Pixiv - KiraraShss
2054 字
10 分钟
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
适用场景:
- YouTube 视频 没有任何字幕
- 需要 本地生成高质量中文字幕(SRT/TXT)
- 适合财经访谈、AI 分析语料整理
- 使用 Ubuntu + pyenv 管理 Python 多版本
一、系统环境要求
1. 操作系统
- Ubuntu 20.04 / 22.04 / 24.04(已验证)
2. 必需系统软件包(APT)
sudo apt updatesudo apt install -y ffmpeg git curl wget ca-certificates build-essential pkg-config libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev说明:
ffmpeg:音频解码(必须)- 其余依赖:用于 pyenv / Python 编译
二、Python 环境(pyenv)
1. 使用 pyenv(你当前就是这个方案)
示例:
pyenv install 3.10.14pyenv local 3.10.14确认 Python 来自 pyenv:
which python三、Python 必需包
1. faster-whisper(核心)
pip install faster-whisper说明:
- 本地 Whisper 推理(无需联网)
- 支持 CPU / GPU
large-v3对中文财经口语最稳
(可选)其他常用包
pip install torch numpy注:CPU 场景不是必须,GPU(如 4090)才需要关注 torch CUDA 版本
四、yt-dlp(YouTube 下载工具)
1. 安装
sudo apt install yt-dlp或(最新版):
pip install -U yt-dlp2. 必须参数(重点)
由于 YouTube 反爬机制,强烈建议始终使用:
- 浏览器 Cookie
- EJS 远程组件
--cookies-from-browser chrome--remote-components ejs:github浏览器需关闭,否则 Cookie 可能被锁
五、音频下载(只下音频,不下视频)
1. 推荐格式(m4a,ID=140)
yt-dlp --cookies-from-browser chrome --remote-components ejs:github -f 140 -x --audio-format m4a https://www.youtube.com/watch?v=VIDEO_ID2. 自动兜底选择(更稳)
yt-dlp --cookies-from-browser chrome --remote-components ejs:github -f "140/bestaudio[ext=m4a]/bestaudio[ext=webm]/bestaudio/best" -x --audio-format m4a https://www.youtube.com/watch?v=VIDEO_ID六、生成中文字幕(faster-whisper)
1. 单文件转写(示例脚本)
from faster_whisper import WhisperModelimport os
audio = "example.m4a"base = os.path.splitext(audio)[0]
model = WhisperModel( "large-v3", device="cpu", # 有 GPU 可改为 "cuda" compute_type="int8")
segments, info = model.transcribe( audio, language="zh", beam_size=5, vad_filter=True)
def ts(t): h = int(t // 3600) m = int((t % 3600) // 60) s = t % 60 return f"{h:02d}:{m:02d}:{s:06.3f}".replace(".", ",")
with open(base + ".zh.srt", "w", encoding="utf-8") as f: i = 1 for seg in segments: text = seg.text.strip() if not text: continue f.write(f"{i}\n{ts(seg.start)} --> {ts(seg.end)}\n{text}\n\n") i += 1输出文件:
xxx.zh.srt(字幕)- 可额外输出
xxx.zh.txt作为纯文本
七、一键脚本(yt_asr.sh)说明
功能
- 输入:YouTube URL 或本地音频文件
- 自动完成:
- 下载音频(m4a)
- 生成中文字幕(SRT + TXT)
- 兼容:pyenv / CPU / GPU
关键注意点(你踩过的坑)
- Bash 函数返回值必须干净
download_audio():- 日志必须输出到
stderr - stdout 只能输出最终音频路径
- 日志必须输出到
- 否则会导致:
❌ Audio not found: ⬇️ Downloading audio ...
正确做法:
echo xxx >&2yt-dlp ... >&2printf '%s\n' "$audio_path"
八、常见问题速查
Q1:YouTube 显示“无字幕”?
A:只能自己跑 ASR,yt-dlp 无解
Q2:bestaudio 报错?
A:先 --list-formats,选 140
Q3:YouTube 提示 bot 校验?
A:
--cookies-from-browser chrome--remote-components ejs:githubQ4:中文财经口语不准?
A:
- 用
large-v3 - 开
vad_filter=True
九、推荐目录结构
YoutubeLearnAStock/├── yt_asr.sh├── out/│ ├── *.m4a│ ├── *.zh.srt│ ├── *.zh.txt│ └── logs/└── README.md十、你现在拥有的能力
- ✅ 不依赖 YouTube 字幕
- ✅ 可批量生成高质量中文字幕
- ✅ 可直接用于:
- A 股访谈复盘
- AI 分析 / RAG
- 长期语料积累
这是 专业级工作流,不是“下载字幕小技巧”。
十一、脚本
一键执行脚本:
#!/usr/bin/env bash# yt_asr.sh - Final version + urls.txt batch mode# Supports:# 1) URLs / local audio files as args# 2) --urls-file urls.txt (one URL per line)# Robust for Ubuntu + pyenv + faster-whisper
set -euo pipefail
# -----------------------------# Defaults# -----------------------------OUTDIR="${OUTDIR:-./out}"ASR_LANG="${ASR_LANG:-zh}" # zh / en / ja / ...MODEL="${MODEL:-large-v3}"DEVICE="${DEVICE:-cpu}" # cpu | cudaCOMPUTE="${COMPUTE:-int8}"BROWSER="${BROWSER:-chrome}"USE_COOKIES="${USE_COOKIES:-1}"USE_REMOTE_COMPONENTS="${USE_REMOTE_COMPONENTS:-1}"KEEP_AUDIO="${KEEP_AUDIO:-1}"FORMAT_SELECT="${FORMAT_SELECT:-140/bestaudio[ext=m4a]/bestaudio[ext=webm]/bestaudio/best}"AUDIO_FORMAT="${AUDIO_FORMAT:-m4a}"VAD_FILTER="${VAD_FILTER:-1}"
URLS_FILE=""
# -----------------------------# Helpers# -----------------------------die() { echo "❌ $*" >&2; exit 1; }log() { echo "👉 $*" >&2; }
need_cmd() { command -v "$1" >/dev/null 2>&1 || die "Missing command: $1"; }
ensure_python_pkg() { local mod="$1" local pkg="${2:-$1}" log "🔎 python: $(command -v python)" if python - <<PY >/dev/null 2>&1import importlib.util, syssys.exit(0 if importlib.util.find_spec("$mod") else 1)PY then log "✅ Python package OK: $mod" else log "⬇️ Installing Python package: $pkg" python -m pip install -U "$pkg" fi}
is_url() { [[ "$1" =~ ^https?:// ]]; }
build_ytdlp_args() { local -a a=() [[ "$USE_COOKIES" == "1" ]] && a+=(--cookies-from-browser "$BROWSER") [[ "$USE_REMOTE_COMPONENTS" == "1" ]] && a+=(--remote-components ejs:github) printf '%s\0' "${a[@]}"}
download_audio() { local input="$1" outdir="$2" logf="$3" mkdir -p "$outdir"
local -a ytdlp=() while IFS= read -r -d '' x; do ytdlp+=("$x"); done < <(build_ytdlp_args)
local tmpl="$outdir/%(title).200B [%(id)s].%(ext)s" log "⬇️ Downloading audio: $input"
local filepath filepath="$( yt-dlp "${ytdlp[@]}" \ -f "$FORMAT_SELECT" \ -x --audio-format "$AUDIO_FORMAT" \ -o "$tmpl" \ --print after_move:filepath \ "$input" \ 2>>"$logf" )" || die "yt-dlp failed: $input"
filepath="$(printf '%s\n' "$filepath" | sed '/^[[:space:]]*$/d' | tail -n 1)" [[ -f "$filepath" ]] || die "Audio not found after download: $filepath" printf '%s\n' "$filepath"}
transcribe_audio() { local audio="$1" lang="$2" model="$3" device="$4" compute="$5" vad="$6" logf="$7" [[ -f "$audio" ]] || die "Audio not found: $audio"
local base="${audio%.*}" local srt="${base}.${lang}.srt" local txt="${base}.${lang}.txt" local json="${base}.${lang}.json" local tsv="${base}.${lang}.tsv"
log "🧠 Transcribing: $audio" python - "$audio" "$lang" "$model" "$device" "$compute" "$vad" \ "$srt" "$txt" "$json" "$tsv" >>"$logf" 2>&1 << 'PY'import sys, json, osfrom faster_whisper import WhisperModel
audio, lang, model_name, device, compute, vad, srt_p, txt_p, json_p, tsv_p = sys.argv[1:]vad = vad == "1"
model = WhisperModel(model_name, device=device, compute_type=compute)segments, info = model.transcribe(audio, language=lang, beam_size=5, vad_filter=vad)
def ts(t): h=int(t//3600); m=int((t%3600)//60); s=t%60 return f"{h:02d}:{m:02d}:{s:06.3f}".replace(".", ",")
rows=[]with open(srt_p,"w",encoding="utf-8") as srt, open(txt_p,"w",encoding="utf-8") as txt: i=1 for seg in segments: text=(seg.text or "").strip() if not text: continue srt.write(f"{i}\n{ts(seg.start)} --> {ts(seg.end)}\n{text}\n\n") txt.write(text+"\n") rows.append({"i":i,"start":float(seg.start),"end":float(seg.end),"text":text}) i+=1
with open(json_p,"w",encoding="utf-8") as f: json.dump({"audio":os.path.basename(audio),"lang":lang,"segments":rows},f,ensure_ascii=False,indent=2)
with open(tsv_p,"w",encoding="utf-8") as f: f.write("i\tstart\tend\ttext\n") for r in rows: f.write(f"{r['i']}\t{r['start']:.3f}\t{r['end']:.3f}\t{r['text']}\n")PY}
usage() { cat <<'USAGE'Usage: ./yt_asr.sh [options] <url_or_audio> [more...] ./yt_asr.sh --urls-file urls.txt
Options: --urls-file FILE Read URLs from file (one per line, # for comments) -o, --outdir DIR --lang LANG zh / en / ja (default: zh) --model MODEL --device cpu|cuda --compute TYPE --browser NAME --no-cookies --no-remote-components --keep-audio | --no-keep-audio --no-vad -h, --helpUSAGE}
# -----------------------------# Parse args# -----------------------------ARGS=()while [[ $# -gt 0 ]]; do case "$1" in --urls-file) URLS_FILE="$2"; shift 2;; -o|--outdir) OUTDIR="$2"; shift 2;; --lang) ASR_LANG="$2"; shift 2;; --model) MODEL="$2"; shift 2;; --device) DEVICE="$2"; shift 2;; --compute) COMPUTE="$2"; shift 2;; --browser) BROWSER="$2"; shift 2;; --no-cookies) USE_COOKIES=0; shift;; --no-remote-components) USE_REMOTE_COMPONENTS=0; shift;; --keep-audio) KEEP_AUDIO=1; shift;; --no-keep-audio) KEEP_AUDIO=0; shift;; --no-vad) VAD_FILTER=0; shift;; -h|--help) usage; exit 0;; *) ARGS+=("$1"); shift;; esacdone
# -----------------------------# Preflight# -----------------------------need_cmd yt-dlpneed_cmd ffmpegneed_cmd pythonensure_python_pkg faster_whisper faster-whisper
mkdir -p "$OUTDIR" "$OUTDIR/logs"
# -----------------------------# Collect inputs# -----------------------------ITEMS=()
if [[ -n "$URLS_FILE" ]]; then [[ -f "$URLS_FILE" ]] || die "urls file not found: $URLS_FILE" while IFS= read -r line; do line="$(echo "$line" | sed 's/#.*//g' | xargs)" [[ -z "$line" ]] && continue ITEMS+=("$line") done < "$URLS_FILE"fi
ITEMS+=("${ARGS[@]}")[[ ${#ITEMS[@]} -ge 1 ]] || { usage; exit 1; }
# -----------------------------# Main loop# -----------------------------for item in "${ITEMS[@]}"; do echo "============================================================" >&2 log "INPUT: $item"
safe_id="$(echo "$item" | sed 's#[^A-Za-z0-9._-]#_#g' | cut -c1-80)" ts="$(date +%Y%m%d_%H%M%S)" logf="$OUTDIR/logs/${ts}_${safe_id}.log" : > "$logf"
audio="" downloaded=0
if is_url "$item"; then audio="$(download_audio "$item" "$OUTDIR" "$logf")" downloaded=1 else [[ -f "$item" ]] || die "Not a URL or file: $item" audio="$item" fi
transcribe_audio "$audio" "$ASR_LANG" "$MODEL" "$DEVICE" "$COMPUTE" "$VAD_FILTER" "$logf"
if [[ "$downloaded" == "1" && "$KEEP_AUDIO" == "0" ]]; then rm -f -- "$audio" fi
log "📝 Log saved: $logf"done
log "🎉 All done."十、查看视频信息
查看视频可以使用的信息:
yt-dlp --cookies-from-browser chrome --remote-components ejs:github --list-formats https://www.youtube.com/watch?v=XZP-LbYj8SA结果:
android@HelloKitty:/data/mycodes/YoutubeLearnAStock$ yt-dlp --cookies-from-browser chrome --remote-components ejs:github --list-formats https://www.youtube.com/watch?v=XZP-LbYj8SAExtracting cookies from chromeExtracted 2337 cookies from chrome[youtube] Extracting URL: https://www.youtube.com/watch?v=XZP-LbYj8SA[youtube] XZP-LbYj8SA: Downloading webpage[youtube] XZP-LbYj8SA: Downloading tv downgraded player API JSON[youtube] XZP-LbYj8SA: Downloading web safari player API JSON[youtube] XZP-LbYj8SA: Downloading player b95b0e7a-main[youtube] [jsc:deno] Solving JS challenges using deno[youtube] [jsc:deno] Downloading challenge solver lib script from https://github.com/yt-dlp/ejs/releases/download/0.3.2/yt.solver.lib.min.js[youtube] XZP-LbYj8SA: Downloading m3u8 information[info] Available formats for XZP-LbYj8SA:ID EXT RESOLUTION FPS CH │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────sb2 mhtml 20x45 0 │ mhtml │ images storyboardsb3 mhtml 48x27 0 │ mhtml │ images storyboardsb1 mhtml 41x90 0 │ mhtml │ images storyboardsb0 mhtml 82x180 0 │ mhtml │ images storyboard249-drc webm audio only 2 │ 10.35MiB 51k https │ audio only opus 51k 48k low, DRC, webm_dash250-drc webm audio only 2 │ 13.75MiB 68k https │ audio only opus 68k 48k low, DRC, webm_dash249 webm audio only 2 │ 10.36MiB 51k https │ audio only opus 51k 48k low, webm_dash250 webm audio only 2 │ 13.72MiB 68k https │ audio only opus 68k 48k low, webm_dash140-drc m4a audio only 2 │ 26.26MiB 129k https │ audio only mp4a.40.2 129k 44k medium, DRC, m4a_dash251-drc webm audio only 2 │ 24.60MiB 121k https │ audio only opus 121k 48k medium, DRC, webm_dash140 m4a audio only 2 │ 26.26MiB 129k https │ audio only mp4a.40.2 129k 44k medium, m4a_dash251 webm audio only 2 │ 24.57MiB 121k https │ audio only opus 121k 48k medium, webm_dash91 mp4 128x256 30 │ ~ 24.86MiB 123k m3u8 │ avc1.4D400C mp4a.40.5160 mp4 128x256 30 │ 8.15MiB 40k https │ avc1.4d400c 40k video only 144p, mp4_dash278 webm 128x256 30 │ 13.86MiB 68k https │ vp9 68k video only 144p, webm_dash394 mp4 128x256 30 │ 10.49MiB 52k https │ av01.0.00M.08 52k video only 144p, mp4_dash92 mp4 196x426 30 │ ~ 38.13MiB 188k m3u8 │ avc1.4D400D mp4a.40.5133 mp4 196x426 30 │ 16.17MiB 80k https │ avc1.4d400d 80k video only 240p, mp4_dash242 webm 196x426 30 │ 18.19MiB 90k https │ vp9 90k video only 240p, webm_dash395 mp4 196x426 30 │ 17.92MiB 88k https │ av01.0.00M.08 88k video only 240p, mp4_dash93 mp4 294x640 30 │ ~ 73.21MiB 361k m3u8 │ avc1.4D401E mp4a.40.2134 mp4 294x640 30 │ 37.74MiB 186k https │ avc1.4d401e 186k video only 360p, mp4_dash18 mp4 294x640 30 2 │ ≈ 63.95MiB 315k https │ avc1.42001E mp4a.40.2 44k 360p243 webm 294x640 30 │ 28.74MiB 142k https │ vp9 142k video only 360p, webm_dash396 mp4 294x640 30 │ 32.05MiB 158k https │ av01.0.01M.08 158k video only 360p, mp4_dash94 mp4 394x854 30 │ ~116.32MiB 574k m3u8 │ avc1.4D401E mp4a.40.2135 mp4 394x854 30 │ 74.97MiB 370k https │ avc1.4d401e 370k video only 480p, mp4_dash244 webm 394x854 30 │ 40.40MiB 199k https │ vp9 199k video only 480p, webm_dash397 mp4 394x854 30 │ 46.73MiB 230k https │ av01.0.04M.08 230k video only 480p, mp4_dash95 mp4 590x1280 30 │ ~209.47MiB 1033k m3u8 │ avc1.4D401F mp4a.40.2136 mp4 590x1280 30 │ 164.49MiB 811k https │ avc1.4d401f 811k video only 720p, mp4_dash247 webm 590x1280 30 │ 63.96MiB 315k https │ vp9 315k video only 720p, webm_dash398 mp4 590x1280 30 │ 77.00MiB 380k https │ av01.0.05M.08 380k video only 720p, mp4_dash赞助支持
如果这篇文章对你有帮助,欢迎赞助支持!
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
https://jkwei.com/posts/knowledge/youtube_asr_workflow_ubuntu_pyenv/ 最后更新于 2026-01-15,距今已过 27 天
部分内容可能已过时