Each search/graph/trending task was calling _fetch_and_index_channel
inline for up to 10-15 newly discovered channels, each making up to 4
yt-dlp calls (1 channel metadata + 3 individual video fetches for
dateless entries). This bypassed the 30-90 s worker gap, producing
bursts of 40-60 calls in rapid succession and hammering YouTube.
Changes:
- _fetch_and_index_channel: removed the dateless-video individual
fetch loop — one call per channel, videos without published_at are
simply skipped at discovery time
- _search_and_store and _fetch_graph_for_channel: queue channel
indexing as separate worker tasks (3 and 2 respectively) so the
30-90 s gap applies between every yt-dlp call, including channel
indexing
- update_trending_signal and update_graph_signal (old sync path):
removed inline _fetch_and_index_channel loops (15 and 10 channels)
- _discovery_task in channels.py: replaced run_full_discovery (old
synchronous path) with schedule_discovery so sync-all and
follow-by-url go through the queue system
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three more code paths were bypassing the _meta_lock guard and firing
raw yt-dlp processes concurrently with active downloads:
- Popular fetch Phase 1 (flat-playlist channel crawl): changed from
ytdlp._run to ytdlp._meta_run so it waits for active downloads
- download_subs_only: changed from _run to _meta_run
- fetch_video_comments: returns empty list immediately if a download
is active (avoids blocking a 90s call indefinitely)
- Diagnostic test endpoint (settings): switched to _meta_run
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three code paths could fire yt-dlp immediately (polite=False) while a
download was already running, causing YouTube to see two simultaneous
authenticated sessions and invalidate the cookie:
- search.py: live yt-dlp fallback now skipped while any download is active
- downloads.py: _ensure_video uses polite=True so it waits for active
downloads to finish before fetching metadata for an unknown video
- channels.py: follow_by_url uses polite=True when fetching metadata
for a brand-new channel
Added is_download_active() helper to ytdlp.py to expose the active
download state without importing private globals.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each yt-dlp call is now an independent task (one search query, one trending
fetch, one graph channel fetch). Tasks are shuffled together so we don't fire
10 searches in a row, then enqueued with 30-90s random gaps between them —
a full sweep of ~17 tasks completes in roughly 10-25 minutes instead of
hammering YouTube with 21 calls back-to-back.
Fast signals (community, category clusters) still run synchronously at
schedule time since they're pure SQL.
Progress is tracked per-user (total/done/running) and exposed on
GET /api/discovery/status. The Discovery page polls every 10s while
running and shows a progress bar + "Finding channels… X / Y" in the header.
The auto-discovery daemon skips scheduling if a manual sweep is already running.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each yt-dlp call is a separate subprocess that opens a new HTTP session with
YouTube. 64 sessions in a row looks like a bot regardless of rate limiting.
Changes:
- crawl_by_search: 30 queries → 10 (top 5 tags, 4 channel names, 1 serendipity)
- update_liked_signal: 10 queries → 4
- update_watch_signal: removed (tags already included in crawl_by_search)
- update_trending_signal: 2 regions → 1 (first region only)
- update_graph_signal: 12 sampled channels → 6
New total: ~21 yt-dlp calls per run (~105s with 5s gaps) vs ~320s before.
Signal quality is preserved — the removed queries were low-marginal-value
duplicates of content already covered by the remaining ones.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_meta_run now checks _active_downloads before each background yt-dlp call.
If a download is running it waits (3s poll loop) until the download finishes
before making the next metadata request.
This prevents YouTube from seeing the same session used simultaneously by
a download and a discovery/metadata call, which was causing cookie invalidation
even with private cookie copies.
Downloads still run immediately without waiting for metadata. Background
discovery is the one that yields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two back-to-back try: blocks with only one finally: caused
"expected 'except' or 'finally' block" at startup. Merged into one.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-discovery daemon:
- Runs every hour, triggers full discovery for any user whose last run
was >23 hours ago. First check is 5 minutes after startup.
- Tracks run time in user_settings.last_discovery_run (new column).
- Manual Find More also stamps last_discovery_run.
Discovery status endpoint (GET /api/discovery/status):
- Returns pending_count (unseen queue size) and last_run timestamp.
- Shown in the Discover page header so users know queue state at a glance.
Find More UX fix:
- Was: kick background task, wait 8 seconds, refetch (task takes minutes).
- Now: button shows "Queued ✓" on success with an explanatory banner
telling the user it takes a few minutes and also runs daily automatically.
Query diversity:
- Added "best [category] channels" serendipity queries to crawl_by_search.
- Limit raised from 25 to 30 queries per run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Downloads run for minutes via Popen while metadata calls continue in parallel.
Both processes read from AND write back to the same --cookies file, causing
concurrent writes that corrupt the session cookie state.
Fix: _make_private_cookie_copy() intercepts --cookies <file> in any arg list
and swaps it for a NamedTemporaryFile copy. Each yt-dlp process gets its own
snapshot; write-backs go to the throwaway copy and are discarded on cleanup.
- _run() uses this for all subprocess.run calls (metadata, subtitles, comments)
- start_download() uses it for the long-lived Popen download process
- _meta_run() benefits automatically since it calls _run()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Cap trending base_score at 18.0 (was unbounded — a viral channel could
score 240+ vs search's 15, making everything else invisible)
- Cap all discovery scores at 50.0 globally so no single signal dominates
- Fix score accumulation: cap accumulated total at 50.0 (was unbounded
across repeated runs, cementing high-score channels in top positions forever)
- Expire unseen queue entries older than 14 days at start of each run
- Add ±8 score perturbation to discovery list endpoint (was pure score DESC,
identical every visit until dismissed)
- Add score perturbation to discovery_videos ORDER BY too
- Fix SQL injection in update_category_clusters (category strings were
interpolated directly into query; now use parameterized queries per category)
- Raise category signal score from 3.0 → 5.0 to compensate for trending cap
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
search_youtube now takes polite=False (default) for instant user
searches and polite=True for background discovery crawls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the lock was released before _run(), so multiple threads could
fire yt-dlp processes simultaneously — completely defeating the rate limiter.
Now the lock is held through the subprocess call and released in finally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- search_youtube, fetch_trending, fetch_featured_channels now use _meta_run
- Replaced ThreadPoolExecutor(4) parallel searches with sequential loop
- Replaced ThreadPoolExecutor(3) parallel featured-channel fetches with sequential
- _fetch_and_index_channel passes polite=True to fetch_channel/video_metadata
Discovery was firing 4+ simultaneous yt-dlp processes, each with cookies,
which is what invalidated the session.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetch_video_metadata and fetch_channel_metadata now take polite=True for
background tasks (enforces 5s+ gap via global lock) while user-facing
calls (watch page, follow channel, download) use polite=False and run
immediately.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All fetch_video_metadata / fetch_channel_metadata / fetch_channel_playlists
/ fetch_available_subs calls now go through _meta_run which enforces a
minimum 5s gap (+ 0.5-2.5s random jitter) across all concurrent tasks.
Per-task sleep loops removed since the global lock serializes everything.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 simultaneous yt-dlp processes hitting video pages looks like a bot
attack and causes YouTube to nuke the session cookies. Drop to:
- Popular fetch view_count enrichment: 8→3 workers
- Discovery search: 8→4 workers
- Graph signal (featured channels): 8→3 workers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_stable_thumbnail expects a video ID but was being passed a playlist ID
(PLxxx), producing a broken URL. Now picks the best thumbnail from
yt-dlp's thumbnails array, falling back to the singular thumbnail field.
Also backfills playlist.thumbnail_url from the first video when indexing
a playlist that still has no thumbnail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.
Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.
Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New playlists router: fetch channel playlists from YouTube, index
playlist videos, browse by playlist with pagination
- Playlist model gets video_ids column to store ordered video list
- Register playlists router in main.py with DB migration
- Add Playlists tab to Channel page: grid of playlist cards, click to
browse videos, index/re-index per playlist
- Fix explore older videos skipping all entries without published_at;
flat-playlist entries for older videos rarely include timestamp data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Channel page:
- "Explore older videos" button fetches 100 videos at a time further back
in the channel history using yt-dlp --playlist-start/--playlist-end
- "Fetch entire history" still available for full crawl
- Backend: /channels/{id}/explore?page=N endpoint + playlist offset support
in fetch_channel_metadata(start_video=N)
Home feed:
- New "Rediscover" mode: older unwatched videos (90+ days old) from
followed channels, randomly sampled then re-ranked by tag affinity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run search queries concurrently (5 workers) instead of sequentially —
cuts crawl time dramatically. Add graph signal: fetch featured channels
from followed channels' /channels tab in parallel (4 workers), which
surfaces creator-curated recommendations as a high-signal, diverse pool
that search alone can't reach.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Double search results per query (20→40), increase query budget (15→25),
use more tags per signal (6→10-12), index more new channels per refresh
(5→10). Remove the YT logo from the header.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip yt-dlp's align:start position:0% cue settings from VTT files
after both video download and subtitle-only download so CSS ::cue centers them
- CC chip now shows already-downloaded langs (e.g. 'CC: en') directly
from disk with a '+' button to add more — no YouTube call needed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- download_subs_only(): yt-dlp --skip-download to fetch just .vtt sidecar
- POST /by-yt/{ytId}/download-subs endpoint
- CC chip now visible on downloaded videos; clicking checks YouTube,
shows lang picker with "Add subtitles" button separate from re-download
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Convert subs to .vtt (was .srt which browsers don't support in <track>)
- Add GET /subtitle-files endpoint: instant disk scan for .vtt sidecar files,
no yt-dlp call needed
- Inject <track> elements into the video player for each .vtt on disk;
browser CC button appears automatically
- Before download: CC chip triggers YouTube availability check (slow, on demand)
- After download with subs: shows "CC ✓" — subtitles live in the player controls
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fetch_available_subs() queries yt-dlp for manual + auto-generated
subtitle langs available on YouTube for any given video
- GET /api/videos/by-yt/{ytId}/subs exposes this to the frontend
- DownloadRequest now accepts subtitle_langs to override the global
setting on a per-download basis
- Watch page fetches available subtitle langs on load (in parallel),
shows a CC dropdown with manual langs + auto-generated langs labeled
"(auto)"; selected lang is passed through to the download
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- auto-sync daemon: background thread checks every hour and syncs followed
channels for users with sync_interval_hours set (6/12/24h options)
- disk stats: /api/stats now returns total/used/free/download bytes;
Stats page shows a disk usage bar
- subtitles: subtitle_langs setting (e.g. "en,sv") passed through all
download paths; yt-dlp writes .srt files alongside the video
- Settings page: sync interval dropdown + subtitle languages input
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Merger+ffmpeg faststart postprocessor arg was overwritten by the
subsequent embed-metadata and embed-thumbnail passes anyway, making it
a pointless extra ffmpeg remux. Dropped it and restored the embeds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both flags trigger extra ffmpeg passes over the entire file after the
stream merge. They're unnecessary — metadata lives in the DB and
thumbnails come from YouTube. Removing them cuts the post-join wait
to just the faststart rewrite.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp 2026.03.17 dropped support for tv_embedded — it silently skips it
and falls back to web-only, which only exposes the pre-merged 360p format
(ID 18). The override was added to avoid SABR restrictions but is now the
cause of the low-quality downloads.
Removing --extractor-args restores yt-dlp's default client selection
(android_vr + web fallback) which exposes all formats up to 2160p.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The per-quality format strings fell back to best[height<=NNN] which on
YouTube resolves to pre-merged streams capped at ~360p, causing every
quality selector choice to silently download low-res video. Replace with
bestvideo+bestaudio as the intermediate fallback so adaptive streams are
always preferred over pre-merged ones.
Also fix detect_resolution to correctly label 1440p and 2160p files
instead of capping the display at 1080p.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YouTube's web client gets SABR format restrictions in 2025-2026 yt-dlp,
limiting available streams and causing fallback to 360p. tv_embedded
bypasses SABR and exposes the full format list including 4K.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Most modern YouTube videos use VP9/AV1, so the old bestvideo[ext=mp4][vcodec^=avc1]
filter always failed and fell through to format codes 22/18 (720p/360p).
--merge-output-format mp4 handles the container; no need to restrict codec.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the engine was blind to dislikes/dismissals:
- _build_user_tag_profile only used liked/watched (positive only)
- dismiss_penalty was capped at 80% so hated content still surfaced
- _search_and_store had zero affinity filtering, any YouTube result entered the queue
- user_tag_affinity negative scores (written by dismiss/dislike) were never read
Now:
- _build_user_tag_profile reads directly from user_tag_affinity (positive + negative)
- _tag_relevance_score returns negative values, so disliked-tag channels score below zero and get dropped
- _search_and_store skips channels whose indexed videos match 3+ negatively-rated tags
- list_discovery post-filters channels already in the queue using the same neg-affinity check
- Removed the old _dismissed_channel_tags + dismiss_penalty (superseded)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
max_comments takes thread_count,total,replies_per_thread,reply_pages.
Passing just one value left the rest unset which caused yt-dlp to fetch
only 1 comment. Now passes 20,20,0,0 to fetch 20 top-level comments
with no replies. Also switch --no-download to --skip-download.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--write-comments writes to .info.json reliably; parsing stdout with
--dump-json was never guaranteed to include comments. Use a TemporaryDirectory,
write the info.json there, read it, then let the context manager clean up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comments: switch from CLI --write-comments to yt-dlp Python API with
getcomments=True — more reliable, proper extractor_args dict format
Dislikes: add dislike_count column, fetch from returnyoutubedislike.com
after each video metadata upsert (5s timeout, non-fatal)
UI: replace emoji like count with a like/dislike ratio bar — blue fill
showing like proportion, labels on each end; views stay in meta row
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp separates extractor args with ; not ,. The malformed arg was
causing max_comments to parse as a garbage string, fetching ~1 comment.
Also swap max_comment_depth (not a real YouTube extractor arg) for
comment_sort=top to get highest-engagement comments first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same pattern as view_count: model column, yt-dlp extraction, SQL select,
VideoDetail field, startup migration, and display in Watch meta row.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Video model: view_count column (Integer, nullable)
- ytdlp._normalize_video: extract view_count from yt-dlp info
- _VIDEO_SELECT: include v.view_count in all queries
- VideoDetail schema: view_count field
- Watch page: formatViews() helper, show "X.XM views" in meta row
alongside date and category
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VideoComment model (video_id, author, text, likes, is_pinned, published_at)
- fetch_video_comments() in ytdlp.py: top 20 comments, no reply threads,
sorted pinned-first then by likes
- GET /videos/by-yt/{id}/comments — returns cached comments instantly
- POST /videos/by-yt/{id}/comments/refresh — fetches from YouTube, stores, returns
- Watch page: CommentsSection shows "Load comments" button when uncached,
renders comments with author/likes once loaded; Refresh link to re-fetch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add bottom tab bar (Home/Following/Discover/Downloads/Settings) for mobile
- Fetch and display channel banner images on channel pages
- Fix ChannelCard: channels without a local DB id now follow+navigate on click
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nodeenv is a Python package that downloads a pre-built Node.js binary —
no apt repos, no compilation, guaranteed to work in python:3.12-slim.
The 'node' binary is linked into /usr/local/bin so yt-dlp can find it.
With Node.js available the web client works fully (37 formats) and can
solve YouTube's n-challenge that every other approach was failing on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
web_embedded: supports cookies, no Node.js/JS runtime needed, 23 video
formats available. android_vr was skipped by yt-dlp when cookies are
present since that client doesn't support cookie auth.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
android_vr provides pre-signed format URLs that bypass YouTube's
n-challenge and signature JS requirements entirely. Tested: 23 video
formats available without any JavaScript runtime installed.
Reverts Node.js Dockerfile addition (which failed to build anyway).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Node.js is required by yt-dlp to solve YouTube's n-challenge (format URL
deobfuscation). Without it the web client returns no video formats.
The tv and ios player clients were removed — both require GVS PO tokens
that we don't have, so they only produce warnings and block every request.
The web client with Node.js installed gives 30+ formats and works cleanly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _cookie_args() no longer falls through to --cookies-from-browser when
cookies_file is configured but missing. Firefox isn't installed in the
Docker image, so that fallback caused yt-dlp to exit with empty stdout
and every metadata fetch to return "Video not found on YouTube".
- fetch_video_metadata() now retries without auth args if the first call
fails, so a broken cookie config can't block public video fetches.
- Add use_oauth2 setting + full device-auth flow (POST /settings/oauth2-init,
GET /settings/oauth2-status) with OAuth2Section UI in Settings page.
- Add GET /settings/ytdlp-test diagnostics endpoint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>