Each yt-dlp call is now an independent task (one search query, one trending
fetch, one graph channel fetch). Tasks are shuffled together so we don't fire
10 searches in a row, then enqueued with 30-90s random gaps between them —
a full sweep of ~17 tasks completes in roughly 10-25 minutes instead of
hammering YouTube with 21 calls back-to-back.
Fast signals (community, category clusters) still run synchronously at
schedule time since they're pure SQL.
Progress is tracked per-user (total/done/running) and exposed on
GET /api/discovery/status. The Discovery page polls every 10s while
running and shows a progress bar + "Finding channels… X / Y" in the header.
The auto-discovery daemon skips scheduling if a manual sweep is already running.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each yt-dlp call is a separate subprocess that opens a new HTTP session with
YouTube. 64 sessions in a row looks like a bot regardless of rate limiting.
Changes:
- crawl_by_search: 30 queries → 10 (top 5 tags, 4 channel names, 1 serendipity)
- update_liked_signal: 10 queries → 4
- update_watch_signal: removed (tags already included in crawl_by_search)
- update_trending_signal: 2 regions → 1 (first region only)
- update_graph_signal: 12 sampled channels → 6
New total: ~21 yt-dlp calls per run (~105s with 5s gaps) vs ~320s before.
Signal quality is preserved — the removed queries were low-marginal-value
duplicates of content already covered by the remaining ones.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_meta_run now checks _active_downloads before each background yt-dlp call.
If a download is running it waits (3s poll loop) until the download finishes
before making the next metadata request.
This prevents YouTube from seeing the same session used simultaneously by
a download and a discovery/metadata call, which was causing cookie invalidation
even with private cookie copies.
Downloads still run immediately without waiting for metadata. Background
discovery is the one that yields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two back-to-back try: blocks with only one finally: caused
"expected 'except' or 'finally' block" at startup. Merged into one.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Column was added via SQL migration but missing from the SQLAlchemy model
definition, causing AttributeError when the discovery status endpoint
accesses s.last_discovery_run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-discovery daemon:
- Runs every hour, triggers full discovery for any user whose last run
was >23 hours ago. First check is 5 minutes after startup.
- Tracks run time in user_settings.last_discovery_run (new column).
- Manual Find More also stamps last_discovery_run.
Discovery status endpoint (GET /api/discovery/status):
- Returns pending_count (unseen queue size) and last_run timestamp.
- Shown in the Discover page header so users know queue state at a glance.
Find More UX fix:
- Was: kick background task, wait 8 seconds, refetch (task takes minutes).
- Now: button shows "Queued ✓" on success with an explanatory banner
telling the user it takes a few minutes and also runs daily automatically.
Query diversity:
- Added "best [category] channels" serendipity queries to crawl_by_search.
- Limit raised from 25 to 30 queries per run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Downloads run for minutes via Popen while metadata calls continue in parallel.
Both processes read from AND write back to the same --cookies file, causing
concurrent writes that corrupt the session cookie state.
Fix: _make_private_cookie_copy() intercepts --cookies <file> in any arg list
and swaps it for a NamedTemporaryFile copy. Each yt-dlp process gets its own
snapshot; write-backs go to the throwaway copy and are discarded on cleanup.
- _run() uses this for all subprocess.run calls (metadata, subtitles, comments)
- start_download() uses it for the long-lived Popen download process
- _meta_run() benefits automatically since it calls _run()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Cap trending base_score at 18.0 (was unbounded — a viral channel could
score 240+ vs search's 15, making everything else invisible)
- Cap all discovery scores at 50.0 globally so no single signal dominates
- Fix score accumulation: cap accumulated total at 50.0 (was unbounded
across repeated runs, cementing high-score channels in top positions forever)
- Expire unseen queue entries older than 14 days at start of each run
- Add ±8 score perturbation to discovery list endpoint (was pure score DESC,
identical every visit until dismissed)
- Add score perturbation to discovery_videos ORDER BY too
- Fix SQL injection in update_category_clusters (category strings were
interpolated directly into query; now use parameterized queries per category)
- Raise category signal score from 3.0 → 5.0 to compensate for trending cap
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
search_youtube now takes polite=False (default) for instant user
searches and polite=True for background discovery crawls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the lock was released before _run(), so multiple threads could
fire yt-dlp processes simultaneously — completely defeating the rate limiter.
Now the lock is held through the subprocess call and released in finally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- stats: started_count now includes any video opened (last_watched_at set)
not just ones with saved progress seconds
- VideoPlayer: fires updateProgress immediately on open so even a
click-and-back sets last_watched_at and counts as a started video
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracks watch_progress_seconds > 0 AND watched = 0. Shown as
"In progress" card in the engagement row alongside finished/bailed/rewatched.
Total liked moved to engagement row, top row condensed to 3 cards.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 discovery card per 3 followed videos (was 1 per 5).
Lower-ranked discovery cards also get shuffled so the same
channels don't always appear at fixed positions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ranking improvements:
- Wider candidate pool (4x limit) with ±12pt score perturbation so
same-score videos shuffle differently each load
- Recent channel engagement signal: channels watched in past 30 days
get a +4pts/watch boost
- Bail penalty: -25pts for videos started but abandoned before 20%
- Impression penalty: -3pts per prior feed appearance (capped at 10),
so repeatedly-skipped videos sink naturally
- rn cap raised to 5 for more candidates; Python-side sampling picks top limit
Feed UX:
- Reshuffle button now available on For You (ranked) mode, not just Explore
- shuffleKey now always included in query key (not just random mode)
- Ranked mode staleTime reduced from 10min to 90s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- search_youtube, fetch_trending, fetch_featured_channels now use _meta_run
- Replaced ThreadPoolExecutor(4) parallel searches with sequential loop
- Replaced ThreadPoolExecutor(3) parallel featured-channel fetches with sequential
- _fetch_and_index_channel passes polite=True to fetch_channel/video_metadata
Discovery was firing 4+ simultaneous yt-dlp processes, each with cookies,
which is what invalidated the session.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The refresh endpoint was passing the request's db session to the
background task, which is closed before the task runs — silently
doing nothing on every refresh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetch_video_metadata and fetch_channel_metadata now take polite=True for
background tasks (enforces 5s+ gap via global lock) while user-facing
calls (watch page, follow channel, download) use polite=False and run
immediately.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All fetch_video_metadata / fetch_channel_metadata / fetch_channel_playlists
/ fetch_available_subs calls now go through _meta_run which enforces a
minimum 5s gap (+ 0.5-2.5s random jitter) across all concurrent tasks.
Per-task sleep loops removed since the global lock serializes everything.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Top 10 shown as variable-size tag cloud, all tags below as a
two-column bar chart. Backend limit raised from 20 to 60.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Writes a Kodi/Jellyfin-compatible .nfo XML file next to each .mp4 on
download completion, deletes it when the download record is removed, and
exposes POST /api/downloads/nfo/generate to backfill NFOs for existing
downloads. Frontend adds a "Generate NFO" button in the Downloads header.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Settings: Notifications section with toggle that requests browser permission
and stores preference in localStorage
- Layout: fires a Notification when new_count increases and user isn't on /following
- Works whenever the tab is open (foreground or background)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Popular fetch phase 2: sequential with 2s delay between requests (was 3 parallel workers)
- Reduced from 200 to 100 videos per popular fetch run
- DB writes happen after each video instead of all at end (no data loss on interrupt)
- _enrich_missing_task: delay increased 0.5s → 2s between requests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stats:
- Peak watching hours chart (24-bar) from last_watched_at timestamps
RSS:
- GET /api/channels/rss — last 100 videos from followed channels as RSS 2.0
- RSS link in Following > Health tab
Channel health:
- New Health tab in Following groups channels into Active / Slow / Dormant / Dead
based on days since last upload
Bulk video download:
- Select mode on Channel page (Videos tab) with checkboxes
- Sticky bottom bar shows count + Download button
- Queues a download for each selected video
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Offline banner in nav when backend is unreachable (network error, not 4xx)
- GET /channels/{id}/random — picks random unwatched video, navigates to watch
- GET /channels/{id}/in-progress — videos with >30s progress, not yet watched
- Channel page: 'Surprise me' button (desktop + mobile) navigates to random video
- Channel page: 'Continue watching' row above video list when in-progress videos exist
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows each active download (title + progress bar) and background task
(label, phase, done/total + bar) on hover. Pure CSS group-hover, no JS state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Phase 1 (crawling) now creates the task immediately so Downloads shows it
- Phase label updates to 'Enriching view counts' when phase 2 starts
- Nav bar DownloadIndicator also polls /channels/tasks and shows spinning
indicator + progress % for background tasks (not just file downloads)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Track active background tasks in an in-memory dict with a lock
- Expose GET /api/channels/tasks returning running task list
- _fetch_popular_task updates done count as each video fetch completes
- Downloads page polls /tasks every 2s and shows progress bars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 simultaneous yt-dlp processes hitting video pages looks like a bot
attack and causes YouTube to nuke the session cookies. Drop to:
- Popular fetch view_count enrichment: 8→3 workers
- Discovery search: 8→4 workers
- Graph signal (featured channels): 8→3 workers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously ORDER BY published_at DESC meant only the newest 200 videos
ever got view counts. Now ORDER BY RANDOM() spreads the 200 slots across
the full channel history — videos without a count are still prioritised,
but among those they're drawn randomly. Each run of Fetch Popular covers
a different slice, converging toward full coverage over time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1: crawl the full channel with flat-playlist to store any videos
not yet in DB (fast, no individual requests).
Phase 2: fetch real view_count for up to 200 channel videos in parallel
(8 workers), prioritising those missing a count.
Popular tab sorts all channel videos by view_count DESC NULLS LAST.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp's own test suite marks channel sort as 'Query for sorting no
longer works' — YouTube blocked it. New approach: fetch view_count for
up to 200 indexed videos in parallel (8 workers, prioritising those
missing counts), then Popular tab sorts by view_count DESC WHERE
view_count IS NOT NULL. Accurate for any channel once enrichment runs.
Frontend refetch wait raised to 60s to cover ~200 parallel fetches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The outer try had no except — any exception (e.g. table missing) killed
the whole background task with no error visible to the user. Now:
- CREATE TABLE IF NOT EXISTS inline so the task works even if the
startup migration hasn't run (no server restart required)
- Wrap DELETE in its own try/except
- Catch and print outer exceptions so failures appear in server logs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
flat-playlist mode returns timestamp=null for most playlist entries so
published_at is missing after the initial index. Now kicks off
_enrich_missing_task (scoped to the playlist size) as a daemon thread
immediately after indexing commits, filling in dates and view counts
in the background via individual video fetches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the task waited for all 30 parallel metadata fetches before
writing anything to the DB (~30s). Now Phase 1 (flat-playlist IDs +
basic info) commits to channel_popular_videos immediately (~5s), so the
tab populates fast. Phase 2 (view_count + dates) runs in a daemon thread
while the user is already browsing.
Also: catch table-not-found errors in the sort=popular query so a cold
server returns [] instead of 500. Frontend refetch wait 35s→8s to match
the faster Phase 1 commit time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When yt-dlp returns no thumbnail for a playlist entry, fetch the
playlist's first video (max_videos=1) and derive a stable thumbnail
URL from its video ID. Applied during both the initial fetch and
on index (already done on index in previous commit).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_stable_thumbnail expects a video ID but was being passed a playlist ID
(PLxxx), producing a broken URL. Now picks the best thumbnail from
yt-dlp's thumbnails array, falling back to the singular thumbnail field.
Also backfills playlist.thumbnail_url from the first video when indexing
a playlist that still has no thumbnail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace 2-col grid of small cards with a full-width list layout:
thumbnail on the left, title + status on the right — same proportions
and hover behaviour as VideoCard variant="list". Index/Re-index button
appears on hover, video count shown as a pill overlay on the thumbnail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add channel_popular_videos table (channel_id, video_id, rank).
_fetch_popular_task clears and rewrites this table after each fetch.
GET /channels/{id}/videos?sort=popular now JOINs this table and orders
by rank instead of view_count, so the tab shows exactly the videos
YouTube returned in popularity order — nothing more.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.
Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.
Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New playlists router: fetch channel playlists from YouTube, index
playlist videos, browse by playlist with pagination
- Playlist model gets video_ids column to store ordered video list
- Register playlists router in main.py with DB migration
- Add Playlists tab to Channel page: grid of playlist cards, click to
browse videos, index/re-index per playlist
- Fix explore older videos skipping all entries without published_at;
flat-playlist entries for older videos rarely include timestamp data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- YouTube sort=p fetch: indexes top 100 most-viewed videos from a channel,
storing view_count in the DB
- Popular tab on channel page shows videos sorted by view_count DESC
- Videos/Popular tab switcher with context-appropriate fetch buttons
- Expose view_count in VideoOut; add 'popular' sort to channel videos endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Channel page:
- "Explore older videos" button fetches 100 videos at a time further back
in the channel history using yt-dlp --playlist-start/--playlist-end
- "Fetch entire history" still available for full crawl
- Backend: /channels/{id}/explore?page=N endpoint + playlist offset support
in fetch_channel_metadata(start_video=N)
Home feed:
- New "Rediscover" mode: older unwatched videos (90+ days old) from
followed channels, randomly sampled then re-ranked by tag affinity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Search bar filters indexed videos server-side; "Search YouTube" button
triggers a deep channel search and indexes matching results
- Server-side sort (newest/oldest/A-Z/unwatched) + infinite scroll (60/page)
- "Fetch recent" indexes last 30, "Fetch all" indexes full history
- Auto-reindex on page visit if stale (>1h), refetches at 8s
- Add /channels/{id}/index-full endpoint (max_videos=0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQLite returns datetime columns as strings via raw text() queries.
Parse crawled_at safely before comparing against utcnow().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>