Each yt-dlp call is a separate subprocess that opens a new HTTP session with
YouTube. 64 sessions in a row looks like a bot regardless of rate limiting.
Changes:
- crawl_by_search: 30 queries → 10 (top 5 tags, 4 channel names, 1 serendipity)
- update_liked_signal: 10 queries → 4
- update_watch_signal: removed (tags already included in crawl_by_search)
- update_trending_signal: 2 regions → 1 (first region only)
- update_graph_signal: 12 sampled channels → 6
New total: ~21 yt-dlp calls per run (~105s with 5s gaps) vs ~320s before.
Signal quality is preserved — the removed queries were low-marginal-value
duplicates of content already covered by the remaining ones.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-discovery daemon:
- Runs every hour, triggers full discovery for any user whose last run
was >23 hours ago. First check is 5 minutes after startup.
- Tracks run time in user_settings.last_discovery_run (new column).
- Manual Find More also stamps last_discovery_run.
Discovery status endpoint (GET /api/discovery/status):
- Returns pending_count (unseen queue size) and last_run timestamp.
- Shown in the Discover page header so users know queue state at a glance.
Find More UX fix:
- Was: kick background task, wait 8 seconds, refetch (task takes minutes).
- Now: button shows "Queued ✓" on success with an explanatory banner
telling the user it takes a few minutes and also runs daily automatically.
Query diversity:
- Added "best [category] channels" serendipity queries to crawl_by_search.
- Limit raised from 25 to 30 queries per run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Cap trending base_score at 18.0 (was unbounded — a viral channel could
score 240+ vs search's 15, making everything else invisible)
- Cap all discovery scores at 50.0 globally so no single signal dominates
- Fix score accumulation: cap accumulated total at 50.0 (was unbounded
across repeated runs, cementing high-score channels in top positions forever)
- Expire unseen queue entries older than 14 days at start of each run
- Add ±8 score perturbation to discovery list endpoint (was pure score DESC,
identical every visit until dismissed)
- Add score perturbation to discovery_videos ORDER BY too
- Fix SQL injection in update_category_clusters (category strings were
interpolated directly into query; now use parameterized queries per category)
- Raise category signal score from 3.0 → 5.0 to compensate for trending cap
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
search_youtube now takes polite=False (default) for instant user
searches and polite=True for background discovery crawls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- search_youtube, fetch_trending, fetch_featured_channels now use _meta_run
- Replaced ThreadPoolExecutor(4) parallel searches with sequential loop
- Replaced ThreadPoolExecutor(3) parallel featured-channel fetches with sequential
- _fetch_and_index_channel passes polite=True to fetch_channel/video_metadata
Discovery was firing 4+ simultaneous yt-dlp processes, each with cookies,
which is what invalidated the session.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 simultaneous yt-dlp processes hitting video pages looks like a bot
attack and causes YouTube to nuke the session cookies. Drop to:
- Popular fetch view_count enrichment: 8→3 workers
- Discovery search: 8→4 workers
- Graph signal (featured channels): 8→3 workers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.
Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.
Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run search queries concurrently (5 workers) instead of sequentially —
cuts crawl time dramatically. Add graph signal: fetch featured channels
from followed channels' /channels tab in parallel (4 workers), which
surfaces creator-curated recommendations as a high-signal, diverse pool
that search alone can't reach.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Double search results per query (20→40), increase query budget (15→25),
use more tags per signal (6→10-12), index more new channels per refresh
(5→10). Remove the YT logo from the header.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the engine was blind to dislikes/dismissals:
- _build_user_tag_profile only used liked/watched (positive only)
- dismiss_penalty was capped at 80% so hated content still surfaced
- _search_and_store had zero affinity filtering, any YouTube result entered the queue
- user_tag_affinity negative scores (written by dismiss/dislike) were never read
Now:
- _build_user_tag_profile reads directly from user_tag_affinity (positive + negative)
- _tag_relevance_score returns negative values, so disliked-tag channels score below zero and get dropped
- _search_and_store skips channels whose indexed videos match 3+ negatively-rated tags
- list_discovery post-filters channels already in the queue using the same neg-affinity check
- Removed the old _dismissed_channel_tags + dismiss_penalty (superseded)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Self-hosted personal YouTube management app.
FastAPI + SQLite backend, React + Vite + Tailwind frontend.
Dockerfiles and compose included for Portainer deployment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>