youclonedl

Author	SHA1	Message	Date
Mattias Thall	a3346c6e87	fix: stop discovery from bursting dozens of yt-dlp calls inside one task Each search/graph/trending task was calling _fetch_and_index_channel inline for up to 10-15 newly discovered channels, each making up to 4 yt-dlp calls (1 channel metadata + 3 individual video fetches for dateless entries). This bypassed the 30-90 s worker gap, producing bursts of 40-60 calls in rapid succession and hammering YouTube. Changes: - _fetch_and_index_channel: removed the dateless-video individual fetch loop — one call per channel, videos without published_at are simply skipped at discovery time - _search_and_store and _fetch_graph_for_channel: queue channel indexing as separate worker tasks (3 and 2 respectively) so the 30-90 s gap applies between every yt-dlp call, including channel indexing - update_trending_signal and update_graph_signal (old sync path): removed inline _fetch_and_index_channel loops (15 and 10 channels) - _discovery_task in channels.py: replaced run_full_discovery (old synchronous path) with schedule_discovery so sync-all and follow-by-url go through the queue system Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 03:17:37 +02:00
Mattias Thall	a535e9f22a	Add queue-based gradual discovery with shuffled call ordering and progress UI Each yt-dlp call is now an independent task (one search query, one trending fetch, one graph channel fetch). Tasks are shuffled together so we don't fire 10 searches in a row, then enqueued with 30-90s random gaps between them — a full sweep of ~17 tasks completes in roughly 10-25 minutes instead of hammering YouTube with 21 calls back-to-back. Fast signals (community, category clusters) still run synchronously at schedule time since they're pure SQL. Progress is tracked per-user (total/done/running) and exposed on GET /api/discovery/status. The Discovery page polls every 10s while running and shows a progress bar + "Finding channels… X / Y" in the header. The auto-discovery daemon skips scheduling if a manual sweep is already running. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 02:28:35 +02:00
Mattias Thall	e6faf8e08e	Drastically reduce discovery yt-dlp call count: 64 → ~21 Each yt-dlp call is a separate subprocess that opens a new HTTP session with YouTube. 64 sessions in a row looks like a bot regardless of rate limiting. Changes: - crawl_by_search: 30 queries → 10 (top 5 tags, 4 channel names, 1 serendipity) - update_liked_signal: 10 queries → 4 - update_watch_signal: removed (tags already included in crawl_by_search) - update_trending_signal: 2 regions → 1 (first region only) - update_graph_signal: 12 sampled channels → 6 New total: ~21 yt-dlp calls per run (~105s with 5s gaps) vs ~320s before. Signal quality is preserved — the removed queries were low-marginal-value duplicates of content already covered by the remaining ones. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 02:14:25 +02:00
Mattias Thall	12f54ac5b0	Auto-schedule daily discovery + fix Find More UX + expand query diversity Auto-discovery daemon: - Runs every hour, triggers full discovery for any user whose last run was >23 hours ago. First check is 5 minutes after startup. - Tracks run time in user_settings.last_discovery_run (new column). - Manual Find More also stamps last_discovery_run. Discovery status endpoint (GET /api/discovery/status): - Returns pending_count (unseen queue size) and last_run timestamp. - Shown in the Discover page header so users know queue state at a glance. Find More UX fix: - Was: kick background task, wait 8 seconds, refetch (task takes minutes). - Now: button shows "Queued ✓" on success with an explanatory banner telling the user it takes a few minutes and also runs daily automatically. Query diversity: - Added "best [category] channels" serendipity queries to crawl_by_search. - Limit raised from 25 to 30 queries per run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 01:58:39 +02:00
Mattias Thall	592194f2ca	Fix discovery scoring: cap trending, prevent score inflation, add freshness - Cap trending base_score at 18.0 (was unbounded — a viral channel could score 240+ vs search's 15, making everything else invisible) - Cap all discovery scores at 50.0 globally so no single signal dominates - Fix score accumulation: cap accumulated total at 50.0 (was unbounded across repeated runs, cementing high-score channels in top positions forever) - Expire unseen queue entries older than 14 days at start of each run - Add ±8 score perturbation to discovery list endpoint (was pure score DESC, identical every visit until dismissed) - Add score perturbation to discovery_videos ORDER BY too - Fix SQL injection in update_category_clusters (category strings were interpolated directly into query; now use parameterized queries per category) - Raise category signal score from 3.0 → 5.0 to compensate for trending cap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 01:37:09 +02:00
Mattias Thall	b6a47249d0	Fix search latency: bypass rate limiter for user-triggered searches search_youtube now takes polite=False (default) for instant user searches and polite=True for background discovery crawls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 01:27:36 +02:00
Mattias Thall	19dae63385	Route all discovery fetches through global rate limiter - search_youtube, fetch_trending, fetch_featured_channels now use _meta_run - Replaced ThreadPoolExecutor(4) parallel searches with sequential loop - Replaced ThreadPoolExecutor(3) parallel featured-channel fetches with sequential - _fetch_and_index_channel passes polite=True to fetch_channel/video_metadata Discovery was firing 4+ simultaneous yt-dlp processes, each with cookies, which is what invalidated the session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 01:05:56 +02:00
Mattias Thall	c3290d33a7	Reduce parallel YouTube request workers to avoid cookie invalidation 8 simultaneous yt-dlp processes hitting video pages looks like a bot attack and causes YouTube to nuke the session cookies. Drop to: - Popular fetch view_count enrichment: 8→3 workers - Discovery search: 8→4 workers - Graph signal (featured channels): 8→3 workers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 23:11:07 +02:00
Mattias Thall	2f37072187	Fix popular fetch and improve date/view_count coverage Popular fetch now does a two-phase approach: fast flat-playlist to get IDs in popularity order, then parallel full metadata fetch (8 workers) to get real view_count and published_at for each video. Previously flat-playlist mode returned timestamp/view_count as null. Enrich task now also backfills published_at and view_count (not just description). Startup limit 3→50, enrichment sleep 2s→0.5s. Raise all thread pool sizes to match 8-core machine: - Discovery search: 5→8 workers - Graph signal: 4→8 workers - Popular fetch: 5→8 workers - Download semaphore default 3→6, cap 10→16 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 22:36:18 +02:00
Mattias Thall	871f668525	Parallelize discovery searches and add graph signal Run search queries concurrently (5 workers) instead of sequentially — cuts crawl time dramatically. Add graph signal: fetch featured channels from followed channels' /channels tab in parallel (4 workers), which surfaces creator-curated recommendations as a high-signal, diverse pool that search alone can't reach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 21:59:23 +02:00
Mattias Thall	62c2c73906	Expand discovery pool and remove header logo Double search results per query (20→40), increase query budget (15→25), use more tags per signal (6→10-12), index more new channels per refresh (5→10). Remove the YT logo from the header. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 21:55:52 +02:00
Mattias Tall	3e3d2c7464	Fix discovery to actually use negative affinity signals Previously the engine was blind to dislikes/dismissals: - _build_user_tag_profile only used liked/watched (positive only) - dismiss_penalty was capped at 80% so hated content still surfaced - _search_and_store had zero affinity filtering, any YouTube result entered the queue - user_tag_affinity negative scores (written by dismiss/dislike) were never read Now: - _build_user_tag_profile reads directly from user_tag_affinity (positive + negative) - _tag_relevance_score returns negative values, so disliked-tag channels score below zero and get dropped - _search_and_store skips channels whose indexed videos match 3+ negatively-rated tags - list_discovery post-filters channels already in the queue using the same neg-affinity check - Removed the old _dismissed_channel_tags + dismiss_penalty (superseded) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 16:48:39 +02:00
inputnoise	1827dd6c4e	Initial commit — YT Hub Self-hosted personal YouTube management app. FastAPI + SQLite backend, React + Vite + Tailwind frontend. Dockerfiles and compose included for Portainer deployment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 20:09:04 +02:00

13 Commits