13 Commits

Author SHA1 Message Date
a3346c6e87 fix: stop discovery from bursting dozens of yt-dlp calls inside one task
Each search/graph/trending task was calling _fetch_and_index_channel
inline for up to 10-15 newly discovered channels, each making up to 4
yt-dlp calls (1 channel metadata + 3 individual video fetches for
dateless entries). This bypassed the 30-90 s worker gap, producing
bursts of 40-60 calls in rapid succession and hammering YouTube.

Changes:
- _fetch_and_index_channel: removed the dateless-video individual
  fetch loop — one call per channel, videos without published_at are
  simply skipped at discovery time
- _search_and_store and _fetch_graph_for_channel: queue channel
  indexing as separate worker tasks (3 and 2 respectively) so the
  30-90 s gap applies between every yt-dlp call, including channel
  indexing
- update_trending_signal and update_graph_signal (old sync path):
  removed inline _fetch_and_index_channel loops (15 and 10 channels)
- _discovery_task in channels.py: replaced run_full_discovery (old
  synchronous path) with schedule_discovery so sync-all and
  follow-by-url go through the queue system

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 03:17:37 +02:00
a535e9f22a Add queue-based gradual discovery with shuffled call ordering and progress UI
Each yt-dlp call is now an independent task (one search query, one trending
fetch, one graph channel fetch). Tasks are shuffled together so we don't fire
10 searches in a row, then enqueued with 30-90s random gaps between them —
a full sweep of ~17 tasks completes in roughly 10-25 minutes instead of
hammering YouTube with 21 calls back-to-back.

Fast signals (community, category clusters) still run synchronously at
schedule time since they're pure SQL.

Progress is tracked per-user (total/done/running) and exposed on
GET /api/discovery/status. The Discovery page polls every 10s while
running and shows a progress bar + "Finding channels… X / Y" in the header.
The auto-discovery daemon skips scheduling if a manual sweep is already running.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 02:28:35 +02:00
e6faf8e08e Drastically reduce discovery yt-dlp call count: 64 → ~21
Each yt-dlp call is a separate subprocess that opens a new HTTP session with
YouTube. 64 sessions in a row looks like a bot regardless of rate limiting.

Changes:
- crawl_by_search: 30 queries → 10 (top 5 tags, 4 channel names, 1 serendipity)
- update_liked_signal: 10 queries → 4
- update_watch_signal: removed (tags already included in crawl_by_search)
- update_trending_signal: 2 regions → 1 (first region only)
- update_graph_signal: 12 sampled channels → 6

New total: ~21 yt-dlp calls per run (~105s with 5s gaps) vs ~320s before.
Signal quality is preserved — the removed queries were low-marginal-value
duplicates of content already covered by the remaining ones.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 02:14:25 +02:00
12f54ac5b0 Auto-schedule daily discovery + fix Find More UX + expand query diversity
Auto-discovery daemon:
- Runs every hour, triggers full discovery for any user whose last run
  was >23 hours ago. First check is 5 minutes after startup.
- Tracks run time in user_settings.last_discovery_run (new column).
- Manual Find More also stamps last_discovery_run.

Discovery status endpoint (GET /api/discovery/status):
- Returns pending_count (unseen queue size) and last_run timestamp.
- Shown in the Discover page header so users know queue state at a glance.

Find More UX fix:
- Was: kick background task, wait 8 seconds, refetch (task takes minutes).
- Now: button shows "Queued ✓" on success with an explanatory banner
  telling the user it takes a few minutes and also runs daily automatically.

Query diversity:
- Added "best [category] channels" serendipity queries to crawl_by_search.
- Limit raised from 25 to 30 queries per run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:58:39 +02:00
592194f2ca Fix discovery scoring: cap trending, prevent score inflation, add freshness
- Cap trending base_score at 18.0 (was unbounded — a viral channel could
  score 240+ vs search's 15, making everything else invisible)
- Cap all discovery scores at 50.0 globally so no single signal dominates
- Fix score accumulation: cap accumulated total at 50.0 (was unbounded
  across repeated runs, cementing high-score channels in top positions forever)
- Expire unseen queue entries older than 14 days at start of each run
- Add ±8 score perturbation to discovery list endpoint (was pure score DESC,
  identical every visit until dismissed)
- Add score perturbation to discovery_videos ORDER BY too
- Fix SQL injection in update_category_clusters (category strings were
  interpolated directly into query; now use parameterized queries per category)
- Raise category signal score from 3.0 → 5.0 to compensate for trending cap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:37:09 +02:00
b6a47249d0 Fix search latency: bypass rate limiter for user-triggered searches
search_youtube now takes polite=False (default) for instant user
searches and polite=True for background discovery crawls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:27:36 +02:00
19dae63385 Route all discovery fetches through global rate limiter
- search_youtube, fetch_trending, fetch_featured_channels now use _meta_run
- Replaced ThreadPoolExecutor(4) parallel searches with sequential loop
- Replaced ThreadPoolExecutor(3) parallel featured-channel fetches with sequential
- _fetch_and_index_channel passes polite=True to fetch_channel/video_metadata

Discovery was firing 4+ simultaneous yt-dlp processes, each with cookies,
which is what invalidated the session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:05:56 +02:00
c3290d33a7 Reduce parallel YouTube request workers to avoid cookie invalidation
8 simultaneous yt-dlp processes hitting video pages looks like a bot
attack and causes YouTube to nuke the session cookies. Drop to:
- Popular fetch view_count enrichment: 8→3 workers
- Discovery search: 8→4 workers
- Graph signal (featured channels): 8→3 workers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 23:11:07 +02:00
2f37072187 Fix popular fetch and improve date/view_count coverage
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.

Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.

Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:36:18 +02:00
871f668525 Parallelize discovery searches and add graph signal
Run search queries concurrently (5 workers) instead of sequentially —
cuts crawl time dramatically. Add graph signal: fetch featured channels
from followed channels' /channels tab in parallel (4 workers), which
surfaces creator-curated recommendations as a high-signal, diverse pool
that search alone can't reach.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 21:59:23 +02:00
62c2c73906 Expand discovery pool and remove header logo
Double search results per query (20→40), increase query budget (15→25),
use more tags per signal (6→10-12), index more new channels per refresh
(5→10). Remove the YT logo from the header.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 21:55:52 +02:00
Mattias Tall
3e3d2c7464 Fix discovery to actually use negative affinity signals
Previously the engine was blind to dislikes/dismissals:
- _build_user_tag_profile only used liked/watched (positive only)
- dismiss_penalty was capped at 80% so hated content still surfaced
- _search_and_store had zero affinity filtering, any YouTube result entered the queue
- user_tag_affinity negative scores (written by dismiss/dislike) were never read

Now:
- _build_user_tag_profile reads directly from user_tag_affinity (positive + negative)
- _tag_relevance_score returns negative values, so disliked-tag channels score below zero and get dropped
- _search_and_store skips channels whose indexed videos match 3+ negatively-rated tags
- list_discovery post-filters channels already in the queue using the same neg-affinity check
- Removed the old _dismissed_channel_tags + dismiss_penalty (superseded)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:48:39 +02:00
inputnoise
1827dd6c4e Initial commit — YT Hub
Self-hosted personal YouTube management app.
FastAPI + SQLite backend, React + Vite + Tailwind frontend.
Dockerfiles and compose included for Portainer deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 20:09:04 +02:00