Commit Graph

13 Commits

Author SHA1 Message Date
a0384b2277 Schedule auto-discovery at 4 AM daily instead of every 23 hours
Replaced the rolling 23-hour check with a fixed-time scheduler that sleeps
until the next 4:00 AM, runs discovery for all users, then sleeps until the
following 4 AM. No longer reads last_discovery_run — just runs at the same
time every day.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 02:59:41 +02:00
a535e9f22a Add queue-based gradual discovery with shuffled call ordering and progress UI
Each yt-dlp call is now an independent task (one search query, one trending
fetch, one graph channel fetch). Tasks are shuffled together so we don't fire
10 searches in a row, then enqueued with 30-90s random gaps between them —
a full sweep of ~17 tasks completes in roughly 10-25 minutes instead of
hammering YouTube with 21 calls back-to-back.

Fast signals (community, category clusters) still run synchronously at
schedule time since they're pure SQL.

Progress is tracked per-user (total/done/running) and exposed on
GET /api/discovery/status. The Discovery page polls every 10s while
running and shows a progress bar + "Finding channels… X / Y" in the header.
The auto-discovery daemon skips scheduling if a manual sweep is already running.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 02:28:35 +02:00
12f54ac5b0 Auto-schedule daily discovery + fix Find More UX + expand query diversity
Auto-discovery daemon:
- Runs every hour, triggers full discovery for any user whose last run
  was >23 hours ago. First check is 5 minutes after startup.
- Tracks run time in user_settings.last_discovery_run (new column).
- Manual Find More also stamps last_discovery_run.

Discovery status endpoint (GET /api/discovery/status):
- Returns pending_count (unseen queue size) and last_run timestamp.
- Shown in the Discover page header so users know queue state at a glance.

Find More UX fix:
- Was: kick background task, wait 8 seconds, refetch (task takes minutes).
- Now: button shows "Queued ✓" on success with an explanatory banner
  telling the user it takes a few minutes and also runs daily automatically.

Query diversity:
- Added "best [category] channels" serendipity queries to crawl_by_search.
- Limit raised from 25 to 30 queries per run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:58:39 +02:00
bbf7cc939b Overhaul For You feed ranking and freshness
Ranking improvements:
- Wider candidate pool (4x limit) with ±12pt score perturbation so
  same-score videos shuffle differently each load
- Recent channel engagement signal: channels watched in past 30 days
  get a +4pts/watch boost
- Bail penalty: -25pts for videos started but abandoned before 20%
- Impression penalty: -3pts per prior feed appearance (capped at 10),
  so repeatedly-skipped videos sink naturally
- rn cap raised to 5 for more candidates; Python-side sampling picks top limit

Feed UX:
- Reshuffle button now available on For You (ranked) mode, not just Explore
- shuffleKey now always included in query key (not just random mode)
- Ranked mode staleTime reduced from 10min to 90s

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 01:14:10 +02:00
112f87e764 Popular tab now shows only flagged popular videos in rank order
Add channel_popular_videos table (channel_id, video_id, rank).
_fetch_popular_task clears and rewrites this table after each fetch.
GET /channels/{id}/videos?sort=popular now JOINs this table and orders
by rank instead of view_count, so the tab shows exactly the videos
YouTube returned in popularity order — nothing more.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:38:53 +02:00
2f37072187 Fix popular fetch and improve date/view_count coverage
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.

Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.

Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:36:18 +02:00
5b0cf27f07 Add playlists support and fix explore older videos
- New playlists router: fetch channel playlists from YouTube, index
  playlist videos, browse by playlist with pagination
- Playlist model gets video_ids column to store ordered video list
- Register playlists router in main.py with DB migration
- Add Playlists tab to Channel page: grid of playlist cards, click to
  browse videos, index/re-index per playlist
- Fix explore older videos skipping all entries without published_at;
  flat-playlist entries for older videos rarely include timestamp data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:28:35 +02:00
ea99b74ba8 Add scheduled sync, disk space awareness, and subtitle downloads
- auto-sync daemon: background thread checks every hour and syncs followed
  channels for users with sync_interval_hours set (6/12/24h options)
- disk stats: /api/stats now returns total/used/free/download bytes;
  Stats page shows a disk usage bar
- subtitles: subtitle_langs setting (e.g. "en,sv") passed through all
  download paths; yt-dlp writes .srt files alongside the video
- Settings page: sync interval dropdown + subtitle languages input

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 20:36:50 +02:00
Mattias Tall
1cd8645957 Fix YouTube hammering, sync rate limiting, and Following load time
Sync throttling:
- sync-all now skips channels crawled within the last 6 hours (prevents
  re-scraping 1266 channels on every button press)
- Channels are queued into a single _index_channels_batch task that runs
  with 1.5s delay between each yt-dlp call instead of firing 1266
  background tasks simultaneously
- Startup enrich task reduced from 10 to 3 videos (3 yt-dlp calls on
  each container restart)
- Enrich task adds 2s sleep between metadata fetches

SQLite stability:
- busy_timeout=5000 prevents SQLITE_BUSY errors under concurrent load
- synchronous=NORMAL speeds up writes without data loss risk (safe with WAL)

Following page:
- staleTime: 60s on channels query so cached data is reused immediately
  on revisit; gcTime keeps it in memory for 5 min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:00:37 +02:00
Mattias Tall
98d986cd95 Fix cookie fallback breaking yt-dlp in Docker; add OAuth2 auth flow
- _cookie_args() no longer falls through to --cookies-from-browser when
  cookies_file is configured but missing. Firefox isn't installed in the
  Docker image, so that fallback caused yt-dlp to exit with empty stdout
  and every metadata fetch to return "Video not found on YouTube".
- fetch_video_metadata() now retries without auth args if the first call
  fails, so a broken cookie config can't block public video fetches.
- Add use_oauth2 setting + full device-auth flow (POST /settings/oauth2-init,
  GET /settings/oauth2-status) with OAuth2Section UI in Settings page.
- Add GET /settings/ytdlp-test diagnostics endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 09:53:02 +02:00
inputnoise
56dd5f8360 Add cookies file support for Docker; auto-detect /data/cookies.txt 2026-05-25 20:57:04 +02:00
inputnoise
7194ec45ec Remove ALLOW_REGISTRATION env var — managed via admin UI instead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 20:16:15 +02:00
inputnoise
1827dd6c4e Initial commit — YT Hub
Self-hosted personal YouTube management app.
FastAPI + SQLite backend, React + Vite + Tailwind frontend.
Dockerfiles and compose included for Portainer deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 20:09:04 +02:00