Commit Graph

20 Commits

Author SHA1 Message Date
c3290d33a7 Reduce parallel YouTube request workers to avoid cookie invalidation
8 simultaneous yt-dlp processes hitting video pages looks like a bot
attack and causes YouTube to nuke the session cookies. Drop to:
- Popular fetch view_count enrichment: 8→3 workers
- Discovery search: 8→4 workers
- Graph signal (featured channels): 8→3 workers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 23:11:07 +02:00
be7319e96c Sample videos randomly for view_count enrichment, not newest-first
Previously ORDER BY published_at DESC meant only the newest 200 videos
ever got view counts. Now ORDER BY RANDOM() spreads the 200 slots across
the full channel history — videos without a count are still prioritised,
but among those they're drawn randomly. Each run of Fetch Popular covers
a different slice, converging toward full coverage over time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 23:06:32 +02:00
6e455ed8ce Fetch popular: flat-playlist crawl then parallel view_count enrichment
Phase 1: crawl the full channel with flat-playlist to store any videos
not yet in DB (fast, no individual requests).
Phase 2: fetch real view_count for up to 200 channel videos in parallel
(8 workers), prioritising those missing a count.
Popular tab sorts all channel videos by view_count DESC NULLS LAST.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 23:05:21 +02:00
ff4d8e4ab4 Popular tab: rank by real view_count, drop broken ?sort=p URL
yt-dlp's own test suite marks channel sort as 'Query for sorting no
longer works' — YouTube blocked it. New approach: fetch view_count for
up to 200 indexed videos in parallel (8 workers, prioritising those
missing counts), then Popular tab sorts by view_count DESC WHERE
view_count IS NOT NULL. Accurate for any channel once enrichment runs.
Frontend refetch wait raised to 60s to cover ~200 parallel fetches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 23:02:03 +02:00
3e699d61b6 Fix popular task failing silently when table doesn't exist
The outer try had no except — any exception (e.g. table missing) killed
the whole background task with no error visible to the user. Now:
- CREATE TABLE IF NOT EXISTS inline so the task works even if the
  startup migration hasn't run (no server restart required)
- Wrap DELETE in its own try/except
- Catch and print outer exceptions so failures appear in server logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:52:30 +02:00
77cba81ef4 Popular: write Phase 1 immediately, enrich view_count in background
Previously the task waited for all 30 parallel metadata fetches before
writing anything to the DB (~30s). Now Phase 1 (flat-playlist IDs +
basic info) commits to channel_popular_videos immediately (~5s), so the
tab populates fast. Phase 2 (view_count + dates) runs in a daemon thread
while the user is already browsing.

Also: catch table-not-found errors in the sort=popular query so a cold
server returns [] instead of 500. Frontend refetch wait 35s→8s to match
the faster Phase 1 commit time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:47:42 +02:00
112f87e764 Popular tab now shows only flagged popular videos in rank order
Add channel_popular_videos table (channel_id, video_id, rank).
_fetch_popular_task clears and rewrites this table after each fetch.
GET /channels/{id}/videos?sort=popular now JOINs this table and orders
by rank instead of view_count, so the tab shows exactly the videos
YouTube returned in popularity order — nothing more.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:38:53 +02:00
2f37072187 Fix popular fetch and improve date/view_count coverage
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.

Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.

Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:36:18 +02:00
5b0cf27f07 Add playlists support and fix explore older videos
- New playlists router: fetch channel playlists from YouTube, index
  playlist videos, browse by playlist with pagination
- Playlist model gets video_ids column to store ordered video list
- Register playlists router in main.py with DB migration
- Add Playlists tab to Channel page: grid of playlist cards, click to
  browse videos, index/re-index per playlist
- Fix explore older videos skipping all entries without published_at;
  flat-playlist entries for older videos rarely include timestamp data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:28:35 +02:00
d31fc1ef7f Add Popular tab to channel page
- YouTube sort=p fetch: indexes top 100 most-viewed videos from a channel,
  storing view_count in the DB
- Popular tab on channel page shows videos sorted by view_count DESC
- Videos/Popular tab switcher with context-appropriate fetch buttons
- Expose view_count in VideoOut; add 'popular' sort to channel videos endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:22:10 +02:00
aa91156bbc Add older content exploration: channel page + home feed Rediscover mode
Channel page:
- "Explore older videos" button fetches 100 videos at a time further back
  in the channel history using yt-dlp --playlist-start/--playlist-end
- "Fetch entire history" still available for full crawl
- Backend: /channels/{id}/explore?page=N endpoint + playlist offset support
  in fetch_channel_metadata(start_video=N)

Home feed:
- New "Rediscover" mode: older unwatched videos (90+ days old) from
  followed channels, randomly sampled then re-ranked by tag affinity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:17:20 +02:00
0b482b5d49 Overhaul channel page: search, pagination, fetch all history
- Search bar filters indexed videos server-side; "Search YouTube" button
  triggers a deep channel search and indexes matching results
- Server-side sort (newest/oldest/A-Z/unwatched) + infinite scroll (60/page)
- "Fetch recent" indexes last 30, "Fetch all" indexes full history
- Auto-reindex on page visit if stale (>1h), refetches at 8s
- Add /channels/{id}/index-full endpoint (max_videos=0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:15:09 +02:00
50d61b5774 Fix crawled_at type error in get_channel
SQLite returns datetime columns as strings via raw text() queries.
Parse crawled_at safely before comparing against utcnow().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:04:35 +02:00
d740fd5224 Auto-reindex channel on page visit if stale
GET /channels/{id} now fires a background _index_channel_task if the
channel hasn't been crawled in the last hour. The frontend refetches
channel + videos 8s after page load to pick up the updated data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 22:02:59 +02:00
ea99b74ba8 Add scheduled sync, disk space awareness, and subtitle downloads
- auto-sync daemon: background thread checks every hour and syncs followed
  channels for users with sync_interval_hours set (6/12/24h options)
- disk stats: /api/stats now returns total/used/free/download bytes;
  Stats page shows a disk usage bar
- subtitles: subtitle_langs setting (e.g. "en,sv") passed through all
  download paths; yt-dlp writes .srt files alongside the video
- Settings page: sync interval dropdown + subtitle languages input

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 20:36:50 +02:00
Mattias Tall
c00d5c7595 Optimise Following page: 4 aggregated queries, no correlated subqueries
- Rewrite list_channels to run exactly 4 SQL queries regardless of channel
  count: channel rows, aggregated video stats (GROUP BY), new-video counts,
  and latest video (derived-table JOIN replaces per-row correlated subquery)
- Remove dead _CHANNEL_STATS_SELECT (orphaned after the rewrite)
- Fix upload_frequency_days: use pre-computed date_span_days from vstats
  instead of a broken per-channel db.execute() call
- Restrict new_counts query to id_csv so it uses idx_videos_channel_indexed
- markChannelsSeen: optimistic setQueryData instead of invalidateQueries,
  eliminating a full channel-list re-fetch on every Following page visit
- DownloadIndicator idle poll: 10s → 30s (no need to hit DB when idle)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:18:33 +02:00
Mattias Tall
1405acfaed Revert channel stats to correlated subqueries (CTE had a param binding bug)
The CTE approach returned 0 rows — likely a SQLite/SQLAlchemy interaction
with :user_id appearing in multiple CTEs. Reverted to the original
correlated-subquery form which is proven correct.

The 4 indexes added in the previous commit still apply and will make
the per-channel subqueries faster once the DB is indexed on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:10:24 +02:00
Mattias Tall
74e9a52096 Fix Following page: replace 9-subquery-per-channel stats with 2 CTEs + indexes
The old _CHANNEL_STATS_SELECT ran 9 correlated subqueries for each
channel row. With 1266 channels that was ~11000 sub-executions per
GET /channels request, causing multi-second (or timeout) delays.

New approach: 2 CTEs (vinfo for counts/sums, nc for new_count) each do
a single aggregated pass over all followed-channel videos, joined back
to channels. Only 2 correlated LIMIT-1 subqueries remain for
latest_video_id/title (fast with the new index).

Also adds 4 indexes on startup (IF NOT EXISTS — safe to deploy):
- videos(channel_id, published_at DESC)  — latest video lookups
- videos(channel_id, indexed_at)         — new_count filter
- user_videos(video_id, user_id)         — watch/download aggregation
- user_channels(user_id, status)         — followed channel filter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:04:41 +02:00
Mattias Tall
1cd8645957 Fix YouTube hammering, sync rate limiting, and Following load time
Sync throttling:
- sync-all now skips channels crawled within the last 6 hours (prevents
  re-scraping 1266 channels on every button press)
- Channels are queued into a single _index_channels_batch task that runs
  with 1.5s delay between each yt-dlp call instead of firing 1266
  background tasks simultaneously
- Startup enrich task reduced from 10 to 3 videos (3 yt-dlp calls on
  each container restart)
- Enrich task adds 2s sleep between metadata fetches

SQLite stability:
- busy_timeout=5000 prevents SQLITE_BUSY errors under concurrent load
- synchronous=NORMAL speeds up writes without data loss risk (safe with WAL)

Following page:
- staleTime: 60s on channels query so cached data is reused immediately
  on revisit; gcTime keeps it in memory for 5 min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 16:00:37 +02:00
inputnoise
1827dd6c4e Initial commit — YT Hub
Self-hosted personal YouTube management app.
FastAPI + SQLite backend, React + Vite + Tailwind frontend.
Dockerfiles and compose included for Portainer deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 20:09:04 +02:00