Phase 1: crawl the full channel with flat-playlist to store any videos
not yet in DB (fast, no individual requests).
Phase 2: fetch real view_count for up to 200 channel videos in parallel
(8 workers), prioritising those missing a count.
Popular tab sorts all channel videos by view_count DESC NULLS LAST.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp's own test suite marks channel sort as 'Query for sorting no
longer works' — YouTube blocked it. New approach: fetch view_count for
up to 200 indexed videos in parallel (8 workers, prioritising those
missing counts), then Popular tab sorts by view_count DESC WHERE
view_count IS NOT NULL. Accurate for any channel once enrichment runs.
Frontend refetch wait raised to 60s to cover ~200 parallel fetches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The outer try had no except — any exception (e.g. table missing) killed
the whole background task with no error visible to the user. Now:
- CREATE TABLE IF NOT EXISTS inline so the task works even if the
startup migration hasn't run (no server restart required)
- Wrap DELETE in its own try/except
- Catch and print outer exceptions so failures appear in server logs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
flat-playlist mode returns timestamp=null for most playlist entries so
published_at is missing after the initial index. Now kicks off
_enrich_missing_task (scoped to the playlist size) as a daemon thread
immediately after indexing commits, filling in dates and view counts
in the background via individual video fetches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the task waited for all 30 parallel metadata fetches before
writing anything to the DB (~30s). Now Phase 1 (flat-playlist IDs +
basic info) commits to channel_popular_videos immediately (~5s), so the
tab populates fast. Phase 2 (view_count + dates) runs in a daemon thread
while the user is already browsing.
Also: catch table-not-found errors in the sort=popular query so a cold
server returns [] instead of 500. Frontend refetch wait 35s→8s to match
the faster Phase 1 commit time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When yt-dlp returns no thumbnail for a playlist entry, fetch the
playlist's first video (max_videos=1) and derive a stable thumbnail
URL from its video ID. Applied during both the initial fetch and
on index (already done on index in previous commit).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_stable_thumbnail expects a video ID but was being passed a playlist ID
(PLxxx), producing a broken URL. Now picks the best thumbnail from
yt-dlp's thumbnails array, falling back to the singular thumbnail field.
Also backfills playlist.thumbnail_url from the first video when indexing
a playlist that still has no thumbnail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add channel_popular_videos table (channel_id, video_id, rank).
_fetch_popular_task clears and rewrites this table after each fetch.
GET /channels/{id}/videos?sort=popular now JOINs this table and orders
by rank instead of view_count, so the tab shows exactly the videos
YouTube returned in popularity order — nothing more.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Popular fetch now does a two-phase approach: fast flat-playlist to get
IDs in popularity order, then parallel full metadata fetch (8 workers)
to get real view_count and published_at for each video. Previously
flat-playlist mode returned timestamp/view_count as null.
Enrich task now also backfills published_at and view_count (not just
description). Startup limit 3→50, enrichment sleep 2s→0.5s.
Raise all thread pool sizes to match 8-core machine:
- Discovery search: 5→8 workers
- Graph signal: 4→8 workers
- Popular fetch: 5→8 workers
- Download semaphore default 3→6, cap 10→16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New playlists router: fetch channel playlists from YouTube, index
playlist videos, browse by playlist with pagination
- Playlist model gets video_ids column to store ordered video list
- Register playlists router in main.py with DB migration
- Add Playlists tab to Channel page: grid of playlist cards, click to
browse videos, index/re-index per playlist
- Fix explore older videos skipping all entries without published_at;
flat-playlist entries for older videos rarely include timestamp data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- YouTube sort=p fetch: indexes top 100 most-viewed videos from a channel,
storing view_count in the DB
- Popular tab on channel page shows videos sorted by view_count DESC
- Videos/Popular tab switcher with context-appropriate fetch buttons
- Expose view_count in VideoOut; add 'popular' sort to channel videos endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Channel page:
- "Explore older videos" button fetches 100 videos at a time further back
in the channel history using yt-dlp --playlist-start/--playlist-end
- "Fetch entire history" still available for full crawl
- Backend: /channels/{id}/explore?page=N endpoint + playlist offset support
in fetch_channel_metadata(start_video=N)
Home feed:
- New "Rediscover" mode: older unwatched videos (90+ days old) from
followed channels, randomly sampled then re-ranked by tag affinity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Search bar filters indexed videos server-side; "Search YouTube" button
triggers a deep channel search and indexes matching results
- Server-side sort (newest/oldest/A-Z/unwatched) + infinite scroll (60/page)
- "Fetch recent" indexes last 30, "Fetch all" indexes full history
- Auto-reindex on page visit if stale (>1h), refetches at 8s
- Add /channels/{id}/index-full endpoint (max_videos=0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQLite returns datetime columns as strings via raw text() queries.
Parse crawled_at safely before comparing against utcnow().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET /channels/{id} now fires a background _index_channel_task if the
channel hasn't been crawled in the last hour. The frontend refetches
channel + videos 8s after page load to pick up the updated data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run search queries concurrently (5 workers) instead of sequentially —
cuts crawl time dramatically. Add graph signal: fetch featured channels
from followed channels' /channels tab in parallel (4 workers), which
surfaces creator-curated recommendations as a high-signal, diverse pool
that search alone can't reach.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Double search results per query (20→40), increase query budget (15→25),
use more tags per signal (6→10-12), index more new channels per refresh
(5→10). Remove the YT logo from the header.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip yt-dlp's align:start position:0% cue settings from VTT files
after both video download and subtitle-only download so CSS ::cue centers them
- CC chip now shows already-downloaded langs (e.g. 'CC: en') directly
from disk with a '+' button to add more — no YouTube call needed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- download_subs_only(): yt-dlp --skip-download to fetch just .vtt sidecar
- POST /by-yt/{ytId}/download-subs endpoint
- CC chip now visible on downloaded videos; clicking checks YouTube,
shows lang picker with "Add subtitles" button separate from re-download
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Convert subs to .vtt (was .srt which browsers don't support in <track>)
- Add GET /subtitle-files endpoint: instant disk scan for .vtt sidecar files,
no yt-dlp call needed
- Inject <track> elements into the video player for each .vtt on disk;
browser CC button appears automatically
- Before download: CC chip triggers YouTube availability check (slow, on demand)
- After download with subs: shows "CC ✓" — subtitles live in the player controls
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fetch_available_subs() queries yt-dlp for manual + auto-generated
subtitle langs available on YouTube for any given video
- GET /api/videos/by-yt/{ytId}/subs exposes this to the frontend
- DownloadRequest now accepts subtitle_langs to override the global
setting on a per-download basis
- Watch page fetches available subtitle langs on load (in parallel),
shows a CC dropdown with manual langs + auto-generated langs labeled
"(auto)"; selected lang is passed through to the download
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- auto-sync daemon: background thread checks every hour and syncs followed
channels for users with sync_interval_hours set (6/12/24h options)
- disk stats: /api/stats now returns total/used/free/download bytes;
Stats page shows a disk usage bar
- subtitles: subtitle_langs setting (e.g. "en,sv") passed through all
download paths; yt-dlp writes .srt files alongside the video
- Settings page: sync interval dropdown + subtitle languages input
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Merger+ffmpeg faststart postprocessor arg was overwritten by the
subsequent embed-metadata and embed-thumbnail passes anyway, making it
a pointless extra ffmpeg remux. Dropped it and restored the embeds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both flags trigger extra ffmpeg passes over the entire file after the
stream merge. They're unnecessary — metadata lives in the DB and
thumbnails come from YouTube. Removing them cuts the post-join wait
to just the faststart rewrite.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp 2026.03.17 dropped support for tv_embedded — it silently skips it
and falls back to web-only, which only exposes the pre-merged 360p format
(ID 18). The override was added to avoid SABR restrictions but is now the
cause of the low-quality downloads.
Removing --extractor-args restores yt-dlp's default client selection
(android_vr + web fallback) which exposes all formats up to 2160p.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The per-quality format strings fell back to best[height<=NNN] which on
YouTube resolves to pre-merged streams capped at ~360p, causing every
quality selector choice to silently download low-res video. Replace with
bestvideo+bestaudio as the intermediate fallback so adaptive streams are
always preferred over pre-merged ones.
Also fix detect_resolution to correctly label 1440p and 2160p files
instead of capping the display at 1080p.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YouTube's web client gets SABR format restrictions in 2025-2026 yt-dlp,
limiting available streams and causing fallback to 360p. tv_embedded
bypasses SABR and exposes the full format list including 4K.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Most modern YouTube videos use VP9/AV1, so the old bestvideo[ext=mp4][vcodec^=avc1]
filter always failed and fell through to format codes 22/18 (720p/360p).
--merge-output-format mp4 handles the container; no need to restrict codec.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the engine was blind to dislikes/dismissals:
- _build_user_tag_profile only used liked/watched (positive only)
- dismiss_penalty was capped at 80% so hated content still surfaced
- _search_and_store had zero affinity filtering, any YouTube result entered the queue
- user_tag_affinity negative scores (written by dismiss/dislike) were never read
Now:
- _build_user_tag_profile reads directly from user_tag_affinity (positive + negative)
- _tag_relevance_score returns negative values, so disliked-tag channels score below zero and get dropped
- _search_and_store skips channels whose indexed videos match 3+ negatively-rated tags
- list_discovery post-filters channels already in the queue using the same neg-affinity check
- Removed the old _dismissed_channel_tags + dismiss_penalty (superseded)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Default list view across all pages (Home, Following, History, Queue,
ContinueWatching, Liked, Discovery, SearchResults, Channel)
- Watch.jsx mobile: smaller chips/title/avatar/meta, hide tags + keyboard
hint on mobile, tighter gaps, compact description padding
- Fix mobile bottom nav showing focus outline on tap
- Fix _update_affinity to write negative entries (not just positive) so
dislikes/dismissals on unseen content actually register
- Dismissing a discovery video now fires -3.0 affinity against its tags,
matching the dislike weight
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ranked feed — affinity was broken:
- Was looking up user_tag_affinity by v.category (e.g. "Science & Technology")
but affinity is stored using fine-grained video tags ("linux", "rust", etc.)
- Now uses SUM across all matching affinities: category OR any tag found in the
video's tags JSON via instr() — up to 5 matches to prevent runaway scores
Ranked feed — completion rate now influences channel scoring:
- Added avg_completion_pct to channel_stats CTE (AVG of completion_percent)
- Channels where you finish videos score higher; channels you bail on score lower
- Defaults to 50% (neutral) for channels with no tracked completions
Progress endpoint — backend auto-watched safety net:
- If completion_percent reaches ≥90% on a video >60s, mark watched automatically
- Catches cases where browser closes before the 10s debounce fires
- Guards against double-calling _update_affinity with not prev_watched check
VideoPlayer — seamless local file switch:
- Removed switchedToLocal state which caused a race condition: video loaded with
local_file_url already set but flag was still false, requiring a page refresh
- local_file_url from the backend is the single source of truth (backend gates
it with os.path.exists so it only appears when the file is actually on disk)
- Show spinner while video metadata loads, then immediately show local player
if file exists — no YouTube flash for already-downloaded videos
- After download completes, single refetchVideo() picks up the new URL and
React re-renders directly into local player
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite list_channels to run exactly 4 SQL queries regardless of channel
count: channel rows, aggregated video stats (GROUP BY), new-video counts,
and latest video (derived-table JOIN replaces per-row correlated subquery)
- Remove dead _CHANNEL_STATS_SELECT (orphaned after the rewrite)
- Fix upload_frequency_days: use pre-computed date_span_days from vstats
instead of a broken per-channel db.execute() call
- Restrict new_counts query to id_csv so it uses idx_videos_channel_indexed
- markChannelsSeen: optimistic setQueryData instead of invalidateQueries,
eliminating a full channel-list re-fetch on every Following page visit
- DownloadIndicator idle poll: 10s → 30s (no need to hit DB when idle)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CTE approach returned 0 rows — likely a SQLite/SQLAlchemy interaction
with :user_id appearing in multiple CTEs. Reverted to the original
correlated-subquery form which is proven correct.
The 4 indexes added in the previous commit still apply and will make
the per-channel subqueries faster once the DB is indexed on startup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old _CHANNEL_STATS_SELECT ran 9 correlated subqueries for each
channel row. With 1266 channels that was ~11000 sub-executions per
GET /channels request, causing multi-second (or timeout) delays.
New approach: 2 CTEs (vinfo for counts/sums, nc for new_count) each do
a single aggregated pass over all followed-channel videos, joined back
to channels. Only 2 correlated LIMIT-1 subqueries remain for
latest_video_id/title (fast with the new index).
Also adds 4 indexes on startup (IF NOT EXISTS — safe to deploy):
- videos(channel_id, published_at DESC) — latest video lookups
- videos(channel_id, indexed_at) — new_count filter
- user_videos(video_id, user_id) — watch/download aggregation
- user_channels(user_id, status) — followed channel filter
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sync throttling:
- sync-all now skips channels crawled within the last 6 hours (prevents
re-scraping 1266 channels on every button press)
- Channels are queued into a single _index_channels_batch task that runs
with 1.5s delay between each yt-dlp call instead of firing 1266
background tasks simultaneously
- Startup enrich task reduced from 10 to 3 videos (3 yt-dlp calls on
each container restart)
- Enrich task adds 2s sleep between metadata fetches
SQLite stability:
- busy_timeout=5000 prevents SQLITE_BUSY errors under concurrent load
- synchronous=NORMAL speeds up writes without data loss risk (safe with WAL)
Following page:
- staleTime: 60s on channels query so cached data is reused immediately
on revisit; gcTime keeps it in memory for 5 min
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All inline SQL queries in the feed endpoint (chronological, random,
inbox, ranked scored CTE, and discovery injection) were missing
c.thumbnail_url AS channel_thumbnail_url — only _VIDEO_SELECT had it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reading from the channels query cache was unreliable (cache might not be
loaded, or channel not followed). Add c.thumbnail_url AS channel_thumbnail_url
to _VIDEO_SELECT so every video response carries its channel avatar directly.
VideoCard uses it with cache as fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
max_comments takes thread_count,total,replies_per_thread,reply_pages.
Passing just one value left the rest unset which caused yt-dlp to fetch
only 1 comment. Now passes 20,20,0,0 to fetch 20 top-level comments
with no replies. Also switch --no-download to --skip-download.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--write-comments writes to .info.json reliably; parsing stdout with
--dump-json was never guaranteed to include comments. Use a TemporaryDirectory,
write the info.json there, read it, then let the context manager clean up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comments: switch from CLI --write-comments to yt-dlp Python API with
getcomments=True — more reliable, proper extractor_args dict format
Dislikes: add dislike_count column, fetch from returnyoutubedislike.com
after each video metadata upsert (5s timeout, non-fatal)
UI: replace emoji like count with a like/dislike ratio bar — blue fill
showing like proportion, labels on each end; views stay in meta row
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp separates extractor args with ; not ,. The malformed arg was
causing max_comments to parse as a garbage string, fetching ~1 comment.
Also swap max_comment_depth (not a real YouTube extractor arg) for
comment_sort=top to get highest-engagement comments first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same pattern as view_count: model column, yt-dlp extraction, SQL select,
VideoDetail field, startup migration, and display in Watch meta row.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
create_all doesn't add columns to existing tables. Add _add_column_if_missing
helper that checks PRAGMA table_info and runs ALTER TABLE if needed, called
on every startup before FTS setup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Video model: view_count column (Integer, nullable)
- ytdlp._normalize_video: extract view_count from yt-dlp info
- _VIDEO_SELECT: include v.view_count in all queries
- VideoDetail schema: view_count field
- Watch page: formatViews() helper, show "X.XM views" in meta row
alongside date and category
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VideoComment model (video_id, author, text, likes, is_pinned, published_at)
- fetch_video_comments() in ytdlp.py: top 20 comments, no reply threads,
sorted pinned-first then by likes
- GET /videos/by-yt/{id}/comments — returns cached comments instantly
- POST /videos/by-yt/{id}/comments/refresh — fetches from YouTube, stores, returns
- Watch page: CommentsSection shows "Load comments" button when uncached,
renders comments with author/likes once loaded; Refresh link to re-fetch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Backend: DELETE /stats/taste/{tag} removes the row from user_tag_affinity
- API: deleteTasteTag(tag) helper
- Stats UI: × button on each tag chip, faint by default, full opacity on hover;
invalidates stats query so the tag disappears immediately
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add bottom tab bar (Home/Following/Discover/Downloads/Settings) for mobile
- Fetch and display channel banner images on channel pages
- Fix ChannelCard: channels without a local DB id now follow+navigate on click
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yt-dlp's EJS (External JavaScript Solver) needs two things:
1. The solver scripts — only bundled with yt-dlp[default], not bare yt-dlp
2. An explicit --js-runtimes flag — Node.js is not the default (Deno is)
Both are now set: pip installs the [default] extras, and /etc/yt-dlp.conf
sets --js-runtimes node globally so every yt-dlp call uses it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nodeenv is a Python package that downloads a pre-built Node.js binary —
no apt repos, no compilation, guaranteed to work in python:3.12-slim.
The 'node' binary is linked into /usr/local/bin so yt-dlp can find it.
With Node.js available the web client works fully (37 formats) and can
solve YouTube's n-challenge that every other approach was failing on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>