Compare commits

...

65 Commits

Author SHA1 Message Date
Melbar fa40821319 Update cutter report 2026-05-18 08:48:26 +02:00
Melbar 68ec775916 Auto-update cutter report 2026-05-09 19:06
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 19:06:02 +02:00
Melbar 3b42c5d018 Mark trailer title cards as graphics 2026-05-09 18:48:24 +02:00
Melbar f3c3a9cfd4 Auto-update cutter report 2026-05-09 18:46
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:46:52 +02:00
Melbar e966a4c321 Filter cached vision action windows 2026-05-09 18:30:13 +02:00
Melbar 45b5376cef Auto-update cutter report 2026-05-09 18:28
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:28:33 +02:00
Melbar 4b3894a812 Auto-update cutter report 2026-05-09 18:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:22:35 +02:00
Melbar 3ad2b51e56 Auto-update cutter report 2026-05-09 18:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:04:24 +02:00
Melbar c16e46fb9d Auto-update cutter report 2026-05-09 18:03
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:03:14 +02:00
Melbar 8ca6d4b696 Auto-update cutter report 2026-05-09 18:02
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:02:11 +02:00
Melbar b771c6792b Auto-update cutter report 2026-05-09 18:01
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:01:01 +02:00
Melbar 6bf3ab6626 Auto-update cutter report 2026-05-09 17:59
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:59:15 +02:00
Melbar 9a5abd5312 Auto-update cutter report 2026-05-09 17:55
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:55:55 +02:00
Melbar b2abdafc7a Auto-update cutter report 2026-05-09 17:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:53:39 +02:00
Melbar 02e9fee982 Auto-update cutter report 2026-05-09 17:36
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:36:04 +02:00
Melbar 5425939a84 Auto-update cutter report 2026-05-09 17:29
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:29:01 +02:00
Melbar ed7b083dca Recover weak low-light matches via vision 2026-05-09 17:26:10 +02:00
Melbar ae3c2b1b13 Improve local phase retuning 2026-05-09 12:35:33 +02:00
Melbar 71117a8a3b Auto-update cutter report 2026-05-09 12:30
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 12:30:17 +02:00
Melbar c1425003c1 Normalize visible island segments 2026-05-09 11:29:07 +02:00
Melbar bcaf0417b3 Recover short low-light vibe matches 2026-05-09 10:38:57 +02:00
Melbar f63d65fcd2 Handle fade-led segment phase ties 2026-05-09 10:11:36 +02:00
Melbar c08ba97d37 Improve multi-shot phase retune 2026-05-09 09:36:11 +02:00
Melbar a275b2efb6 Retune weak multi-shot segment phases 2026-05-09 05:10:38 +02:00
Melbar fab6c53698 Remove legacy match report 2026-05-09 04:33:53 +02:00
Melbar c5b7d61451 Restore visible beat 14 cutter candidate 2026-05-09 04:31:14 +02:00
Melbar acafe538b2 Tighten cutter phase span validation 2026-05-08 14:56:44 +02:00
Melbar 10e27afc8d Make cutter report the only generated review report 2026-05-08 14:29:49 +02:00
Melbar e335fffe92 Mask timecode in phase refine and guard cutter scene starts 2026-05-08 14:18:27 +02:00
Melbar bdc9e4ab31 Clamp cutter clips to source scene start 2026-05-08 14:11:02 +02:00
Melbar 430a81a988 Constrain hi-res phase refine and update beat 14 2026-05-08 13:45:09 +02:00
Melbar 5611902eb5 Update cutter report for beat 14 compare clip 2026-05-08 13:21:35 +02:00
Melbar 4eeecca80d Fix cutter compare fallback for single-shot matches 2026-05-08 13:18:56 +02:00
Melbar 5407f08fbc Auto-update cutter report 2026-05-08 12:46
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:46:57 +02:00
Melbar 0baedb3a17 Auto-update cutter report 2026-05-08 12:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:22:35 +02:00
Melbar d83fced8d2 Fix multi-shot matching: increase cut correlation threshold to properly segment multi-island beats 2026-05-08 12:16:09 +02:00
Melbar 4fe1d35f1a Fix multi-shot matching: Always use continuity seed for first island to prevent wrong scene jumps 2026-05-08 11:50:13 +02:00
Melbar 730b5ef3c0 Auto-update cutter report 2026-05-08 11:31
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 11:31:15 +02:00
Melbar f20f89b06b Add hi-res phase refinement for intra-scene phase matching (Beat 03 investigation) 2026-05-08 10:52:11 +02:00
Melbar 18c8c89ee6 Auto-update cutter report 2026-05-08 10:31
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:31:59 +02:00
Melbar 9b524c9329 Auto-update cutter report 2026-05-08 10:18
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:18:12 +02:00
Melbar 1e5ffffd91 Auto-update cutter report 2026-05-08 10:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:04:18 +02:00
Melbar 8fd0442724 Auto-update cutter report 2026-05-08 09:40
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:40:51 +02:00
Melbar 18a67387f6 Auto-update cutter report 2026-05-08 09:25
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:25:43 +02:00
Melbar 7ffe4adc3b Auto-update cutter report 2026-05-08 09:10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:10:33 +02:00
Melbar 92a12276ee Auto-update cutter report 2026-05-06 20:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:53:01 +02:00
Melbar 64b53c0e82 Auto-update cutter report 2026-05-06 20:28
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:28:39 +02:00
Melbar 8096f9b4d8 Auto-update cutter report 2026-05-06 20:10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:10:55 +02:00
Melbar e960b1c080 Auto-update cutter report 2026-05-06 19:40
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 19:40:19 +02:00
Melbar c972894972 Fix: prevent tail-trimming of valid matches at hard scene boundaries in global_scan.py 2026-05-06 19:06:33 +02:00
Melbar 72e22969b4 Auto-update cutter report 2026-05-06 19:00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 19:00:08 +02:00
Melbar 9d3c5d5afd Auto-update cutter report 2026-05-06 18:47
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 18:47:39 +02:00
Melbar 0375580373 Auto-update cutter report 2026-05-06 18:33
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 18:33:17 +02:00
Melbar e3a4c22b71 Auto-update cutter report 2026-05-06 17:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 17:34:37 +02:00
Melbar c71ed2b701 Auto-update cutter report 2026-05-06 14:07
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 14:07:48 +02:00
Melbar 49412c54a6 Auto-update cutter report 2026-05-06 13:58
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 13:58:00 +02:00
Melbar 533ab49d62 Auto-update cutter report 2026-05-06 13:25
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 13:25:40 +02:00
Melbar f1e9636a83 Auto-update cutter report 2026-05-06 13:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 13:05:24 +02:00
Melbar cd10e2bc03 Auto-update cutter report 2026-05-06 12:48
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:48:55 +02:00
Melbar c118428167 gitignore: exclude .code-workspace and .claude/ session data
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:45:36 +02:00
Melbar 45769aa366 Refactor report pipeline: redesign HTML, add motion alignment, remove legacy reporter
- scripts/generate_cutter_report.py: complete HTML redesign with glassmorphism
  dark-mode style, compare video links in markdown output
- cli.py: cmd_report now calls _regenerate_cutter_report directly; also writes
  legacy match_report.html; removes dependency on src/pipeline/reporter.py
- src/cv/global_scan.py: add motion-phase alignment refinement step after
  initial in-point search (align_in_point_by_motion, threshold +0.015)
- Remove HANDOVER.md and src/pipeline/reporter.py (superseded)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:44:10 +02:00
Melbar 3b90905d07 Auto-update cutter report 2026-05-06 12:30
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 12:30:03 +02:00
Melbar 07f47ebe2b Auto-update cutter report 2026-05-06 10:52
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:52:14 +02:00
Melbar 2f8b0585e2 Auto-update cutter report 2026-05-06 10:44
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:44:20 +02:00
Melbar d287952572 Auto-update cutter report 2026-05-06 10:29
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:29:06 +02:00
143 changed files with 1930 additions and 1998 deletions
+6
View File
@@ -0,0 +1,6 @@
* text=auto
.gitattributes text eol=lf
*.py text eol=lf
*.md text eol=lf
*.html text eol=lf
*.ps1 text eol=crlf
+5 -1
View File
@@ -30,10 +30,14 @@ proxy/
*.jpeg *.jpeg
*.png *.png
# IDE # IDE / editor
.vscode/ .vscode/
.idea/ .idea/
*.swp *.swp
*.code-workspace
# Claude Code session data
.claude/
# OS # OS
.DS_Store .DS_Store
+114 -69
View File
File diff suppressed because one or more lines are too long
+102 -54
View File
File diff suppressed because one or more lines are too long
-125
View File
@@ -1,125 +0,0 @@
# Handover Notes
Stand: 2026-05-03 (Beat-20-Reparatur abgeschlossen).
## Zustand
- `pytest tests/ -q` → 52/52 grün.
- `python cli.py match --beat 20 --vision` läuft erfolgreich durch und schreibt
einen confirmed Match (Score 0.6632, scene 613, in=5284.706s, dur=0.88s).
- Vorheriger Cache wurde nach `.cache/match_results.json.bak` gesichert.
- Kein offener PR; lokale Änderungen sind committed (siehe letzter Commit).
## Was zuletzt geändert wurde und warum
### 1. `cli.py` — `realign_window` wählt das Action-Window pro Segment
In `_filter_semantically_invalid_vision_matches.realign_window`:
- **Vorher:** `find_action_window_in_scene(action_beat or check_beat, …)` — bei
segmentierten Beats wurde immer der ganze Beat als semantischer Kontext
benutzt. Das hat für Beat 20 die Source-Position auf die Kuss-Phase
(5270 s) gelegt, obwohl das *sichtbare* Segment nur "approaching and pulling
apart" zeigt — diese Phase liegt im Source erst um 5284 s.
- **Jetzt:** Es werden zwei Fenster gesucht (Segment-Beschreibung *und* Beat-
Beschreibung). Der Beat-Kontext gewinnt nur bei deutlichem (>0.06) Score-
Vorsprung. Der Trailer-Offset-Shift (`visible_content_offset`) wird nur
angewendet, wenn tatsächlich der Beat-Kontext benutzt wurde — sonst zeigt
das Segment-Fenster bereits auf die richtige Phase.
Effekt für Beat 20: 5270.118 → 5284.706, Score 0.6449 (provisional) → 0.6632
(confirmed).
### 2. `cli.py` — Filter-/Repair-Stufe ist crash-tolerant
`_filter_semantically_invalid_vision_matches` hat den Per-Result-Body in eine
lokale Funktion `_filter_repair_one` herausgezogen und in einen try/except
verpackt. Wenn die Reparatur abbricht (z. B. weil Vision-API mitten in der
Antwort wegfällt), wird der bisher gecachte Treffer behalten statt komplett
verworfen.
### 3. `src/llm/vision_cache.py` — Vision-Retry für Lesefehler
`_call_vision_model` fängt jetzt zusätzlich `TimeoutError`,
`socket.timeout`, `ConnectionError` und `OSError` während des Antwort-Lesens
und retryt mit demselben Backoff wie HTTP-/URL-Fehler. Die Auslöse-Bedingung
war ein 24-h-DSL-Disconnect mitten im Lauf; davor wurde der Match-Lauf hart
abgebrochen und der Cache stand auf "kein Match".
### 4. `README.md`
Zwei kurze Absätze ergänzt, die (1) die Segment-vs-Beat-Window-Auswahl und
(2) das neue Crash-/Netzfehler-Verhalten beschreiben.
## Nicht angefasst, aber relevant für die Übergabe
- Der **vollständige FFmpeg-Vollscan** liefert für Beat 20 weiterhin keinen
bestätigten Treffer (final score 0.419 < provisional 0.430). Den
Confirmed-Match liefert die Action-Window-Reparatur. Das ist erwartet:
das sichtbare Segment ist visuell sehr generisch (Two-Shot Profil mit
unscharfem Hintergrund), die korrekte Phase fällt erst durch die
semantische Aktionsbeschreibung auf.
- Die `candidate_points`-Schleife in `realign_window` (lines ~700765) sucht
nur ±~2 s um `start_s` herum. Solange `start_s` jetzt aus dem Segment-
Fenster kommt, liegt der korrekte Source-Punkt in diesem Bereich. Wenn
künftig Beats mit längeren visiblen Inseln auftauchen, kann diese Range
zu eng werden — dann den Suchradius erweitern statt das Window-Picking
rückgängig machen.
- Es gibt **keine Tests** für `_filter_semantically_invalid_vision_matches`
oder `realign_window`. Wer das anfasst, sollte Beat 20 als Live-Smoke-Test
benutzen (siehe unten).
## Reproduktion / Smoke-Test
```powershell
.\.venv\Scripts\Activate.ps1
python cli.py match --beat 20 --vision
```
Erwartet: `Beat 20: realigned semantically valid long scene by motion/action
windows`, danach `is_confirmed: true` für Beat 20 in
`.cache/match_results.json` mit `in_point_s ≈ 5284.7` und `match_score ≥ 0.65`.
Wenn das fehlschlägt:
1. `python -m pytest tests/ -q` — falls rot, ist die Codebasis selbst kaputt.
2. `.cache/vision_descriptions.json` prüfen — die Schlüssel
`beat:20:73.560:74.680:…` und `action_window:613:5282.390:5285.430:…` müssen
existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.
## Aktuelle Coverage (vor neuestem Lauf)
```
total beats: 25
matched: 20 (5 confirmed, 15 provisional)
unmatched: beats 0, 2, 21, 23, 24
```
Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
korrekt unmatched.
Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
## Offene Risiken / Bekannte Schwächen
- Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
kalibriert an Beat 20. Andere Beats sollten auch durchlaufen werden, bevor
weitere Beats angefasst werden — am besten ein voller `python cli.py match`
ohne `--beat` und Diff der `match_results.json` gegen `.bak`.
- Die Filter-/Repair-Stufe kann durch Vision-Calls minutenlang laufen. Das
ist nicht neu, aber bei Netzproblemen sehr sichtbar.
- Die `_filter_repair_one`-Funktion bekommt viele Argumente durchgereicht
(closure-Variablen aus dem Parent). Bei einer nächsten Iteration könnte das
in eine kleine Klasse umgebaut werden.
## Useful greps
- `find_action_window_in_scene` — semantische Action-Window-Suche (Vision).
- `_reference_scoreable_segments` — bestimmt die sichtbaren Inseln eines
Beats.
- `estimate_usable_source_duration` — kürzt Match-Clips, wenn die Source
vor Beat-Ende in eine andere Phase wechselt.
- `_filter_semantically_invalid_vision_matches` — Eintrittspunkt der
Repair-Stufe in `cli.py`.
+10 -2
View File
@@ -36,6 +36,10 @@ Was du bekommst sind zwei Dateien, mit denen du arbeitest:
5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die 5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
Beschreibung im Report sagt dir was du suchst. Beschreibung im Report sagt dir was du suchst.
Für die visuelle Kontrolle ist zusätzlich **`CUTTER_REPORT.html`** relevant:
er enthält die frame-locked Compare-Clips. Der alte `match_report.html` ist
nicht mehr Teil des Workflows.
Alles andere unten ist Hintergrund für den Tool-Verantwortlichen. Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
--- ---
@@ -48,7 +52,7 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). | | **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. | | **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). | | **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. | | **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase lokal um den gefundenen Inpoint saliency- und motion-gewichtet mit der sichtbaren Trailerphase abgeglichen. |
| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. | | **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. | | **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
@@ -56,6 +60,10 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
nicht verfälschen. nicht verfälschen.
**Cutter-Report-Caching:** Vorhandene Compare-Clips werden wiederverwendet.
Bei gezielten Rematches wird nur der betroffene Beat neu gerendert, damit der
Report schnell aktuell bleibt und keine unnötigen Videoartefakte neu entstehen.
**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt **Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker. CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
@@ -159,7 +167,7 @@ wenn sich das zugrundeliegende Match geändert hat.
| Source-Clip zeigt richtige Szene, aber falsche Bewegungsphase | `python cli.py rematch --beat N --refine` — schiebt den Inpoint frame-genau aus dem Bildinhalt. | | Source-Clip zeigt richtige Szene, aber falsche Bewegungsphase | `python cli.py rematch --beat N --refine` — schiebt den Inpoint frame-genau aus dem Bildinhalt. |
| Score zu niedrig, andere Szene wäre richtig | `python cli.py match --beat N --vision` — vollständiger Re-Match nur für diesen Beat mit Vision-Phasenprüfung. | | Score zu niedrig, andere Szene wäre richtig | `python cli.py match --beat N --vision` — vollständiger Re-Match nur für diesen Beat mit Vision-Phasenprüfung. |
| Match offensichtlich falsche Szene | `python cli.py rematch --beat N --threshold 0.50` — Schwelle absenken, neuer globaler Scan nur für diesen Beat. | | Match offensichtlich falsche Szene | `python cli.py rematch --beat N --threshold 0.50` — Schwelle absenken, neuer globaler Scan nur für diesen Beat. |
| Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `MAN.` im `CUTTER_REPORT.md` ist korrekt. | | Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `GFX` im `CUTTER_REPORT.md` ist korrekt. |
### Algorithmische Details ### Algorithmische Details
+676 -48
View File
@@ -104,10 +104,6 @@ def _auto_commit_push_reports(project_root: "Path") -> None: # type: ignore[nam
report_globs = [ report_globs = [
"CUTTER_REPORT.html", "CUTTER_REPORT.html",
"CUTTER_REPORT.md", "CUTTER_REPORT.md",
"output/report/match_report.html",
"output/report/beat_*_compare.mp4",
"output/report/beat_*_src.mp4",
"output/report/beat_*_ref.mp4",
"output/cutter_clips/beat_*_compare.mp4", "output/cutter_clips/beat_*_compare.mp4",
"output/cutter_clips/beat_*_source.mp4", "output/cutter_clips/beat_*_source.mp4",
"output/cutter_clips/beat_*_source_seg*.mp4", "output/cutter_clips/beat_*_source_seg*.mp4",
@@ -135,7 +131,7 @@ def _auto_commit_push_reports(project_root: "Path") -> None: # type: ignore[nam
log.warning("Auto-commit/push failed (non-fatal): %s", exc) log.warning("Auto-commit/push failed (non-fatal): %s", exc)
def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-defined] def _regenerate_cutter_report(cfg: "AppConfig", force_beats: set[int] | None = None) -> None: # type: ignore[name-defined]
"""Re-render CUTTER_REPORT.{md,html} with Frame-Locked Compare clips. """Re-render CUTTER_REPORT.{md,html} with Frame-Locked Compare clips.
Called from every match-style command after the cache is written so all Called from every match-style command after the cache is written so all
@@ -145,10 +141,22 @@ def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-de
""" """
project_root = cfg.paths.cache_dir.parent project_root = cfg.paths.cache_dir.parent
try: try:
import os
from scripts.generate_cutter_report import render_report from scripts.generate_cutter_report import render_report
md, html = render_report(project_root, with_stills=True, with_clips=True) old_force = os.environ.get("CUTTER_REPORT_FORCE_BEATS")
try:
if force_beats:
os.environ["CUTTER_REPORT_FORCE_BEATS"] = ",".join(str(b) for b in sorted(force_beats))
md, html = render_report(project_root, with_stills=True, with_clips=True)
finally:
if force_beats:
if old_force is None:
os.environ.pop("CUTTER_REPORT_FORCE_BEATS", None)
else:
os.environ["CUTTER_REPORT_FORCE_BEATS"] = old_force
(project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8") (project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8")
(project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8") (project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8")
logging.getLogger(__name__).info("Cutter report regenerated (md + html + compare clips)") logging.getLogger(__name__).info("Cutter report regenerated (md + html + compare clips)")
except Exception as exc: except Exception as exc:
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc) logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
@@ -273,9 +281,57 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
for result in results: for result in results:
beat = beats_by_id.get(result.beat_id) beat = beats_by_id.get(result.beat_id)
if getattr(result, "segments", ()): if getattr(result, "segments", ()):
segment_duration = sum(max(0.0, float(s.duration_s)) for s in result.segments) segment_threshold = cfg.cv.deep_scan.multi_shot_segment_threshold
current_islands = _reference_scoreable_segments(beat, cfg) if beat is not None else []
repaired_segments = []
source_segments = list(result.segments)
if beat is not None and len(source_segments) == 1 and len(current_islands) == 1:
island_start_s, island_end_s = current_islands[0]
island_duration_s = max(0.0, island_end_s - island_start_s)
segment = source_segments[0]
if (
abs(float(segment.trailer_offset_s) - island_start_s) > 0.04
or abs(float(segment.duration_s) - island_duration_s) > 0.08
):
from dataclasses import replace as _replace
source_segments[0] = _replace(
segment,
trailer_offset_s=island_start_s,
duration_s=island_duration_s,
out_point_s=float(segment.in_point_s) + island_duration_s,
)
for segment in source_segments:
if float(segment.match_score) < segment_threshold:
scene = _scene_by_id_light(scenes, segment.scene_id)
if beat is not None and scene is not None:
segment_beat = replace(
beat,
start_s=beat.start_s + float(segment.trailer_offset_s),
end_s=beat.start_s + float(segment.trailer_offset_s) + float(segment.duration_s),
)
probe = _phase_probe_segment_in_scene(
segment_beat,
scene,
float(segment.in_point_s),
cfg,
)
if probe is not None:
in_point_s, _phase_score = probe
segment = replace(
segment,
in_point_s=in_point_s,
out_point_s=in_point_s + float(segment.duration_s),
match_score=max(float(segment.match_score), float(_phase_score)),
is_confirmed=float(_phase_score) >= cfg.cv.deep_scan.match_threshold,
)
repaired_segments.append(segment)
valid_segments = tuple(repaired_segments)
if not valid_segments:
continue
segment_duration = sum(max(0.0, float(s.duration_s)) for s in valid_segments)
weighted_score = ( weighted_score = (
sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in result.segments) sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in valid_segments)
/ segment_duration / segment_duration
if segment_duration > 0 else result.match_score if segment_duration > 0 else result.match_score
) )
@@ -290,7 +346,15 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
coverage = segment_duration / coverage_target coverage = segment_duration / coverage_target
if coverage < cfg.cv.deep_scan.min_duration_coverage: if coverage < cfg.cv.deep_scan.min_duration_coverage:
continue continue
normalized.append(replace(result, match_score=weighted_score)) first_segment = valid_segments[0]
normalized.append(replace(
result,
scene_id=first_segment.scene_id,
in_point_s=first_segment.in_point_s,
out_point_s=first_segment.out_point_s,
match_score=weighted_score,
segments=valid_segments,
))
continue continue
if result.match_score < cfg.cv.deep_scan.provisional_match_threshold: if result.match_score < cfg.cv.deep_scan.provisional_match_threshold:
@@ -320,6 +384,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
fps = _scene_fps_light(scene, cfg) fps = _scene_fps_light(scene, cfg)
adjusted_in_s = result.in_point_s adjusted_in_s = result.in_point_s
phase_changed = False
scene_changed = int(scene["scene_id"]) != result.scene_id scene_changed = int(scene["scene_id"]) != result.scene_id
starts_before_scene = result.in_point_s < float(scene["start_s"]) starts_before_scene = result.in_point_s < float(scene["start_s"])
if scene_changed or starts_before_scene or result.duration_s <= 0.12: if scene_changed or starts_before_scene or result.duration_s <= 0.12:
@@ -328,6 +393,25 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
fps = _scene_fps_light(scene, cfg) fps = _scene_fps_light(scene, cfg)
should_phase_probe = (
scene_changed
or starts_before_scene
or not result.is_confirmed
or result.match_score < cfg.cv.deep_scan.match_threshold
)
phase_score = result.match_score
if should_phase_probe:
probe = _phase_probe_segment_in_scene(beat, scene, adjusted_in_s, cfg)
if probe is not None:
probed_in_s, probed_score = probe
max_shift_s = max(0.12, min(0.75, beat.duration_s * 0.35))
if abs(probed_in_s - adjusted_in_s) <= max_shift_s:
adjusted_in_s = probed_in_s
phase_changed = True
phase_score = max(float(result.match_score), float(probed_score))
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
fps = _scene_fps_light(scene, cfg)
matchable_duration_s = beat.duration_s matchable_duration_s = beat.duration_s
try: try:
from src.cv.global_scan import estimate_matchable_reference_duration from src.cv.global_scan import estimate_matchable_reference_duration
@@ -350,6 +434,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
if ( if (
scene_changed scene_changed
or starts_before_scene or starts_before_scene
or phase_changed
or result.duration_s <= 0.12 or result.duration_s <= 0.12
or result.out_point_s > adjusted_in_s + max_duration_s + (1.0 / fps) or result.out_point_s > adjusted_in_s + max_duration_s + (1.0 / fps)
): ):
@@ -359,6 +444,8 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
in_point_s=adjusted_in_s, in_point_s=adjusted_in_s,
out_point_s=adjusted_in_s + max_duration_s, out_point_s=adjusted_in_s + max_duration_s,
in_point_frame=int(adjusted_in_s * fps), in_point_frame=int(adjusted_in_s * fps),
match_score=phase_score,
is_confirmed=phase_score >= cfg.cv.deep_scan.match_threshold,
) )
coverage = ( coverage = (
@@ -549,7 +636,7 @@ def _reference_scoreable_segments(beat, cfg) -> list[tuple[float, float]]:
t = 0.0 t = 0.0
while t <= beat.duration_s: while t <= beat.duration_s:
frame = grab_frame_at_path(beat.trailer_path, beat.start_s + t) frame = grab_frame_at_path(beat.trailer_path, beat.start_s + t)
scoreable = frame is not None and _is_scoreable_reference_frame(frame, cfg) scoreable = frame is not None and is_visible(frame)
if scoreable: if scoreable:
if start is None: if start is None:
start = t start = t
@@ -827,7 +914,7 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list: def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
"""Try a vision-led search for beats that ended up without a match. """Try a vision-led search for beats that ended up weak or unmatched.
For each unmatched beat that has scoreable visual content (i.e. not pure For each unmatched beat that has scoreable visual content (i.e. not pure
fade/title-card material), this pass: fade/title-card material), this pass:
@@ -844,7 +931,7 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
Confirmed and provisional matches both stay subject to the same thresholds Confirmed and provisional matches both stay subject to the same thresholds
used elsewhere; this only adds matches that pass the same quality gates. used elsewhere; this only adds matches that pass the same quality gates.
""" """
if not cfg.vision.enabled or not beats: if not beats:
return results return results
from dataclasses import replace from dataclasses import replace
@@ -855,17 +942,28 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
matched_ids = {r.beat_id for r in results} results_by_id = {r.beat_id: r for r in results}
unmatched = [b for b in beats if b.beat_id not in matched_ids] recovery_targets = [
if not unmatched: b for b in beats
if (
b.beat_id not in results_by_id
or (
not results_by_id[b.beat_id].is_confirmed
and results_by_id[b.beat_id].match_score < cfg.cv.deep_scan.match_threshold
)
)
]
if not recovery_targets:
return results return results
scenes = build_scene_index(cfg) scenes = build_scene_index(cfg)
if not scenes: if not scenes:
return results return results
new_results = list(results) target_ids = {b.beat_id for b in recovery_targets}
for beat in unmatched: new_results = [r for r in results if r.beat_id not in target_ids]
replaced_results = {r.beat_id: r for r in results if r.beat_id in target_ids}
for beat in recovery_targets:
try: try:
islands = _reference_scoreable_segments(beat, cfg) islands = _reference_scoreable_segments(beat, cfg)
except Exception: except Exception:
@@ -902,6 +1000,79 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
scenes_by_id = {s.scene_id: s for s in scenes} scenes_by_id = {s.scene_id: s for s in scenes}
best = None # (score, scene, in_s, dur_s, reason) best = None # (score, scene, in_s, dur_s, reason)
try:
from src.llm.vision_cache import (
_load_cache,
_semantic_action_groups,
_semantic_match_score,
_STRONG_ACTION_GROUPS,
)
cache = _load_cache(cfg)
items = cache.get("items", {})
beat_desc = ""
if isinstance(items, dict):
for item in items.values():
if (
isinstance(item, dict)
and item.get("kind") == "beat"
and item.get("item_id") == beat.beat_id
):
beat_desc = str(item.get("description", ""))
break
beat_actions = _semantic_action_groups(beat_desc) & _STRONG_ACTION_GROUPS if beat_desc else set()
identity_vocab = {
"woman", "women", "man", "men", "girl", "boy", "child",
"blonde", "hair", "face", "mouth", "eyes", "profile",
"close-up", "closeup",
}
beat_identity = {term for term in identity_vocab if term in beat_desc.lower()}
distinctive_identity = {
term for term in ("woman", "women", "blonde", "mouth", "face")
if term in beat_desc.lower()
}
if beat_actions and isinstance(items, dict):
for item in items.values():
if not isinstance(item, dict) or item.get("kind") != "action_window":
continue
scene = scenes_by_id.get(item.get("item_id"))
desc = str(item.get("description", ""))
source_actions = _semantic_action_groups(desc)
if scene is None or not beat_actions <= source_actions:
continue
source_text = desc.lower()
positive_source_text = source_text.split('"negatives"', 1)[0]
identity_overlap = {term for term in beat_identity if term in source_text}
if len(beat_identity) >= 2 and len(identity_overlap) < 2:
continue
if distinctive_identity and not any(term in positive_source_text for term in distinctive_identity):
continue
if "mouth" in beat_desc.lower() and "mouth" not in positive_source_text:
continue
if "dark interior" in beat_desc.lower() and (
"interior" not in positive_source_text or "dark" not in positive_source_text
):
continue
score, reason = _semantic_match_score(beat_desc, desc)
if score < max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
continue
try:
in_s = float(item.get("start_s"))
out_s = float(item.get("end_s"))
except (TypeError, ValueError):
continue
duration_s = max(0.32, min(anchor_beat.duration_s, out_s - in_s))
candidate = (
min(0.99, score),
scene,
in_s,
duration_s,
f"cached vision action; {reason}",
)
if best is None or candidate[0] > best[0]:
best = candidate
except Exception as exc:
logger.debug("Beat %d: cached vision fallback failed (%s)", beat.beat_id, exc)
seen = set() seen = set()
for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]: for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
scene = scenes_by_id.get(hit.scene_id) scene = scenes_by_id.get(hit.scene_id)
@@ -928,7 +1099,10 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
) )
except Exception as exc: except Exception as exc:
logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc) logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
continue aligned_in_s = start_s
combined_score = semantic_score
content_score = 0.0
motion_score = 0.0
aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s))) aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
try: try:
@@ -958,6 +1132,8 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
combined_score, combined_score,
min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08), min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
) )
if semantic_score >= max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
final_score = max(final_score, semantic_score)
if final_score < cfg.cv.deep_scan.provisional_match_threshold: if final_score < cfg.cv.deep_scan.provisional_match_threshold:
continue continue
candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}") candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
@@ -965,6 +1141,9 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
best = candidate best = candidate
if best is None: if best is None:
previous = replaced_results.get(beat.beat_id)
if previous is not None:
new_results.append(previous)
continue continue
score, scene, aligned_in_s, usable_duration_s, repair_reason = best score, scene, aligned_in_s, usable_duration_s, repair_reason = best
logger.info( logger.info(
@@ -991,6 +1170,97 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
return sorted(new_results, key=lambda r: r.beat_id) return sorted(new_results, key=lambda r: r.beat_id)
def _recover_short_lowlight_vibe_matches(results: list, beats: list, cfg) -> list:
"""Keep obvious short low-light scene hits as provisional instead of no-match.
Short blue/dark dialogue shots can be correctly ranked by scene-level
histogram/pHash but then rejected by the stricter content aligner because
the shot contains little texture, motion blur, or trailer timecode overlay.
This fallback only accepts the top vibe scene when it has a clear margin and
the local content scan still finds a usable in-point.
"""
from src.core.models import MatchResult, Scene
from src.cv.global_scan import _content_alignment_score, _content_alignment_templates
from src.cv.vibe_check import run_vibe_check
from src.cv.frame_extractor import open_video
matched_ids = {r.beat_id for r in results}
targets = [b for b in beats if b.beat_id not in matched_ids and b.duration_s <= 2.25]
if not targets:
return results
raw_scenes = _load_scene_cache_light(cfg)
scenes = [
Scene(
scene_id=int(s["scene_id"]),
source_path=cfg.paths.source_movie,
start_s=float(s["start_s"]),
end_s=float(s["end_s"]),
start_frame=int(s["start_frame"]),
end_frame=int(s["end_frame"]),
luma_hist=bytes.fromhex(s["luma_hist"]) if s.get("luma_hist") else None,
sat_hist=bytes.fromhex(s["sat_hist"]) if s.get("sat_hist") else None,
phash=s.get("phash"),
)
for s in raw_scenes
]
scenes_by_id = {s.scene_id: s for s in scenes}
recovered = list(results)
with open_video(cfg.paths.source_movie) as cap:
for beat in targets:
templates = _content_alignment_templates(beat, cfg)
if not templates:
continue
hits = run_vibe_check(
beat,
scenes,
top_k=6,
hist_method=cfg.cv.vibe_check.hist_compare_method,
phash_max_distance=64,
)
if len(hits) < 2:
continue
top, second = hits[0], hits[1]
if top.combined_score < 0.74 or top.combined_score - second.combined_score < 0.03:
continue
scene = scenes_by_id.get(top.scene_id)
if scene is None or scene.duration_s < max(0.5, beat.duration_s):
continue
best: tuple[float, float] | None = None
scan_end = max(scene.start_s, scene.end_s - beat.duration_s)
step_s = 0.12
t = scene.start_s
while t <= scan_end:
score = _content_alignment_score(cap, t, templates, cfg)
if best is None or score > best[0]:
best = (score, t)
t = round(t + step_s, 6)
if best is None or best[0] < 0.15:
continue
content_score, in_point_s = best
final_score = max(
cfg.cv.deep_scan.provisional_match_threshold,
min(0.64, top.combined_score * 0.55 + content_score * 0.45),
)
recovered.append(MatchResult(
beat_id=beat.beat_id,
scene_id=scene.scene_id,
source_path=scene.source_path,
in_point_s=in_point_s,
out_point_s=in_point_s + beat.duration_s,
in_point_frame=int(in_point_s * cfg.export.edl_frame_rate),
match_score=final_score,
match_location=(0, 0),
is_confirmed=False,
segments=tuple(),
))
return sorted(recovered, key=lambda r: r.beat_id)
def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list: def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
"""Drop vision-enabled matches whose final action phase contradicts the beat.""" """Drop vision-enabled matches whose final action phase contradicts the beat."""
if not cfg.vision.enabled or not results: if not cfg.vision.enabled or not results:
@@ -1366,6 +1636,41 @@ def _attach_visual_segments(results: list, beats: list, cfg) -> list:
if not segment_matches: if not segment_matches:
continue continue
seg = segment_matches[0] seg = segment_matches[0]
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
repaired = _local_same_scene_segment_match(
segment_beat,
beat,
start_s,
cached + expanded,
cfg,
)
if (
repaired is None
or repaired.match_score
< max(
cfg.cv.deep_scan.multi_shot_segment_threshold,
seg.match_score + cfg.cv.deep_scan.duration_tie_break_score_delta,
)
):
scenes = _load_scene_cache_light(cfg)
scene = _scene_by_id_light(scenes, seg.scene_id)
probe = (
_phase_probe_segment_in_scene(segment_beat, scene, seg.in_point_s, cfg)
if scene is not None else None
)
if probe is None:
continue
in_point_s, _phase_score = probe
from dataclasses import replace as _replace
seg = _replace(
seg,
in_point_s=in_point_s,
out_point_s=in_point_s + seg.duration_s,
match_score=max(seg.match_score, _phase_score),
is_confirmed=_phase_score >= cfg.cv.deep_scan.match_threshold,
)
else:
seg = repaired
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s)) seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
segments.append( segments.append(
MatchSegment( MatchSegment(
@@ -1466,21 +1771,12 @@ def _match_unmatched_visual_segments(
start_s=beat.start_s + start_s, start_s=beat.start_s + start_s,
end_s=beat.start_s + end_s, end_s=beat.start_s + end_s,
) )
if island_idx == 0: continuity = _continuity_seed_in_points(
# First island of an unmatched multi-shot beat: search globally beat.beat_id,
# without a continuity bias from the previous beat. Continuity [b if b.beat_id != beat.beat_id else segment_beat for b in beats],
# assumes the shot follows the previous beat in the source, but cached + expanded,
# the lead shot of a multi-shot beat is often an insert cut from cfg,
# a completely different scene. A wrong seed with score 0.92 )
# would push the real match out of the refinement candidate pool.
continuity = {}
else:
continuity = _continuity_seed_in_points(
beat.beat_id,
[b if b.beat_id != beat.beat_id else segment_beat for b in beats],
cached + expanded,
cfg,
)
segment_matches = [] segment_matches = []
if beat.beat_id not in skip_global_segment_scan_for: if beat.beat_id not in skip_global_segment_scan_for:
segment_matches = _run_segment_match(segment_beat, continuity, cfg, allow_fullscan=True) segment_matches = _run_segment_match(segment_beat, continuity, cfg, allow_fullscan=True)
@@ -1496,7 +1792,10 @@ def _match_unmatched_visual_segments(
if recovered: if recovered:
rec = recovered[0] rec = recovered[0]
seg_dur = min(max(0.0, end_s - start_s), max(0.0, rec.duration_s)) seg_dur = min(max(0.0, end_s - start_s), max(0.0, rec.duration_s))
if seg_dur > 0: if (
seg_dur > 0
and rec.match_score >= cfg.cv.deep_scan.multi_shot_segment_threshold
):
segments.append(MatchSegment( segments.append(MatchSegment(
trailer_offset_s=start_s, trailer_offset_s=start_s,
duration_s=seg_dur, duration_s=seg_dur,
@@ -1518,6 +1817,8 @@ def _match_unmatched_visual_segments(
segments.append(local_segment) segments.append(local_segment)
continue continue
seg = segment_matches[0] seg = segment_matches[0]
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
continue
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s)) seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
segments.append( segments.append(
MatchSegment( MatchSegment(
@@ -1589,7 +1890,13 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
cfg.cv.deep_scan.provisional_content_threshold * 0.70, cfg.cv.deep_scan.provisional_content_threshold * 0.70,
cfg.cv.deep_scan.provisional_match_threshold, cfg.cv.deep_scan.provisional_match_threshold,
) )
step_s = max(1.0 / cfg.export.edl_frame_rate, 0.04) # Coarse repair scan over already plausible neighbouring scenes. A frame-step
# sweep across long dialogue scenes is slow and can overfit static layouts.
step_s = max(
cfg.vision.local_scan_step_s,
cfg.cv.deep_scan.content_align_sample_step_s,
0.25,
)
best: tuple[float, float, int] | None = None best: tuple[float, float, int] | None = None
with open_video(cfg.paths.source_movie) as cap: with open_video(cfg.paths.source_movie) as cap:
for scene_id in scene_ids: for scene_id in scene_ids:
@@ -1598,12 +1905,14 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
continue continue
start_s = max(0.0, float(scene["start_s"]) - 0.25) start_s = max(0.0, float(scene["start_s"]) - 0.25)
end_s = max(start_s, float(scene["end_s"]) - max(0.04, segment_beat.duration_s) + 0.25) end_s = max(start_s, float(scene["end_s"]) - max(0.04, segment_beat.duration_s) + 0.25)
max_points = max(4, min(48, int(cfg.vision.local_scan_max_points_per_scene)))
scene_step_s = max(step_s, (end_s - start_s) / max_points)
t = start_s t = start_s
while t <= end_s: while t <= end_s:
score = _content_alignment_score(cap, t, templates, cfg) score = _content_alignment_score(cap, t, templates, cfg)
if best is None or score > best[0]: if best is None or score > best[0]:
best = (score, t, int(scene_id)) best = (score, t, int(scene_id))
t = round(t + step_s, 6) t = round(t + scene_step_s, 6)
if best is None or best[0] < min_score: if best is None or best[0] < min_score:
return None return None
@@ -1621,6 +1930,186 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
) )
def _phase_probe_segment_in_scene(segment_beat, scene: dict, original_in_s: float, cfg):
"""Retune a weak multi-shot segment inside its own scene using saliency-weighted frames."""
import cv2
import numpy as np
offsets = [0.0, 0.16, 0.32, 0.48, 0.64, 0.80, 0.96, 1.12]
size = (160, 90)
def prepared_gray(frame):
if frame is None:
return None
h, w = frame.shape[:2]
frame = frame.copy()
# Timecode overlays and letterbox edges are trailer/source-specific and
# should not pull the phase toward the wrong moment.
frame[: int(h * 0.16), : int(w * 0.32)] = 0
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.resize(gray, size)
return cv2.equalizeHist(gray).astype("float32") / 255.0
def edge(gray):
return cv2.Canny((gray * 255).astype("uint8"), 45, 130).astype("float32") / 255.0
def pair_score(ref_gray, src_gray, mask):
if ref_gray is None or src_gray is None:
return None
pixel = 1.0 - float((np.abs(ref_gray - src_gray) * mask).sum())
edge_score = 1.0 - float((np.abs(edge(ref_gray) - edge(src_gray)) * mask).sum())
return 0.65 * pixel + 0.35 * edge_score
def frame_at(cap, t_s):
cap.set(cv2.CAP_PROP_POS_MSEC, t_s * 1000.0)
ok, frame = cap.read()
return frame if ok else None
trailer_cap = cv2.VideoCapture(str(cfg.paths.reference_trailer))
ref_candidates = []
fallback_items = []
for offset in offsets:
if offset > segment_beat.duration_s + 0.04:
continue
frame = frame_at(trailer_cap, segment_beat.start_s + offset)
ref = prepared_gray(frame)
if ref is None:
continue
fallback_items.append((offset, ref))
raw_gray = cv2.resize(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), size)
h, w = raw_gray.shape[:2]
raw_gray[: int(h * 0.16), : int(w * 0.32)] = 0
roi = raw_gray[int(h * 0.12) : int(h * 0.90), :]
mean_luma = float(roi.mean() / 255.0)
p90_luma = float(np.percentile(roi, 90) / 255.0)
contrast = float(roi.std() / 255.0)
ref_candidates.append((offset, ref, mean_luma, p90_luma, contrast))
transition_start = False
ref_items = []
if ref_candidates:
max_mean = max(item[2] for item in ref_candidates)
max_p90 = max(item[3] for item in ref_candidates)
transition_start = (
ref_candidates[0][2] < max_mean * 0.90
or ref_candidates[0][3] < max_p90 * 0.90
)
ref_items = [
(offset, ref)
for offset, ref, mean_luma, p90_luma, contrast in ref_candidates
if (
mean_luma >= max(0.16, max_mean * 0.82)
and p90_luma >= max(0.28, max_p90 * 0.86)
and contrast >= 0.035
)
]
if len(ref_items) < 4:
ref_items = fallback_items
if len(ref_items) < 4:
return None
ref_offsets = [item[0] for item in ref_items]
refs = [item[1] for item in ref_items]
align_offset = ref_offsets[0]
ref_offsets = [offset - align_offset for offset in ref_offsets]
ref_stack = np.stack(refs, axis=0)
edge_stack = np.stack([edge(ref) for ref in refs], axis=0)
# Static window/room edges are useful for finding the scene, but toxic for
# phase retuning inside a repeated dialogue shot. Bias the mask toward
# areas that actually change across the reference segment.
saliency = ref_stack.std(axis=0) * 3.0 + edge_stack.std(axis=0) * 0.75 + edge_stack.mean(axis=0) * 0.15
saliency[:, : int(size[0] * 0.12)] *= 0.15
saliency[: int(size[1] * 0.16), : int(size[0] * 0.32)] = 0.0
threshold = np.quantile(saliency, 0.66)
mask = (saliency >= threshold).astype("float32")
mask /= mask.sum() + 1e-6
scene_start = float(scene["start_s"])
scene_end = float(scene["end_s"])
center_t = max(scene_start, min(scene_end, original_in_s + align_offset))
retune_radius_s = max(4.0, min(12.0, segment_beat.duration_s * 2.5))
scan_start = max(scene_start, center_t - retune_radius_s)
scene_scan_end = min(scene_end, center_t + retune_radius_s)
scan_end = max(scan_start, scene_scan_end - max(0.04, segment_beat.duration_s - align_offset))
max_points = 400
step_s = max(0.04, (scan_end - scan_start) / max_points)
source_cap = cv2.VideoCapture(str(cfg.paths.source_movie))
source_fps = source_cap.get(cv2.CAP_PROP_FPS) or _scene_fps_light(scene, cfg)
stride = max(1, int(round(step_s * source_fps)))
start_frame = max(0, int(round(scan_start * source_fps)))
end_frame = max(start_frame, int(round(scene_scan_end * source_fps)))
times: list[float] = []
source_frames: list = []
frame_idx = start_frame
while frame_idx <= end_frame:
source_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ok, frame = source_cap.read()
if not ok:
break
times.append(frame_idx / source_fps)
source_frames.append(prepared_gray(frame))
frame_idx += stride
base_time = times[0] if times else scan_start
candidates: list[tuple[float, float, float]] = []
for i, t in enumerate(times):
if t > scan_end:
break
vals = []
src_for_offsets = []
for offset, ref in zip(ref_offsets, refs):
j = int(round((t + offset - base_time) / step_s))
if 0 <= j < len(source_frames):
src = source_frames[j]
score = pair_score(ref, src, mask)
else:
src = None
score = None
if score is not None:
vals.append(score)
src_for_offsets.append(src)
if len(vals) >= 4:
avg_score = sum(vals) / len(vals)
early_count = min(2, len(vals))
tail_count = min(2, len(vals))
early_score = sum(vals[:early_count]) / early_count
tail_score = sum(vals[-tail_count:]) / tail_count
motion_vals = []
for idx in range(1, min(len(refs), len(src_for_offsets))):
if src_for_offsets[idx - 1] is None or src_for_offsets[idx] is None:
continue
ref_motion = refs[idx] - refs[idx - 1]
src_motion = src_for_offsets[idx] - src_for_offsets[idx - 1]
motion_vals.append(1.0 - float((np.abs(ref_motion - src_motion) * mask).sum()))
motion_score = sum(motion_vals) / len(motion_vals) if motion_vals else avg_score
# Phase retuning must reject "same shot, wrong moment" matches.
# A plain average can hide a bad onset inside slow dialogue shots;
# keep the low-water mark, onset, and frame-to-frame motion influential.
phase_score = (
0.26 * avg_score
+ 0.24 * min(vals)
+ 0.24 * early_score
+ 0.08 * tail_score
+ 0.18 * motion_score
)
candidates.append((phase_score, min(vals), t))
if not candidates:
return None
candidates.sort(reverse=True)
best_score = candidates[0][0]
tie_window = 0.006 if transition_start else 0.002
near_tie = [c for c in candidates if c[0] >= best_score - tie_window]
if transition_start:
chosen = max(near_tie, key=lambda c: (c[1], c[0]))
else:
chosen = min(near_tie, key=lambda c: abs((c[2] - align_offset) - original_in_s))
return max(scene_start, chosen[2] - align_offset), chosen[0]
def cmd_match(args: argparse.Namespace, cfg) -> list: def cmd_match(args: argparse.Namespace, cfg) -> list:
from src.pipeline.matcher import run_matching from src.pipeline.matcher import run_matching
from dataclasses import replace from dataclasses import replace
@@ -1694,6 +2183,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
results = _attach_visual_segments(results, beats, cfg) results = _attach_visual_segments(results, beats, cfg)
results = _filter_semantically_invalid_vision_matches(results, beats, cfg) results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
results = _recover_unmatched_beats_via_vision(results, beats, cfg) results = _recover_unmatched_beats_via_vision(results, beats, cfg)
results = _recover_short_lowlight_vibe_matches(results, beats, cfg)
# A targeted one-beat match must NEVER delete or modify any other beat's # A targeted one-beat match must NEVER delete or modify any other beat's
# cache entry. We deliberately re-load the raw cache from disk here so # cache entry. We deliberately re-load the raw cache from disk here so
@@ -1720,7 +2210,8 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
results_to_save = results results_to_save = results
_save_results(results_to_save, cfg) _save_results(results_to_save, cfg)
_regenerate_cutter_report(cfg) force_report_beats = {int(args.beat)} if getattr(args, "beat", None) is not None else None
_regenerate_cutter_report(cfg, force_beats=force_report_beats)
print(f"\n{len(results)} / {len(beats)} beats matched.") print(f"\n{len(results)} / {len(beats)} beats matched.")
for r in results: for r in results:
@@ -1890,17 +2381,12 @@ def cmd_rematch(args: argparse.Namespace, cfg) -> None:
def cmd_report(args: argparse.Namespace, cfg) -> None: def cmd_report(args: argparse.Namespace, cfg) -> None:
from src.pipeline.reporter import generate_report if getattr(args, "beat", None) is not None:
beats = _select_beats(_load_beats(cfg), getattr(args, "beat", None)) print(f"\n⚠️ Generating cutter report for all beats (ignoring --beat {args.beat}).")
beat_ids = {b.beat_id for b in beats} if getattr(args, "beat", None) is not None else None
results = _select_results(_normalize_cached_results(_load_beats(cfg), _load_results(cfg), cfg), beat_ids) _regenerate_cutter_report(cfg)
out = generate_report(beats, results, cfg) project_root = cfg.paths.cache_dir.parent
if getattr(args, "beat", None) is not None and not results: print(f"\n✅ Report → {project_root / 'CUTTER_REPORT.html'} and CUTTER_REPORT.md")
print(
f"\n⚠️ Beat {args.beat} has no cached match yet. "
f"Run: python cli.py match --beat {args.beat}"
)
print(f"\n\u2705 Report \u2192 {out}")
def cmd_export(args: argparse.Namespace, cfg) -> None: def cmd_export(args: argparse.Namespace, cfg) -> None:
@@ -1941,6 +2427,141 @@ def cmd_run(args: argparse.Namespace, cfg) -> None:
cmd_export(args, cfg) cmd_export(args, cfg)
def cmd_preview(args: argparse.Namespace, cfg) -> None:
"""Assemble a rough preview video from cached source matches, with original audio."""
import subprocess
log = logging.getLogger(__name__)
results_path = _results_cache_path(cfg)
if not results_path.exists():
log.error("No match_results.json — run 'match' first.")
return
data = sorted(
json.loads(results_path.read_text(encoding="utf-8")),
key=lambda r: r["beat_id"],
)
beats_path = cfg.paths.cache_dir / "trailer_beats.json"
beats_by_id: dict = {}
if beats_path.exists():
for b in json.loads(beats_path.read_text(encoding="utf-8")):
beats_by_id[int(b["beat_id"])] = b
clip_width = 1280
fps = 25
out_dir = cfg.paths.output_dir / "preview_clips"
out_dir.mkdir(parents=True, exist_ok=True)
preview_out = cfg.paths.output_dir / "preview.mp4"
def _run(cmd: list, timeout: int = 120) -> bool:
r = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
if r.returncode != 0:
log.debug("ffmpeg stderr: %s", r.stderr[-600:])
return r.returncode == 0
def extract_with_audio(src: Path, start_s: float, duration_s: float, out: Path) -> bool:
preroll = 2.0 if start_s >= 2.0 else 0.0
input_seek = max(0.0, start_s - preroll)
accurate_seek = start_s - input_seek
return _run([
"ffmpeg", "-y", "-loglevel", "error",
"-ss", f"{input_seek:.3f}", "-i", str(src),
"-ss", f"{accurate_seek:.3f}", "-t", f"{max(0.04, duration_s):.3f}",
"-map", "0:v:0", "-map", "0:a:0",
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
"-vf", f"fps={fps},scale={clip_width}:-2,setsar=1,setpts=PTS-STARTPTS",
"-c:a", "aac", "-ar", "48000", "-ac", "2",
"-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
])
def black_silence(duration_s: float, out: Path) -> bool:
return _run([
"ffmpeg", "-y", "-loglevel", "error",
"-f", "lavfi", "-i", f"color=black:s={clip_width}x720:r={fps}",
"-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo",
"-t", f"{max(0.5, duration_s):.3f}",
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
"-c:a", "aac", "-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
])
def concat_clips(parts: list[Path], out: Path) -> bool:
lst = out.with_suffix(".txt")
lst.write_text(
"\n".join(f"file '{p.resolve().as_posix()}'" for p in parts),
encoding="utf-8",
)
ok = _run([
"ffmpeg", "-y", "-loglevel", "error",
"-f", "concat", "-safe", "0", "-i", str(lst),
"-c", "copy", str(out),
], timeout=300)
lst.unlink(missing_ok=True)
return ok
beat_clips: list[Path] = []
for rec in data:
bid = int(rec["beat_id"])
segs = rec.get("segments", [])
src = Path(rec["source_path"]) if rec.get("source_path") else None
clip_out = out_dir / f"beat_{bid:02d}.mp4"
if src is None or not src.exists():
beat = beats_by_id.get(bid, {})
dur = max(0.5, float(beat.get("end_s", 1)) - float(beat.get("start_s", 0)))
log.info("Beat %02d: NO MATCH — black/silence %.2fs", bid, dur)
if black_silence(dur, clip_out):
beat_clips.append(clip_out)
continue
if len(segs) >= 2:
parts: list[Path] = []
for idx, seg in enumerate(segs):
in_s = float(seg["in_point_s"])
dur = max(0.04, float(seg["out_point_s"]) - in_s)
seg_src = Path(seg["source_path"]) if seg.get("source_path") else src
part = out_dir / f"beat_{bid:02d}_seg{idx:02d}.mp4"
log.info("Beat %02d seg%d: scene=%s %.2fs%.2fs", bid, idx, seg.get("scene_id"), in_s, in_s + dur)
if extract_with_audio(seg_src, in_s, dur, part):
parts.append(part)
if not parts:
log.warning("Beat %02d: no segments extracted", bid)
continue
if len(parts) == 1:
parts[0].rename(clip_out)
beat_clips.append(clip_out)
else:
if concat_clips(parts, clip_out):
beat_clips.append(clip_out)
for p in parts:
p.unlink(missing_ok=True)
else:
in_s = float(rec["in_point_s"])
beat = beats_by_id.get(bid, {})
beat_dur = float(beat["end_s"]) - float(beat["start_s"]) if beat else 0.0
source_dur = float(rec["out_point_s"]) - in_s
dur = max(0.04, beat_dur if beat_dur > 0.04 else source_dur)
log.info("Beat %02d: scene=%s %.2fs+%.2fs (trailer=%.2fs src=%.2fs)", bid, rec.get("scene_id"), in_s, dur, beat_dur, source_dur)
if extract_with_audio(src, in_s, dur, clip_out):
beat_clips.append(clip_out)
else:
log.warning("Beat %02d: extraction failed", bid)
if not beat_clips:
log.error("No clips extracted — aborting.")
return
log.info("Concatenating %d beat clips → %s", len(beat_clips), preview_out)
if concat_clips(beat_clips, preview_out):
size_mb = preview_out.stat().st_size / 1_048_576
log.info("Preview ready: %s (%.1f MB)", preview_out, size_mb)
print(f"\n Preview → {preview_out} ({size_mb:.1f} MB)")
else:
log.error("Final concat failed — per-beat clips are in %s", out_dir)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Argument parser # Argument parser
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -2011,6 +2632,12 @@ def _build_parser() -> argparse.ArgumentParser:
p_run.add_argument("--beat", type=int, p_run.add_argument("--beat", type=int,
help="Run match/report/export for only one cached beat") help="Run match/report/export for only one cached beat")
# preview
sub.add_parser(
"preview",
help="Build output/preview.mp4 from cached matches — source clips with audio in beat order",
)
return parser return parser
@@ -2035,6 +2662,7 @@ def main() -> None:
"report": cmd_report, "report": cmd_report,
"export": cmd_export, "export": cmd_export,
"run": cmd_run, "run": cmd_run,
"preview": cmd_preview,
} }
handler = dispatch[args.command] handler = dispatch[args.command]
+6 -3
View File
@@ -8,7 +8,7 @@
[project] [project]
name = "AI Trailer Generator v2" name = "AI Trailer Generator v2"
version = "2.0.0" version = "2.0.0"
log_level = "INFO" # DEBUG | INFO | WARNING | ERROR log_level = "DEBUG" # DEBUG | INFO | WARNING | ERROR
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# [paths] — External video sources (read-only access) # [paths] — External video sources (read-only access)
@@ -86,7 +86,10 @@ span_score_weight = 0.15
coarse_score_weight = 0.10 coarse_score_weight = 0.10
duration_score_weight = 0.20 duration_score_weight = 0.20
duration_tie_break_score_delta = 0.03 duration_tie_break_score_delta = 0.03
min_duration_coverage = 0.65 min_duration_coverage = 0.55
# Every visible sub-shot in a multi-shot beat must pass this stricter gate.
# A weak segment is left unmatched instead of being hidden by a strong neighbor.
multi_shot_segment_threshold = 0.50
continuity_seed_offsets_s = [-1.0, 0.0, 0.5, 1.0, 1.5, 2.0, 3.0] continuity_seed_offsets_s = [-1.0, 0.0, 0.5, 1.0, 1.5, 2.0, 3.0]
scene_seed_top_k = 30 scene_seed_top_k = 30
scene_seed_points_per_scene = 6 scene_seed_points_per_scene = 6
@@ -183,7 +186,7 @@ local_scan_step_s = 0.12
local_scan_max_points_per_scene = 180 local_scan_max_points_per_scene = 180
local_scan_top_candidates = 36 local_scan_top_candidates = 36
local_scan_tie_break_score_delta = 0.08 local_scan_tie_break_score_delta = 0.08
multi_shot_cut_corr_threshold = 0.20 multi_shot_cut_corr_threshold = 0.55
multi_shot_boundary_tolerance_s = 0.20 multi_shot_boundary_tolerance_s = 0.20
fullscan_fallback = false fullscan_fallback = false
content_threshold = 0.22 content_threshold = 0.22
+85 -2
View File
@@ -132,8 +132,33 @@ bereits auf die sichtbare Aktionsphase ausgerichtet.
Der Segment-Offset zählt nur über vorherige scorebare Bildinseln, nicht über Der Segment-Offset zählt nur über vorherige scorebare Bildinseln, nicht über
schwarze oder blendige Lücken. Nach dem Retiming wird die nutzbare Source- schwarze oder blendige Lücken. Nach dem Retiming wird die nutzbare Source-
Dauer erneut geschätzt; läuft die Source am Ende in eine sichtbar andere Dauer erneut geschätzt; läuft die Source am Ende in eine sichtbar andere
Aktionsphase, wird der Clip gekürzt und der Rest bleibt Placeholder/Fade Aktionsphase, wird der Treffer im Cutter-Report klar als phasenkritisch
statt einen falschen Bewegungsmoment zu zeigen. markiert. Schwarz/Placeholder wird nur für wirklich ungematchte Trailer-
Bereiche oder Fades verwendet, nicht um sichtbare Kandidatenbewegung im Review
zu verstecken.
Diese Span-Schätzung ist strenger als der grobe Suchscore: Ein fast stehender
Anfang darf einen Match nicht retten, wenn spätere Frames sichtbar in eine
andere Gestik, Körperposition oder eintretende Figur driften. Stabile
Score-Plateaus dürfen nur verlängern, wenn sie noch nah genug am Anfangsniveau
liegen; sonst bleibt der Treffer vorläufig und muss neu gesucht oder visuell
geprüft werden. Der Review-Clip zeigt den Kandidaten weiterhin sichtbar, damit
Phasenfehler nicht durch Schwarz verdeckt werden.
Für Multi-Shot-Beats gilt zusätzlich eine Segment-Schwelle pro sichtbarer
Einstellung. Ein gutes erstes Segment darf kein zweites Segment mit schwachem
Score mitziehen. Segmente unter `multi_shot_segment_threshold` werden nicht als
stabile Wahrheit behandelt, sondern innerhalb derselben plausiblen Source-Scene
nachjustiert. Die Nachjustierung nutzt eine saliency-gewichtete Mehrframe-Prüfung:
Timecodes und statische Randbereiche werden entwertet, kontrastreiche und über
mehrere Trailerframes unterscheidbare Bildbereiche zählen stärker. Dadurch kann
eine schwache zweite Einstellung phasengenauer repariert werden, ohne den Fehler
durch Schwarzbild zu verdecken oder einen Beat manuell zu kuratieren.
Der Cutter-Report verwendet Clip-Caching. Bereits vorhandene Compare-Clips werden
wiederverwendet; bei gezielten Rematches wird nur der betroffene Beat neu gerendert
(`CUTTER_REPORT_FORCE_BEATS`). So bleibt der Report aktuell, ohne alle Beats jedes
Mal neu zu kodieren.
## Vision-Seeds vs. Vollscan ## Vision-Seeds vs. Vollscan
@@ -165,6 +190,56 @@ eine kurze Geste erst korrekt erkannt und anschließend in eine spätere
ähnliche Körperhaltung verschoben wird. Wenn mehrere Vision-Kandidaten in ähnliche Körperhaltung verschoben wird. Wenn mehrere Vision-Kandidaten in
derselben Source-Szene ähnlich gut scoren und die Beat-Dauer abdecken, derselben Source-Szene ähnlich gut scoren und die Beat-Dauer abdecken,
bevorzugt der Matcher die frühere Phase. bevorzugt der Matcher die frühere Phase.
Die Vision-Recovery läuft nicht nur für komplett fehlende Beats, sondern auch
für schwache unbestätigte Treffer. Gerade Low-Light-Beats dürfen nicht an einem
falschen dunklen CV-Treffer hängen bleiben, wenn der Cache semantisch eine
bessere Handlungsphase kennt.
Bei langen Source-Szenen prüft die Action-Window-Suche immer den Szenenanfang
und mehrere frühe Fenster, bevor sie gleichmäßig über die ganze Szene sampelt.
Damit gehen kurze Trailer-Aktionen am Anfang einer langen Szene nicht unter,
wenn der Rest der Szene aus Credits, Schwarzbild oder ruhigen Folgeframes
besteht.
Wenn ein Action-Window die starke Beat-Aktion explizit enthält, darf es eine
etwas niedrigere Textähnlichkeit haben; die Handlung zählt dann stärker als
Nebenwörter zu Licht, Bildausschnitt oder Stimmung.
Bereits gecachte Action-Windows einer Szene bleiben gültige Kandidaten, auch
wenn sich das aktuelle Sampling-Raster ändert. So verliert der Matcher keine
teuren Vision-Hinweise und muss dieselben Fenster nicht erneut beschreiben.
Wenn neue Vision-Calls deaktiviert sind, darf die Recovery vorhandene Cache-
Beschreibungen trotzdem lesen; das erzeugt keine API-Kosten und verhindert,
dass alte schwache CV-Treffer stehen bleiben.
Schlägt die CV-Feinjustierung bei einem semantisch klaren Low-Light-Fenster
fehl, bleibt das Action-Window als provisorischer Treffer erhalten. CV darf
einen dunklen Treffer verfeinern, aber nicht einen eindeutigen Cache-Hinweis
komplett verwerfen.
Zusätzlich kann Recovery vorhandene gecachte Action-Windows direkt über alle
Szenen ranken. Dieser schnelle Pfad vermeidet einen teuren Vollscan, wenn der
Cache bereits eine starke Aktion wie Hand-am-Mund, Kuss oder Blickwechsel
enthält.
Eindeutige Begriffe aus der Beat-Beschreibung wirken als harte Filter für
Vision-Fenster: `mouth` muss im Kandidaten wiederkehren, `dark interior` darf
nicht auf Outdoor-Material fallen, und markante Personenmerkmale wie `blonde`
bleiben bindend.
Der zusätzliche Hi-Res-Phasenrefine bleibt lokal um den bereits validierten
Inpoint und übernimmt nur klare Verbesserungen. Er darf keine ganze lange
Dialogszene nach ähnlichen Layouts durchsuchen, weil sonst dieselbe Location
mit anderer Gestik als falsche Phase gewinnen kann und die Laufzeit explodiert.
Die lokale Retune-Wertung nutzt deshalb nicht nur den mittleren Frame-Score,
sondern auch den schlechtesten Einzelvergleich, die ersten sichtbaren Frames
und die Frame-zu-Frame-Bewegung. Dadurch gewinnt nicht mehr ein späteres
Standbild derselben Einstellung, nur weil Fenster, Gesichter und Licht fast
identisch aussehen.
Unsichere Einzeltreffer ohne Segmentliste laufen ebenfalls durch diesen lokalen
Phasen-Probe. Das repariert alte Cache-Einträge, deren Szene korrekt ist, deren
Inpoint aber einige Frames in der Bewegung daneben liegt. Der Probe bleibt auf
kleine lokale Shifts begrenzt und wird nicht für jeden bestätigten Treffer
erzwungen, damit Report-Refreshes nicht zum Vollscan werden.
Report-Clips werden zusätzlich an den bekannten Source-Szenenstart plus eine
sehr kurze Ein-Frame-Guard-Zone geklemmt, damit ein knapp vor oder direkt auf
der Schnittkante liegender Inpoint nicht mit Frames der vorherigen Einstellung
beginnt. Die Guard-Zone bleibt bewusst klein, weil eine längere Korrektur die
sichtbare Bewegungsphase innerhalb derselben Einstellung verschieben würde.
## Multi-Shot-Beats ## Multi-Shot-Beats
@@ -175,6 +250,13 @@ nur wenn die relative Source-Grenze zeitlich zu einem erkannten Trailer-
Umschnitt passt. So kann ein Beat aus Frage/Antwort-Shots vollständig erfasst Umschnitt passt. So kann ein Beat aus Frage/Antwort-Shots vollständig erfasst
werden, ohne Szenen willkürlich zusammenzukleben. werden, ohne Szenen willkürlich zusammenzukleben.
## Titel- und Grafikbeats
Dunkle Trailerkarten mit deutlich isoliertem Text werden im Cutter-Report als
`GFX` markiert, wenn es keinen Source-Treffer gibt. Diese Beats sind keine
fehlgeschlagenen Matches: Der Cutter soll die Trailer-Grafik beziehungsweise
eine NLE-Titelkarte übernehmen und nicht im Spielfilm nach einem Bild suchen.
## Reranking-Pipeline ## Reranking-Pipeline
Vor dem teuren Frame-Refine wird der gesamte Kandidatenpool mit einer Vor dem teuren Frame-Refine wird der gesamte Kandidatenpool mit einer
@@ -296,3 +378,4 @@ bzw. letzten scorebaren Frame derselben Einstellung passen.
Treffer unter `provisional_content_threshold` werden nicht mehr gespeichert Treffer unter `provisional_content_threshold` werden nicht mehr gespeichert
oder aus alten Cache-Ergebnissen übernommen. oder aus alten Cache-Ergebnissen übernommen.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.1 KiB

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.0 KiB

After

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.4 KiB

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.0 KiB

After

Width:  |  Height:  |  Size: 9.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.4 KiB

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.9 KiB

After

Width:  |  Height:  |  Size: 8.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 9.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.2 KiB

After

Width:  |  Height:  |  Size: 8.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.9 KiB

After

Width:  |  Height:  |  Size: 9.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.1 KiB

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.5 KiB

After

Width:  |  Height:  |  Size: 6.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.3 KiB

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.6 KiB

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 KiB

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.4 KiB

After

Width:  |  Height:  |  Size: 4.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.7 KiB

After

Width:  |  Height:  |  Size: 5.7 KiB

Some files were not shown because too many files have changed in this diff Show More