Compare commits
65 Commits
e2c30d0062
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| fa40821319 | |||
| 68ec775916 | |||
| 3b42c5d018 | |||
| f3c3a9cfd4 | |||
| e966a4c321 | |||
| 45b5376cef | |||
| 4b3894a812 | |||
| 3ad2b51e56 | |||
| c16e46fb9d | |||
| 8ca6d4b696 | |||
| b771c6792b | |||
| 6bf3ab6626 | |||
| 9a5abd5312 | |||
| b2abdafc7a | |||
| 02e9fee982 | |||
| 5425939a84 | |||
| ed7b083dca | |||
| ae3c2b1b13 | |||
| 71117a8a3b | |||
| c1425003c1 | |||
| bcaf0417b3 | |||
| f63d65fcd2 | |||
| c08ba97d37 | |||
| a275b2efb6 | |||
| fab6c53698 | |||
| c5b7d61451 | |||
| acafe538b2 | |||
| 10e27afc8d | |||
| e335fffe92 | |||
| bdc9e4ab31 | |||
| 430a81a988 | |||
| 5611902eb5 | |||
| 4eeecca80d | |||
| 5407f08fbc | |||
| 0baedb3a17 | |||
| d83fced8d2 | |||
| 4fe1d35f1a | |||
| 730b5ef3c0 | |||
| f20f89b06b | |||
| 18c8c89ee6 | |||
| 9b524c9329 | |||
| 1e5ffffd91 | |||
| 8fd0442724 | |||
| 18a67387f6 | |||
| 7ffe4adc3b | |||
| 92a12276ee | |||
| 64b53c0e82 | |||
| 8096f9b4d8 | |||
| e960b1c080 | |||
| c972894972 | |||
| 72e22969b4 | |||
| 9d3c5d5afd | |||
| 0375580373 | |||
| e3a4c22b71 | |||
| c71ed2b701 | |||
| 49412c54a6 | |||
| 533ab49d62 | |||
| f1e9636a83 | |||
| cd10e2bc03 | |||
| c118428167 | |||
| 45769aa366 | |||
| 3b90905d07 | |||
| 07f47ebe2b | |||
| 2f8b0585e2 | |||
| d287952572 |
@@ -0,0 +1,6 @@
|
|||||||
|
* text=auto
|
||||||
|
.gitattributes text eol=lf
|
||||||
|
*.py text eol=lf
|
||||||
|
*.md text eol=lf
|
||||||
|
*.html text eol=lf
|
||||||
|
*.ps1 text eol=crlf
|
||||||
@@ -30,10 +30,14 @@ proxy/
|
|||||||
*.jpeg
|
*.jpeg
|
||||||
*.png
|
*.png
|
||||||
|
|
||||||
# IDE
|
# IDE / editor
|
||||||
.vscode/
|
.vscode/
|
||||||
.idea/
|
.idea/
|
||||||
*.swp
|
*.swp
|
||||||
|
*.code-workspace
|
||||||
|
|
||||||
|
# Claude Code session data
|
||||||
|
.claude/
|
||||||
|
|
||||||
# OS
|
# OS
|
||||||
.DS_Store
|
.DS_Store
|
||||||
|
|||||||
@@ -1,125 +0,0 @@
|
|||||||
# Handover Notes
|
|
||||||
|
|
||||||
Stand: 2026-05-03 (Beat-20-Reparatur abgeschlossen).
|
|
||||||
|
|
||||||
## Zustand
|
|
||||||
|
|
||||||
- `pytest tests/ -q` → 52/52 grün.
|
|
||||||
- `python cli.py match --beat 20 --vision` läuft erfolgreich durch und schreibt
|
|
||||||
einen confirmed Match (Score 0.6632, scene 613, in=5284.706s, dur=0.88s).
|
|
||||||
- Vorheriger Cache wurde nach `.cache/match_results.json.bak` gesichert.
|
|
||||||
- Kein offener PR; lokale Änderungen sind committed (siehe letzter Commit).
|
|
||||||
|
|
||||||
## Was zuletzt geändert wurde und warum
|
|
||||||
|
|
||||||
### 1. `cli.py` — `realign_window` wählt das Action-Window pro Segment
|
|
||||||
|
|
||||||
In `_filter_semantically_invalid_vision_matches.realign_window`:
|
|
||||||
|
|
||||||
- **Vorher:** `find_action_window_in_scene(action_beat or check_beat, …)` — bei
|
|
||||||
segmentierten Beats wurde immer der ganze Beat als semantischer Kontext
|
|
||||||
benutzt. Das hat für Beat 20 die Source-Position auf die Kuss-Phase
|
|
||||||
(5270 s) gelegt, obwohl das *sichtbare* Segment nur "approaching and pulling
|
|
||||||
apart" zeigt — diese Phase liegt im Source erst um 5284 s.
|
|
||||||
- **Jetzt:** Es werden zwei Fenster gesucht (Segment-Beschreibung *und* Beat-
|
|
||||||
Beschreibung). Der Beat-Kontext gewinnt nur bei deutlichem (>0.06) Score-
|
|
||||||
Vorsprung. Der Trailer-Offset-Shift (`visible_content_offset`) wird nur
|
|
||||||
angewendet, wenn tatsächlich der Beat-Kontext benutzt wurde — sonst zeigt
|
|
||||||
das Segment-Fenster bereits auf die richtige Phase.
|
|
||||||
|
|
||||||
Effekt für Beat 20: 5270.118 → 5284.706, Score 0.6449 (provisional) → 0.6632
|
|
||||||
(confirmed).
|
|
||||||
|
|
||||||
### 2. `cli.py` — Filter-/Repair-Stufe ist crash-tolerant
|
|
||||||
|
|
||||||
`_filter_semantically_invalid_vision_matches` hat den Per-Result-Body in eine
|
|
||||||
lokale Funktion `_filter_repair_one` herausgezogen und in einen try/except
|
|
||||||
verpackt. Wenn die Reparatur abbricht (z. B. weil Vision-API mitten in der
|
|
||||||
Antwort wegfällt), wird der bisher gecachte Treffer behalten statt komplett
|
|
||||||
verworfen.
|
|
||||||
|
|
||||||
### 3. `src/llm/vision_cache.py` — Vision-Retry für Lesefehler
|
|
||||||
|
|
||||||
`_call_vision_model` fängt jetzt zusätzlich `TimeoutError`,
|
|
||||||
`socket.timeout`, `ConnectionError` und `OSError` während des Antwort-Lesens
|
|
||||||
und retryt mit demselben Backoff wie HTTP-/URL-Fehler. Die Auslöse-Bedingung
|
|
||||||
war ein 24-h-DSL-Disconnect mitten im Lauf; davor wurde der Match-Lauf hart
|
|
||||||
abgebrochen und der Cache stand auf "kein Match".
|
|
||||||
|
|
||||||
### 4. `README.md`
|
|
||||||
|
|
||||||
Zwei kurze Absätze ergänzt, die (1) die Segment-vs-Beat-Window-Auswahl und
|
|
||||||
(2) das neue Crash-/Netzfehler-Verhalten beschreiben.
|
|
||||||
|
|
||||||
## Nicht angefasst, aber relevant für die Übergabe
|
|
||||||
|
|
||||||
- Der **vollständige FFmpeg-Vollscan** liefert für Beat 20 weiterhin keinen
|
|
||||||
bestätigten Treffer (final score 0.419 < provisional 0.430). Den
|
|
||||||
Confirmed-Match liefert die Action-Window-Reparatur. Das ist erwartet:
|
|
||||||
das sichtbare Segment ist visuell sehr generisch (Two-Shot Profil mit
|
|
||||||
unscharfem Hintergrund), die korrekte Phase fällt erst durch die
|
|
||||||
semantische Aktionsbeschreibung auf.
|
|
||||||
- Die `candidate_points`-Schleife in `realign_window` (lines ~700–765) sucht
|
|
||||||
nur ±~2 s um `start_s` herum. Solange `start_s` jetzt aus dem Segment-
|
|
||||||
Fenster kommt, liegt der korrekte Source-Punkt in diesem Bereich. Wenn
|
|
||||||
künftig Beats mit längeren visiblen Inseln auftauchen, kann diese Range
|
|
||||||
zu eng werden — dann den Suchradius erweitern statt das Window-Picking
|
|
||||||
rückgängig machen.
|
|
||||||
- Es gibt **keine Tests** für `_filter_semantically_invalid_vision_matches`
|
|
||||||
oder `realign_window`. Wer das anfasst, sollte Beat 20 als Live-Smoke-Test
|
|
||||||
benutzen (siehe unten).
|
|
||||||
|
|
||||||
## Reproduktion / Smoke-Test
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
.\.venv\Scripts\Activate.ps1
|
|
||||||
python cli.py match --beat 20 --vision
|
|
||||||
```
|
|
||||||
|
|
||||||
Erwartet: `Beat 20: realigned semantically valid long scene by motion/action
|
|
||||||
windows`, danach `is_confirmed: true` für Beat 20 in
|
|
||||||
`.cache/match_results.json` mit `in_point_s ≈ 5284.7` und `match_score ≥ 0.65`.
|
|
||||||
|
|
||||||
Wenn das fehlschlägt:
|
|
||||||
|
|
||||||
1. `python -m pytest tests/ -q` — falls rot, ist die Codebasis selbst kaputt.
|
|
||||||
2. `.cache/vision_descriptions.json` prüfen — die Schlüssel
|
|
||||||
`beat:20:73.560:74.680:…` und `action_window:613:5282.390:5285.430:…` müssen
|
|
||||||
existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
|
|
||||||
3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.
|
|
||||||
|
|
||||||
## Aktuelle Coverage (vor neuestem Lauf)
|
|
||||||
|
|
||||||
```
|
|
||||||
total beats: 25
|
|
||||||
matched: 20 (5 confirmed, 15 provisional)
|
|
||||||
unmatched: beats 0, 2, 21, 23, 24
|
|
||||||
```
|
|
||||||
|
|
||||||
Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
|
|
||||||
Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
|
|
||||||
korrekt unmatched.
|
|
||||||
Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
|
|
||||||
Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
|
|
||||||
|
|
||||||
## Offene Risiken / Bekannte Schwächen
|
|
||||||
|
|
||||||
- Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
|
|
||||||
kalibriert an Beat 20. Andere Beats sollten auch durchlaufen werden, bevor
|
|
||||||
weitere Beats angefasst werden — am besten ein voller `python cli.py match`
|
|
||||||
ohne `--beat` und Diff der `match_results.json` gegen `.bak`.
|
|
||||||
- Die Filter-/Repair-Stufe kann durch Vision-Calls minutenlang laufen. Das
|
|
||||||
ist nicht neu, aber bei Netzproblemen sehr sichtbar.
|
|
||||||
- Die `_filter_repair_one`-Funktion bekommt viele Argumente durchgereicht
|
|
||||||
(closure-Variablen aus dem Parent). Bei einer nächsten Iteration könnte das
|
|
||||||
in eine kleine Klasse umgebaut werden.
|
|
||||||
|
|
||||||
## Useful greps
|
|
||||||
|
|
||||||
- `find_action_window_in_scene` — semantische Action-Window-Suche (Vision).
|
|
||||||
- `_reference_scoreable_segments` — bestimmt die sichtbaren Inseln eines
|
|
||||||
Beats.
|
|
||||||
- `estimate_usable_source_duration` — kürzt Match-Clips, wenn die Source
|
|
||||||
vor Beat-Ende in eine andere Phase wechselt.
|
|
||||||
- `_filter_semantically_invalid_vision_matches` — Eintrittspunkt der
|
|
||||||
Repair-Stufe in `cli.py`.
|
|
||||||
@@ -36,6 +36,10 @@ Was du bekommst sind zwei Dateien, mit denen du arbeitest:
|
|||||||
5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
|
5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
|
||||||
Beschreibung im Report sagt dir was du suchst.
|
Beschreibung im Report sagt dir was du suchst.
|
||||||
|
|
||||||
|
Für die visuelle Kontrolle ist zusätzlich **`CUTTER_REPORT.html`** relevant:
|
||||||
|
er enthält die frame-locked Compare-Clips. Der alte `match_report.html` ist
|
||||||
|
nicht mehr Teil des Workflows.
|
||||||
|
|
||||||
Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -48,7 +52,7 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
|||||||
| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
|
| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
|
||||||
| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
|
| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
|
||||||
| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
|
| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
|
||||||
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. |
|
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase lokal um den gefundenen Inpoint saliency- und motion-gewichtet mit der sichtbaren Trailerphase abgeglichen. |
|
||||||
| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
|
| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
|
||||||
| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
|
| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
|
||||||
|
|
||||||
@@ -56,6 +60,10 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
|||||||
Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
|
Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
|
||||||
nicht verfälschen.
|
nicht verfälschen.
|
||||||
|
|
||||||
|
**Cutter-Report-Caching:** Vorhandene Compare-Clips werden wiederverwendet.
|
||||||
|
Bei gezielten Rematches wird nur der betroffene Beat neu gerendert, damit der
|
||||||
|
Report schnell aktuell bleibt und keine unnötigen Videoartefakte neu entstehen.
|
||||||
|
|
||||||
**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
|
**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
|
||||||
CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
|
CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
|
||||||
|
|
||||||
@@ -159,7 +167,7 @@ wenn sich das zugrundeliegende Match geändert hat.
|
|||||||
| Source-Clip zeigt richtige Szene, aber falsche Bewegungsphase | `python cli.py rematch --beat N --refine` — schiebt den Inpoint frame-genau aus dem Bildinhalt. |
|
| Source-Clip zeigt richtige Szene, aber falsche Bewegungsphase | `python cli.py rematch --beat N --refine` — schiebt den Inpoint frame-genau aus dem Bildinhalt. |
|
||||||
| Score zu niedrig, andere Szene wäre richtig | `python cli.py match --beat N --vision` — vollständiger Re-Match nur für diesen Beat mit Vision-Phasenprüfung. |
|
| Score zu niedrig, andere Szene wäre richtig | `python cli.py match --beat N --vision` — vollständiger Re-Match nur für diesen Beat mit Vision-Phasenprüfung. |
|
||||||
| Match offensichtlich falsche Szene | `python cli.py rematch --beat N --threshold 0.50` — Schwelle absenken, neuer globaler Scan nur für diesen Beat. |
|
| Match offensichtlich falsche Szene | `python cli.py rematch --beat N --threshold 0.50` — Schwelle absenken, neuer globaler Scan nur für diesen Beat. |
|
||||||
| Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `MAN.` im `CUTTER_REPORT.md` ist korrekt. |
|
| Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `GFX` im `CUTTER_REPORT.md` ist korrekt. |
|
||||||
|
|
||||||
### Algorithmische Details
|
### Algorithmische Details
|
||||||
|
|
||||||
|
|||||||
@@ -104,10 +104,6 @@ def _auto_commit_push_reports(project_root: "Path") -> None: # type: ignore[nam
|
|||||||
report_globs = [
|
report_globs = [
|
||||||
"CUTTER_REPORT.html",
|
"CUTTER_REPORT.html",
|
||||||
"CUTTER_REPORT.md",
|
"CUTTER_REPORT.md",
|
||||||
"output/report/match_report.html",
|
|
||||||
"output/report/beat_*_compare.mp4",
|
|
||||||
"output/report/beat_*_src.mp4",
|
|
||||||
"output/report/beat_*_ref.mp4",
|
|
||||||
"output/cutter_clips/beat_*_compare.mp4",
|
"output/cutter_clips/beat_*_compare.mp4",
|
||||||
"output/cutter_clips/beat_*_source.mp4",
|
"output/cutter_clips/beat_*_source.mp4",
|
||||||
"output/cutter_clips/beat_*_source_seg*.mp4",
|
"output/cutter_clips/beat_*_source_seg*.mp4",
|
||||||
@@ -135,7 +131,7 @@ def _auto_commit_push_reports(project_root: "Path") -> None: # type: ignore[nam
|
|||||||
log.warning("Auto-commit/push failed (non-fatal): %s", exc)
|
log.warning("Auto-commit/push failed (non-fatal): %s", exc)
|
||||||
|
|
||||||
|
|
||||||
def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-defined]
|
def _regenerate_cutter_report(cfg: "AppConfig", force_beats: set[int] | None = None) -> None: # type: ignore[name-defined]
|
||||||
"""Re-render CUTTER_REPORT.{md,html} with Frame-Locked Compare clips.
|
"""Re-render CUTTER_REPORT.{md,html} with Frame-Locked Compare clips.
|
||||||
|
|
||||||
Called from every match-style command after the cache is written so all
|
Called from every match-style command after the cache is written so all
|
||||||
@@ -145,10 +141,22 @@ def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-de
|
|||||||
"""
|
"""
|
||||||
project_root = cfg.paths.cache_dir.parent
|
project_root = cfg.paths.cache_dir.parent
|
||||||
try:
|
try:
|
||||||
|
import os
|
||||||
from scripts.generate_cutter_report import render_report
|
from scripts.generate_cutter_report import render_report
|
||||||
md, html = render_report(project_root, with_stills=True, with_clips=True)
|
old_force = os.environ.get("CUTTER_REPORT_FORCE_BEATS")
|
||||||
|
try:
|
||||||
|
if force_beats:
|
||||||
|
os.environ["CUTTER_REPORT_FORCE_BEATS"] = ",".join(str(b) for b in sorted(force_beats))
|
||||||
|
md, html = render_report(project_root, with_stills=True, with_clips=True)
|
||||||
|
finally:
|
||||||
|
if force_beats:
|
||||||
|
if old_force is None:
|
||||||
|
os.environ.pop("CUTTER_REPORT_FORCE_BEATS", None)
|
||||||
|
else:
|
||||||
|
os.environ["CUTTER_REPORT_FORCE_BEATS"] = old_force
|
||||||
(project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8")
|
(project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8")
|
||||||
(project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8")
|
(project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8")
|
||||||
|
|
||||||
logging.getLogger(__name__).info("Cutter report regenerated (md + html + compare clips)")
|
logging.getLogger(__name__).info("Cutter report regenerated (md + html + compare clips)")
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
|
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
|
||||||
@@ -273,9 +281,57 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
for result in results:
|
for result in results:
|
||||||
beat = beats_by_id.get(result.beat_id)
|
beat = beats_by_id.get(result.beat_id)
|
||||||
if getattr(result, "segments", ()):
|
if getattr(result, "segments", ()):
|
||||||
segment_duration = sum(max(0.0, float(s.duration_s)) for s in result.segments)
|
segment_threshold = cfg.cv.deep_scan.multi_shot_segment_threshold
|
||||||
|
current_islands = _reference_scoreable_segments(beat, cfg) if beat is not None else []
|
||||||
|
repaired_segments = []
|
||||||
|
source_segments = list(result.segments)
|
||||||
|
if beat is not None and len(source_segments) == 1 and len(current_islands) == 1:
|
||||||
|
island_start_s, island_end_s = current_islands[0]
|
||||||
|
island_duration_s = max(0.0, island_end_s - island_start_s)
|
||||||
|
segment = source_segments[0]
|
||||||
|
if (
|
||||||
|
abs(float(segment.trailer_offset_s) - island_start_s) > 0.04
|
||||||
|
or abs(float(segment.duration_s) - island_duration_s) > 0.08
|
||||||
|
):
|
||||||
|
from dataclasses import replace as _replace
|
||||||
|
source_segments[0] = _replace(
|
||||||
|
segment,
|
||||||
|
trailer_offset_s=island_start_s,
|
||||||
|
duration_s=island_duration_s,
|
||||||
|
out_point_s=float(segment.in_point_s) + island_duration_s,
|
||||||
|
)
|
||||||
|
for segment in source_segments:
|
||||||
|
if float(segment.match_score) < segment_threshold:
|
||||||
|
scene = _scene_by_id_light(scenes, segment.scene_id)
|
||||||
|
if beat is not None and scene is not None:
|
||||||
|
segment_beat = replace(
|
||||||
|
beat,
|
||||||
|
start_s=beat.start_s + float(segment.trailer_offset_s),
|
||||||
|
end_s=beat.start_s + float(segment.trailer_offset_s) + float(segment.duration_s),
|
||||||
|
)
|
||||||
|
probe = _phase_probe_segment_in_scene(
|
||||||
|
segment_beat,
|
||||||
|
scene,
|
||||||
|
float(segment.in_point_s),
|
||||||
|
cfg,
|
||||||
|
)
|
||||||
|
if probe is not None:
|
||||||
|
in_point_s, _phase_score = probe
|
||||||
|
segment = replace(
|
||||||
|
segment,
|
||||||
|
in_point_s=in_point_s,
|
||||||
|
out_point_s=in_point_s + float(segment.duration_s),
|
||||||
|
match_score=max(float(segment.match_score), float(_phase_score)),
|
||||||
|
is_confirmed=float(_phase_score) >= cfg.cv.deep_scan.match_threshold,
|
||||||
|
)
|
||||||
|
repaired_segments.append(segment)
|
||||||
|
|
||||||
|
valid_segments = tuple(repaired_segments)
|
||||||
|
if not valid_segments:
|
||||||
|
continue
|
||||||
|
segment_duration = sum(max(0.0, float(s.duration_s)) for s in valid_segments)
|
||||||
weighted_score = (
|
weighted_score = (
|
||||||
sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in result.segments)
|
sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in valid_segments)
|
||||||
/ segment_duration
|
/ segment_duration
|
||||||
if segment_duration > 0 else result.match_score
|
if segment_duration > 0 else result.match_score
|
||||||
)
|
)
|
||||||
@@ -290,7 +346,15 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
coverage = segment_duration / coverage_target
|
coverage = segment_duration / coverage_target
|
||||||
if coverage < cfg.cv.deep_scan.min_duration_coverage:
|
if coverage < cfg.cv.deep_scan.min_duration_coverage:
|
||||||
continue
|
continue
|
||||||
normalized.append(replace(result, match_score=weighted_score))
|
first_segment = valid_segments[0]
|
||||||
|
normalized.append(replace(
|
||||||
|
result,
|
||||||
|
scene_id=first_segment.scene_id,
|
||||||
|
in_point_s=first_segment.in_point_s,
|
||||||
|
out_point_s=first_segment.out_point_s,
|
||||||
|
match_score=weighted_score,
|
||||||
|
segments=valid_segments,
|
||||||
|
))
|
||||||
continue
|
continue
|
||||||
|
|
||||||
if result.match_score < cfg.cv.deep_scan.provisional_match_threshold:
|
if result.match_score < cfg.cv.deep_scan.provisional_match_threshold:
|
||||||
@@ -320,6 +384,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
|
|
||||||
fps = _scene_fps_light(scene, cfg)
|
fps = _scene_fps_light(scene, cfg)
|
||||||
adjusted_in_s = result.in_point_s
|
adjusted_in_s = result.in_point_s
|
||||||
|
phase_changed = False
|
||||||
scene_changed = int(scene["scene_id"]) != result.scene_id
|
scene_changed = int(scene["scene_id"]) != result.scene_id
|
||||||
starts_before_scene = result.in_point_s < float(scene["start_s"])
|
starts_before_scene = result.in_point_s < float(scene["start_s"])
|
||||||
if scene_changed or starts_before_scene or result.duration_s <= 0.12:
|
if scene_changed or starts_before_scene or result.duration_s <= 0.12:
|
||||||
@@ -328,6 +393,25 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
|
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
|
||||||
fps = _scene_fps_light(scene, cfg)
|
fps = _scene_fps_light(scene, cfg)
|
||||||
|
|
||||||
|
should_phase_probe = (
|
||||||
|
scene_changed
|
||||||
|
or starts_before_scene
|
||||||
|
or not result.is_confirmed
|
||||||
|
or result.match_score < cfg.cv.deep_scan.match_threshold
|
||||||
|
)
|
||||||
|
phase_score = result.match_score
|
||||||
|
if should_phase_probe:
|
||||||
|
probe = _phase_probe_segment_in_scene(beat, scene, adjusted_in_s, cfg)
|
||||||
|
if probe is not None:
|
||||||
|
probed_in_s, probed_score = probe
|
||||||
|
max_shift_s = max(0.12, min(0.75, beat.duration_s * 0.35))
|
||||||
|
if abs(probed_in_s - adjusted_in_s) <= max_shift_s:
|
||||||
|
adjusted_in_s = probed_in_s
|
||||||
|
phase_changed = True
|
||||||
|
phase_score = max(float(result.match_score), float(probed_score))
|
||||||
|
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
|
||||||
|
fps = _scene_fps_light(scene, cfg)
|
||||||
|
|
||||||
matchable_duration_s = beat.duration_s
|
matchable_duration_s = beat.duration_s
|
||||||
try:
|
try:
|
||||||
from src.cv.global_scan import estimate_matchable_reference_duration
|
from src.cv.global_scan import estimate_matchable_reference_duration
|
||||||
@@ -350,6 +434,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
if (
|
if (
|
||||||
scene_changed
|
scene_changed
|
||||||
or starts_before_scene
|
or starts_before_scene
|
||||||
|
or phase_changed
|
||||||
or result.duration_s <= 0.12
|
or result.duration_s <= 0.12
|
||||||
or result.out_point_s > adjusted_in_s + max_duration_s + (1.0 / fps)
|
or result.out_point_s > adjusted_in_s + max_duration_s + (1.0 / fps)
|
||||||
):
|
):
|
||||||
@@ -359,6 +444,8 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
|||||||
in_point_s=adjusted_in_s,
|
in_point_s=adjusted_in_s,
|
||||||
out_point_s=adjusted_in_s + max_duration_s,
|
out_point_s=adjusted_in_s + max_duration_s,
|
||||||
in_point_frame=int(adjusted_in_s * fps),
|
in_point_frame=int(adjusted_in_s * fps),
|
||||||
|
match_score=phase_score,
|
||||||
|
is_confirmed=phase_score >= cfg.cv.deep_scan.match_threshold,
|
||||||
)
|
)
|
||||||
|
|
||||||
coverage = (
|
coverage = (
|
||||||
@@ -549,7 +636,7 @@ def _reference_scoreable_segments(beat, cfg) -> list[tuple[float, float]]:
|
|||||||
t = 0.0
|
t = 0.0
|
||||||
while t <= beat.duration_s:
|
while t <= beat.duration_s:
|
||||||
frame = grab_frame_at_path(beat.trailer_path, beat.start_s + t)
|
frame = grab_frame_at_path(beat.trailer_path, beat.start_s + t)
|
||||||
scoreable = frame is not None and _is_scoreable_reference_frame(frame, cfg)
|
scoreable = frame is not None and is_visible(frame)
|
||||||
if scoreable:
|
if scoreable:
|
||||||
if start is None:
|
if start is None:
|
||||||
start = t
|
start = t
|
||||||
@@ -827,7 +914,7 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
|
|||||||
|
|
||||||
|
|
||||||
def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
|
def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
|
||||||
"""Try a vision-led search for beats that ended up without a match.
|
"""Try a vision-led search for beats that ended up weak or unmatched.
|
||||||
|
|
||||||
For each unmatched beat that has scoreable visual content (i.e. not pure
|
For each unmatched beat that has scoreable visual content (i.e. not pure
|
||||||
fade/title-card material), this pass:
|
fade/title-card material), this pass:
|
||||||
@@ -844,7 +931,7 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
Confirmed and provisional matches both stay subject to the same thresholds
|
Confirmed and provisional matches both stay subject to the same thresholds
|
||||||
used elsewhere; this only adds matches that pass the same quality gates.
|
used elsewhere; this only adds matches that pass the same quality gates.
|
||||||
"""
|
"""
|
||||||
if not cfg.vision.enabled or not beats:
|
if not beats:
|
||||||
return results
|
return results
|
||||||
|
|
||||||
from dataclasses import replace
|
from dataclasses import replace
|
||||||
@@ -855,17 +942,28 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
|
from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
matched_ids = {r.beat_id for r in results}
|
results_by_id = {r.beat_id: r for r in results}
|
||||||
unmatched = [b for b in beats if b.beat_id not in matched_ids]
|
recovery_targets = [
|
||||||
if not unmatched:
|
b for b in beats
|
||||||
|
if (
|
||||||
|
b.beat_id not in results_by_id
|
||||||
|
or (
|
||||||
|
not results_by_id[b.beat_id].is_confirmed
|
||||||
|
and results_by_id[b.beat_id].match_score < cfg.cv.deep_scan.match_threshold
|
||||||
|
)
|
||||||
|
)
|
||||||
|
]
|
||||||
|
if not recovery_targets:
|
||||||
return results
|
return results
|
||||||
|
|
||||||
scenes = build_scene_index(cfg)
|
scenes = build_scene_index(cfg)
|
||||||
if not scenes:
|
if not scenes:
|
||||||
return results
|
return results
|
||||||
|
|
||||||
new_results = list(results)
|
target_ids = {b.beat_id for b in recovery_targets}
|
||||||
for beat in unmatched:
|
new_results = [r for r in results if r.beat_id not in target_ids]
|
||||||
|
replaced_results = {r.beat_id: r for r in results if r.beat_id in target_ids}
|
||||||
|
for beat in recovery_targets:
|
||||||
try:
|
try:
|
||||||
islands = _reference_scoreable_segments(beat, cfg)
|
islands = _reference_scoreable_segments(beat, cfg)
|
||||||
except Exception:
|
except Exception:
|
||||||
@@ -902,6 +1000,79 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
|
|
||||||
scenes_by_id = {s.scene_id: s for s in scenes}
|
scenes_by_id = {s.scene_id: s for s in scenes}
|
||||||
best = None # (score, scene, in_s, dur_s, reason)
|
best = None # (score, scene, in_s, dur_s, reason)
|
||||||
|
try:
|
||||||
|
from src.llm.vision_cache import (
|
||||||
|
_load_cache,
|
||||||
|
_semantic_action_groups,
|
||||||
|
_semantic_match_score,
|
||||||
|
_STRONG_ACTION_GROUPS,
|
||||||
|
)
|
||||||
|
cache = _load_cache(cfg)
|
||||||
|
items = cache.get("items", {})
|
||||||
|
beat_desc = ""
|
||||||
|
if isinstance(items, dict):
|
||||||
|
for item in items.values():
|
||||||
|
if (
|
||||||
|
isinstance(item, dict)
|
||||||
|
and item.get("kind") == "beat"
|
||||||
|
and item.get("item_id") == beat.beat_id
|
||||||
|
):
|
||||||
|
beat_desc = str(item.get("description", ""))
|
||||||
|
break
|
||||||
|
beat_actions = _semantic_action_groups(beat_desc) & _STRONG_ACTION_GROUPS if beat_desc else set()
|
||||||
|
identity_vocab = {
|
||||||
|
"woman", "women", "man", "men", "girl", "boy", "child",
|
||||||
|
"blonde", "hair", "face", "mouth", "eyes", "profile",
|
||||||
|
"close-up", "closeup",
|
||||||
|
}
|
||||||
|
beat_identity = {term for term in identity_vocab if term in beat_desc.lower()}
|
||||||
|
distinctive_identity = {
|
||||||
|
term for term in ("woman", "women", "blonde", "mouth", "face")
|
||||||
|
if term in beat_desc.lower()
|
||||||
|
}
|
||||||
|
if beat_actions and isinstance(items, dict):
|
||||||
|
for item in items.values():
|
||||||
|
if not isinstance(item, dict) or item.get("kind") != "action_window":
|
||||||
|
continue
|
||||||
|
scene = scenes_by_id.get(item.get("item_id"))
|
||||||
|
desc = str(item.get("description", ""))
|
||||||
|
source_actions = _semantic_action_groups(desc)
|
||||||
|
if scene is None or not beat_actions <= source_actions:
|
||||||
|
continue
|
||||||
|
source_text = desc.lower()
|
||||||
|
positive_source_text = source_text.split('"negatives"', 1)[0]
|
||||||
|
identity_overlap = {term for term in beat_identity if term in source_text}
|
||||||
|
if len(beat_identity) >= 2 and len(identity_overlap) < 2:
|
||||||
|
continue
|
||||||
|
if distinctive_identity and not any(term in positive_source_text for term in distinctive_identity):
|
||||||
|
continue
|
||||||
|
if "mouth" in beat_desc.lower() and "mouth" not in positive_source_text:
|
||||||
|
continue
|
||||||
|
if "dark interior" in beat_desc.lower() and (
|
||||||
|
"interior" not in positive_source_text or "dark" not in positive_source_text
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
score, reason = _semantic_match_score(beat_desc, desc)
|
||||||
|
if score < max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
in_s = float(item.get("start_s"))
|
||||||
|
out_s = float(item.get("end_s"))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
duration_s = max(0.32, min(anchor_beat.duration_s, out_s - in_s))
|
||||||
|
candidate = (
|
||||||
|
min(0.99, score),
|
||||||
|
scene,
|
||||||
|
in_s,
|
||||||
|
duration_s,
|
||||||
|
f"cached vision action; {reason}",
|
||||||
|
)
|
||||||
|
if best is None or candidate[0] > best[0]:
|
||||||
|
best = candidate
|
||||||
|
except Exception as exc:
|
||||||
|
logger.debug("Beat %d: cached vision fallback failed (%s)", beat.beat_id, exc)
|
||||||
|
|
||||||
seen = set()
|
seen = set()
|
||||||
for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
|
for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
|
||||||
scene = scenes_by_id.get(hit.scene_id)
|
scene = scenes_by_id.get(hit.scene_id)
|
||||||
@@ -928,7 +1099,10 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
)
|
)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
|
logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
|
||||||
continue
|
aligned_in_s = start_s
|
||||||
|
combined_score = semantic_score
|
||||||
|
content_score = 0.0
|
||||||
|
motion_score = 0.0
|
||||||
aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
|
aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -958,6 +1132,8 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
combined_score,
|
combined_score,
|
||||||
min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
|
min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
|
||||||
)
|
)
|
||||||
|
if semantic_score >= max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
|
||||||
|
final_score = max(final_score, semantic_score)
|
||||||
if final_score < cfg.cv.deep_scan.provisional_match_threshold:
|
if final_score < cfg.cv.deep_scan.provisional_match_threshold:
|
||||||
continue
|
continue
|
||||||
candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
|
candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
|
||||||
@@ -965,6 +1141,9 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
best = candidate
|
best = candidate
|
||||||
|
|
||||||
if best is None:
|
if best is None:
|
||||||
|
previous = replaced_results.get(beat.beat_id)
|
||||||
|
if previous is not None:
|
||||||
|
new_results.append(previous)
|
||||||
continue
|
continue
|
||||||
score, scene, aligned_in_s, usable_duration_s, repair_reason = best
|
score, scene, aligned_in_s, usable_duration_s, repair_reason = best
|
||||||
logger.info(
|
logger.info(
|
||||||
@@ -991,6 +1170,97 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
|||||||
return sorted(new_results, key=lambda r: r.beat_id)
|
return sorted(new_results, key=lambda r: r.beat_id)
|
||||||
|
|
||||||
|
|
||||||
|
def _recover_short_lowlight_vibe_matches(results: list, beats: list, cfg) -> list:
|
||||||
|
"""Keep obvious short low-light scene hits as provisional instead of no-match.
|
||||||
|
|
||||||
|
Short blue/dark dialogue shots can be correctly ranked by scene-level
|
||||||
|
histogram/pHash but then rejected by the stricter content aligner because
|
||||||
|
the shot contains little texture, motion blur, or trailer timecode overlay.
|
||||||
|
This fallback only accepts the top vibe scene when it has a clear margin and
|
||||||
|
the local content scan still finds a usable in-point.
|
||||||
|
"""
|
||||||
|
from src.core.models import MatchResult, Scene
|
||||||
|
from src.cv.global_scan import _content_alignment_score, _content_alignment_templates
|
||||||
|
from src.cv.vibe_check import run_vibe_check
|
||||||
|
from src.cv.frame_extractor import open_video
|
||||||
|
|
||||||
|
matched_ids = {r.beat_id for r in results}
|
||||||
|
targets = [b for b in beats if b.beat_id not in matched_ids and b.duration_s <= 2.25]
|
||||||
|
if not targets:
|
||||||
|
return results
|
||||||
|
|
||||||
|
raw_scenes = _load_scene_cache_light(cfg)
|
||||||
|
scenes = [
|
||||||
|
Scene(
|
||||||
|
scene_id=int(s["scene_id"]),
|
||||||
|
source_path=cfg.paths.source_movie,
|
||||||
|
start_s=float(s["start_s"]),
|
||||||
|
end_s=float(s["end_s"]),
|
||||||
|
start_frame=int(s["start_frame"]),
|
||||||
|
end_frame=int(s["end_frame"]),
|
||||||
|
luma_hist=bytes.fromhex(s["luma_hist"]) if s.get("luma_hist") else None,
|
||||||
|
sat_hist=bytes.fromhex(s["sat_hist"]) if s.get("sat_hist") else None,
|
||||||
|
phash=s.get("phash"),
|
||||||
|
)
|
||||||
|
for s in raw_scenes
|
||||||
|
]
|
||||||
|
scenes_by_id = {s.scene_id: s for s in scenes}
|
||||||
|
recovered = list(results)
|
||||||
|
|
||||||
|
with open_video(cfg.paths.source_movie) as cap:
|
||||||
|
for beat in targets:
|
||||||
|
templates = _content_alignment_templates(beat, cfg)
|
||||||
|
if not templates:
|
||||||
|
continue
|
||||||
|
hits = run_vibe_check(
|
||||||
|
beat,
|
||||||
|
scenes,
|
||||||
|
top_k=6,
|
||||||
|
hist_method=cfg.cv.vibe_check.hist_compare_method,
|
||||||
|
phash_max_distance=64,
|
||||||
|
)
|
||||||
|
if len(hits) < 2:
|
||||||
|
continue
|
||||||
|
top, second = hits[0], hits[1]
|
||||||
|
if top.combined_score < 0.74 or top.combined_score - second.combined_score < 0.03:
|
||||||
|
continue
|
||||||
|
scene = scenes_by_id.get(top.scene_id)
|
||||||
|
if scene is None or scene.duration_s < max(0.5, beat.duration_s):
|
||||||
|
continue
|
||||||
|
|
||||||
|
best: tuple[float, float] | None = None
|
||||||
|
scan_end = max(scene.start_s, scene.end_s - beat.duration_s)
|
||||||
|
step_s = 0.12
|
||||||
|
t = scene.start_s
|
||||||
|
while t <= scan_end:
|
||||||
|
score = _content_alignment_score(cap, t, templates, cfg)
|
||||||
|
if best is None or score > best[0]:
|
||||||
|
best = (score, t)
|
||||||
|
t = round(t + step_s, 6)
|
||||||
|
if best is None or best[0] < 0.15:
|
||||||
|
continue
|
||||||
|
|
||||||
|
content_score, in_point_s = best
|
||||||
|
final_score = max(
|
||||||
|
cfg.cv.deep_scan.provisional_match_threshold,
|
||||||
|
min(0.64, top.combined_score * 0.55 + content_score * 0.45),
|
||||||
|
)
|
||||||
|
recovered.append(MatchResult(
|
||||||
|
beat_id=beat.beat_id,
|
||||||
|
scene_id=scene.scene_id,
|
||||||
|
source_path=scene.source_path,
|
||||||
|
in_point_s=in_point_s,
|
||||||
|
out_point_s=in_point_s + beat.duration_s,
|
||||||
|
in_point_frame=int(in_point_s * cfg.export.edl_frame_rate),
|
||||||
|
match_score=final_score,
|
||||||
|
match_location=(0, 0),
|
||||||
|
is_confirmed=False,
|
||||||
|
segments=tuple(),
|
||||||
|
))
|
||||||
|
|
||||||
|
return sorted(recovered, key=lambda r: r.beat_id)
|
||||||
|
|
||||||
|
|
||||||
def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
|
def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
|
||||||
"""Drop vision-enabled matches whose final action phase contradicts the beat."""
|
"""Drop vision-enabled matches whose final action phase contradicts the beat."""
|
||||||
if not cfg.vision.enabled or not results:
|
if not cfg.vision.enabled or not results:
|
||||||
@@ -1366,6 +1636,41 @@ def _attach_visual_segments(results: list, beats: list, cfg) -> list:
|
|||||||
if not segment_matches:
|
if not segment_matches:
|
||||||
continue
|
continue
|
||||||
seg = segment_matches[0]
|
seg = segment_matches[0]
|
||||||
|
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
|
||||||
|
repaired = _local_same_scene_segment_match(
|
||||||
|
segment_beat,
|
||||||
|
beat,
|
||||||
|
start_s,
|
||||||
|
cached + expanded,
|
||||||
|
cfg,
|
||||||
|
)
|
||||||
|
if (
|
||||||
|
repaired is None
|
||||||
|
or repaired.match_score
|
||||||
|
< max(
|
||||||
|
cfg.cv.deep_scan.multi_shot_segment_threshold,
|
||||||
|
seg.match_score + cfg.cv.deep_scan.duration_tie_break_score_delta,
|
||||||
|
)
|
||||||
|
):
|
||||||
|
scenes = _load_scene_cache_light(cfg)
|
||||||
|
scene = _scene_by_id_light(scenes, seg.scene_id)
|
||||||
|
probe = (
|
||||||
|
_phase_probe_segment_in_scene(segment_beat, scene, seg.in_point_s, cfg)
|
||||||
|
if scene is not None else None
|
||||||
|
)
|
||||||
|
if probe is None:
|
||||||
|
continue
|
||||||
|
in_point_s, _phase_score = probe
|
||||||
|
from dataclasses import replace as _replace
|
||||||
|
seg = _replace(
|
||||||
|
seg,
|
||||||
|
in_point_s=in_point_s,
|
||||||
|
out_point_s=in_point_s + seg.duration_s,
|
||||||
|
match_score=max(seg.match_score, _phase_score),
|
||||||
|
is_confirmed=_phase_score >= cfg.cv.deep_scan.match_threshold,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
seg = repaired
|
||||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
||||||
segments.append(
|
segments.append(
|
||||||
MatchSegment(
|
MatchSegment(
|
||||||
@@ -1466,21 +1771,12 @@ def _match_unmatched_visual_segments(
|
|||||||
start_s=beat.start_s + start_s,
|
start_s=beat.start_s + start_s,
|
||||||
end_s=beat.start_s + end_s,
|
end_s=beat.start_s + end_s,
|
||||||
)
|
)
|
||||||
if island_idx == 0:
|
continuity = _continuity_seed_in_points(
|
||||||
# First island of an unmatched multi-shot beat: search globally
|
beat.beat_id,
|
||||||
# without a continuity bias from the previous beat. Continuity
|
[b if b.beat_id != beat.beat_id else segment_beat for b in beats],
|
||||||
# assumes the shot follows the previous beat in the source, but
|
cached + expanded,
|
||||||
# the lead shot of a multi-shot beat is often an insert cut from
|
cfg,
|
||||||
# a completely different scene. A wrong seed with score 0.92
|
)
|
||||||
# would push the real match out of the refinement candidate pool.
|
|
||||||
continuity = {}
|
|
||||||
else:
|
|
||||||
continuity = _continuity_seed_in_points(
|
|
||||||
beat.beat_id,
|
|
||||||
[b if b.beat_id != beat.beat_id else segment_beat for b in beats],
|
|
||||||
cached + expanded,
|
|
||||||
cfg,
|
|
||||||
)
|
|
||||||
segment_matches = []
|
segment_matches = []
|
||||||
if beat.beat_id not in skip_global_segment_scan_for:
|
if beat.beat_id not in skip_global_segment_scan_for:
|
||||||
segment_matches = _run_segment_match(segment_beat, continuity, cfg, allow_fullscan=True)
|
segment_matches = _run_segment_match(segment_beat, continuity, cfg, allow_fullscan=True)
|
||||||
@@ -1496,7 +1792,10 @@ def _match_unmatched_visual_segments(
|
|||||||
if recovered:
|
if recovered:
|
||||||
rec = recovered[0]
|
rec = recovered[0]
|
||||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, rec.duration_s))
|
seg_dur = min(max(0.0, end_s - start_s), max(0.0, rec.duration_s))
|
||||||
if seg_dur > 0:
|
if (
|
||||||
|
seg_dur > 0
|
||||||
|
and rec.match_score >= cfg.cv.deep_scan.multi_shot_segment_threshold
|
||||||
|
):
|
||||||
segments.append(MatchSegment(
|
segments.append(MatchSegment(
|
||||||
trailer_offset_s=start_s,
|
trailer_offset_s=start_s,
|
||||||
duration_s=seg_dur,
|
duration_s=seg_dur,
|
||||||
@@ -1518,6 +1817,8 @@ def _match_unmatched_visual_segments(
|
|||||||
segments.append(local_segment)
|
segments.append(local_segment)
|
||||||
continue
|
continue
|
||||||
seg = segment_matches[0]
|
seg = segment_matches[0]
|
||||||
|
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
|
||||||
|
continue
|
||||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
||||||
segments.append(
|
segments.append(
|
||||||
MatchSegment(
|
MatchSegment(
|
||||||
@@ -1589,7 +1890,13 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
|||||||
cfg.cv.deep_scan.provisional_content_threshold * 0.70,
|
cfg.cv.deep_scan.provisional_content_threshold * 0.70,
|
||||||
cfg.cv.deep_scan.provisional_match_threshold,
|
cfg.cv.deep_scan.provisional_match_threshold,
|
||||||
)
|
)
|
||||||
step_s = max(1.0 / cfg.export.edl_frame_rate, 0.04)
|
# Coarse repair scan over already plausible neighbouring scenes. A frame-step
|
||||||
|
# sweep across long dialogue scenes is slow and can overfit static layouts.
|
||||||
|
step_s = max(
|
||||||
|
cfg.vision.local_scan_step_s,
|
||||||
|
cfg.cv.deep_scan.content_align_sample_step_s,
|
||||||
|
0.25,
|
||||||
|
)
|
||||||
best: tuple[float, float, int] | None = None
|
best: tuple[float, float, int] | None = None
|
||||||
with open_video(cfg.paths.source_movie) as cap:
|
with open_video(cfg.paths.source_movie) as cap:
|
||||||
for scene_id in scene_ids:
|
for scene_id in scene_ids:
|
||||||
@@ -1598,12 +1905,14 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
|||||||
continue
|
continue
|
||||||
start_s = max(0.0, float(scene["start_s"]) - 0.25)
|
start_s = max(0.0, float(scene["start_s"]) - 0.25)
|
||||||
end_s = max(start_s, float(scene["end_s"]) - max(0.04, segment_beat.duration_s) + 0.25)
|
end_s = max(start_s, float(scene["end_s"]) - max(0.04, segment_beat.duration_s) + 0.25)
|
||||||
|
max_points = max(4, min(48, int(cfg.vision.local_scan_max_points_per_scene)))
|
||||||
|
scene_step_s = max(step_s, (end_s - start_s) / max_points)
|
||||||
t = start_s
|
t = start_s
|
||||||
while t <= end_s:
|
while t <= end_s:
|
||||||
score = _content_alignment_score(cap, t, templates, cfg)
|
score = _content_alignment_score(cap, t, templates, cfg)
|
||||||
if best is None or score > best[0]:
|
if best is None or score > best[0]:
|
||||||
best = (score, t, int(scene_id))
|
best = (score, t, int(scene_id))
|
||||||
t = round(t + step_s, 6)
|
t = round(t + scene_step_s, 6)
|
||||||
|
|
||||||
if best is None or best[0] < min_score:
|
if best is None or best[0] < min_score:
|
||||||
return None
|
return None
|
||||||
@@ -1621,6 +1930,186 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _phase_probe_segment_in_scene(segment_beat, scene: dict, original_in_s: float, cfg):
|
||||||
|
"""Retune a weak multi-shot segment inside its own scene using saliency-weighted frames."""
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
offsets = [0.0, 0.16, 0.32, 0.48, 0.64, 0.80, 0.96, 1.12]
|
||||||
|
size = (160, 90)
|
||||||
|
|
||||||
|
def prepared_gray(frame):
|
||||||
|
if frame is None:
|
||||||
|
return None
|
||||||
|
h, w = frame.shape[:2]
|
||||||
|
frame = frame.copy()
|
||||||
|
# Timecode overlays and letterbox edges are trailer/source-specific and
|
||||||
|
# should not pull the phase toward the wrong moment.
|
||||||
|
frame[: int(h * 0.16), : int(w * 0.32)] = 0
|
||||||
|
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
|
||||||
|
gray = cv2.resize(gray, size)
|
||||||
|
return cv2.equalizeHist(gray).astype("float32") / 255.0
|
||||||
|
|
||||||
|
def edge(gray):
|
||||||
|
return cv2.Canny((gray * 255).astype("uint8"), 45, 130).astype("float32") / 255.0
|
||||||
|
|
||||||
|
def pair_score(ref_gray, src_gray, mask):
|
||||||
|
if ref_gray is None or src_gray is None:
|
||||||
|
return None
|
||||||
|
pixel = 1.0 - float((np.abs(ref_gray - src_gray) * mask).sum())
|
||||||
|
edge_score = 1.0 - float((np.abs(edge(ref_gray) - edge(src_gray)) * mask).sum())
|
||||||
|
return 0.65 * pixel + 0.35 * edge_score
|
||||||
|
|
||||||
|
def frame_at(cap, t_s):
|
||||||
|
cap.set(cv2.CAP_PROP_POS_MSEC, t_s * 1000.0)
|
||||||
|
ok, frame = cap.read()
|
||||||
|
return frame if ok else None
|
||||||
|
|
||||||
|
trailer_cap = cv2.VideoCapture(str(cfg.paths.reference_trailer))
|
||||||
|
ref_candidates = []
|
||||||
|
fallback_items = []
|
||||||
|
for offset in offsets:
|
||||||
|
if offset > segment_beat.duration_s + 0.04:
|
||||||
|
continue
|
||||||
|
frame = frame_at(trailer_cap, segment_beat.start_s + offset)
|
||||||
|
ref = prepared_gray(frame)
|
||||||
|
if ref is None:
|
||||||
|
continue
|
||||||
|
fallback_items.append((offset, ref))
|
||||||
|
raw_gray = cv2.resize(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), size)
|
||||||
|
h, w = raw_gray.shape[:2]
|
||||||
|
raw_gray[: int(h * 0.16), : int(w * 0.32)] = 0
|
||||||
|
roi = raw_gray[int(h * 0.12) : int(h * 0.90), :]
|
||||||
|
mean_luma = float(roi.mean() / 255.0)
|
||||||
|
p90_luma = float(np.percentile(roi, 90) / 255.0)
|
||||||
|
contrast = float(roi.std() / 255.0)
|
||||||
|
ref_candidates.append((offset, ref, mean_luma, p90_luma, contrast))
|
||||||
|
|
||||||
|
transition_start = False
|
||||||
|
ref_items = []
|
||||||
|
if ref_candidates:
|
||||||
|
max_mean = max(item[2] for item in ref_candidates)
|
||||||
|
max_p90 = max(item[3] for item in ref_candidates)
|
||||||
|
transition_start = (
|
||||||
|
ref_candidates[0][2] < max_mean * 0.90
|
||||||
|
or ref_candidates[0][3] < max_p90 * 0.90
|
||||||
|
)
|
||||||
|
ref_items = [
|
||||||
|
(offset, ref)
|
||||||
|
for offset, ref, mean_luma, p90_luma, contrast in ref_candidates
|
||||||
|
if (
|
||||||
|
mean_luma >= max(0.16, max_mean * 0.82)
|
||||||
|
and p90_luma >= max(0.28, max_p90 * 0.86)
|
||||||
|
and contrast >= 0.035
|
||||||
|
)
|
||||||
|
]
|
||||||
|
if len(ref_items) < 4:
|
||||||
|
ref_items = fallback_items
|
||||||
|
if len(ref_items) < 4:
|
||||||
|
return None
|
||||||
|
ref_offsets = [item[0] for item in ref_items]
|
||||||
|
refs = [item[1] for item in ref_items]
|
||||||
|
|
||||||
|
align_offset = ref_offsets[0]
|
||||||
|
ref_offsets = [offset - align_offset for offset in ref_offsets]
|
||||||
|
|
||||||
|
ref_stack = np.stack(refs, axis=0)
|
||||||
|
edge_stack = np.stack([edge(ref) for ref in refs], axis=0)
|
||||||
|
# Static window/room edges are useful for finding the scene, but toxic for
|
||||||
|
# phase retuning inside a repeated dialogue shot. Bias the mask toward
|
||||||
|
# areas that actually change across the reference segment.
|
||||||
|
saliency = ref_stack.std(axis=0) * 3.0 + edge_stack.std(axis=0) * 0.75 + edge_stack.mean(axis=0) * 0.15
|
||||||
|
saliency[:, : int(size[0] * 0.12)] *= 0.15
|
||||||
|
saliency[: int(size[1] * 0.16), : int(size[0] * 0.32)] = 0.0
|
||||||
|
threshold = np.quantile(saliency, 0.66)
|
||||||
|
mask = (saliency >= threshold).astype("float32")
|
||||||
|
mask /= mask.sum() + 1e-6
|
||||||
|
|
||||||
|
scene_start = float(scene["start_s"])
|
||||||
|
scene_end = float(scene["end_s"])
|
||||||
|
center_t = max(scene_start, min(scene_end, original_in_s + align_offset))
|
||||||
|
retune_radius_s = max(4.0, min(12.0, segment_beat.duration_s * 2.5))
|
||||||
|
scan_start = max(scene_start, center_t - retune_radius_s)
|
||||||
|
scene_scan_end = min(scene_end, center_t + retune_radius_s)
|
||||||
|
scan_end = max(scan_start, scene_scan_end - max(0.04, segment_beat.duration_s - align_offset))
|
||||||
|
max_points = 400
|
||||||
|
step_s = max(0.04, (scan_end - scan_start) / max_points)
|
||||||
|
|
||||||
|
source_cap = cv2.VideoCapture(str(cfg.paths.source_movie))
|
||||||
|
source_fps = source_cap.get(cv2.CAP_PROP_FPS) or _scene_fps_light(scene, cfg)
|
||||||
|
stride = max(1, int(round(step_s * source_fps)))
|
||||||
|
start_frame = max(0, int(round(scan_start * source_fps)))
|
||||||
|
end_frame = max(start_frame, int(round(scene_scan_end * source_fps)))
|
||||||
|
times: list[float] = []
|
||||||
|
source_frames: list = []
|
||||||
|
frame_idx = start_frame
|
||||||
|
while frame_idx <= end_frame:
|
||||||
|
source_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
|
||||||
|
ok, frame = source_cap.read()
|
||||||
|
if not ok:
|
||||||
|
break
|
||||||
|
times.append(frame_idx / source_fps)
|
||||||
|
source_frames.append(prepared_gray(frame))
|
||||||
|
frame_idx += stride
|
||||||
|
base_time = times[0] if times else scan_start
|
||||||
|
|
||||||
|
candidates: list[tuple[float, float, float]] = []
|
||||||
|
for i, t in enumerate(times):
|
||||||
|
if t > scan_end:
|
||||||
|
break
|
||||||
|
vals = []
|
||||||
|
src_for_offsets = []
|
||||||
|
for offset, ref in zip(ref_offsets, refs):
|
||||||
|
j = int(round((t + offset - base_time) / step_s))
|
||||||
|
if 0 <= j < len(source_frames):
|
||||||
|
src = source_frames[j]
|
||||||
|
score = pair_score(ref, src, mask)
|
||||||
|
else:
|
||||||
|
src = None
|
||||||
|
score = None
|
||||||
|
if score is not None:
|
||||||
|
vals.append(score)
|
||||||
|
src_for_offsets.append(src)
|
||||||
|
if len(vals) >= 4:
|
||||||
|
avg_score = sum(vals) / len(vals)
|
||||||
|
early_count = min(2, len(vals))
|
||||||
|
tail_count = min(2, len(vals))
|
||||||
|
early_score = sum(vals[:early_count]) / early_count
|
||||||
|
tail_score = sum(vals[-tail_count:]) / tail_count
|
||||||
|
motion_vals = []
|
||||||
|
for idx in range(1, min(len(refs), len(src_for_offsets))):
|
||||||
|
if src_for_offsets[idx - 1] is None or src_for_offsets[idx] is None:
|
||||||
|
continue
|
||||||
|
ref_motion = refs[idx] - refs[idx - 1]
|
||||||
|
src_motion = src_for_offsets[idx] - src_for_offsets[idx - 1]
|
||||||
|
motion_vals.append(1.0 - float((np.abs(ref_motion - src_motion) * mask).sum()))
|
||||||
|
motion_score = sum(motion_vals) / len(motion_vals) if motion_vals else avg_score
|
||||||
|
# Phase retuning must reject "same shot, wrong moment" matches.
|
||||||
|
# A plain average can hide a bad onset inside slow dialogue shots;
|
||||||
|
# keep the low-water mark, onset, and frame-to-frame motion influential.
|
||||||
|
phase_score = (
|
||||||
|
0.26 * avg_score
|
||||||
|
+ 0.24 * min(vals)
|
||||||
|
+ 0.24 * early_score
|
||||||
|
+ 0.08 * tail_score
|
||||||
|
+ 0.18 * motion_score
|
||||||
|
)
|
||||||
|
candidates.append((phase_score, min(vals), t))
|
||||||
|
|
||||||
|
if not candidates:
|
||||||
|
return None
|
||||||
|
|
||||||
|
candidates.sort(reverse=True)
|
||||||
|
best_score = candidates[0][0]
|
||||||
|
tie_window = 0.006 if transition_start else 0.002
|
||||||
|
near_tie = [c for c in candidates if c[0] >= best_score - tie_window]
|
||||||
|
if transition_start:
|
||||||
|
chosen = max(near_tie, key=lambda c: (c[1], c[0]))
|
||||||
|
else:
|
||||||
|
chosen = min(near_tie, key=lambda c: abs((c[2] - align_offset) - original_in_s))
|
||||||
|
return max(scene_start, chosen[2] - align_offset), chosen[0]
|
||||||
|
|
||||||
|
|
||||||
def cmd_match(args: argparse.Namespace, cfg) -> list:
|
def cmd_match(args: argparse.Namespace, cfg) -> list:
|
||||||
from src.pipeline.matcher import run_matching
|
from src.pipeline.matcher import run_matching
|
||||||
from dataclasses import replace
|
from dataclasses import replace
|
||||||
@@ -1694,6 +2183,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
|
|||||||
results = _attach_visual_segments(results, beats, cfg)
|
results = _attach_visual_segments(results, beats, cfg)
|
||||||
results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
|
results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
|
||||||
results = _recover_unmatched_beats_via_vision(results, beats, cfg)
|
results = _recover_unmatched_beats_via_vision(results, beats, cfg)
|
||||||
|
results = _recover_short_lowlight_vibe_matches(results, beats, cfg)
|
||||||
|
|
||||||
# A targeted one-beat match must NEVER delete or modify any other beat's
|
# A targeted one-beat match must NEVER delete or modify any other beat's
|
||||||
# cache entry. We deliberately re-load the raw cache from disk here so
|
# cache entry. We deliberately re-load the raw cache from disk here so
|
||||||
@@ -1720,7 +2210,8 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
|
|||||||
results_to_save = results
|
results_to_save = results
|
||||||
|
|
||||||
_save_results(results_to_save, cfg)
|
_save_results(results_to_save, cfg)
|
||||||
_regenerate_cutter_report(cfg)
|
force_report_beats = {int(args.beat)} if getattr(args, "beat", None) is not None else None
|
||||||
|
_regenerate_cutter_report(cfg, force_beats=force_report_beats)
|
||||||
|
|
||||||
print(f"\n✅ {len(results)} / {len(beats)} beats matched.")
|
print(f"\n✅ {len(results)} / {len(beats)} beats matched.")
|
||||||
for r in results:
|
for r in results:
|
||||||
@@ -1890,17 +2381,12 @@ def cmd_rematch(args: argparse.Namespace, cfg) -> None:
|
|||||||
|
|
||||||
|
|
||||||
def cmd_report(args: argparse.Namespace, cfg) -> None:
|
def cmd_report(args: argparse.Namespace, cfg) -> None:
|
||||||
from src.pipeline.reporter import generate_report
|
if getattr(args, "beat", None) is not None:
|
||||||
beats = _select_beats(_load_beats(cfg), getattr(args, "beat", None))
|
print(f"\n⚠️ Generating cutter report for all beats (ignoring --beat {args.beat}).")
|
||||||
beat_ids = {b.beat_id for b in beats} if getattr(args, "beat", None) is not None else None
|
|
||||||
results = _select_results(_normalize_cached_results(_load_beats(cfg), _load_results(cfg), cfg), beat_ids)
|
_regenerate_cutter_report(cfg)
|
||||||
out = generate_report(beats, results, cfg)
|
project_root = cfg.paths.cache_dir.parent
|
||||||
if getattr(args, "beat", None) is not None and not results:
|
print(f"\n✅ Report → {project_root / 'CUTTER_REPORT.html'} and CUTTER_REPORT.md")
|
||||||
print(
|
|
||||||
f"\n⚠️ Beat {args.beat} has no cached match yet. "
|
|
||||||
f"Run: python cli.py match --beat {args.beat}"
|
|
||||||
)
|
|
||||||
print(f"\n\u2705 Report \u2192 {out}")
|
|
||||||
|
|
||||||
|
|
||||||
def cmd_export(args: argparse.Namespace, cfg) -> None:
|
def cmd_export(args: argparse.Namespace, cfg) -> None:
|
||||||
@@ -1941,6 +2427,141 @@ def cmd_run(args: argparse.Namespace, cfg) -> None:
|
|||||||
cmd_export(args, cfg)
|
cmd_export(args, cfg)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_preview(args: argparse.Namespace, cfg) -> None:
|
||||||
|
"""Assemble a rough preview video from cached source matches, with original audio."""
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
results_path = _results_cache_path(cfg)
|
||||||
|
if not results_path.exists():
|
||||||
|
log.error("No match_results.json — run 'match' first.")
|
||||||
|
return
|
||||||
|
|
||||||
|
data = sorted(
|
||||||
|
json.loads(results_path.read_text(encoding="utf-8")),
|
||||||
|
key=lambda r: r["beat_id"],
|
||||||
|
)
|
||||||
|
|
||||||
|
beats_path = cfg.paths.cache_dir / "trailer_beats.json"
|
||||||
|
beats_by_id: dict = {}
|
||||||
|
if beats_path.exists():
|
||||||
|
for b in json.loads(beats_path.read_text(encoding="utf-8")):
|
||||||
|
beats_by_id[int(b["beat_id"])] = b
|
||||||
|
|
||||||
|
clip_width = 1280
|
||||||
|
fps = 25
|
||||||
|
out_dir = cfg.paths.output_dir / "preview_clips"
|
||||||
|
out_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
preview_out = cfg.paths.output_dir / "preview.mp4"
|
||||||
|
|
||||||
|
def _run(cmd: list, timeout: int = 120) -> bool:
|
||||||
|
r = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
|
||||||
|
if r.returncode != 0:
|
||||||
|
log.debug("ffmpeg stderr: %s", r.stderr[-600:])
|
||||||
|
return r.returncode == 0
|
||||||
|
|
||||||
|
def extract_with_audio(src: Path, start_s: float, duration_s: float, out: Path) -> bool:
|
||||||
|
preroll = 2.0 if start_s >= 2.0 else 0.0
|
||||||
|
input_seek = max(0.0, start_s - preroll)
|
||||||
|
accurate_seek = start_s - input_seek
|
||||||
|
return _run([
|
||||||
|
"ffmpeg", "-y", "-loglevel", "error",
|
||||||
|
"-ss", f"{input_seek:.3f}", "-i", str(src),
|
||||||
|
"-ss", f"{accurate_seek:.3f}", "-t", f"{max(0.04, duration_s):.3f}",
|
||||||
|
"-map", "0:v:0", "-map", "0:a:0",
|
||||||
|
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
|
||||||
|
"-vf", f"fps={fps},scale={clip_width}:-2,setsar=1,setpts=PTS-STARTPTS",
|
||||||
|
"-c:a", "aac", "-ar", "48000", "-ac", "2",
|
||||||
|
"-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
|
||||||
|
])
|
||||||
|
|
||||||
|
def black_silence(duration_s: float, out: Path) -> bool:
|
||||||
|
return _run([
|
||||||
|
"ffmpeg", "-y", "-loglevel", "error",
|
||||||
|
"-f", "lavfi", "-i", f"color=black:s={clip_width}x720:r={fps}",
|
||||||
|
"-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo",
|
||||||
|
"-t", f"{max(0.5, duration_s):.3f}",
|
||||||
|
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
|
||||||
|
"-c:a", "aac", "-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
|
||||||
|
])
|
||||||
|
|
||||||
|
def concat_clips(parts: list[Path], out: Path) -> bool:
|
||||||
|
lst = out.with_suffix(".txt")
|
||||||
|
lst.write_text(
|
||||||
|
"\n".join(f"file '{p.resolve().as_posix()}'" for p in parts),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
ok = _run([
|
||||||
|
"ffmpeg", "-y", "-loglevel", "error",
|
||||||
|
"-f", "concat", "-safe", "0", "-i", str(lst),
|
||||||
|
"-c", "copy", str(out),
|
||||||
|
], timeout=300)
|
||||||
|
lst.unlink(missing_ok=True)
|
||||||
|
return ok
|
||||||
|
|
||||||
|
beat_clips: list[Path] = []
|
||||||
|
|
||||||
|
for rec in data:
|
||||||
|
bid = int(rec["beat_id"])
|
||||||
|
segs = rec.get("segments", [])
|
||||||
|
src = Path(rec["source_path"]) if rec.get("source_path") else None
|
||||||
|
clip_out = out_dir / f"beat_{bid:02d}.mp4"
|
||||||
|
|
||||||
|
if src is None or not src.exists():
|
||||||
|
beat = beats_by_id.get(bid, {})
|
||||||
|
dur = max(0.5, float(beat.get("end_s", 1)) - float(beat.get("start_s", 0)))
|
||||||
|
log.info("Beat %02d: NO MATCH — black/silence %.2fs", bid, dur)
|
||||||
|
if black_silence(dur, clip_out):
|
||||||
|
beat_clips.append(clip_out)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if len(segs) >= 2:
|
||||||
|
parts: list[Path] = []
|
||||||
|
for idx, seg in enumerate(segs):
|
||||||
|
in_s = float(seg["in_point_s"])
|
||||||
|
dur = max(0.04, float(seg["out_point_s"]) - in_s)
|
||||||
|
seg_src = Path(seg["source_path"]) if seg.get("source_path") else src
|
||||||
|
part = out_dir / f"beat_{bid:02d}_seg{idx:02d}.mp4"
|
||||||
|
log.info("Beat %02d seg%d: scene=%s %.2fs–%.2fs", bid, idx, seg.get("scene_id"), in_s, in_s + dur)
|
||||||
|
if extract_with_audio(seg_src, in_s, dur, part):
|
||||||
|
parts.append(part)
|
||||||
|
if not parts:
|
||||||
|
log.warning("Beat %02d: no segments extracted", bid)
|
||||||
|
continue
|
||||||
|
if len(parts) == 1:
|
||||||
|
parts[0].rename(clip_out)
|
||||||
|
beat_clips.append(clip_out)
|
||||||
|
else:
|
||||||
|
if concat_clips(parts, clip_out):
|
||||||
|
beat_clips.append(clip_out)
|
||||||
|
for p in parts:
|
||||||
|
p.unlink(missing_ok=True)
|
||||||
|
else:
|
||||||
|
in_s = float(rec["in_point_s"])
|
||||||
|
beat = beats_by_id.get(bid, {})
|
||||||
|
beat_dur = float(beat["end_s"]) - float(beat["start_s"]) if beat else 0.0
|
||||||
|
source_dur = float(rec["out_point_s"]) - in_s
|
||||||
|
dur = max(0.04, beat_dur if beat_dur > 0.04 else source_dur)
|
||||||
|
log.info("Beat %02d: scene=%s %.2fs+%.2fs (trailer=%.2fs src=%.2fs)", bid, rec.get("scene_id"), in_s, dur, beat_dur, source_dur)
|
||||||
|
if extract_with_audio(src, in_s, dur, clip_out):
|
||||||
|
beat_clips.append(clip_out)
|
||||||
|
else:
|
||||||
|
log.warning("Beat %02d: extraction failed", bid)
|
||||||
|
|
||||||
|
if not beat_clips:
|
||||||
|
log.error("No clips extracted — aborting.")
|
||||||
|
return
|
||||||
|
|
||||||
|
log.info("Concatenating %d beat clips → %s", len(beat_clips), preview_out)
|
||||||
|
if concat_clips(beat_clips, preview_out):
|
||||||
|
size_mb = preview_out.stat().st_size / 1_048_576
|
||||||
|
log.info("Preview ready: %s (%.1f MB)", preview_out, size_mb)
|
||||||
|
print(f"\n Preview → {preview_out} ({size_mb:.1f} MB)")
|
||||||
|
else:
|
||||||
|
log.error("Final concat failed — per-beat clips are in %s", out_dir)
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Argument parser
|
# Argument parser
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -2011,6 +2632,12 @@ def _build_parser() -> argparse.ArgumentParser:
|
|||||||
p_run.add_argument("--beat", type=int,
|
p_run.add_argument("--beat", type=int,
|
||||||
help="Run match/report/export for only one cached beat")
|
help="Run match/report/export for only one cached beat")
|
||||||
|
|
||||||
|
# preview
|
||||||
|
sub.add_parser(
|
||||||
|
"preview",
|
||||||
|
help="Build output/preview.mp4 from cached matches — source clips with audio in beat order",
|
||||||
|
)
|
||||||
|
|
||||||
return parser
|
return parser
|
||||||
|
|
||||||
|
|
||||||
@@ -2035,6 +2662,7 @@ def main() -> None:
|
|||||||
"report": cmd_report,
|
"report": cmd_report,
|
||||||
"export": cmd_export,
|
"export": cmd_export,
|
||||||
"run": cmd_run,
|
"run": cmd_run,
|
||||||
|
"preview": cmd_preview,
|
||||||
}
|
}
|
||||||
|
|
||||||
handler = dispatch[args.command]
|
handler = dispatch[args.command]
|
||||||
|
|||||||
@@ -8,7 +8,7 @@
|
|||||||
[project]
|
[project]
|
||||||
name = "AI Trailer Generator v2"
|
name = "AI Trailer Generator v2"
|
||||||
version = "2.0.0"
|
version = "2.0.0"
|
||||||
log_level = "INFO" # DEBUG | INFO | WARNING | ERROR
|
log_level = "DEBUG" # DEBUG | INFO | WARNING | ERROR
|
||||||
|
|
||||||
# -----------------------------------------------------------------------------
|
# -----------------------------------------------------------------------------
|
||||||
# [paths] — External video sources (read-only access)
|
# [paths] — External video sources (read-only access)
|
||||||
@@ -86,7 +86,10 @@ span_score_weight = 0.15
|
|||||||
coarse_score_weight = 0.10
|
coarse_score_weight = 0.10
|
||||||
duration_score_weight = 0.20
|
duration_score_weight = 0.20
|
||||||
duration_tie_break_score_delta = 0.03
|
duration_tie_break_score_delta = 0.03
|
||||||
min_duration_coverage = 0.65
|
min_duration_coverage = 0.55
|
||||||
|
# Every visible sub-shot in a multi-shot beat must pass this stricter gate.
|
||||||
|
# A weak segment is left unmatched instead of being hidden by a strong neighbor.
|
||||||
|
multi_shot_segment_threshold = 0.50
|
||||||
continuity_seed_offsets_s = [-1.0, 0.0, 0.5, 1.0, 1.5, 2.0, 3.0]
|
continuity_seed_offsets_s = [-1.0, 0.0, 0.5, 1.0, 1.5, 2.0, 3.0]
|
||||||
scene_seed_top_k = 30
|
scene_seed_top_k = 30
|
||||||
scene_seed_points_per_scene = 6
|
scene_seed_points_per_scene = 6
|
||||||
@@ -183,7 +186,7 @@ local_scan_step_s = 0.12
|
|||||||
local_scan_max_points_per_scene = 180
|
local_scan_max_points_per_scene = 180
|
||||||
local_scan_top_candidates = 36
|
local_scan_top_candidates = 36
|
||||||
local_scan_tie_break_score_delta = 0.08
|
local_scan_tie_break_score_delta = 0.08
|
||||||
multi_shot_cut_corr_threshold = 0.20
|
multi_shot_cut_corr_threshold = 0.55
|
||||||
multi_shot_boundary_tolerance_s = 0.20
|
multi_shot_boundary_tolerance_s = 0.20
|
||||||
fullscan_fallback = false
|
fullscan_fallback = false
|
||||||
content_threshold = 0.22
|
content_threshold = 0.22
|
||||||
|
|||||||
@@ -132,8 +132,33 @@ bereits auf die sichtbare Aktionsphase ausgerichtet.
|
|||||||
Der Segment-Offset zählt nur über vorherige scorebare Bildinseln, nicht über
|
Der Segment-Offset zählt nur über vorherige scorebare Bildinseln, nicht über
|
||||||
schwarze oder blendige Lücken. Nach dem Retiming wird die nutzbare Source-
|
schwarze oder blendige Lücken. Nach dem Retiming wird die nutzbare Source-
|
||||||
Dauer erneut geschätzt; läuft die Source am Ende in eine sichtbar andere
|
Dauer erneut geschätzt; läuft die Source am Ende in eine sichtbar andere
|
||||||
Aktionsphase, wird der Clip gekürzt und der Rest bleibt Placeholder/Fade
|
Aktionsphase, wird der Treffer im Cutter-Report klar als phasenkritisch
|
||||||
statt einen falschen Bewegungsmoment zu zeigen.
|
markiert. Schwarz/Placeholder wird nur für wirklich ungematchte Trailer-
|
||||||
|
Bereiche oder Fades verwendet, nicht um sichtbare Kandidatenbewegung im Review
|
||||||
|
zu verstecken.
|
||||||
|
|
||||||
|
Diese Span-Schätzung ist strenger als der grobe Suchscore: Ein fast stehender
|
||||||
|
Anfang darf einen Match nicht retten, wenn spätere Frames sichtbar in eine
|
||||||
|
andere Gestik, Körperposition oder eintretende Figur driften. Stabile
|
||||||
|
Score-Plateaus dürfen nur verlängern, wenn sie noch nah genug am Anfangsniveau
|
||||||
|
liegen; sonst bleibt der Treffer vorläufig und muss neu gesucht oder visuell
|
||||||
|
geprüft werden. Der Review-Clip zeigt den Kandidaten weiterhin sichtbar, damit
|
||||||
|
Phasenfehler nicht durch Schwarz verdeckt werden.
|
||||||
|
|
||||||
|
Für Multi-Shot-Beats gilt zusätzlich eine Segment-Schwelle pro sichtbarer
|
||||||
|
Einstellung. Ein gutes erstes Segment darf kein zweites Segment mit schwachem
|
||||||
|
Score mitziehen. Segmente unter `multi_shot_segment_threshold` werden nicht als
|
||||||
|
stabile Wahrheit behandelt, sondern innerhalb derselben plausiblen Source-Scene
|
||||||
|
nachjustiert. Die Nachjustierung nutzt eine saliency-gewichtete Mehrframe-Prüfung:
|
||||||
|
Timecodes und statische Randbereiche werden entwertet, kontrastreiche und über
|
||||||
|
mehrere Trailerframes unterscheidbare Bildbereiche zählen stärker. Dadurch kann
|
||||||
|
eine schwache zweite Einstellung phasengenauer repariert werden, ohne den Fehler
|
||||||
|
durch Schwarzbild zu verdecken oder einen Beat manuell zu kuratieren.
|
||||||
|
|
||||||
|
Der Cutter-Report verwendet Clip-Caching. Bereits vorhandene Compare-Clips werden
|
||||||
|
wiederverwendet; bei gezielten Rematches wird nur der betroffene Beat neu gerendert
|
||||||
|
(`CUTTER_REPORT_FORCE_BEATS`). So bleibt der Report aktuell, ohne alle Beats jedes
|
||||||
|
Mal neu zu kodieren.
|
||||||
|
|
||||||
## Vision-Seeds vs. Vollscan
|
## Vision-Seeds vs. Vollscan
|
||||||
|
|
||||||
@@ -165,6 +190,56 @@ eine kurze Geste erst korrekt erkannt und anschließend in eine spätere
|
|||||||
ähnliche Körperhaltung verschoben wird. Wenn mehrere Vision-Kandidaten in
|
ähnliche Körperhaltung verschoben wird. Wenn mehrere Vision-Kandidaten in
|
||||||
derselben Source-Szene ähnlich gut scoren und die Beat-Dauer abdecken,
|
derselben Source-Szene ähnlich gut scoren und die Beat-Dauer abdecken,
|
||||||
bevorzugt der Matcher die frühere Phase.
|
bevorzugt der Matcher die frühere Phase.
|
||||||
|
Die Vision-Recovery läuft nicht nur für komplett fehlende Beats, sondern auch
|
||||||
|
für schwache unbestätigte Treffer. Gerade Low-Light-Beats dürfen nicht an einem
|
||||||
|
falschen dunklen CV-Treffer hängen bleiben, wenn der Cache semantisch eine
|
||||||
|
bessere Handlungsphase kennt.
|
||||||
|
Bei langen Source-Szenen prüft die Action-Window-Suche immer den Szenenanfang
|
||||||
|
und mehrere frühe Fenster, bevor sie gleichmäßig über die ganze Szene sampelt.
|
||||||
|
Damit gehen kurze Trailer-Aktionen am Anfang einer langen Szene nicht unter,
|
||||||
|
wenn der Rest der Szene aus Credits, Schwarzbild oder ruhigen Folgeframes
|
||||||
|
besteht.
|
||||||
|
Wenn ein Action-Window die starke Beat-Aktion explizit enthält, darf es eine
|
||||||
|
etwas niedrigere Textähnlichkeit haben; die Handlung zählt dann stärker als
|
||||||
|
Nebenwörter zu Licht, Bildausschnitt oder Stimmung.
|
||||||
|
Bereits gecachte Action-Windows einer Szene bleiben gültige Kandidaten, auch
|
||||||
|
wenn sich das aktuelle Sampling-Raster ändert. So verliert der Matcher keine
|
||||||
|
teuren Vision-Hinweise und muss dieselben Fenster nicht erneut beschreiben.
|
||||||
|
Wenn neue Vision-Calls deaktiviert sind, darf die Recovery vorhandene Cache-
|
||||||
|
Beschreibungen trotzdem lesen; das erzeugt keine API-Kosten und verhindert,
|
||||||
|
dass alte schwache CV-Treffer stehen bleiben.
|
||||||
|
Schlägt die CV-Feinjustierung bei einem semantisch klaren Low-Light-Fenster
|
||||||
|
fehl, bleibt das Action-Window als provisorischer Treffer erhalten. CV darf
|
||||||
|
einen dunklen Treffer verfeinern, aber nicht einen eindeutigen Cache-Hinweis
|
||||||
|
komplett verwerfen.
|
||||||
|
Zusätzlich kann Recovery vorhandene gecachte Action-Windows direkt über alle
|
||||||
|
Szenen ranken. Dieser schnelle Pfad vermeidet einen teuren Vollscan, wenn der
|
||||||
|
Cache bereits eine starke Aktion wie Hand-am-Mund, Kuss oder Blickwechsel
|
||||||
|
enthält.
|
||||||
|
Eindeutige Begriffe aus der Beat-Beschreibung wirken als harte Filter für
|
||||||
|
Vision-Fenster: `mouth` muss im Kandidaten wiederkehren, `dark interior` darf
|
||||||
|
nicht auf Outdoor-Material fallen, und markante Personenmerkmale wie `blonde`
|
||||||
|
bleiben bindend.
|
||||||
|
|
||||||
|
Der zusätzliche Hi-Res-Phasenrefine bleibt lokal um den bereits validierten
|
||||||
|
Inpoint und übernimmt nur klare Verbesserungen. Er darf keine ganze lange
|
||||||
|
Dialogszene nach ähnlichen Layouts durchsuchen, weil sonst dieselbe Location
|
||||||
|
mit anderer Gestik als falsche Phase gewinnen kann und die Laufzeit explodiert.
|
||||||
|
Die lokale Retune-Wertung nutzt deshalb nicht nur den mittleren Frame-Score,
|
||||||
|
sondern auch den schlechtesten Einzelvergleich, die ersten sichtbaren Frames
|
||||||
|
und die Frame-zu-Frame-Bewegung. Dadurch gewinnt nicht mehr ein späteres
|
||||||
|
Standbild derselben Einstellung, nur weil Fenster, Gesichter und Licht fast
|
||||||
|
identisch aussehen.
|
||||||
|
Unsichere Einzeltreffer ohne Segmentliste laufen ebenfalls durch diesen lokalen
|
||||||
|
Phasen-Probe. Das repariert alte Cache-Einträge, deren Szene korrekt ist, deren
|
||||||
|
Inpoint aber einige Frames in der Bewegung daneben liegt. Der Probe bleibt auf
|
||||||
|
kleine lokale Shifts begrenzt und wird nicht für jeden bestätigten Treffer
|
||||||
|
erzwungen, damit Report-Refreshes nicht zum Vollscan werden.
|
||||||
|
Report-Clips werden zusätzlich an den bekannten Source-Szenenstart plus eine
|
||||||
|
sehr kurze Ein-Frame-Guard-Zone geklemmt, damit ein knapp vor oder direkt auf
|
||||||
|
der Schnittkante liegender Inpoint nicht mit Frames der vorherigen Einstellung
|
||||||
|
beginnt. Die Guard-Zone bleibt bewusst klein, weil eine längere Korrektur die
|
||||||
|
sichtbare Bewegungsphase innerhalb derselben Einstellung verschieben würde.
|
||||||
|
|
||||||
## Multi-Shot-Beats
|
## Multi-Shot-Beats
|
||||||
|
|
||||||
@@ -175,6 +250,13 @@ nur wenn die relative Source-Grenze zeitlich zu einem erkannten Trailer-
|
|||||||
Umschnitt passt. So kann ein Beat aus Frage/Antwort-Shots vollständig erfasst
|
Umschnitt passt. So kann ein Beat aus Frage/Antwort-Shots vollständig erfasst
|
||||||
werden, ohne Szenen willkürlich zusammenzukleben.
|
werden, ohne Szenen willkürlich zusammenzukleben.
|
||||||
|
|
||||||
|
## Titel- und Grafikbeats
|
||||||
|
|
||||||
|
Dunkle Trailerkarten mit deutlich isoliertem Text werden im Cutter-Report als
|
||||||
|
`GFX` markiert, wenn es keinen Source-Treffer gibt. Diese Beats sind keine
|
||||||
|
fehlgeschlagenen Matches: Der Cutter soll die Trailer-Grafik beziehungsweise
|
||||||
|
eine NLE-Titelkarte übernehmen und nicht im Spielfilm nach einem Bild suchen.
|
||||||
|
|
||||||
## Reranking-Pipeline
|
## Reranking-Pipeline
|
||||||
|
|
||||||
Vor dem teuren Frame-Refine wird der gesamte Kandidatenpool mit einer
|
Vor dem teuren Frame-Refine wird der gesamte Kandidatenpool mit einer
|
||||||
@@ -296,3 +378,4 @@ bzw. letzten scorebaren Frame derselben Einstellung passen.
|
|||||||
|
|
||||||
Treffer unter `provisional_content_threshold` werden nicht mehr gespeichert
|
Treffer unter `provisional_content_threshold` werden nicht mehr gespeichert
|
||||||
oder aus alten Cache-Ergebnissen übernommen.
|
oder aus alten Cache-Ergebnissen übernommen.
|
||||||
|
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 9.1 KiB After Width: | Height: | Size: 9.1 KiB |
|
Before Width: | Height: | Size: 2.0 KiB After Width: | Height: | Size: 2.0 KiB |
|
Before Width: | Height: | Size: 8.4 KiB After Width: | Height: | Size: 8.4 KiB |
|
Before Width: | Height: | Size: 9.0 KiB After Width: | Height: | Size: 9.0 KiB |
|
Before Width: | Height: | Size: 8.4 KiB After Width: | Height: | Size: 8.4 KiB |
|
Before Width: | Height: | Size: 8.9 KiB After Width: | Height: | Size: 8.9 KiB |
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 9.9 KiB |
|
Before Width: | Height: | Size: 8.2 KiB After Width: | Height: | Size: 8.2 KiB |
|
Before Width: | Height: | Size: 9.9 KiB After Width: | Height: | Size: 9.9 KiB |
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 4.1 KiB After Width: | Height: | Size: 4.1 KiB |
|
Before Width: | Height: | Size: 6.5 KiB After Width: | Height: | Size: 6.5 KiB |
|
Before Width: | Height: | Size: 8.3 KiB After Width: | Height: | Size: 8.3 KiB |
|
Before Width: | Height: | Size: 8.6 KiB After Width: | Height: | Size: 8.6 KiB |
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
Before Width: | Height: | Size: 4.4 KiB After Width: | Height: | Size: 4.4 KiB |
|
Before Width: | Height: | Size: 5.7 KiB After Width: | Height: | Size: 5.7 KiB |