Compare commits

..

2 Commits

Author SHA1 Message Date
Melbar 97a8f9e305 Add cutter report and auto-regen on each match
- New CUTTER_REPORT.md: per-beat hand-off table for the video editor doing
  the manual recut. Per beat: trailer SMPTE in/out, source SMPTE in/out,
  scene id, score, status (OK / ? / MAN.), and a one-line phase
  description from the cached vision text.
- New scripts/generate_cutter_report.py: pure renderer that reads the
  current cache (match_results.json + trailer_beats.json + optional
  vision_descriptions.json) and writes CUTTER_REPORT.md. No side effects on
  the cache.
- cli.py: after every successful match the cutter report is regenerated
  automatically (best-effort; failures are logged and do not abort).
- README.md: new top-section "Fuer den Cutter" describing exactly what the
  editor needs (which two files to look at, how the status flag works,
  the recommended NLE workflow). The technical algorithm description
  follows below.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 13:09:16 +02:00
Melbar 06a2326bf1 Broaden phase realign and add unmatched-beat recovery
- Phase realign for matched results: drop the "long scene" gate (>1.6x
  segment, >=6s) in favor of "scene has any meaningful slack beyond the
  segment". Already-confirmed segments in tight scenes are still skipped to
  protect strong matches. A repair is only committed if the new score is
  not meaningfully worse than the original (>=score-0.02).

- Recovery stage for unmatched beats: vibe-check (CV) feeds top-K candidate
  scenes into the semantic action-window search; CV alignment + vision phase
  validate gate decide whether the candidate becomes a provisional match.
  Beats without scoreable visual content (logos, title cards, full fades)
  remain unmatched by design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 07:12:20 +02:00
5 changed files with 598 additions and 18 deletions
+100
View File
@@ -0,0 +1,100 @@
# Cutter-Report — manuelles Nachschneiden
Stand: 2026-05-04. Frame-Rate: 23.976 fps. Source: BehindTheRedDoor_FTR_1080P_2398_Fixed.mp4 — Trailer: BehindTheRedDoor_Trailer_REFERENCE.mp4.
Diese Datei wird automatisch aus dem Match-Cache erzeugt. Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren.
## Wie diese Tabelle zu lesen ist
- **Beat**: Nummer im Referenz-Trailer.
- **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).
- **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.
- **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).
- **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.
- **Status**:
- `OK` — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.
- `?` — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.
- `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.
- **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.
## Status-Uebersicht
- **Beats gesamt**: 25
- **Automatisch gefunden**: 20 (5 davon bestaetigt)
- **Manuell zu setzen**: 5
## Beat-Tabelle
| Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |
|-----:|------------------|------------------|------:|------:|:------:|---------------------------|
| 0 | 00:00:00:00-00:00:03:00 | —-— | — | 0.000 | MAN. | logo animation assembling from distorted shapes with motion blur |
| 1 | 00:00:03:00-00:00:08:10 | 00:00:04:09-00:00:06:03 | 1 | 0.380 | ? | |
| 2 | 00:00:08:10-00:00:16:23 | —-— | — | 0.000 | MAN. | |
| 3 | 00:00:16:23-00:00:19:03 | 01:02:17:22-01:02:19:14 | 436 | 0.469 | ? | |
| 4 | 00:00:19:03-00:00:20:15 | 01:02:21:01-01:02:22:10 | 437 | 0.647 | ? | |
| 5 | 00:00:20:15-00:00:26:09 | 00:01:33:04-00:01:37:10 | 10 | 0.501 | ? | |
| 6 | 00:00:26:09-00:00:29:06 | 00:01:03:06-00:01:05:21 | 5 | 0.548 | ? | |
| 7 | 00:00:29:06-00:00:31:16 | 01:20:10:10-01:20:12:16 | 553 | 0.463 | ? | man appears to be engaged in conversation |
| 8 | 00:00:31:16-00:00:33:15 | 00:00:51:07-00:00:53:01 | 5 | 0.733 | OK | static or slow drifting |
| 9 | 00:00:33:15-00:00:36:18 | 01:20:28:20-01:20:31:17 | 557 | 0.529 | ? | speaking, transitioning from closed eyes to open mouth and focused gaze |
| 10 | 00:00:36:18-00:00:40:02 | 01:20:35:16-01:20:39:00 | 558 | 0.635 | ? | conversation |
| 11 | 00:00:40:02-00:00:42:03 | 01:20:40:18-01:20:42:18 | 559 | 0.502 | ? | static talking head with slight facial expression changes |
| 12 | 00:00:42:03-00:00:50:06 | 01:14:26:01-01:14:29:10 | 519 | 0.558 | ? | static profile shot transitioning to black/darkness |
| 13 | 00:00:50:06-00:00:53:20 | 00:43:20:02-00:43:23:10 | 308 | 0.468 | ? | static conversation; woman on right is standing and holding a cup |
| 14 | 00:00:53:20-00:00:57:02 | 00:43:24:09-00:43:27:04 | 309 | 0.444 | ? | static conversation, subject holding a white cup |
| 15 | 00:00:57:02-00:01:01:12 | 00:02:10:11-00:02:12:16 | 0 | 0.467 | ? | static conversation |
| 16 | 00:01:01:12-00:01:04:12 | 01:05:12:16-01:05:15:06 | 451 | 0.613 | ? | man reaches out and touches the red door with a small object |
| 17 | 00:01:04:12-00:01:09:03 | 01:31:22:10-01:31:24:09 | 623 | 0.684 | OK | Static intimacy transitioning to a spatial arrangement of figures |
| 18 | 00:01:09:03-00:01:10:18 | 00:09:13:12-00:09:14:19 | 75 | 0.668 | OK | Woman in foreground turns her head from profile to face the camera while speaking |
| 19 | 00:01:10:18-00:01:12:12 | 00:16:48:14-00:16:49:15 | 126 | 0.717 | OK | static conversation, subtle facial expression change |
| 20 | 00:01:12:12-00:01:15:13 | 01:28:04:17-01:28:05:14 | 613 | 0.663 | OK | man kisses woman's forehead, then they pull back slightly to face each other |
| 21 | 00:01:15:13-00:01:17:12 | —-— | — | 0.000 | MAN. | hand raised to mouth, slight facial movement |
| 22 | 00:01:17:12-00:01:19:22 | 01:03:05:16-01:03:07:10 | 442 | 0.545 | ? | |
| 23 | 00:01:19:22-00:01:25:13 | —-— | — | 0.000 | MAN. | |
| 24 | 00:01:25:13-00:01:32:07 | —-— | — | 0.000 | MAN. | |
## Beats die manuelle Aufmerksamkeit brauchen
### Manuell setzen (Status `MAN.`)
- **Beat 0** 00:00:00:00-00:00:03:00: logo animation assembling from distorted shapes with motion blur
- **Beat 2** 00:00:08:10-00:00:16:23: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
- **Beat 21** 00:01:15:13-00:01:17:12: hand raised to mouth, slight facial movement
- **Beat 23** 00:01:19:22-00:01:25:13: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
- **Beat 24** 00:01:25:13-00:01:32:07: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
### Vorlaeufig (Status `?`) — bitte sichten
| Beat | Score | Source In | Phase laut Vision |
|-----:|------:|-----------|--------------------|
| 1 | 0.380 | 00:00:04:09 | |
| 3 | 0.469 | 01:02:17:22 | |
| 4 | 0.647 | 01:02:21:01 | |
| 5 | 0.501 | 00:01:33:04 | |
| 6 | 0.548 | 00:01:03:06 | |
| 7 | 0.463 | 01:20:10:10 | man appears to be engaged in conversation |
| 9 | 0.529 | 01:20:28:20 | speaking, transitioning from closed eyes to open mouth and focused gaze |
| 10 | 0.635 | 01:20:35:16 | conversation |
| 11 | 0.502 | 01:20:40:18 | static talking head with slight facial expression changes |
| 12 | 0.558 | 01:14:26:01 | static profile shot transitioning to black/darkness |
| 13 | 0.468 | 00:43:20:02 | static conversation; woman on right is standing and holding a cup |
| 14 | 0.444 | 00:43:24:09 | static conversation, subject holding a white cup |
| 15 | 0.467 | 00:02:10:11 | static conversation |
| 16 | 0.613 | 01:05:12:16 | man reaches out and touches the red door with a small object |
| 22 | 0.545 | 01:03:05:16 | |
### Bestaetigt (Status `OK`) — kann uebernommen werden
| Beat | Score | Source In | Phase laut Vision |
|-----:|------:|-----------|--------------------|
| 8 | 0.733 | 00:00:51:07 | static or slow drifting |
| 17 | 0.684 | 01:31:22:10 | Static intimacy transitioning to a spatial arrangement of figures |
| 18 | 0.668 | 00:09:13:12 | Woman in foreground turns her head from profile to face the camera while speaking |
| 19 | 0.717 | 00:16:48:14 | static conversation, subtle facial expression change |
| 20 | 0.663 | 01:28:04:17 | man kisses woman's forehead, then they pull back slightly to face each other |
## Hinweise zur Pruefung
1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben.
2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen.
3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten.
+14
View File
@@ -88,6 +88,20 @@ Wenn das fehlschlägt:
existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz). existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist. 3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.
## Aktuelle Coverage (vor neuestem Lauf)
```
total beats: 25
matched: 20 (5 confirmed, 15 provisional)
unmatched: beats 0, 2, 21, 23, 24
```
Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
korrekt unmatched.
Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
## Offene Risiken / Bekannte Schwächen ## Offene Risiken / Bekannte Schwächen
- Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist - Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
+66 -15
View File
@@ -1,27 +1,63 @@
# AI Trailer Generator v2 # AI Trailer Generator v2
**Frame-accurate trailer reconstruction via pure Computer Vision** **Frame-genaues Nachbauen eines Trailers aus dem Quellfilm.**
> Gibt einen Reference Trailer und den dazugehörigen Quellfilm hinein — bekommt eine fertige FCPXML/EDL heraus, die den Trailer Frame-genau aus dem Quellfilm nachbaut. Du gibst zwei Videos rein — einen Referenz-Trailer und den dazugehörigen
Spielfilm — und bekommst eine fertige FCPXML/EDL für deinen Schnittplatz, die
den Trailer Beat für Beat aus dem Quellfilm nachbaut.
--- ---
## Das Kernprinzip ## Für den Cutter — was du wirklich brauchst
Standardmäßig kein LLM für visuelles Matching. Optional kann ein Vision-Layer Du musst dieses Tool **nicht selbst bedienen** und musst **kein Python können**.
gecachte 3-Frame-Beschreibungen als zusätzliche Suchanker liefern; der finale Was du bekommst sind zwei Dateien, mit denen du arbeitest:
Match bleibt aber CV-verifiziert.
| Phase | Was passiert | Technologie | 1. **`CUTTER_REPORT.md`** — die Tabelle für die manuelle Kontrolle und das
|-------|-------------|-------------| Nachschneiden. Pro Beat steht drin:
| **0 — Prep** | Reference Trailer analysieren & Beats extrahieren | PySceneDetect + OpenCV | - der Trailer-Zeitcode (h:mm:ss:ff),
| **1 — Global Scan**| Gesamten Quellfilm via FFmpeg-Stream (2 FPS) gegen alle Beats scannen | FFmpeg Pipe + Luma-Histogramm | - der vorgeschlagene Source-Zeitcode aus dem Spielfilm,
| **1b — Optional Vision Seeds** | Unsichere Top-K Szenen mit 3-Frame-Beschreibungen cachen | OpenAI-kompatibles Vision-LLM | - ein Status: `OK` (kann übernommen werden), `?` (bitte sichten) oder
| **2 — Refine** | Beste Treffer auf Frame-Ebene präzisieren | OpenCV `matchTemplate` | `MAN.` (kein Treffer, manuell setzen),
| **3 — Dramaturgie** | Narrative BeatType-Klassifikation aus Dialog-Text | OpenRouter LLM | - eine kurze Beschreibung, was im Trailer-Beat zu sehen ist (damit du
| **4 — Export** | Timeline → FCPXML 1.10 oder CMX 3600 EDL | xml.etree + eigener Timecode-Layer | die richtige Stelle im Source schneller findest).
2. **`output/*.fcpxml`** und **`output/*.edl`** — die fertige Timeline für
FCP / Premiere / Avid / Resolve. Beats mit Status `OK` sind dort schon
richtig gesetzt; `?` und `MAN.` musst du im NLE prüfen bzw. selbst setzen.
**Text-Safe Crop:** Obere 15% und untere 30% des Frames werden vor jedem Vergleich ausgeblendet, um Title Cards, Logos und Letterbox zu ignorieren. **Workflow-Empfehlung:**
1. Öffne `CUTTER_REPORT.md` und arbeite die Tabelle von oben nach unten ab.
2. Importiere die FCPXML/EDL ins NLE, lade Trailer und Spielfilm dazu.
3. Bei `OK`-Beats nur stichprobenartig sichten.
4. Bei `?`-Beats den Vorschauclip aus dem Report-HTML (siehe unten) prüfen
und im NLE den Source-In um wenige Frames vor/zurück verschieben, bis die
Bewegungsphase exakt zum Trailer passt.
5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
Beschreibung im Report sagt dir was du suchst.
Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
---
## Wie das Tool die Treffer findet (Kurzfassung)
| Phase | Was passiert |
|-------|--------------|
| **0** | Trailer in Beats zerlegen (PySceneDetect). |
| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. |
| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
**Text-Safe Crop:** Obere 15 % und untere 30 % jedes Frames werden vor dem
Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
nicht verfälschen.
**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
--- ---
@@ -310,6 +346,21 @@ beim Verbindungsaufbau. Schlägt die Vision-Verifikation während der finalen
Filter-/Repair-Stufe trotzdem dauerhaft fehl, wird der bisherige gecachte Filter-/Repair-Stufe trotzdem dauerhaft fehl, wird der bisherige gecachte
Treffer für diesen Beat behalten statt verworfen — ein Netzproblem darf keinen Treffer für diesen Beat behalten statt verworfen — ein Netzproblem darf keinen
schon korrekt gefundenen Match aus dem Cache löschen. schon korrekt gefundenen Match aus dem Cache löschen.
Die Phasen-Reparatur an gefundenen Treffern läuft nicht mehr nur in „langen"
Source-Szenen, sondern überall dort, wo die Szene mehr als nur das
Segment-Fenster trägt. Eine korrigierte Position wird übernommen, sobald sie
das Bildinhalt-Validate besteht UND nicht spürbar schlechter scort als das
Original (≤ 0.02 Verlust). Bereits bestätigte Treffer in eng zugeschnittenen
Szenen werden bewusst nicht angefasst, damit ein guter Match nicht durch eine
nominell gleichwertige Alternative ausgetauscht wird.
Beats, die nach dem CV-Lauf weder als Vollmatch noch als Segmentmatch landen,
durchlaufen anschließend eine Recovery-Stufe: Vibe-Check (Histogramm/pHash)
liefert Top-K Kandidatenszenen, die semantische Action-Window-Suche prüft
darin die Phase des sichtbaren Trailerbeat-Anteils, und der CV-Aligner setzt
den Inpoint frame-genau. Übernommen wird nur ein Kandidat, der dieselbe
Vision-Phasenvalidierung wie der Hauptpfad besteht. Beats ohne sichtbares
Bildmaterial (Logos, Titel-Karten, durchgehende Fades) werden gar nicht erst
gesucht — sie sind bewusst kein Match.
Lange Trailerbeats werden nicht mehr automatisch über ihre gesamte Beat-Länge Lange Trailerbeats werden nicht mehr automatisch über ihre gesamte Beat-Länge
gegen einen einzigen Source-Clip validiert. Sobald nach einem sichtbaren gegen einen einzigen Source-Clip validiert. Sobald nach einem sichtbaren
Source-Abschnitt eine anhaltende Schwarzblende oder Titel-/Credit-Insel beginnt, Source-Abschnitt eine anhaltende Schwarzblende oder Titel-/Credit-Insel beginnt,
+211 -3
View File
@@ -92,6 +92,22 @@ def _save_results(results: list, cfg: "AppConfig") -> None: # type: ignore[name
logging.getLogger(__name__).info("Match results cached → %s", p) logging.getLogger(__name__).info("Match results cached → %s", p)
def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-defined]
"""Re-render CUTTER_REPORT.md after each cache write so it stays in sync."""
try:
from scripts.generate_cutter_report import render_report
except Exception as exc:
logging.getLogger(__name__).warning("Cutter report regen skipped: %s", exc)
return
try:
project_root = cfg.paths.cache_dir.parent
out = project_root / "CUTTER_REPORT.md"
out.write_text(render_report(project_root), encoding="utf-8")
logging.getLogger(__name__).info("Cutter report regenerated → %s", out)
except Exception as exc:
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
def _load_results(cfg: "AppConfig") -> list: # type: ignore[name-defined] def _load_results(cfg: "AppConfig") -> list: # type: ignore[name-defined]
from src.core.models import MatchResult, MatchSegment from src.core.models import MatchResult, MatchSegment
p = _results_cache_path(cfg) p = _results_cache_path(cfg)
@@ -632,6 +648,171 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
return sorted(by_id.values(), key=lambda r: r.beat_id) return sorted(by_id.values(), key=lambda r: r.beat_id)
def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
"""Try a vision-led search for beats that ended up without a match.
For each unmatched beat that has scoreable visual content (i.e. not pure
fade/title-card material), this pass:
1. Asks the vibe-check (CV histogram + pHash) for the top-K candidate
scenes.
2. For each candidate, runs the semantic action-window search with the
beat's own description, prefering windows whose phase matches the
visible part of the beat.
3. Refines the in-point with the regular CV content/motion aligner.
4. Validates the resulting window with the vision phase check, exactly
like the main filter.
5. Adds the best validated candidate as a provisional MatchResult.
Confirmed and provisional matches both stay subject to the same thresholds
used elsewhere; this only adds matches that pass the same quality gates.
"""
if not cfg.vision.enabled or not beats:
return results
from dataclasses import replace
from src.cv.global_scan import align_in_point_by_content_and_motion, estimate_usable_source_duration
from src.cv.scene_indexer import build_scene_index
from src.cv.vibe_check import run_vibe_check
from src.core.models import MatchResult
from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
logger = logging.getLogger(__name__)
matched_ids = {r.beat_id for r in results}
unmatched = [b for b in beats if b.beat_id not in matched_ids]
if not unmatched:
return results
scenes = build_scene_index(cfg)
if not scenes:
return results
new_results = list(results)
for beat in unmatched:
try:
islands = _reference_scoreable_segments(beat, cfg)
except Exception:
islands = []
# Anchor selection: prefer the longest visible island; if none exists,
# fall back to the full beat. The latter handles dark / low-contrast
# close-ups that drop below the scoreable luma/contrast thresholds but
# are still semantically describable. The strict vision phase
# validation later in this pass keeps us from accepting pure title-card
# or logo material.
from dataclasses import replace as _replace
if islands:
anchor_start_s, anchor_end_s = max(islands, key=lambda iv: iv[1] - iv[0])
anchor_beat = _replace(
beat,
start_s=beat.start_s + anchor_start_s,
end_s=beat.start_s + anchor_end_s,
)
else:
anchor_beat = beat
try:
hits = run_vibe_check(
beat,
scenes,
top_k=max(cfg.cv.deep_scan.scene_seed_top_k, cfg.cv.vibe_check.top_k_candidates),
hist_method=cfg.cv.vibe_check.hist_compare_method,
phash_max_distance=64,
)
except Exception as exc:
logger.warning("Beat %d: recovery vibe-check failed (%s)", beat.beat_id, exc)
continue
scenes_by_id = {s.scene_id: s for s in scenes}
best = None # (score, scene, in_s, dur_s, reason)
seen = set()
for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
scene = scenes_by_id.get(hit.scene_id)
if scene is None or scene.scene_id in seen:
continue
seen.add(scene.scene_id)
try:
found = find_action_window_in_scene(anchor_beat, scene, cfg)
except Exception as exc:
logger.debug("Beat %d: action window failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
continue
if found is None:
continue
start_s, end_s, semantic_score, reason = found
window_s = max(3.0, min(8.0, (end_s - start_s) * 4.0))
try:
aligned_in_s, combined_score, content_score, motion_score = align_in_point_by_content_and_motion(
anchor_beat,
start_s,
cfg,
search_window_s=window_s,
)
except Exception as exc:
logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
continue
aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
try:
usable_duration_s, usable_score = estimate_usable_source_duration(anchor_beat, aligned_in_s, cfg)
except Exception:
usable_duration_s, usable_score = anchor_beat.duration_s, 0.0
usable_duration_s = max(0.0, min(anchor_beat.duration_s, usable_duration_s))
if usable_duration_s < max(0.32, anchor_beat.duration_s * 0.45):
usable_duration_s = anchor_beat.duration_s
try:
ok, verify_reason = validate_match_window_with_vision(
anchor_beat,
source_path=scene.source_path,
scene_id=scene.scene_id,
in_point_s=aligned_in_s,
out_point_s=aligned_in_s + usable_duration_s,
cfg=cfg,
)
except Exception as exc:
logger.debug("Beat %d: validate failed scene=%d (%s)", beat.beat_id, scene.scene_id, exc)
continue
if not ok:
continue
final_score = max(
combined_score,
min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
)
if final_score < cfg.cv.deep_scan.provisional_match_threshold:
continue
candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
if best is None or candidate[0] > best[0]:
best = candidate
if best is None:
continue
score, scene, aligned_in_s, usable_duration_s, repair_reason = best
logger.info(
"Beat %d: recovered via vision action search scene=%d in=%.3fs score=%.3f (%s)",
beat.beat_id,
scene.scene_id,
aligned_in_s,
score,
repair_reason,
)
new_results.append(MatchResult(
beat_id=beat.beat_id,
scene_id=scene.scene_id,
source_path=scene.source_path,
in_point_s=aligned_in_s,
out_point_s=aligned_in_s + usable_duration_s,
in_point_frame=int(aligned_in_s * cfg.export.edl_frame_rate),
match_score=score,
match_location=(0, 0),
is_confirmed=score >= cfg.cv.deep_scan.match_threshold,
segments=tuple(),
))
return sorted(new_results, key=lambda r: r.beat_id)
def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list: def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
"""Drop vision-enabled matches whose final action phase contradicts the beat.""" """Drop vision-enabled matches whose final action phase contradicts the beat."""
if not cfg.vision.enabled or not results: if not cfg.vision.enabled or not results:
@@ -785,7 +966,16 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
changed = False changed = False
for segment in result.segments: for segment in result.segments:
scene = scenes_by_id.get(segment.scene_id) scene = scenes_by_id.get(segment.scene_id)
if scene is None or scene.duration_s <= max(segment.duration_s * 1.6, 6.0): # Allow phase-realign whenever the scene has any meaningful
# slack beyond the segment, not only for "long" scenes.
# Short scenes don't need realigning because the segment
# essentially is the scene.
if scene is None or scene.duration_s <= segment.duration_s + 0.5:
new_segments.append(segment)
continue
# For already-confirmed segments, skip the realign to avoid
# destabilizing a strong original match.
if segment.is_confirmed and scene.duration_s <= max(segment.duration_s * 1.6, 6.0):
new_segments.append(segment) new_segments.append(segment)
continue continue
segment_beat = replace( segment_beat = replace(
@@ -801,6 +991,11 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
if abs(aligned_in_s - segment.in_point_s) <= 1.0 / cfg.export.edl_frame_rate: if abs(aligned_in_s - segment.in_point_s) <= 1.0 / cfg.export.edl_frame_rate:
new_segments.append(segment) new_segments.append(segment)
continue continue
# Don't commit a repair that scores meaningfully worse than
# the original; phase realign should improve, not regress.
if score < segment.match_score - 0.02:
new_segments.append(segment)
continue
changed = True changed = True
repair_reasons.append(repair_reason) repair_reasons.append(repair_reason)
new_segments.append(replace( new_segments.append(replace(
@@ -833,11 +1028,22 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
repaired = True repaired = True
else: else:
scene = scenes_by_id.get(result.scene_id) scene = scenes_by_id.get(result.scene_id)
if scene is not None and scene.duration_s > max(result.duration_s * 1.6, 6.0): wide_scene = (
scene is not None
and scene.duration_s > result.duration_s + 0.5
)
already_confirmed_in_tight_scene = (
result.is_confirmed
and scene is not None
and scene.duration_s <= max(result.duration_s * 1.6, 6.0)
)
if wide_scene and not already_confirmed_in_tight_scene:
repair = realign_window(beat, result.scene_id) repair = realign_window(beat, result.scene_id)
if repair is not None: if repair is not None:
repair_scene, aligned_in_s, usable_duration_s, score, repair_reason = repair repair_scene, aligned_in_s, usable_duration_s, score, repair_reason = repair
if abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate: moved = abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate
improved = score >= result.match_score - 0.02
if moved and improved:
logger.info( logger.info(
"Beat %d: realigned semantically valid long scene by motion/action window (%s)", "Beat %d: realigned semantically valid long scene by motion/action window (%s)",
result.beat_id, result.beat_id,
@@ -1271,6 +1477,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
) )
results = _attach_visual_segments(results, beats, cfg) results = _attach_visual_segments(results, beats, cfg)
results = _filter_semantically_invalid_vision_matches(results, beats, cfg) results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
results = _recover_unmatched_beats_via_vision(results, beats, cfg)
# A targeted one-beat match should improve the cache without deleting # A targeted one-beat match should improve the cache without deleting
# automatic matches for other beats. # automatic matches for other beats.
@@ -1283,6 +1490,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
results_to_save = results results_to_save = results
_save_results(results_to_save, cfg) _save_results(results_to_save, cfg)
_regenerate_cutter_report(cfg)
print(f"\n{len(results)} / {len(beats)} beats matched.") print(f"\n{len(results)} / {len(beats)} beats matched.")
for r in results: for r in results:
+207
View File
@@ -0,0 +1,207 @@
"""
scripts/generate_cutter_report.py — generate CUTTER_REPORT.md from current cache
Regenerates CUTTER_REPORT.md from .cache/match_results.json,
.cache/trailer_beats.json and .cache/vision_descriptions.json. The report is a
hand-off document for a video editor (Cutter) doing the manual recut: it lists,
per beat, the trailer position, the proposed source position in SMPTE
timecodes, the match score, and what the vision model saw in the trailer beat.
Usage (from project root):
python scripts/generate_cutter_report.py
Run this any time after `python cli.py match` to keep CUTTER_REPORT.md in sync
with the latest cache.
"""
from __future__ import annotations
import json
import re
import sys
from datetime import date
from pathlib import Path
def smpte(t: float | None, fps: int) -> str:
if t is None:
return "--:--:--:--"
total = int(round(t * fps))
h = total // (3600 * fps)
m = (total // (60 * fps)) % 60
s = (total // fps) % 60
f = total % fps
return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
def best_beat_description(items: dict, beat_id: int, start_s: float, end_s: float) -> str | None:
best, best_diff = None, 1e9
for key, value in items.items():
if not key.startswith(f"beat:{beat_id}:") or not isinstance(value, dict):
continue
try:
parts = key.split(":")
ks, ke = float(parts[2]), float(parts[3])
except (IndexError, ValueError):
continue
diff = abs(ks - start_s) + abs(ke - end_s)
if diff < best_diff:
best_diff = diff
best = value
return best.get("description", "") if best else None
def parse_field(desc: str | None, key: str) -> str:
if not desc:
return ""
match = re.search(rf'"{key}"\s*:\s*"([^"]+)"', desc)
return match.group(1) if match else ""
def render_report(project_root: Path) -> str:
sys.path.insert(0, str(project_root))
from src.core.config import load_config
cfg = load_config(project_root / "config.toml")
fps = int(round(cfg.export.edl_frame_rate))
cache = project_root / ".cache"
results = {r["beat_id"]: r for r in json.loads((cache / "match_results.json").read_text())}
beats = json.loads((cache / "trailer_beats.json").read_text())
vis_path = cache / "vision_descriptions.json"
vis_items = json.loads(vis_path.read_text())["items"] if vis_path.exists() else {}
lines: list[str] = []
lines.append("# Cutter-Report — manuelles Nachschneiden")
lines.append("")
lines.append(
f"Stand: {date.today().isoformat()}. Frame-Rate: {cfg.export.edl_frame_rate} fps. "
f"Source: {Path(cfg.paths.source_movie).name} — Trailer: {Path(cfg.paths.reference_trailer).name}."
)
lines.append("")
lines.append(
"Diese Datei wird automatisch aus dem Match-Cache erzeugt. "
"Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren."
)
lines.append("")
lines.append("## Wie diese Tabelle zu lesen ist")
lines.append("")
lines.append("- **Beat**: Nummer im Referenz-Trailer.")
lines.append("- **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).")
lines.append("- **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.")
lines.append("- **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).")
lines.append("- **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.")
lines.append("- **Status**:")
lines.append(" - `OK` — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.")
lines.append(" - `?` — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.")
lines.append(" - `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.")
lines.append("- **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.")
lines.append("")
matched = sum(1 for b in beats if b["beat_id"] in results)
confirmed = sum(1 for b in beats if b["beat_id"] in results and results[b["beat_id"]]["is_confirmed"])
lines.append("## Status-Uebersicht")
lines.append("")
lines.append(f"- **Beats gesamt**: {len(beats)}")
lines.append(f"- **Automatisch gefunden**: {matched} ({confirmed} davon bestaetigt)")
lines.append(f"- **Manuell zu setzen**: {len(beats) - matched}")
lines.append("")
lines.append("## Beat-Tabelle")
lines.append("")
lines.append("| Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |")
lines.append("|-----:|------------------|------------------|------:|------:|:------:|---------------------------|")
def status_for(rec: dict | None) -> str:
if rec is None:
return "MAN."
return "OK" if rec.get("is_confirmed") else "?"
for beat in beats:
bid = beat["beat_id"]
rec = results.get(bid)
ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
if rec is not None:
si, so = smpte(rec["in_point_s"], fps), smpte(rec["out_point_s"], fps)
scn = rec["scene_id"]
sc = rec["match_score"]
else:
si = so = ""
scn = ""
sc = 0.0
desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
phase = (parse_field(desc, "action_phase") or parse_field(desc, "subject"))[:90]
lines.append(f"| {bid:>4} | {ti}-{to} | {si}-{so} | {scn} | {sc:.3f} | {status_for(rec)} | {phase} |")
lines.append("")
lines.append("## Beats die manuelle Aufmerksamkeit brauchen")
lines.append("")
lines.append("### Manuell setzen (Status `MAN.`)")
lines.append("")
for beat in beats:
bid = beat["beat_id"]
if bid in results:
continue
ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
phase = parse_field(desc, "action_phase")
note = phase or "keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo"
lines.append(f"- **Beat {bid}** {ti}-{to}: {note}")
lines.append("")
lines.append("### Vorlaeufig (Status `?`) — bitte sichten")
lines.append("")
lines.append("| Beat | Score | Source In | Phase laut Vision |")
lines.append("|-----:|------:|-----------|--------------------|")
for beat in beats:
bid = beat["beat_id"]
rec = results.get(bid)
if rec is None or rec.get("is_confirmed"):
continue
desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
phase = parse_field(desc, "action_phase")
lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
lines.append("")
lines.append("### Bestaetigt (Status `OK`) — kann uebernommen werden")
lines.append("")
lines.append("| Beat | Score | Source In | Phase laut Vision |")
lines.append("|-----:|------:|-----------|--------------------|")
for beat in beats:
bid = beat["beat_id"]
rec = results.get(bid)
if rec is None or not rec.get("is_confirmed"):
continue
desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
phase = parse_field(desc, "action_phase")
lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
lines.append("")
lines.append("## Hinweise zur Pruefung")
lines.append("")
lines.append(
"1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. "
"Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben."
)
lines.append(
"2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), "
"enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen."
)
lines.append(
"3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten."
)
lines.append("")
return "\n".join(lines)
def main() -> int:
here = Path(__file__).resolve().parent
project_root = here.parent
out = project_root / "CUTTER_REPORT.md"
out.write_text(render_report(project_root), encoding="utf-8")
print(f"Wrote {out}")
return 0
if __name__ == "__main__":
raise SystemExit(main())