Add cutter report and auto-regen on each match

- New CUTTER_REPORT.md: per-beat hand-off table for the video editor doing the manual recut. Per beat: trailer SMPTE in/out, source SMPTE in/out, scene id, score, status (OK / ? / MAN.), and a one-line phase description from the cached vision text. - New scripts/generate_cutter_report.py: pure renderer that reads the current cache (match_results.json + trailer_beats.json + optional vision_descriptions.json) and writes CUTTER_REPORT.md. No side effects on the cache. - cli.py: after every successful match the cutter report is regenerated automatically (best-effort; failures are logged and do not abort). - README.md: new top-section "Fuer den Cutter" describing exactly what the editor needs (which two files to look at, how the status flag works, the recommended NLE workflow). The technical algorithm description follows below. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Broaden phase realign and add unmatched-beat recovery
2026-05-04 13:09:16 +02:00 · 2026-05-04 07:12:20 +02:00
5 changed files with 598 additions and 18 deletions
@@ -0,0 +1,100 @@
 # Cutter-Report — manuelles Nachschneiden
 Stand: 2026-05-04. Frame-Rate: 23.976 fps. Source: BehindTheRedDoor_FTR_1080P_2398_Fixed.mp4 — Trailer: BehindTheRedDoor_Trailer_REFERENCE.mp4.
 Diese Datei wird automatisch aus dem Match-Cache erzeugt. Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren.
 ## Wie diese Tabelle zu lesen ist
 - **Beat**: Nummer im Referenz-Trailer.
 - **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).
 - **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.
 - **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).
 - **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.
 - **Status**:
    - `OK`   — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.
    - `?`    — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.
    - `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.
 - **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.
 ## Status-Uebersicht
 - **Beats gesamt**: 25
 - **Automatisch gefunden**: 20 (5 davon bestaetigt)
 - **Manuell zu setzen**: 5
 ## Beat-Tabelle
 | Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |
 |-----:|------------------|------------------|------:|------:|:------:|---------------------------|
 |    0 | 00:00:00:00-00:00:03:00 | —-— | — | 0.000 | MAN. | logo animation assembling from distorted shapes with motion blur |
 |    1 | 00:00:03:00-00:00:08:10 | 00:00:04:09-00:00:06:03 | 1 | 0.380 | ? |  |
 |    2 | 00:00:08:10-00:00:16:23 | —-— | — | 0.000 | MAN. |  |
 |    3 | 00:00:16:23-00:00:19:03 | 01:02:17:22-01:02:19:14 | 436 | 0.469 | ? |  |
 |    4 | 00:00:19:03-00:00:20:15 | 01:02:21:01-01:02:22:10 | 437 | 0.647 | ? |  |
 |    5 | 00:00:20:15-00:00:26:09 | 00:01:33:04-00:01:37:10 | 10 | 0.501 | ? |  |
 |    6 | 00:00:26:09-00:00:29:06 | 00:01:03:06-00:01:05:21 | 5 | 0.548 | ? |  |
 |    7 | 00:00:29:06-00:00:31:16 | 01:20:10:10-01:20:12:16 | 553 | 0.463 | ? | man appears to be engaged in conversation |
 |    8 | 00:00:31:16-00:00:33:15 | 00:00:51:07-00:00:53:01 | 5 | 0.733 | OK | static or slow drifting |
 |    9 | 00:00:33:15-00:00:36:18 | 01:20:28:20-01:20:31:17 | 557 | 0.529 | ? | speaking, transitioning from closed eyes to open mouth and focused gaze |
 |   10 | 00:00:36:18-00:00:40:02 | 01:20:35:16-01:20:39:00 | 558 | 0.635 | ? | conversation |
 |   11 | 00:00:40:02-00:00:42:03 | 01:20:40:18-01:20:42:18 | 559 | 0.502 | ? | static talking head with slight facial expression changes |
 |   12 | 00:00:42:03-00:00:50:06 | 01:14:26:01-01:14:29:10 | 519 | 0.558 | ? | static profile shot transitioning to black/darkness |
 |   13 | 00:00:50:06-00:00:53:20 | 00:43:20:02-00:43:23:10 | 308 | 0.468 | ? | static conversation; woman on right is standing and holding a cup |
 |   14 | 00:00:53:20-00:00:57:02 | 00:43:24:09-00:43:27:04 | 309 | 0.444 | ? | static conversation, subject holding a white cup |
 |   15 | 00:00:57:02-00:01:01:12 | 00:02:10:11-00:02:12:16 | 0 | 0.467 | ? | static conversation |
 |   16 | 00:01:01:12-00:01:04:12 | 01:05:12:16-01:05:15:06 | 451 | 0.613 | ? | man reaches out and touches the red door with a small object |
 |   17 | 00:01:04:12-00:01:09:03 | 01:31:22:10-01:31:24:09 | 623 | 0.684 | OK | Static intimacy transitioning to a spatial arrangement of figures |
 |   18 | 00:01:09:03-00:01:10:18 | 00:09:13:12-00:09:14:19 | 75 | 0.668 | OK | Woman in foreground turns her head from profile to face the camera while speaking |
 |   19 | 00:01:10:18-00:01:12:12 | 00:16:48:14-00:16:49:15 | 126 | 0.717 | OK | static conversation, subtle facial expression change |
 |   20 | 00:01:12:12-00:01:15:13 | 01:28:04:17-01:28:05:14 | 613 | 0.663 | OK | man kisses woman's forehead, then they pull back slightly to face each other |
 |   21 | 00:01:15:13-00:01:17:12 | —-— | — | 0.000 | MAN. | hand raised to mouth, slight facial movement |
 |   22 | 00:01:17:12-00:01:19:22 | 01:03:05:16-01:03:07:10 | 442 | 0.545 | ? |  |
 |   23 | 00:01:19:22-00:01:25:13 | —-— | — | 0.000 | MAN. |  |
 |   24 | 00:01:25:13-00:01:32:07 | —-— | — | 0.000 | MAN. |  |
 ## Beats die manuelle Aufmerksamkeit brauchen
 ### Manuell setzen (Status `MAN.`)
 - **Beat 0** 00:00:00:00-00:00:03:00: logo animation assembling from distorted shapes with motion blur
 - **Beat 2** 00:00:08:10-00:00:16:23: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
 - **Beat 21** 00:01:15:13-00:01:17:12: hand raised to mouth, slight facial movement
 - **Beat 23** 00:01:19:22-00:01:25:13: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
 - **Beat 24** 00:01:25:13-00:01:32:07: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
 ### Vorlaeufig (Status `?`) — bitte sichten
 | Beat | Score | Source In | Phase laut Vision |
 |-----:|------:|-----------|--------------------|
 |    1 | 0.380 | 00:00:04:09 |  |
 |    3 | 0.469 | 01:02:17:22 |  |
 |    4 | 0.647 | 01:02:21:01 |  |
 |    5 | 0.501 | 00:01:33:04 |  |
 |    6 | 0.548 | 00:01:03:06 |  |
 |    7 | 0.463 | 01:20:10:10 | man appears to be engaged in conversation |
 |    9 | 0.529 | 01:20:28:20 | speaking, transitioning from closed eyes to open mouth and focused gaze |
 |   10 | 0.635 | 01:20:35:16 | conversation |
 |   11 | 0.502 | 01:20:40:18 | static talking head with slight facial expression changes |
 |   12 | 0.558 | 01:14:26:01 | static profile shot transitioning to black/darkness |
 |   13 | 0.468 | 00:43:20:02 | static conversation; woman on right is standing and holding a cup |
 |   14 | 0.444 | 00:43:24:09 | static conversation, subject holding a white cup |
 |   15 | 0.467 | 00:02:10:11 | static conversation |
 |   16 | 0.613 | 01:05:12:16 | man reaches out and touches the red door with a small object |
 |   22 | 0.545 | 01:03:05:16 |  |
 ### Bestaetigt (Status `OK`) — kann uebernommen werden
 | Beat | Score | Source In | Phase laut Vision |
 |-----:|------:|-----------|--------------------|
 |    8 | 0.733 | 00:00:51:07 | static or slow drifting |
 |   17 | 0.684 | 01:31:22:10 | Static intimacy transitioning to a spatial arrangement of figures |
 |   18 | 0.668 | 00:09:13:12 | Woman in foreground turns her head from profile to face the camera while speaking |
 |   19 | 0.717 | 00:16:48:14 | static conversation, subtle facial expression change |
 |   20 | 0.663 | 01:28:04:17 | man kisses woman's forehead, then they pull back slightly to face each other |
 ## Hinweise zur Pruefung
 1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben.
 2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen.
 3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten.
@@ -88,6 +88,20 @@ Wenn das fehlschlägt:
   existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
 3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.
 ## Aktuelle Coverage (vor neuestem Lauf)
 ```
 total beats: 25
 matched:     20 (5 confirmed, 15 provisional)
 unmatched:   beats 0, 2, 21, 23, 24
 ```
 Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
 Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
 korrekt unmatched.
 Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
 Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
 ## Offene Risiken / Bekannte Schwächen
 - Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
@@ -1,27 +1,63 @@
 # AI Trailer Generator v2
-**Frame-accurate trailer reconstruction via pure Computer Vision**
+**Frame-genaues Nachbauen eines Trailers aus dem Quellfilm.**
-> Gibt einen Reference Trailer und den dazugehörigen Quellfilm hinein — bekommt eine fertige FCPXML/EDL heraus, die den Trailer Frame-genau aus dem Quellfilm nachbaut.
+Du gibst zwei Videos rein — einen Referenz-Trailer und den dazugehörigen
 Spielfilm — und bekommst eine fertige FCPXML/EDL für deinen Schnittplatz, die
 den Trailer Beat für Beat aus dem Quellfilm nachbaut.
 ---
-## Das Kernprinzip
+## Für den Cutter — was du wirklich brauchst
-Standardmäßig kein LLM für visuelles Matching. Optional kann ein Vision-Layer
+Du musst dieses Tool **nicht selbst bedienen** und musst **kein Python können**.
-gecachte 3-Frame-Beschreibungen als zusätzliche Suchanker liefern; der finale
+Was du bekommst sind zwei Dateien, mit denen du arbeitest:
 Match bleibt aber CV-verifiziert.
-| Phase | Was passiert | Technologie |
+1. **`CUTTER_REPORT.md`** — die Tabelle für die manuelle Kontrolle und das
-|-------|-------------|-------------|
+   Nachschneiden. Pro Beat steht drin:
-| **0 — Prep** | Reference Trailer analysieren & Beats extrahieren | PySceneDetect + OpenCV |
+   - der Trailer-Zeitcode (h:mm:ss:ff),
-| **1 — Global Scan**| Gesamten Quellfilm via FFmpeg-Stream (2 FPS) gegen alle Beats scannen | FFmpeg Pipe + Luma-Histogramm |
+   - der vorgeschlagene Source-Zeitcode aus dem Spielfilm,
-| **1b — Optional Vision Seeds** | Unsichere Top-K Szenen mit 3-Frame-Beschreibungen cachen | OpenAI-kompatibles Vision-LLM |
+   - ein Status: `OK` (kann übernommen werden), `?` (bitte sichten) oder
-| **2 — Refine** | Beste Treffer auf Frame-Ebene präzisieren | OpenCV `matchTemplate` |
+     `MAN.` (kein Treffer, manuell setzen),
-| **3 — Dramaturgie** | Narrative BeatType-Klassifikation aus Dialog-Text | OpenRouter LLM |
+   - eine kurze Beschreibung, was im Trailer-Beat zu sehen ist (damit du
-| **4 — Export** | Timeline → FCPXML 1.10 oder CMX 3600 EDL | xml.etree + eigener Timecode-Layer |
+     die richtige Stelle im Source schneller findest).
 2. **`output/*.fcpxml`** und **`output/*.edl`** — die fertige Timeline für
   FCP / Premiere / Avid / Resolve. Beats mit Status `OK` sind dort schon
   richtig gesetzt; `?` und `MAN.` musst du im NLE prüfen bzw. selbst setzen.
-**Text-Safe Crop:** Obere 15% und untere 30% des Frames werden vor jedem Vergleich ausgeblendet, um Title Cards, Logos und Letterbox zu ignorieren.
+**Workflow-Empfehlung:**
 1. Öffne `CUTTER_REPORT.md` und arbeite die Tabelle von oben nach unten ab.
 2. Importiere die FCPXML/EDL ins NLE, lade Trailer und Spielfilm dazu.
 3. Bei `OK`-Beats nur stichprobenartig sichten.
 4. Bei `?`-Beats den Vorschauclip aus dem Report-HTML (siehe unten) prüfen
   und im NLE den Source-In um wenige Frames vor/zurück verschieben, bis die
   Bewegungsphase exakt zum Trailer passt.
 5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
   Beschreibung im Report sagt dir was du suchst.
 Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
 ---
 ## Wie das Tool die Treffer findet (Kurzfassung)
 | Phase | Was passiert |
 |-------|--------------|
 | **0** | Trailer in Beats zerlegen (PySceneDetect). |
 | **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
 | **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
 | **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
 | **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. |
 | **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
 | **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
 **Text-Safe Crop:** Obere 15 % und untere 30 % jedes Frames werden vor dem
 Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
 nicht verfälschen.
 **Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
 CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
 ---
@@ -310,6 +346,21 @@ beim Verbindungsaufbau. Schlägt die Vision-Verifikation während der finalen
 Filter-/Repair-Stufe trotzdem dauerhaft fehl, wird der bisherige gecachte
 Treffer für diesen Beat behalten statt verworfen — ein Netzproblem darf keinen
 schon korrekt gefundenen Match aus dem Cache löschen.
 Die Phasen-Reparatur an gefundenen Treffern läuft nicht mehr nur in „langen"
 Source-Szenen, sondern überall dort, wo die Szene mehr als nur das
 Segment-Fenster trägt. Eine korrigierte Position wird übernommen, sobald sie
 das Bildinhalt-Validate besteht UND nicht spürbar schlechter scort als das
 Original (≤ 0.02 Verlust). Bereits bestätigte Treffer in eng zugeschnittenen
 Szenen werden bewusst nicht angefasst, damit ein guter Match nicht durch eine
 nominell gleichwertige Alternative ausgetauscht wird.
 Beats, die nach dem CV-Lauf weder als Vollmatch noch als Segmentmatch landen,
 durchlaufen anschließend eine Recovery-Stufe: Vibe-Check (Histogramm/pHash)
 liefert Top-K Kandidatenszenen, die semantische Action-Window-Suche prüft
 darin die Phase des sichtbaren Trailerbeat-Anteils, und der CV-Aligner setzt
 den Inpoint frame-genau. Übernommen wird nur ein Kandidat, der dieselbe
 Vision-Phasenvalidierung wie der Hauptpfad besteht. Beats ohne sichtbares
 Bildmaterial (Logos, Titel-Karten, durchgehende Fades) werden gar nicht erst
 gesucht — sie sind bewusst kein Match.
 Lange Trailerbeats werden nicht mehr automatisch über ihre gesamte Beat-Länge
 gegen einen einzigen Source-Clip validiert. Sobald nach einem sichtbaren
 Source-Abschnitt eine anhaltende Schwarzblende oder Titel-/Credit-Insel beginnt,
@@ -92,6 +92,22 @@ def _save_results(results: list, cfg: "AppConfig") -> None:  # type: ignore[name
    logging.getLogger(__name__).info("Match results cached → %s", p)
 def _regenerate_cutter_report(cfg: "AppConfig") -> None:  # type: ignore[name-defined]
    """Re-render CUTTER_REPORT.md after each cache write so it stays in sync."""
    try:
        from scripts.generate_cutter_report import render_report
    except Exception as exc:
        logging.getLogger(__name__).warning("Cutter report regen skipped: %s", exc)
        return
    try:
        project_root = cfg.paths.cache_dir.parent
        out = project_root / "CUTTER_REPORT.md"
        out.write_text(render_report(project_root), encoding="utf-8")
        logging.getLogger(__name__).info("Cutter report regenerated → %s", out)
    except Exception as exc:
        logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
 def _load_results(cfg: "AppConfig") -> list:  # type: ignore[name-defined]
    from src.core.models import MatchResult, MatchSegment
    p = _results_cache_path(cfg)
@@ -632,6 +648,171 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
    return sorted(by_id.values(), key=lambda r: r.beat_id)
 def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
    """Try a vision-led search for beats that ended up without a match.
    For each unmatched beat that has scoreable visual content (i.e. not pure
    fade/title-card material), this pass:
      1. Asks the vibe-check (CV histogram + pHash) for the top-K candidate
         scenes.
      2. For each candidate, runs the semantic action-window search with the
         beat's own description, prefering windows whose phase matches the
         visible part of the beat.
      3. Refines the in-point with the regular CV content/motion aligner.
      4. Validates the resulting window with the vision phase check, exactly
         like the main filter.
      5. Adds the best validated candidate as a provisional MatchResult.
    Confirmed and provisional matches both stay subject to the same thresholds
    used elsewhere; this only adds matches that pass the same quality gates.
    """
    if not cfg.vision.enabled or not beats:
        return results
    from dataclasses import replace
    from src.cv.global_scan import align_in_point_by_content_and_motion, estimate_usable_source_duration
    from src.cv.scene_indexer import build_scene_index
    from src.cv.vibe_check import run_vibe_check
    from src.core.models import MatchResult
    from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
    logger = logging.getLogger(__name__)
    matched_ids = {r.beat_id for r in results}
    unmatched = [b for b in beats if b.beat_id not in matched_ids]
    if not unmatched:
        return results
    scenes = build_scene_index(cfg)
    if not scenes:
        return results
    new_results = list(results)
    for beat in unmatched:
        try:
            islands = _reference_scoreable_segments(beat, cfg)
        except Exception:
            islands = []
        # Anchor selection: prefer the longest visible island; if none exists,
        # fall back to the full beat. The latter handles dark / low-contrast
        # close-ups that drop below the scoreable luma/contrast thresholds but
        # are still semantically describable. The strict vision phase
        # validation later in this pass keeps us from accepting pure title-card
        # or logo material.
        from dataclasses import replace as _replace
        if islands:
            anchor_start_s, anchor_end_s = max(islands, key=lambda iv: iv[1] - iv[0])
            anchor_beat = _replace(
                beat,
                start_s=beat.start_s + anchor_start_s,
                end_s=beat.start_s + anchor_end_s,
            )
        else:
            anchor_beat = beat
        try:
            hits = run_vibe_check(
                beat,
                scenes,
                top_k=max(cfg.cv.deep_scan.scene_seed_top_k, cfg.cv.vibe_check.top_k_candidates),
                hist_method=cfg.cv.vibe_check.hist_compare_method,
                phash_max_distance=64,
            )
        except Exception as exc:
            logger.warning("Beat %d: recovery vibe-check failed (%s)", beat.beat_id, exc)
            continue
        scenes_by_id = {s.scene_id: s for s in scenes}
        best = None  # (score, scene, in_s, dur_s, reason)
        seen = set()
        for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
            scene = scenes_by_id.get(hit.scene_id)
            if scene is None or scene.scene_id in seen:
                continue
            seen.add(scene.scene_id)
            try:
                found = find_action_window_in_scene(anchor_beat, scene, cfg)
            except Exception as exc:
                logger.debug("Beat %d: action window failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
                continue
            if found is None:
                continue
            start_s, end_s, semantic_score, reason = found
            window_s = max(3.0, min(8.0, (end_s - start_s) * 4.0))
            try:
                aligned_in_s, combined_score, content_score, motion_score = align_in_point_by_content_and_motion(
                    anchor_beat,
                    start_s,
                    cfg,
                    search_window_s=window_s,
                )
            except Exception as exc:
                logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
                continue
            aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
            try:
                usable_duration_s, usable_score = estimate_usable_source_duration(anchor_beat, aligned_in_s, cfg)
            except Exception:
                usable_duration_s, usable_score = anchor_beat.duration_s, 0.0
            usable_duration_s = max(0.0, min(anchor_beat.duration_s, usable_duration_s))
            if usable_duration_s < max(0.32, anchor_beat.duration_s * 0.45):
                usable_duration_s = anchor_beat.duration_s
            try:
                ok, verify_reason = validate_match_window_with_vision(
                    anchor_beat,
                    source_path=scene.source_path,
                    scene_id=scene.scene_id,
                    in_point_s=aligned_in_s,
                    out_point_s=aligned_in_s + usable_duration_s,
                    cfg=cfg,
                )
            except Exception as exc:
                logger.debug("Beat %d: validate failed scene=%d (%s)", beat.beat_id, scene.scene_id, exc)
                continue
            if not ok:
                continue
            final_score = max(
                combined_score,
                min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
            )
            if final_score < cfg.cv.deep_scan.provisional_match_threshold:
                continue
            candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
            if best is None or candidate[0] > best[0]:
                best = candidate
        if best is None:
            continue
        score, scene, aligned_in_s, usable_duration_s, repair_reason = best
        logger.info(
            "Beat %d: recovered via vision action search scene=%d in=%.3fs score=%.3f (%s)",
            beat.beat_id,
            scene.scene_id,
            aligned_in_s,
            score,
            repair_reason,
        )
        new_results.append(MatchResult(
            beat_id=beat.beat_id,
            scene_id=scene.scene_id,
            source_path=scene.source_path,
            in_point_s=aligned_in_s,
            out_point_s=aligned_in_s + usable_duration_s,
            in_point_frame=int(aligned_in_s * cfg.export.edl_frame_rate),
            match_score=score,
            match_location=(0, 0),
            is_confirmed=score >= cfg.cv.deep_scan.match_threshold,
            segments=tuple(),
        ))
    return sorted(new_results, key=lambda r: r.beat_id)
 def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
    """Drop vision-enabled matches whose final action phase contradicts the beat."""
    if not cfg.vision.enabled or not results:
@@ -785,7 +966,16 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                changed = False
                for segment in result.segments:
                    scene = scenes_by_id.get(segment.scene_id)
-                    if scene is None or scene.duration_s <= max(segment.duration_s * 1.6, 6.0):
+                    # Allow phase-realign whenever the scene has any meaningful
                    # slack beyond the segment, not only for "long" scenes.
                    # Short scenes don't need realigning because the segment
                    # essentially is the scene.
                    if scene is None or scene.duration_s <= segment.duration_s + 0.5:
                        new_segments.append(segment)
                        continue
                    # For already-confirmed segments, skip the realign to avoid
                    # destabilizing a strong original match.
                    if segment.is_confirmed and scene.duration_s <= max(segment.duration_s * 1.6, 6.0):
                        new_segments.append(segment)
                        continue
                    segment_beat = replace(
@@ -801,6 +991,11 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                    if abs(aligned_in_s - segment.in_point_s) <= 1.0 / cfg.export.edl_frame_rate:
                        new_segments.append(segment)
                        continue
                    # Don't commit a repair that scores meaningfully worse than
                    # the original; phase realign should improve, not regress.
                    if score < segment.match_score - 0.02:
                        new_segments.append(segment)
                        continue
                    changed = True
                    repair_reasons.append(repair_reason)
                    new_segments.append(replace(
@@ -833,11 +1028,22 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                    repaired = True
            else:
                scene = scenes_by_id.get(result.scene_id)
-                if scene is not None and scene.duration_s > max(result.duration_s * 1.6, 6.0):
+                wide_scene = (
                    scene is not None
                    and scene.duration_s > result.duration_s + 0.5
                )
                already_confirmed_in_tight_scene = (
                    result.is_confirmed
                    and scene is not None
                    and scene.duration_s <= max(result.duration_s * 1.6, 6.0)
                )
                if wide_scene and not already_confirmed_in_tight_scene:
                    repair = realign_window(beat, result.scene_id)
                    if repair is not None:
                        repair_scene, aligned_in_s, usable_duration_s, score, repair_reason = repair
-                        if abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate:
+                        moved = abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate
                        improved = score >= result.match_score - 0.02
                        if moved and improved:
                            logger.info(
                                "Beat %d: realigned semantically valid long scene by motion/action window (%s)",
                                result.beat_id,
@@ -1271,6 +1477,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
    )
    results = _attach_visual_segments(results, beats, cfg)
    results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
    results = _recover_unmatched_beats_via_vision(results, beats, cfg)
    # A targeted one-beat match should improve the cache without deleting
    # automatic matches for other beats.
@@ -1283,6 +1490,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
        results_to_save = results
    _save_results(results_to_save, cfg)
    _regenerate_cutter_report(cfg)
    print(f"\n✅  {len(results)} / {len(beats)} beats matched.")
    for r in results:
@@ -0,0 +1,207 @@
 """
 scripts/generate_cutter_report.py — generate CUTTER_REPORT.md from current cache
 Regenerates CUTTER_REPORT.md from .cache/match_results.json,
 .cache/trailer_beats.json and .cache/vision_descriptions.json. The report is a
 hand-off document for a video editor (Cutter) doing the manual recut: it lists,
 per beat, the trailer position, the proposed source position in SMPTE
 timecodes, the match score, and what the vision model saw in the trailer beat.
 Usage (from project root):
    python scripts/generate_cutter_report.py
 Run this any time after `python cli.py match` to keep CUTTER_REPORT.md in sync
 with the latest cache.
 """
 from __future__ import annotations
 import json
 import re
 import sys
 from datetime import date
 from pathlib import Path
 def smpte(t: float | None, fps: int) -> str:
    if t is None:
        return "--:--:--:--"
    total = int(round(t * fps))
    h = total // (3600 * fps)
    m = (total // (60 * fps)) % 60
    s = (total // fps) % 60
    f = total % fps
    return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
 def best_beat_description(items: dict, beat_id: int, start_s: float, end_s: float) -> str | None:
    best, best_diff = None, 1e9
    for key, value in items.items():
        if not key.startswith(f"beat:{beat_id}:") or not isinstance(value, dict):
            continue
        try:
            parts = key.split(":")
            ks, ke = float(parts[2]), float(parts[3])
        except (IndexError, ValueError):
            continue
        diff = abs(ks - start_s) + abs(ke - end_s)
        if diff < best_diff:
            best_diff = diff
            best = value
    return best.get("description", "") if best else None
 def parse_field(desc: str | None, key: str) -> str:
    if not desc:
        return ""
    match = re.search(rf'"{key}"\s*:\s*"([^"]+)"', desc)
    return match.group(1) if match else ""
 def render_report(project_root: Path) -> str:
    sys.path.insert(0, str(project_root))
    from src.core.config import load_config
    cfg = load_config(project_root / "config.toml")
    fps = int(round(cfg.export.edl_frame_rate))
    cache = project_root / ".cache"
    results = {r["beat_id"]: r for r in json.loads((cache / "match_results.json").read_text())}
    beats = json.loads((cache / "trailer_beats.json").read_text())
    vis_path = cache / "vision_descriptions.json"
    vis_items = json.loads(vis_path.read_text())["items"] if vis_path.exists() else {}
    lines: list[str] = []
    lines.append("# Cutter-Report — manuelles Nachschneiden")
    lines.append("")
    lines.append(
        f"Stand: {date.today().isoformat()}. Frame-Rate: {cfg.export.edl_frame_rate} fps. "
        f"Source: {Path(cfg.paths.source_movie).name} — Trailer: {Path(cfg.paths.reference_trailer).name}."
    )
    lines.append("")
    lines.append(
        "Diese Datei wird automatisch aus dem Match-Cache erzeugt. "
        "Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren."
    )
    lines.append("")
    lines.append("## Wie diese Tabelle zu lesen ist")
    lines.append("")
    lines.append("- **Beat**: Nummer im Referenz-Trailer.")
    lines.append("- **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).")
    lines.append("- **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.")
    lines.append("- **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).")
    lines.append("- **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.")
    lines.append("- **Status**:")
    lines.append("    - `OK`   — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.")
    lines.append("    - `?`    — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.")
    lines.append("    - `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.")
    lines.append("- **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.")
    lines.append("")
    matched = sum(1 for b in beats if b["beat_id"] in results)
    confirmed = sum(1 for b in beats if b["beat_id"] in results and results[b["beat_id"]]["is_confirmed"])
    lines.append("## Status-Uebersicht")
    lines.append("")
    lines.append(f"- **Beats gesamt**: {len(beats)}")
    lines.append(f"- **Automatisch gefunden**: {matched} ({confirmed} davon bestaetigt)")
    lines.append(f"- **Manuell zu setzen**: {len(beats) - matched}")
    lines.append("")
    lines.append("## Beat-Tabelle")
    lines.append("")
    lines.append("| Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |")
    lines.append("|-----:|------------------|------------------|------:|------:|:------:|---------------------------|")
    def status_for(rec: dict | None) -> str:
        if rec is None:
            return "MAN."
        return "OK" if rec.get("is_confirmed") else "?"
    for beat in beats:
        bid = beat["beat_id"]
        rec = results.get(bid)
        ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
        if rec is not None:
            si, so = smpte(rec["in_point_s"], fps), smpte(rec["out_point_s"], fps)
            scn = rec["scene_id"]
            sc = rec["match_score"]
        else:
            si = so = "—"
            scn = "—"
            sc = 0.0
        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
        phase = (parse_field(desc, "action_phase") or parse_field(desc, "subject"))[:90]
        lines.append(f"| {bid:>4} | {ti}-{to} | {si}-{so} | {scn} | {sc:.3f} | {status_for(rec)} | {phase} |")
    lines.append("")
    lines.append("## Beats die manuelle Aufmerksamkeit brauchen")
    lines.append("")
    lines.append("### Manuell setzen (Status `MAN.`)")
    lines.append("")
    for beat in beats:
        bid = beat["beat_id"]
        if bid in results:
            continue
        ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
        phase = parse_field(desc, "action_phase")
        note = phase or "keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo"
        lines.append(f"- **Beat {bid}** {ti}-{to}: {note}")
    lines.append("")
    lines.append("### Vorlaeufig (Status `?`) — bitte sichten")
    lines.append("")
    lines.append("| Beat | Score | Source In | Phase laut Vision |")
    lines.append("|-----:|------:|-----------|--------------------|")
    for beat in beats:
        bid = beat["beat_id"]
        rec = results.get(bid)
        if rec is None or rec.get("is_confirmed"):
            continue
        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
        phase = parse_field(desc, "action_phase")
        lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
    lines.append("")
    lines.append("### Bestaetigt (Status `OK`) — kann uebernommen werden")
    lines.append("")
    lines.append("| Beat | Score | Source In | Phase laut Vision |")
    lines.append("|-----:|------:|-----------|--------------------|")
    for beat in beats:
        bid = beat["beat_id"]
        rec = results.get(bid)
        if rec is None or not rec.get("is_confirmed"):
            continue
        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
        phase = parse_field(desc, "action_phase")
        lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
    lines.append("")
    lines.append("## Hinweise zur Pruefung")
    lines.append("")
    lines.append(
        "1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. "
        "Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben."
    )
    lines.append(
        "2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), "
        "enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen."
    )
    lines.append(
        "3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten."
    )
    lines.append("")
    return "\n".join(lines)
 def main() -> int:
    here = Path(__file__).resolve().parent
    project_root = here.parent
    out = project_root / "CUTTER_REPORT.md"
    out.write_text(render_report(project_root), encoding="utf-8")
    print(f"Wrote {out}")
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())