Add cutter report and auto-regen on each match

- New CUTTER_REPORT.md: per-beat hand-off table for the video editor doing the manual recut. Per beat: trailer SMPTE in/out, source SMPTE in/out, scene id, score, status (OK / ? / MAN.), and a one-line phase description from the cached vision text. - New scripts/generate_cutter_report.py: pure renderer that reads the current cache (match_results.json + trailer_beats.json + optional vision_descriptions.json) and writes CUTTER_REPORT.md. No side effects on the cache. - cli.py: after every successful match the cutter report is regenerated automatically (best-effort; failures are logged and do not abort). - README.md: new top-section "Fuer den Cutter" describing exactly what the editor needs (which two files to look at, how the status flag works, the recommended NLE workflow). The technical algorithm description follows below. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Broaden phase realign and add unmatched-beat recovery
2026-05-04 13:09:16 +02:00 · 2026-05-04 07:12:20 +02:00
5 changed files with 598 additions and 18 deletions
@@ -0,0 +1,100 @@
+# Cutter-Report — manuelles Nachschneiden
+
+Stand: 2026-05-04. Frame-Rate: 23.976 fps. Source: BehindTheRedDoor_FTR_1080P_2398_Fixed.mp4 — Trailer: BehindTheRedDoor_Trailer_REFERENCE.mp4.
+
+Diese Datei wird automatisch aus dem Match-Cache erzeugt. Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren.
+
+## Wie diese Tabelle zu lesen ist
+
+- **Beat**: Nummer im Referenz-Trailer.
+- **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).
+- **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.
+- **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).
+- **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.
+- **Status**:
+    - `OK`   — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.
+    - `?`    — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.
+    - `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.
+- **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.
+
+## Status-Uebersicht
+
+- **Beats gesamt**: 25
+- **Automatisch gefunden**: 20 (5 davon bestaetigt)
+- **Manuell zu setzen**: 5
+
+## Beat-Tabelle
+
+| Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |
+|-----:|------------------|------------------|------:|------:|:------:|---------------------------|
+|    0 | 00:00:00:00-00:00:03:00 | —-— | — | 0.000 | MAN. | logo animation assembling from distorted shapes with motion blur |
+|    1 | 00:00:03:00-00:00:08:10 | 00:00:04:09-00:00:06:03 | 1 | 0.380 | ? |  |
+|    2 | 00:00:08:10-00:00:16:23 | —-— | — | 0.000 | MAN. |  |
+|    3 | 00:00:16:23-00:00:19:03 | 01:02:17:22-01:02:19:14 | 436 | 0.469 | ? |  |
+|    4 | 00:00:19:03-00:00:20:15 | 01:02:21:01-01:02:22:10 | 437 | 0.647 | ? |  |
+|    5 | 00:00:20:15-00:00:26:09 | 00:01:33:04-00:01:37:10 | 10 | 0.501 | ? |  |
+|    6 | 00:00:26:09-00:00:29:06 | 00:01:03:06-00:01:05:21 | 5 | 0.548 | ? |  |
+|    7 | 00:00:29:06-00:00:31:16 | 01:20:10:10-01:20:12:16 | 553 | 0.463 | ? | man appears to be engaged in conversation |
+|    8 | 00:00:31:16-00:00:33:15 | 00:00:51:07-00:00:53:01 | 5 | 0.733 | OK | static or slow drifting |
+|    9 | 00:00:33:15-00:00:36:18 | 01:20:28:20-01:20:31:17 | 557 | 0.529 | ? | speaking, transitioning from closed eyes to open mouth and focused gaze |
+|   10 | 00:00:36:18-00:00:40:02 | 01:20:35:16-01:20:39:00 | 558 | 0.635 | ? | conversation |
+|   11 | 00:00:40:02-00:00:42:03 | 01:20:40:18-01:20:42:18 | 559 | 0.502 | ? | static talking head with slight facial expression changes |
+|   12 | 00:00:42:03-00:00:50:06 | 01:14:26:01-01:14:29:10 | 519 | 0.558 | ? | static profile shot transitioning to black/darkness |
+|   13 | 00:00:50:06-00:00:53:20 | 00:43:20:02-00:43:23:10 | 308 | 0.468 | ? | static conversation; woman on right is standing and holding a cup |
+|   14 | 00:00:53:20-00:00:57:02 | 00:43:24:09-00:43:27:04 | 309 | 0.444 | ? | static conversation, subject holding a white cup |
+|   15 | 00:00:57:02-00:01:01:12 | 00:02:10:11-00:02:12:16 | 0 | 0.467 | ? | static conversation |
+|   16 | 00:01:01:12-00:01:04:12 | 01:05:12:16-01:05:15:06 | 451 | 0.613 | ? | man reaches out and touches the red door with a small object |
+|   17 | 00:01:04:12-00:01:09:03 | 01:31:22:10-01:31:24:09 | 623 | 0.684 | OK | Static intimacy transitioning to a spatial arrangement of figures |
+|   18 | 00:01:09:03-00:01:10:18 | 00:09:13:12-00:09:14:19 | 75 | 0.668 | OK | Woman in foreground turns her head from profile to face the camera while speaking |
+|   19 | 00:01:10:18-00:01:12:12 | 00:16:48:14-00:16:49:15 | 126 | 0.717 | OK | static conversation, subtle facial expression change |
+|   20 | 00:01:12:12-00:01:15:13 | 01:28:04:17-01:28:05:14 | 613 | 0.663 | OK | man kisses woman's forehead, then they pull back slightly to face each other |
+|   21 | 00:01:15:13-00:01:17:12 | —-— | — | 0.000 | MAN. | hand raised to mouth, slight facial movement |
+|   22 | 00:01:17:12-00:01:19:22 | 01:03:05:16-01:03:07:10 | 442 | 0.545 | ? |  |
+|   23 | 00:01:19:22-00:01:25:13 | —-— | — | 0.000 | MAN. |  |
+|   24 | 00:01:25:13-00:01:32:07 | —-— | — | 0.000 | MAN. |  |
+
+## Beats die manuelle Aufmerksamkeit brauchen
+
+### Manuell setzen (Status `MAN.`)
+
+- **Beat 0** 00:00:00:00-00:00:03:00: logo animation assembling from distorted shapes with motion blur
+- **Beat 2** 00:00:08:10-00:00:16:23: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
+- **Beat 21** 00:01:15:13-00:01:17:12: hand raised to mouth, slight facial movement
+- **Beat 23** 00:01:19:22-00:01:25:13: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
+- **Beat 24** 00:01:25:13-00:01:32:07: keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo
+
+### Vorlaeufig (Status `?`) — bitte sichten
+
+| Beat | Score | Source In | Phase laut Vision |
+|-----:|------:|-----------|--------------------|
+|    1 | 0.380 | 00:00:04:09 |  |
+|    3 | 0.469 | 01:02:17:22 |  |
+|    4 | 0.647 | 01:02:21:01 |  |
+|    5 | 0.501 | 00:01:33:04 |  |
+|    6 | 0.548 | 00:01:03:06 |  |
+|    7 | 0.463 | 01:20:10:10 | man appears to be engaged in conversation |
+|    9 | 0.529 | 01:20:28:20 | speaking, transitioning from closed eyes to open mouth and focused gaze |
+|   10 | 0.635 | 01:20:35:16 | conversation |
+|   11 | 0.502 | 01:20:40:18 | static talking head with slight facial expression changes |
+|   12 | 0.558 | 01:14:26:01 | static profile shot transitioning to black/darkness |
+|   13 | 0.468 | 00:43:20:02 | static conversation; woman on right is standing and holding a cup |
+|   14 | 0.444 | 00:43:24:09 | static conversation, subject holding a white cup |
+|   15 | 0.467 | 00:02:10:11 | static conversation |
+|   16 | 0.613 | 01:05:12:16 | man reaches out and touches the red door with a small object |
+|   22 | 0.545 | 01:03:05:16 |  |
+
+### Bestaetigt (Status `OK`) — kann uebernommen werden
+
+| Beat | Score | Source In | Phase laut Vision |
+|-----:|------:|-----------|--------------------|
+|    8 | 0.733 | 00:00:51:07 | static or slow drifting |
+|   17 | 0.684 | 01:31:22:10 | Static intimacy transitioning to a spatial arrangement of figures |
+|   18 | 0.668 | 00:09:13:12 | Woman in foreground turns her head from profile to face the camera while speaking |
+|   19 | 0.717 | 00:16:48:14 | static conversation, subtle facial expression change |
+|   20 | 0.663 | 01:28:04:17 | man kisses woman's forehead, then they pull back slightly to face each other |
+
+## Hinweise zur Pruefung
+
+1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben.
+2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen.
+3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten.
@@ -88,6 +88,20 @@ Wenn das fehlschlägt:
   existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
 3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.

+## Aktuelle Coverage (vor neuestem Lauf)
+
+```
+total beats: 25
+matched:     20 (5 confirmed, 15 provisional)
+unmatched:   beats 0, 2, 21, 23, 24
+```
+
+Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
+Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
+korrekt unmatched.
+Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
+Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
+
 ## Offene Risiken / Bekannte Schwächen

 - Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
@@ -1,27 +1,63 @@
 # AI Trailer Generator v2

-**Frame-accurate trailer reconstruction via pure Computer Vision**
+**Frame-genaues Nachbauen eines Trailers aus dem Quellfilm.**

-> Gibt einen Reference Trailer und den dazugehörigen Quellfilm hinein — bekommt eine fertige FCPXML/EDL heraus, die den Trailer Frame-genau aus dem Quellfilm nachbaut.
+Du gibst zwei Videos rein — einen Referenz-Trailer und den dazugehörigen
+Spielfilm — und bekommst eine fertige FCPXML/EDL für deinen Schnittplatz, die
+den Trailer Beat für Beat aus dem Quellfilm nachbaut.

 ---

-## Das Kernprinzip
+## Für den Cutter — was du wirklich brauchst

-Standardmäßig kein LLM für visuelles Matching. Optional kann ein Vision-Layer
-gecachte 3-Frame-Beschreibungen als zusätzliche Suchanker liefern; der finale
-Match bleibt aber CV-verifiziert.
+Du musst dieses Tool **nicht selbst bedienen** und musst **kein Python können**.
+Was du bekommst sind zwei Dateien, mit denen du arbeitest:

-| Phase | Was passiert | Technologie |
-|-------|-------------|-------------|
-| **0 — Prep** | Reference Trailer analysieren & Beats extrahieren | PySceneDetect + OpenCV |
-| **1 — Global Scan**| Gesamten Quellfilm via FFmpeg-Stream (2 FPS) gegen alle Beats scannen | FFmpeg Pipe + Luma-Histogramm |
-| **1b — Optional Vision Seeds** | Unsichere Top-K Szenen mit 3-Frame-Beschreibungen cachen | OpenAI-kompatibles Vision-LLM |
-| **2 — Refine** | Beste Treffer auf Frame-Ebene präzisieren | OpenCV `matchTemplate` |
-| **3 — Dramaturgie** | Narrative BeatType-Klassifikation aus Dialog-Text | OpenRouter LLM |
-| **4 — Export** | Timeline → FCPXML 1.10 oder CMX 3600 EDL | xml.etree + eigener Timecode-Layer |
+1. **`CUTTER_REPORT.md`** — die Tabelle für die manuelle Kontrolle und das
+   Nachschneiden. Pro Beat steht drin:
+   - der Trailer-Zeitcode (h:mm:ss:ff),
+   - der vorgeschlagene Source-Zeitcode aus dem Spielfilm,
+   - ein Status: `OK` (kann übernommen werden), `?` (bitte sichten) oder
+     `MAN.` (kein Treffer, manuell setzen),
+   - eine kurze Beschreibung, was im Trailer-Beat zu sehen ist (damit du
+     die richtige Stelle im Source schneller findest).
+2. **`output/*.fcpxml`** und **`output/*.edl`** — die fertige Timeline für
+   FCP / Premiere / Avid / Resolve. Beats mit Status `OK` sind dort schon
+   richtig gesetzt; `?` und `MAN.` musst du im NLE prüfen bzw. selbst setzen.

-**Text-Safe Crop:** Obere 15% und untere 30% des Frames werden vor jedem Vergleich ausgeblendet, um Title Cards, Logos und Letterbox zu ignorieren.
+**Workflow-Empfehlung:**
+
+1. Öffne `CUTTER_REPORT.md` und arbeite die Tabelle von oben nach unten ab.
+2. Importiere die FCPXML/EDL ins NLE, lade Trailer und Spielfilm dazu.
+3. Bei `OK`-Beats nur stichprobenartig sichten.
+4. Bei `?`-Beats den Vorschauclip aus dem Report-HTML (siehe unten) prüfen
+   und im NLE den Source-In um wenige Frames vor/zurück verschieben, bis die
+   Bewegungsphase exakt zum Trailer passt.
+5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
+   Beschreibung im Report sagt dir was du suchst.
+
+Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
+
+---
+
+## Wie das Tool die Treffer findet (Kurzfassung)
+
+| Phase | Was passiert |
+|-------|--------------|
+| **0** | Trailer in Beats zerlegen (PySceneDetect). |
+| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
+| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
+| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
+| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. |
+| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
+| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
+
+**Text-Safe Crop:** Obere 15 % und untere 30 % jedes Frames werden vor dem
+Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
+nicht verfälschen.
+
+**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
+CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.

 ---

@@ -310,6 +346,21 @@ beim Verbindungsaufbau. Schlägt die Vision-Verifikation während der finalen
 Filter-/Repair-Stufe trotzdem dauerhaft fehl, wird der bisherige gecachte
 Treffer für diesen Beat behalten statt verworfen — ein Netzproblem darf keinen
 schon korrekt gefundenen Match aus dem Cache löschen.
+Die Phasen-Reparatur an gefundenen Treffern läuft nicht mehr nur in „langen"
+Source-Szenen, sondern überall dort, wo die Szene mehr als nur das
+Segment-Fenster trägt. Eine korrigierte Position wird übernommen, sobald sie
+das Bildinhalt-Validate besteht UND nicht spürbar schlechter scort als das
+Original (≤ 0.02 Verlust). Bereits bestätigte Treffer in eng zugeschnittenen
+Szenen werden bewusst nicht angefasst, damit ein guter Match nicht durch eine
+nominell gleichwertige Alternative ausgetauscht wird.
+Beats, die nach dem CV-Lauf weder als Vollmatch noch als Segmentmatch landen,
+durchlaufen anschließend eine Recovery-Stufe: Vibe-Check (Histogramm/pHash)
+liefert Top-K Kandidatenszenen, die semantische Action-Window-Suche prüft
+darin die Phase des sichtbaren Trailerbeat-Anteils, und der CV-Aligner setzt
+den Inpoint frame-genau. Übernommen wird nur ein Kandidat, der dieselbe
+Vision-Phasenvalidierung wie der Hauptpfad besteht. Beats ohne sichtbares
+Bildmaterial (Logos, Titel-Karten, durchgehende Fades) werden gar nicht erst
+gesucht — sie sind bewusst kein Match.
 Lange Trailerbeats werden nicht mehr automatisch über ihre gesamte Beat-Länge
 gegen einen einzigen Source-Clip validiert. Sobald nach einem sichtbaren
 Source-Abschnitt eine anhaltende Schwarzblende oder Titel-/Credit-Insel beginnt,
@@ -92,6 +92,22 @@ def _save_results(results: list, cfg: "AppConfig") -> None:  # type: ignore[name
    logging.getLogger(__name__).info("Match results cached → %s", p)


+def _regenerate_cutter_report(cfg: "AppConfig") -> None:  # type: ignore[name-defined]
+    """Re-render CUTTER_REPORT.md after each cache write so it stays in sync."""
+    try:
+        from scripts.generate_cutter_report import render_report
+    except Exception as exc:
+        logging.getLogger(__name__).warning("Cutter report regen skipped: %s", exc)
+        return
+    try:
+        project_root = cfg.paths.cache_dir.parent
+        out = project_root / "CUTTER_REPORT.md"
+        out.write_text(render_report(project_root), encoding="utf-8")
+        logging.getLogger(__name__).info("Cutter report regenerated → %s", out)
+    except Exception as exc:
+        logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
+
+
 def _load_results(cfg: "AppConfig") -> list:  # type: ignore[name-defined]
    from src.core.models import MatchResult, MatchSegment
    p = _results_cache_path(cfg)
@@ -632,6 +648,171 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
    return sorted(by_id.values(), key=lambda r: r.beat_id)


+def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
+    """Try a vision-led search for beats that ended up without a match.
+
+    For each unmatched beat that has scoreable visual content (i.e. not pure
+    fade/title-card material), this pass:
+      1. Asks the vibe-check (CV histogram + pHash) for the top-K candidate
+         scenes.
+      2. For each candidate, runs the semantic action-window search with the
+         beat's own description, prefering windows whose phase matches the
+         visible part of the beat.
+      3. Refines the in-point with the regular CV content/motion aligner.
+      4. Validates the resulting window with the vision phase check, exactly
+         like the main filter.
+      5. Adds the best validated candidate as a provisional MatchResult.
+
+    Confirmed and provisional matches both stay subject to the same thresholds
+    used elsewhere; this only adds matches that pass the same quality gates.
+    """
+    if not cfg.vision.enabled or not beats:
+        return results
+
+    from dataclasses import replace
+    from src.cv.global_scan import align_in_point_by_content_and_motion, estimate_usable_source_duration
+    from src.cv.scene_indexer import build_scene_index
+    from src.cv.vibe_check import run_vibe_check
+    from src.core.models import MatchResult
+    from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
+
+    logger = logging.getLogger(__name__)
+    matched_ids = {r.beat_id for r in results}
+    unmatched = [b for b in beats if b.beat_id not in matched_ids]
+    if not unmatched:
+        return results
+
+    scenes = build_scene_index(cfg)
+    if not scenes:
+        return results
+
+    new_results = list(results)
+    for beat in unmatched:
+        try:
+            islands = _reference_scoreable_segments(beat, cfg)
+        except Exception:
+            islands = []
+
+        # Anchor selection: prefer the longest visible island; if none exists,
+        # fall back to the full beat. The latter handles dark / low-contrast
+        # close-ups that drop below the scoreable luma/contrast thresholds but
+        # are still semantically describable. The strict vision phase
+        # validation later in this pass keeps us from accepting pure title-card
+        # or logo material.
+        from dataclasses import replace as _replace
+        if islands:
+            anchor_start_s, anchor_end_s = max(islands, key=lambda iv: iv[1] - iv[0])
+            anchor_beat = _replace(
+                beat,
+                start_s=beat.start_s + anchor_start_s,
+                end_s=beat.start_s + anchor_end_s,
+            )
+        else:
+            anchor_beat = beat
+
+        try:
+            hits = run_vibe_check(
+                beat,
+                scenes,
+                top_k=max(cfg.cv.deep_scan.scene_seed_top_k, cfg.cv.vibe_check.top_k_candidates),
+                hist_method=cfg.cv.vibe_check.hist_compare_method,
+                phash_max_distance=64,
+            )
+        except Exception as exc:
+            logger.warning("Beat %d: recovery vibe-check failed (%s)", beat.beat_id, exc)
+            continue
+
+        scenes_by_id = {s.scene_id: s for s in scenes}
+        best = None  # (score, scene, in_s, dur_s, reason)
+        seen = set()
+        for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
+            scene = scenes_by_id.get(hit.scene_id)
+            if scene is None or scene.scene_id in seen:
+                continue
+            seen.add(scene.scene_id)
+
+            try:
+                found = find_action_window_in_scene(anchor_beat, scene, cfg)
+            except Exception as exc:
+                logger.debug("Beat %d: action window failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
+                continue
+            if found is None:
+                continue
+            start_s, end_s, semantic_score, reason = found
+
+            window_s = max(3.0, min(8.0, (end_s - start_s) * 4.0))
+            try:
+                aligned_in_s, combined_score, content_score, motion_score = align_in_point_by_content_and_motion(
+                    anchor_beat,
+                    start_s,
+                    cfg,
+                    search_window_s=window_s,
+                )
+            except Exception as exc:
+                logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
+                continue
+            aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
+
+            try:
+                usable_duration_s, usable_score = estimate_usable_source_duration(anchor_beat, aligned_in_s, cfg)
+            except Exception:
+                usable_duration_s, usable_score = anchor_beat.duration_s, 0.0
+            usable_duration_s = max(0.0, min(anchor_beat.duration_s, usable_duration_s))
+            if usable_duration_s < max(0.32, anchor_beat.duration_s * 0.45):
+                usable_duration_s = anchor_beat.duration_s
+
+            try:
+                ok, verify_reason = validate_match_window_with_vision(
+                    anchor_beat,
+                    source_path=scene.source_path,
+                    scene_id=scene.scene_id,
+                    in_point_s=aligned_in_s,
+                    out_point_s=aligned_in_s + usable_duration_s,
+                    cfg=cfg,
+                )
+            except Exception as exc:
+                logger.debug("Beat %d: validate failed scene=%d (%s)", beat.beat_id, scene.scene_id, exc)
+                continue
+            if not ok:
+                continue
+
+            final_score = max(
+                combined_score,
+                min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
+            )
+            if final_score < cfg.cv.deep_scan.provisional_match_threshold:
+                continue
+            candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
+            if best is None or candidate[0] > best[0]:
+                best = candidate
+
+        if best is None:
+            continue
+        score, scene, aligned_in_s, usable_duration_s, repair_reason = best
+        logger.info(
+            "Beat %d: recovered via vision action search scene=%d in=%.3fs score=%.3f (%s)",
+            beat.beat_id,
+            scene.scene_id,
+            aligned_in_s,
+            score,
+            repair_reason,
+        )
+        new_results.append(MatchResult(
+            beat_id=beat.beat_id,
+            scene_id=scene.scene_id,
+            source_path=scene.source_path,
+            in_point_s=aligned_in_s,
+            out_point_s=aligned_in_s + usable_duration_s,
+            in_point_frame=int(aligned_in_s * cfg.export.edl_frame_rate),
+            match_score=score,
+            match_location=(0, 0),
+            is_confirmed=score >= cfg.cv.deep_scan.match_threshold,
+            segments=tuple(),
+        ))
+
+    return sorted(new_results, key=lambda r: r.beat_id)
+
+
 def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
    """Drop vision-enabled matches whose final action phase contradicts the beat."""
    if not cfg.vision.enabled or not results:
@@ -785,7 +966,16 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                changed = False
                for segment in result.segments:
                    scene = scenes_by_id.get(segment.scene_id)
-                    if scene is None or scene.duration_s <= max(segment.duration_s * 1.6, 6.0):
+                    # Allow phase-realign whenever the scene has any meaningful
+                    # slack beyond the segment, not only for "long" scenes.
+                    # Short scenes don't need realigning because the segment
+                    # essentially is the scene.
+                    if scene is None or scene.duration_s <= segment.duration_s + 0.5:
+                        new_segments.append(segment)
+                        continue
+                    # For already-confirmed segments, skip the realign to avoid
+                    # destabilizing a strong original match.
+                    if segment.is_confirmed and scene.duration_s <= max(segment.duration_s * 1.6, 6.0):
                        new_segments.append(segment)
                        continue
                    segment_beat = replace(
@@ -801,6 +991,11 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                    if abs(aligned_in_s - segment.in_point_s) <= 1.0 / cfg.export.edl_frame_rate:
                        new_segments.append(segment)
                        continue
+                    # Don't commit a repair that scores meaningfully worse than
+                    # the original; phase realign should improve, not regress.
+                    if score < segment.match_score - 0.02:
+                        new_segments.append(segment)
+                        continue
                    changed = True
                    repair_reasons.append(repair_reason)
                    new_segments.append(replace(
@@ -833,11 +1028,22 @@ def _filter_repair_one(result, beat, beats_by_id, scenes_by_id, kept, cfg, reali
                    repaired = True
            else:
                scene = scenes_by_id.get(result.scene_id)
-                if scene is not None and scene.duration_s > max(result.duration_s * 1.6, 6.0):
+                wide_scene = (
+                    scene is not None
+                    and scene.duration_s > result.duration_s + 0.5
+                )
+                already_confirmed_in_tight_scene = (
+                    result.is_confirmed
+                    and scene is not None
+                    and scene.duration_s <= max(result.duration_s * 1.6, 6.0)
+                )
+                if wide_scene and not already_confirmed_in_tight_scene:
                    repair = realign_window(beat, result.scene_id)
                    if repair is not None:
                        repair_scene, aligned_in_s, usable_duration_s, score, repair_reason = repair
-                        if abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate:
+                        moved = abs(aligned_in_s - result.in_point_s) > 1.0 / cfg.export.edl_frame_rate
+                        improved = score >= result.match_score - 0.02
+                        if moved and improved:
                            logger.info(
                                "Beat %d: realigned semantically valid long scene by motion/action window (%s)",
                                result.beat_id,
@@ -1271,6 +1477,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
    )
    results = _attach_visual_segments(results, beats, cfg)
    results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
+    results = _recover_unmatched_beats_via_vision(results, beats, cfg)

    # A targeted one-beat match should improve the cache without deleting
    # automatic matches for other beats.
@@ -1283,6 +1490,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
        results_to_save = results

    _save_results(results_to_save, cfg)
+    _regenerate_cutter_report(cfg)

    print(f"\n✅  {len(results)} / {len(beats)} beats matched.")
    for r in results:
@@ -0,0 +1,207 @@
+"""
+scripts/generate_cutter_report.py — generate CUTTER_REPORT.md from current cache
+
+Regenerates CUTTER_REPORT.md from .cache/match_results.json,
+.cache/trailer_beats.json and .cache/vision_descriptions.json. The report is a
+hand-off document for a video editor (Cutter) doing the manual recut: it lists,
+per beat, the trailer position, the proposed source position in SMPTE
+timecodes, the match score, and what the vision model saw in the trailer beat.
+
+Usage (from project root):
+    python scripts/generate_cutter_report.py
+
+Run this any time after `python cli.py match` to keep CUTTER_REPORT.md in sync
+with the latest cache.
+"""
+
+from __future__ import annotations
+
+import json
+import re
+import sys
+from datetime import date
+from pathlib import Path
+
+
+def smpte(t: float | None, fps: int) -> str:
+    if t is None:
+        return "--:--:--:--"
+    total = int(round(t * fps))
+    h = total // (3600 * fps)
+    m = (total // (60 * fps)) % 60
+    s = (total // fps) % 60
+    f = total % fps
+    return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
+
+
+def best_beat_description(items: dict, beat_id: int, start_s: float, end_s: float) -> str | None:
+    best, best_diff = None, 1e9
+    for key, value in items.items():
+        if not key.startswith(f"beat:{beat_id}:") or not isinstance(value, dict):
+            continue
+        try:
+            parts = key.split(":")
+            ks, ke = float(parts[2]), float(parts[3])
+        except (IndexError, ValueError):
+            continue
+        diff = abs(ks - start_s) + abs(ke - end_s)
+        if diff < best_diff:
+            best_diff = diff
+            best = value
+    return best.get("description", "") if best else None
+
+
+def parse_field(desc: str | None, key: str) -> str:
+    if not desc:
+        return ""
+    match = re.search(rf'"{key}"\s*:\s*"([^"]+)"', desc)
+    return match.group(1) if match else ""
+
+
+def render_report(project_root: Path) -> str:
+    sys.path.insert(0, str(project_root))
+    from src.core.config import load_config
+
+    cfg = load_config(project_root / "config.toml")
+    fps = int(round(cfg.export.edl_frame_rate))
+
+    cache = project_root / ".cache"
+    results = {r["beat_id"]: r for r in json.loads((cache / "match_results.json").read_text())}
+    beats = json.loads((cache / "trailer_beats.json").read_text())
+    vis_path = cache / "vision_descriptions.json"
+    vis_items = json.loads(vis_path.read_text())["items"] if vis_path.exists() else {}
+
+    lines: list[str] = []
+    lines.append("# Cutter-Report — manuelles Nachschneiden")
+    lines.append("")
+    lines.append(
+        f"Stand: {date.today().isoformat()}. Frame-Rate: {cfg.export.edl_frame_rate} fps. "
+        f"Source: {Path(cfg.paths.source_movie).name} — Trailer: {Path(cfg.paths.reference_trailer).name}."
+    )
+    lines.append("")
+    lines.append(
+        "Diese Datei wird automatisch aus dem Match-Cache erzeugt. "
+        "Nach jedem `python cli.py match` mit `python scripts/generate_cutter_report.py` neu generieren."
+    )
+    lines.append("")
+    lines.append("## Wie diese Tabelle zu lesen ist")
+    lines.append("")
+    lines.append("- **Beat**: Nummer im Referenz-Trailer.")
+    lines.append("- **Trailer In/Out**: SMPTE-Position des Beats im Trailer (h:mm:ss:ff).")
+    lines.append("- **Source In/Out**: vorgeschlagene Position im Quellfilm. Bei `MAN.` selbst aussuchen.")
+    lines.append("- **Scene**: ID der Source-Szene aus PySceneDetect (nur fuer Debug-Zwecke).")
+    lines.append("- **Score**: 0..1, je hoeher desto besser. >=0.65 ist als bestaetigt eingestuft.")
+    lines.append("- **Status**:")
+    lines.append("    - `OK`   — bestaetigt durch CV + Vision-Phasenpruefung, kann ohne weitere Pruefung uebernommen werden.")
+    lines.append("    - `?`    — vorlaeufig, korrekte Szene aber Score unter 0.65; Bewegungsphase im Vorschauclip pruefen und ggf. um wenige Frames verschieben.")
+    lines.append("    - `MAN.` — kein automatischer Treffer; entweder manuell suchen oder als Schwarzfade/Titel uebernehmen.")
+    lines.append("- **Phase**: was im Trailerbeat zu sehen ist (aus Vision-Beschreibung). Hilft dir, die richtige Stelle im Source zu finden.")
+    lines.append("")
+
+    matched = sum(1 for b in beats if b["beat_id"] in results)
+    confirmed = sum(1 for b in beats if b["beat_id"] in results and results[b["beat_id"]]["is_confirmed"])
+    lines.append("## Status-Uebersicht")
+    lines.append("")
+    lines.append(f"- **Beats gesamt**: {len(beats)}")
+    lines.append(f"- **Automatisch gefunden**: {matched} ({confirmed} davon bestaetigt)")
+    lines.append(f"- **Manuell zu setzen**: {len(beats) - matched}")
+    lines.append("")
+    lines.append("## Beat-Tabelle")
+    lines.append("")
+    lines.append("| Beat | Trailer In / Out | Source In / Out | Scene | Score | Status | Was im Bild zu sehen ist |")
+    lines.append("|-----:|------------------|------------------|------:|------:|:------:|---------------------------|")
+
+    def status_for(rec: dict | None) -> str:
+        if rec is None:
+            return "MAN."
+        return "OK" if rec.get("is_confirmed") else "?"
+
+    for beat in beats:
+        bid = beat["beat_id"]
+        rec = results.get(bid)
+        ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
+        if rec is not None:
+            si, so = smpte(rec["in_point_s"], fps), smpte(rec["out_point_s"], fps)
+            scn = rec["scene_id"]
+            sc = rec["match_score"]
+        else:
+            si = so = "—"
+            scn = "—"
+            sc = 0.0
+        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
+        phase = (parse_field(desc, "action_phase") or parse_field(desc, "subject"))[:90]
+        lines.append(f"| {bid:>4} | {ti}-{to} | {si}-{so} | {scn} | {sc:.3f} | {status_for(rec)} | {phase} |")
+
+    lines.append("")
+    lines.append("## Beats die manuelle Aufmerksamkeit brauchen")
+    lines.append("")
+    lines.append("### Manuell setzen (Status `MAN.`)")
+    lines.append("")
+    for beat in beats:
+        bid = beat["beat_id"]
+        if bid in results:
+            continue
+        ti, to = smpte(beat["start_s"], fps), smpte(beat["end_s"], fps)
+        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
+        phase = parse_field(desc, "action_phase")
+        note = phase or "keine Vision-Beschreibung — vermutlich Title-Card / Fade / Logo"
+        lines.append(f"- **Beat {bid}** {ti}-{to}: {note}")
+    lines.append("")
+
+    lines.append("### Vorlaeufig (Status `?`) — bitte sichten")
+    lines.append("")
+    lines.append("| Beat | Score | Source In | Phase laut Vision |")
+    lines.append("|-----:|------:|-----------|--------------------|")
+    for beat in beats:
+        bid = beat["beat_id"]
+        rec = results.get(bid)
+        if rec is None or rec.get("is_confirmed"):
+            continue
+        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
+        phase = parse_field(desc, "action_phase")
+        lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
+    lines.append("")
+
+    lines.append("### Bestaetigt (Status `OK`) — kann uebernommen werden")
+    lines.append("")
+    lines.append("| Beat | Score | Source In | Phase laut Vision |")
+    lines.append("|-----:|------:|-----------|--------------------|")
+    for beat in beats:
+        bid = beat["beat_id"]
+        rec = results.get(bid)
+        if rec is None or not rec.get("is_confirmed"):
+            continue
+        desc = best_beat_description(vis_items, bid, beat["start_s"], beat["end_s"]) or ""
+        phase = parse_field(desc, "action_phase")
+        lines.append(f"| {bid:>4} | {rec['match_score']:.3f} | {smpte(rec['in_point_s'], fps)} | {phase[:90]} |")
+    lines.append("")
+
+    lines.append("## Hinweise zur Pruefung")
+    lines.append("")
+    lines.append(
+        "1. Source-Times sollten zur jeweiligen Trailer-Bewegungsphase passen. "
+        "Wenn nicht: Source-In innerhalb derselben Source-Szene wenige Frames vor/zurueck verschieben."
+    )
+    lines.append(
+        "2. Wenn der Source-Clip kuerzer ist als der Trailerbeat (Source-Out < Trailer-Out gerechnet ab Source-In), "
+        "enthaelt der Trailerbeat eine Blende/Titelkarte; im Schnitt mit Schwarzfade oder Source-Tail auffuellen."
+    )
+    lines.append(
+        "3. `OK`-Beats sind durch CV + Vision-Phasenpruefung doppelt verifiziert; trotzdem stichprobenartig sichten."
+    )
+    lines.append("")
+
+    return "\n".join(lines)
+
+
+def main() -> int:
+    here = Path(__file__).resolve().parent
+    project_root = here.parent
+    out = project_root / "CUTTER_REPORT.md"
+    out.write_text(render_report(project_root), encoding="utf-8")
+    print(f"Wrote {out}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())