Compare commits
69 Commits
a7b103b4fe
..
main
| Author | SHA1 | Date | |
|---|---|---|---|
| fa40821319 | |||
| 68ec775916 | |||
| 3b42c5d018 | |||
| f3c3a9cfd4 | |||
| e966a4c321 | |||
| 45b5376cef | |||
| 4b3894a812 | |||
| 3ad2b51e56 | |||
| c16e46fb9d | |||
| 8ca6d4b696 | |||
| b771c6792b | |||
| 6bf3ab6626 | |||
| 9a5abd5312 | |||
| b2abdafc7a | |||
| 02e9fee982 | |||
| 5425939a84 | |||
| ed7b083dca | |||
| ae3c2b1b13 | |||
| 71117a8a3b | |||
| c1425003c1 | |||
| bcaf0417b3 | |||
| f63d65fcd2 | |||
| c08ba97d37 | |||
| a275b2efb6 | |||
| fab6c53698 | |||
| c5b7d61451 | |||
| acafe538b2 | |||
| 10e27afc8d | |||
| e335fffe92 | |||
| bdc9e4ab31 | |||
| 430a81a988 | |||
| 5611902eb5 | |||
| 4eeecca80d | |||
| 5407f08fbc | |||
| 0baedb3a17 | |||
| d83fced8d2 | |||
| 4fe1d35f1a | |||
| 730b5ef3c0 | |||
| f20f89b06b | |||
| 18c8c89ee6 | |||
| 9b524c9329 | |||
| 1e5ffffd91 | |||
| 8fd0442724 | |||
| 18a67387f6 | |||
| 7ffe4adc3b | |||
| 92a12276ee | |||
| 64b53c0e82 | |||
| 8096f9b4d8 | |||
| e960b1c080 | |||
| c972894972 | |||
| 72e22969b4 | |||
| 9d3c5d5afd | |||
| 0375580373 | |||
| e3a4c22b71 | |||
| c71ed2b701 | |||
| 49412c54a6 | |||
| 533ab49d62 | |||
| f1e9636a83 | |||
| cd10e2bc03 | |||
| c118428167 | |||
| 45769aa366 | |||
| 3b90905d07 | |||
| 07f47ebe2b | |||
| 2f8b0585e2 | |||
| d287952572 | |||
| e2c30d0062 | |||
| fab6aa9388 | |||
| 64e0132cc7 | |||
| 4afb438a4d |
@@ -0,0 +1,6 @@
|
||||
* text=auto
|
||||
.gitattributes text eol=lf
|
||||
*.py text eol=lf
|
||||
*.md text eol=lf
|
||||
*.html text eol=lf
|
||||
*.ps1 text eol=crlf
|
||||
@@ -30,10 +30,14 @@ proxy/
|
||||
*.jpeg
|
||||
*.png
|
||||
|
||||
# IDE
|
||||
# IDE / editor
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.code-workspace
|
||||
|
||||
# Claude Code session data
|
||||
.claude/
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
|
||||
@@ -1,125 +0,0 @@
|
||||
# Handover Notes
|
||||
|
||||
Stand: 2026-05-03 (Beat-20-Reparatur abgeschlossen).
|
||||
|
||||
## Zustand
|
||||
|
||||
- `pytest tests/ -q` → 52/52 grün.
|
||||
- `python cli.py match --beat 20 --vision` läuft erfolgreich durch und schreibt
|
||||
einen confirmed Match (Score 0.6632, scene 613, in=5284.706s, dur=0.88s).
|
||||
- Vorheriger Cache wurde nach `.cache/match_results.json.bak` gesichert.
|
||||
- Kein offener PR; lokale Änderungen sind committed (siehe letzter Commit).
|
||||
|
||||
## Was zuletzt geändert wurde und warum
|
||||
|
||||
### 1. `cli.py` — `realign_window` wählt das Action-Window pro Segment
|
||||
|
||||
In `_filter_semantically_invalid_vision_matches.realign_window`:
|
||||
|
||||
- **Vorher:** `find_action_window_in_scene(action_beat or check_beat, …)` — bei
|
||||
segmentierten Beats wurde immer der ganze Beat als semantischer Kontext
|
||||
benutzt. Das hat für Beat 20 die Source-Position auf die Kuss-Phase
|
||||
(5270 s) gelegt, obwohl das *sichtbare* Segment nur "approaching and pulling
|
||||
apart" zeigt — diese Phase liegt im Source erst um 5284 s.
|
||||
- **Jetzt:** Es werden zwei Fenster gesucht (Segment-Beschreibung *und* Beat-
|
||||
Beschreibung). Der Beat-Kontext gewinnt nur bei deutlichem (>0.06) Score-
|
||||
Vorsprung. Der Trailer-Offset-Shift (`visible_content_offset`) wird nur
|
||||
angewendet, wenn tatsächlich der Beat-Kontext benutzt wurde — sonst zeigt
|
||||
das Segment-Fenster bereits auf die richtige Phase.
|
||||
|
||||
Effekt für Beat 20: 5270.118 → 5284.706, Score 0.6449 (provisional) → 0.6632
|
||||
(confirmed).
|
||||
|
||||
### 2. `cli.py` — Filter-/Repair-Stufe ist crash-tolerant
|
||||
|
||||
`_filter_semantically_invalid_vision_matches` hat den Per-Result-Body in eine
|
||||
lokale Funktion `_filter_repair_one` herausgezogen und in einen try/except
|
||||
verpackt. Wenn die Reparatur abbricht (z. B. weil Vision-API mitten in der
|
||||
Antwort wegfällt), wird der bisher gecachte Treffer behalten statt komplett
|
||||
verworfen.
|
||||
|
||||
### 3. `src/llm/vision_cache.py` — Vision-Retry für Lesefehler
|
||||
|
||||
`_call_vision_model` fängt jetzt zusätzlich `TimeoutError`,
|
||||
`socket.timeout`, `ConnectionError` und `OSError` während des Antwort-Lesens
|
||||
und retryt mit demselben Backoff wie HTTP-/URL-Fehler. Die Auslöse-Bedingung
|
||||
war ein 24-h-DSL-Disconnect mitten im Lauf; davor wurde der Match-Lauf hart
|
||||
abgebrochen und der Cache stand auf "kein Match".
|
||||
|
||||
### 4. `README.md`
|
||||
|
||||
Zwei kurze Absätze ergänzt, die (1) die Segment-vs-Beat-Window-Auswahl und
|
||||
(2) das neue Crash-/Netzfehler-Verhalten beschreiben.
|
||||
|
||||
## Nicht angefasst, aber relevant für die Übergabe
|
||||
|
||||
- Der **vollständige FFmpeg-Vollscan** liefert für Beat 20 weiterhin keinen
|
||||
bestätigten Treffer (final score 0.419 < provisional 0.430). Den
|
||||
Confirmed-Match liefert die Action-Window-Reparatur. Das ist erwartet:
|
||||
das sichtbare Segment ist visuell sehr generisch (Two-Shot Profil mit
|
||||
unscharfem Hintergrund), die korrekte Phase fällt erst durch die
|
||||
semantische Aktionsbeschreibung auf.
|
||||
- Die `candidate_points`-Schleife in `realign_window` (lines ~700–765) sucht
|
||||
nur ±~2 s um `start_s` herum. Solange `start_s` jetzt aus dem Segment-
|
||||
Fenster kommt, liegt der korrekte Source-Punkt in diesem Bereich. Wenn
|
||||
künftig Beats mit längeren visiblen Inseln auftauchen, kann diese Range
|
||||
zu eng werden — dann den Suchradius erweitern statt das Window-Picking
|
||||
rückgängig machen.
|
||||
- Es gibt **keine Tests** für `_filter_semantically_invalid_vision_matches`
|
||||
oder `realign_window`. Wer das anfasst, sollte Beat 20 als Live-Smoke-Test
|
||||
benutzen (siehe unten).
|
||||
|
||||
## Reproduktion / Smoke-Test
|
||||
|
||||
```powershell
|
||||
.\.venv\Scripts\Activate.ps1
|
||||
python cli.py match --beat 20 --vision
|
||||
```
|
||||
|
||||
Erwartet: `Beat 20: realigned semantically valid long scene by motion/action
|
||||
windows`, danach `is_confirmed: true` für Beat 20 in
|
||||
`.cache/match_results.json` mit `in_point_s ≈ 5284.7` und `match_score ≥ 0.65`.
|
||||
|
||||
Wenn das fehlschlägt:
|
||||
|
||||
1. `python -m pytest tests/ -q` — falls rot, ist die Codebasis selbst kaputt.
|
||||
2. `.cache/vision_descriptions.json` prüfen — die Schlüssel
|
||||
`beat:20:73.560:74.680:…` und `action_window:613:5282.390:5285.430:…` müssen
|
||||
existieren, sonst ruft Vision live ab (kostet Credits; braucht Netz).
|
||||
3. `match_results.json.bak` zurückspielen, falls der Cache zerschossen ist.
|
||||
|
||||
## Aktuelle Coverage (vor neuestem Lauf)
|
||||
|
||||
```
|
||||
total beats: 25
|
||||
matched: 20 (5 confirmed, 15 provisional)
|
||||
unmatched: beats 0, 2, 21, 23, 24
|
||||
```
|
||||
|
||||
Beat 0 ist das SHO-Logo (kein Source-Match möglich, korrekt).
|
||||
Beats 22/23/24 haben keine sichtbaren Inseln (Endcredits/Title) — auch
|
||||
korrekt unmatched.
|
||||
Beat 2 und Beat 21 sind die echten Recovery-Kandidaten; die neue
|
||||
Recovery-Stufe versucht sie beim nächsten `match`-Lauf nachzuziehen.
|
||||
|
||||
## Offene Risiken / Bekannte Schwächen
|
||||
|
||||
- Die Schwelle `0.06` für "Beat-Kontext gewinnt" in `realign_window` ist
|
||||
kalibriert an Beat 20. Andere Beats sollten auch durchlaufen werden, bevor
|
||||
weitere Beats angefasst werden — am besten ein voller `python cli.py match`
|
||||
ohne `--beat` und Diff der `match_results.json` gegen `.bak`.
|
||||
- Die Filter-/Repair-Stufe kann durch Vision-Calls minutenlang laufen. Das
|
||||
ist nicht neu, aber bei Netzproblemen sehr sichtbar.
|
||||
- Die `_filter_repair_one`-Funktion bekommt viele Argumente durchgereicht
|
||||
(closure-Variablen aus dem Parent). Bei einer nächsten Iteration könnte das
|
||||
in eine kleine Klasse umgebaut werden.
|
||||
|
||||
## Useful greps
|
||||
|
||||
- `find_action_window_in_scene` — semantische Action-Window-Suche (Vision).
|
||||
- `_reference_scoreable_segments` — bestimmt die sichtbaren Inseln eines
|
||||
Beats.
|
||||
- `estimate_usable_source_duration` — kürzt Match-Clips, wenn die Source
|
||||
vor Beat-Ende in eine andere Phase wechselt.
|
||||
- `_filter_semantically_invalid_vision_matches` — Eintrittspunkt der
|
||||
Repair-Stufe in `cli.py`.
|
||||
@@ -36,6 +36,10 @@ Was du bekommst sind zwei Dateien, mit denen du arbeitest:
|
||||
5. Bei `MAN.`-Beats selbst die passende Stelle im Spielfilm suchen — die
|
||||
Beschreibung im Report sagt dir was du suchst.
|
||||
|
||||
Für die visuelle Kontrolle ist zusätzlich **`CUTTER_REPORT.html`** relevant:
|
||||
er enthält die frame-locked Compare-Clips. Der alte `match_report.html` ist
|
||||
nicht mehr Teil des Workflows.
|
||||
|
||||
Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
||||
|
||||
---
|
||||
@@ -48,7 +52,7 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
||||
| **1** | Schneller Vibe-Check: für jeden Beat die Top-K ähnlichsten Szenen aus dem Spielfilm vorauswählen (Histogramm + pHash). |
|
||||
| **2** | Optional: Vision-LLM beschreibt unsichere Szenen mit 3-Frame-Samples; die Beschreibungen liegen gecached vor. |
|
||||
| **3** | Frame-genaue Verfeinerung pro Beat (OpenCV-Templatematching, Bewegungsphasen-Vergleich). |
|
||||
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase im Source mit der sichtbaren Trailerphase abgeglichen. |
|
||||
| **4** | Phasen-Reparatur: bei segmentierten Beats wird die Bewegungsphase lokal um den gefundenen Inpoint saliency- und motion-gewichtet mit der sichtbaren Trailerphase abgeglichen. |
|
||||
| **5** | Recovery: Beats ohne Treffer werden via Vision-Phasensuche in den Top-K Szenen nochmal probiert. |
|
||||
| **6** | Export als FCPXML 1.10 oder CMX-3600-EDL plus `CUTTER_REPORT.md`. |
|
||||
|
||||
@@ -56,6 +60,10 @@ Alles andere unten ist Hintergrund für den Tool-Verantwortlichen.
|
||||
Vergleich ausgeblendet, damit Title-Cards, Logos und Letterbox die Treffer
|
||||
nicht verfälschen.
|
||||
|
||||
**Cutter-Report-Caching:** Vorhandene Compare-Clips werden wiederverwendet.
|
||||
Bei gezielten Rematches wird nur der betroffene Beat neu gerendert, damit der
|
||||
Report schnell aktuell bleibt und keine unnötigen Videoartefakte neu entstehen.
|
||||
|
||||
**Wichtig:** Auch wenn Vision aktiviert ist — der finale Match bleibt
|
||||
CV-verifiziert. Das LLM liefert nur zusätzliche Suchanker.
|
||||
|
||||
@@ -159,7 +167,7 @@ wenn sich das zugrundeliegende Match geändert hat.
|
||||
| Source-Clip zeigt richtige Szene, aber falsche Bewegungsphase | `python cli.py rematch --beat N --refine` — schiebt den Inpoint frame-genau aus dem Bildinhalt. |
|
||||
| Score zu niedrig, andere Szene wäre richtig | `python cli.py match --beat N --vision` — vollständiger Re-Match nur für diesen Beat mit Vision-Phasenprüfung. |
|
||||
| Match offensichtlich falsche Szene | `python cli.py rematch --beat N --threshold 0.50` — Schwelle absenken, neuer globaler Scan nur für diesen Beat. |
|
||||
| Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `MAN.` im `CUTTER_REPORT.md` ist korrekt. |
|
||||
| Beat ist Schwarzbild / Logo / Titel und sollte gar nicht matchen | nichts tun, der Status `GFX` im `CUTTER_REPORT.md` ist korrekt. |
|
||||
|
||||
### Algorithmische Details
|
||||
|
||||
|
||||
@@ -92,40 +92,76 @@ def _save_results(results: list, cfg: "AppConfig") -> None: # type: ignore[name
|
||||
logging.getLogger(__name__).info("Match results cached → %s", p)
|
||||
|
||||
|
||||
def _regenerate_cutter_report(cfg: "AppConfig") -> None: # type: ignore[name-defined]
|
||||
"""Re-render CUTTER_REPORT.{md,html} and output/report/match_report.html.
|
||||
def _auto_commit_push_reports(project_root: "Path") -> None: # type: ignore[name-defined]
|
||||
"""Stage changed report files, commit, and push to origin.
|
||||
|
||||
Only touches report output files — never stages source or config changes.
|
||||
Failures are logged but never propagate.
|
||||
"""
|
||||
import subprocess as _sp
|
||||
from datetime import datetime as _dt
|
||||
|
||||
report_globs = [
|
||||
"CUTTER_REPORT.html",
|
||||
"CUTTER_REPORT.md",
|
||||
"output/cutter_clips/beat_*_compare.mp4",
|
||||
"output/cutter_clips/beat_*_source.mp4",
|
||||
"output/cutter_clips/beat_*_source_seg*.mp4",
|
||||
"output/cutter_clips/beat_*_trailer.mp4",
|
||||
"output/cutter_stills/beat_*_source.jpg",
|
||||
"output/cutter_stills/beat_*_trailer.jpg",
|
||||
]
|
||||
log = logging.getLogger(__name__)
|
||||
cwd = str(project_root)
|
||||
try:
|
||||
for pattern in report_globs:
|
||||
_sp.run(["git", "add", "--", pattern], capture_output=True, cwd=cwd)
|
||||
status = _sp.run(
|
||||
["git", "status", "--porcelain"], capture_output=True, text=True, cwd=cwd
|
||||
)
|
||||
if not status.stdout.strip():
|
||||
log.info("Auto-commit: nothing changed in report files.")
|
||||
return
|
||||
now = _dt.now().strftime("%Y-%m-%d %H:%M")
|
||||
msg = f"Auto-update cutter report {now}\n\nCo-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>"
|
||||
_sp.run(["git", "commit", "-m", msg], capture_output=True, cwd=cwd, check=True)
|
||||
_sp.run(["git", "push", "origin", "main"], capture_output=True, cwd=cwd, check=True)
|
||||
log.info("Auto-commit+push: cutter report updated → remote.")
|
||||
except Exception as exc:
|
||||
log.warning("Auto-commit/push failed (non-fatal): %s", exc)
|
||||
|
||||
|
||||
def _regenerate_cutter_report(cfg: "AppConfig", force_beats: set[int] | None = None) -> None: # type: ignore[name-defined]
|
||||
"""Re-render CUTTER_REPORT.{md,html} with Frame-Locked Compare clips.
|
||||
|
||||
Called from every match-style command after the cache is written so all
|
||||
cutter-facing artefacts stay in sync with the current `match_results.json`.
|
||||
Failures are logged but never abort the run — the cache is the source of
|
||||
truth, the reports can always be re-rendered manually later.
|
||||
cutter-facing artefacts stay in sync with `match_results.json`.
|
||||
After rendering, stages and pushes changed report files to the remote.
|
||||
Failures are logged but never abort the run.
|
||||
"""
|
||||
project_root = cfg.paths.cache_dir.parent
|
||||
try:
|
||||
import os
|
||||
from scripts.generate_cutter_report import render_report
|
||||
except Exception as exc:
|
||||
logging.getLogger(__name__).warning("Cutter report regen skipped: %s", exc)
|
||||
else:
|
||||
old_force = os.environ.get("CUTTER_REPORT_FORCE_BEATS")
|
||||
try:
|
||||
project_root = cfg.paths.cache_dir.parent
|
||||
if force_beats:
|
||||
os.environ["CUTTER_REPORT_FORCE_BEATS"] = ",".join(str(b) for b in sorted(force_beats))
|
||||
md, html = render_report(project_root, with_stills=True, with_clips=True)
|
||||
(project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8")
|
||||
(project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8")
|
||||
logging.getLogger(__name__).info("Cutter report regenerated (md + html)")
|
||||
except Exception as exc:
|
||||
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
|
||||
finally:
|
||||
if force_beats:
|
||||
if old_force is None:
|
||||
os.environ.pop("CUTTER_REPORT_FORCE_BEATS", None)
|
||||
else:
|
||||
os.environ["CUTTER_REPORT_FORCE_BEATS"] = old_force
|
||||
(project_root / "CUTTER_REPORT.md").write_text(md, encoding="utf-8")
|
||||
(project_root / "CUTTER_REPORT.html").write_text(html, encoding="utf-8")
|
||||
|
||||
# Also keep the legacy output/report/match_report.html in sync. It uses
|
||||
# its own preview-clip pipeline (frame-locked compare videos) and is the
|
||||
# heavier of the two reports — kept up-to-date so the cutter can choose
|
||||
# whichever view they prefer.
|
||||
try:
|
||||
from src.pipeline.reporter import generate_report
|
||||
all_beats = _load_beats(cfg)
|
||||
all_results = _normalize_cached_results(all_beats, _load_results(cfg), cfg)
|
||||
generate_report(all_beats, all_results, cfg)
|
||||
logging.getLogger(__name__).info("Match report regenerated → output/report/match_report.html")
|
||||
logging.getLogger(__name__).info("Cutter report regenerated (md + html + compare clips)")
|
||||
except Exception as exc:
|
||||
logging.getLogger(__name__).warning("Match report regen failed: %s", exc)
|
||||
logging.getLogger(__name__).warning("Cutter report regen failed: %s", exc)
|
||||
|
||||
_auto_commit_push_reports(project_root)
|
||||
|
||||
|
||||
def _load_results(cfg: "AppConfig") -> list: # type: ignore[name-defined]
|
||||
@@ -245,9 +281,57 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
for result in results:
|
||||
beat = beats_by_id.get(result.beat_id)
|
||||
if getattr(result, "segments", ()):
|
||||
segment_duration = sum(max(0.0, float(s.duration_s)) for s in result.segments)
|
||||
segment_threshold = cfg.cv.deep_scan.multi_shot_segment_threshold
|
||||
current_islands = _reference_scoreable_segments(beat, cfg) if beat is not None else []
|
||||
repaired_segments = []
|
||||
source_segments = list(result.segments)
|
||||
if beat is not None and len(source_segments) == 1 and len(current_islands) == 1:
|
||||
island_start_s, island_end_s = current_islands[0]
|
||||
island_duration_s = max(0.0, island_end_s - island_start_s)
|
||||
segment = source_segments[0]
|
||||
if (
|
||||
abs(float(segment.trailer_offset_s) - island_start_s) > 0.04
|
||||
or abs(float(segment.duration_s) - island_duration_s) > 0.08
|
||||
):
|
||||
from dataclasses import replace as _replace
|
||||
source_segments[0] = _replace(
|
||||
segment,
|
||||
trailer_offset_s=island_start_s,
|
||||
duration_s=island_duration_s,
|
||||
out_point_s=float(segment.in_point_s) + island_duration_s,
|
||||
)
|
||||
for segment in source_segments:
|
||||
if float(segment.match_score) < segment_threshold:
|
||||
scene = _scene_by_id_light(scenes, segment.scene_id)
|
||||
if beat is not None and scene is not None:
|
||||
segment_beat = replace(
|
||||
beat,
|
||||
start_s=beat.start_s + float(segment.trailer_offset_s),
|
||||
end_s=beat.start_s + float(segment.trailer_offset_s) + float(segment.duration_s),
|
||||
)
|
||||
probe = _phase_probe_segment_in_scene(
|
||||
segment_beat,
|
||||
scene,
|
||||
float(segment.in_point_s),
|
||||
cfg,
|
||||
)
|
||||
if probe is not None:
|
||||
in_point_s, _phase_score = probe
|
||||
segment = replace(
|
||||
segment,
|
||||
in_point_s=in_point_s,
|
||||
out_point_s=in_point_s + float(segment.duration_s),
|
||||
match_score=max(float(segment.match_score), float(_phase_score)),
|
||||
is_confirmed=float(_phase_score) >= cfg.cv.deep_scan.match_threshold,
|
||||
)
|
||||
repaired_segments.append(segment)
|
||||
|
||||
valid_segments = tuple(repaired_segments)
|
||||
if not valid_segments:
|
||||
continue
|
||||
segment_duration = sum(max(0.0, float(s.duration_s)) for s in valid_segments)
|
||||
weighted_score = (
|
||||
sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in result.segments)
|
||||
sum(max(0.0, float(s.duration_s)) * float(s.match_score) for s in valid_segments)
|
||||
/ segment_duration
|
||||
if segment_duration > 0 else result.match_score
|
||||
)
|
||||
@@ -262,7 +346,15 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
coverage = segment_duration / coverage_target
|
||||
if coverage < cfg.cv.deep_scan.min_duration_coverage:
|
||||
continue
|
||||
normalized.append(replace(result, match_score=weighted_score))
|
||||
first_segment = valid_segments[0]
|
||||
normalized.append(replace(
|
||||
result,
|
||||
scene_id=first_segment.scene_id,
|
||||
in_point_s=first_segment.in_point_s,
|
||||
out_point_s=first_segment.out_point_s,
|
||||
match_score=weighted_score,
|
||||
segments=valid_segments,
|
||||
))
|
||||
continue
|
||||
|
||||
if result.match_score < cfg.cv.deep_scan.provisional_match_threshold:
|
||||
@@ -292,6 +384,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
|
||||
fps = _scene_fps_light(scene, cfg)
|
||||
adjusted_in_s = result.in_point_s
|
||||
phase_changed = False
|
||||
scene_changed = int(scene["scene_id"]) != result.scene_id
|
||||
starts_before_scene = result.in_point_s < float(scene["start_s"])
|
||||
if scene_changed or starts_before_scene or result.duration_s <= 0.12:
|
||||
@@ -300,6 +393,25 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
|
||||
fps = _scene_fps_light(scene, cfg)
|
||||
|
||||
should_phase_probe = (
|
||||
scene_changed
|
||||
or starts_before_scene
|
||||
or not result.is_confirmed
|
||||
or result.match_score < cfg.cv.deep_scan.match_threshold
|
||||
)
|
||||
phase_score = result.match_score
|
||||
if should_phase_probe:
|
||||
probe = _phase_probe_segment_in_scene(beat, scene, adjusted_in_s, cfg)
|
||||
if probe is not None:
|
||||
probed_in_s, probed_score = probe
|
||||
max_shift_s = max(0.12, min(0.75, beat.duration_s * 0.35))
|
||||
if abs(probed_in_s - adjusted_in_s) <= max_shift_s:
|
||||
adjusted_in_s = probed_in_s
|
||||
phase_changed = True
|
||||
phase_score = max(float(result.match_score), float(probed_score))
|
||||
scene = _scene_for_time_light(scenes, adjusted_in_s, cfg) or scene
|
||||
fps = _scene_fps_light(scene, cfg)
|
||||
|
||||
matchable_duration_s = beat.duration_s
|
||||
try:
|
||||
from src.cv.global_scan import estimate_matchable_reference_duration
|
||||
@@ -322,6 +434,7 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
if (
|
||||
scene_changed
|
||||
or starts_before_scene
|
||||
or phase_changed
|
||||
or result.duration_s <= 0.12
|
||||
or result.out_point_s > adjusted_in_s + max_duration_s + (1.0 / fps)
|
||||
):
|
||||
@@ -331,6 +444,8 @@ def _normalize_cached_results(beats: list, results: list, cfg) -> list:
|
||||
in_point_s=adjusted_in_s,
|
||||
out_point_s=adjusted_in_s + max_duration_s,
|
||||
in_point_frame=int(adjusted_in_s * fps),
|
||||
match_score=phase_score,
|
||||
is_confirmed=phase_score >= cfg.cv.deep_scan.match_threshold,
|
||||
)
|
||||
|
||||
coverage = (
|
||||
@@ -521,7 +636,7 @@ def _reference_scoreable_segments(beat, cfg) -> list[tuple[float, float]]:
|
||||
t = 0.0
|
||||
while t <= beat.duration_s:
|
||||
frame = grab_frame_at_path(beat.trailer_path, beat.start_s + t)
|
||||
scoreable = frame is not None and _is_scoreable_reference_frame(frame, cfg)
|
||||
scoreable = frame is not None and is_visible(frame)
|
||||
if scoreable:
|
||||
if start is None:
|
||||
start = t
|
||||
@@ -799,7 +914,7 @@ def _merge_best_results(existing: list, candidates: list, cfg) -> list:
|
||||
|
||||
|
||||
def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list:
|
||||
"""Try a vision-led search for beats that ended up without a match.
|
||||
"""Try a vision-led search for beats that ended up weak or unmatched.
|
||||
|
||||
For each unmatched beat that has scoreable visual content (i.e. not pure
|
||||
fade/title-card material), this pass:
|
||||
@@ -816,7 +931,7 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
Confirmed and provisional matches both stay subject to the same thresholds
|
||||
used elsewhere; this only adds matches that pass the same quality gates.
|
||||
"""
|
||||
if not cfg.vision.enabled or not beats:
|
||||
if not beats:
|
||||
return results
|
||||
|
||||
from dataclasses import replace
|
||||
@@ -827,17 +942,28 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
from src.llm.vision_cache import find_action_window_in_scene, validate_match_window_with_vision
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
matched_ids = {r.beat_id for r in results}
|
||||
unmatched = [b for b in beats if b.beat_id not in matched_ids]
|
||||
if not unmatched:
|
||||
results_by_id = {r.beat_id: r for r in results}
|
||||
recovery_targets = [
|
||||
b for b in beats
|
||||
if (
|
||||
b.beat_id not in results_by_id
|
||||
or (
|
||||
not results_by_id[b.beat_id].is_confirmed
|
||||
and results_by_id[b.beat_id].match_score < cfg.cv.deep_scan.match_threshold
|
||||
)
|
||||
)
|
||||
]
|
||||
if not recovery_targets:
|
||||
return results
|
||||
|
||||
scenes = build_scene_index(cfg)
|
||||
if not scenes:
|
||||
return results
|
||||
|
||||
new_results = list(results)
|
||||
for beat in unmatched:
|
||||
target_ids = {b.beat_id for b in recovery_targets}
|
||||
new_results = [r for r in results if r.beat_id not in target_ids]
|
||||
replaced_results = {r.beat_id: r for r in results if r.beat_id in target_ids}
|
||||
for beat in recovery_targets:
|
||||
try:
|
||||
islands = _reference_scoreable_segments(beat, cfg)
|
||||
except Exception:
|
||||
@@ -874,6 +1000,79 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
|
||||
scenes_by_id = {s.scene_id: s for s in scenes}
|
||||
best = None # (score, scene, in_s, dur_s, reason)
|
||||
try:
|
||||
from src.llm.vision_cache import (
|
||||
_load_cache,
|
||||
_semantic_action_groups,
|
||||
_semantic_match_score,
|
||||
_STRONG_ACTION_GROUPS,
|
||||
)
|
||||
cache = _load_cache(cfg)
|
||||
items = cache.get("items", {})
|
||||
beat_desc = ""
|
||||
if isinstance(items, dict):
|
||||
for item in items.values():
|
||||
if (
|
||||
isinstance(item, dict)
|
||||
and item.get("kind") == "beat"
|
||||
and item.get("item_id") == beat.beat_id
|
||||
):
|
||||
beat_desc = str(item.get("description", ""))
|
||||
break
|
||||
beat_actions = _semantic_action_groups(beat_desc) & _STRONG_ACTION_GROUPS if beat_desc else set()
|
||||
identity_vocab = {
|
||||
"woman", "women", "man", "men", "girl", "boy", "child",
|
||||
"blonde", "hair", "face", "mouth", "eyes", "profile",
|
||||
"close-up", "closeup",
|
||||
}
|
||||
beat_identity = {term for term in identity_vocab if term in beat_desc.lower()}
|
||||
distinctive_identity = {
|
||||
term for term in ("woman", "women", "blonde", "mouth", "face")
|
||||
if term in beat_desc.lower()
|
||||
}
|
||||
if beat_actions and isinstance(items, dict):
|
||||
for item in items.values():
|
||||
if not isinstance(item, dict) or item.get("kind") != "action_window":
|
||||
continue
|
||||
scene = scenes_by_id.get(item.get("item_id"))
|
||||
desc = str(item.get("description", ""))
|
||||
source_actions = _semantic_action_groups(desc)
|
||||
if scene is None or not beat_actions <= source_actions:
|
||||
continue
|
||||
source_text = desc.lower()
|
||||
positive_source_text = source_text.split('"negatives"', 1)[0]
|
||||
identity_overlap = {term for term in beat_identity if term in source_text}
|
||||
if len(beat_identity) >= 2 and len(identity_overlap) < 2:
|
||||
continue
|
||||
if distinctive_identity and not any(term in positive_source_text for term in distinctive_identity):
|
||||
continue
|
||||
if "mouth" in beat_desc.lower() and "mouth" not in positive_source_text:
|
||||
continue
|
||||
if "dark interior" in beat_desc.lower() and (
|
||||
"interior" not in positive_source_text or "dark" not in positive_source_text
|
||||
):
|
||||
continue
|
||||
score, reason = _semantic_match_score(beat_desc, desc)
|
||||
if score < max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
|
||||
continue
|
||||
try:
|
||||
in_s = float(item.get("start_s"))
|
||||
out_s = float(item.get("end_s"))
|
||||
except (TypeError, ValueError):
|
||||
continue
|
||||
duration_s = max(0.32, min(anchor_beat.duration_s, out_s - in_s))
|
||||
candidate = (
|
||||
min(0.99, score),
|
||||
scene,
|
||||
in_s,
|
||||
duration_s,
|
||||
f"cached vision action; {reason}",
|
||||
)
|
||||
if best is None or candidate[0] > best[0]:
|
||||
best = candidate
|
||||
except Exception as exc:
|
||||
logger.debug("Beat %d: cached vision fallback failed (%s)", beat.beat_id, exc)
|
||||
|
||||
seen = set()
|
||||
for hit in hits[: cfg.cv.deep_scan.scene_seed_top_k]:
|
||||
scene = scenes_by_id.get(hit.scene_id)
|
||||
@@ -900,7 +1099,10 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.debug("Beat %d: align failed for scene %d (%s)", beat.beat_id, scene.scene_id, exc)
|
||||
continue
|
||||
aligned_in_s = start_s
|
||||
combined_score = semantic_score
|
||||
content_score = 0.0
|
||||
motion_score = 0.0
|
||||
aligned_in_s = max(scene.start_s, min(aligned_in_s, max(scene.start_s, scene.end_s - anchor_beat.duration_s)))
|
||||
|
||||
try:
|
||||
@@ -930,6 +1132,8 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
combined_score,
|
||||
min(0.99, semantic_score * 0.65 + motion_score * 0.18 + content_score * 0.09 + usable_score * 0.08),
|
||||
)
|
||||
if semantic_score >= max(0.60, cfg.cv.deep_scan.provisional_match_threshold):
|
||||
final_score = max(final_score, semantic_score)
|
||||
if final_score < cfg.cv.deep_scan.provisional_match_threshold:
|
||||
continue
|
||||
candidate = (final_score, scene, aligned_in_s, usable_duration_s, f"recovery; {reason}; {verify_reason}")
|
||||
@@ -937,6 +1141,9 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
best = candidate
|
||||
|
||||
if best is None:
|
||||
previous = replaced_results.get(beat.beat_id)
|
||||
if previous is not None:
|
||||
new_results.append(previous)
|
||||
continue
|
||||
score, scene, aligned_in_s, usable_duration_s, repair_reason = best
|
||||
logger.info(
|
||||
@@ -963,6 +1170,97 @@ def _recover_unmatched_beats_via_vision(results: list, beats: list, cfg) -> list
|
||||
return sorted(new_results, key=lambda r: r.beat_id)
|
||||
|
||||
|
||||
def _recover_short_lowlight_vibe_matches(results: list, beats: list, cfg) -> list:
|
||||
"""Keep obvious short low-light scene hits as provisional instead of no-match.
|
||||
|
||||
Short blue/dark dialogue shots can be correctly ranked by scene-level
|
||||
histogram/pHash but then rejected by the stricter content aligner because
|
||||
the shot contains little texture, motion blur, or trailer timecode overlay.
|
||||
This fallback only accepts the top vibe scene when it has a clear margin and
|
||||
the local content scan still finds a usable in-point.
|
||||
"""
|
||||
from src.core.models import MatchResult, Scene
|
||||
from src.cv.global_scan import _content_alignment_score, _content_alignment_templates
|
||||
from src.cv.vibe_check import run_vibe_check
|
||||
from src.cv.frame_extractor import open_video
|
||||
|
||||
matched_ids = {r.beat_id for r in results}
|
||||
targets = [b for b in beats if b.beat_id not in matched_ids and b.duration_s <= 2.25]
|
||||
if not targets:
|
||||
return results
|
||||
|
||||
raw_scenes = _load_scene_cache_light(cfg)
|
||||
scenes = [
|
||||
Scene(
|
||||
scene_id=int(s["scene_id"]),
|
||||
source_path=cfg.paths.source_movie,
|
||||
start_s=float(s["start_s"]),
|
||||
end_s=float(s["end_s"]),
|
||||
start_frame=int(s["start_frame"]),
|
||||
end_frame=int(s["end_frame"]),
|
||||
luma_hist=bytes.fromhex(s["luma_hist"]) if s.get("luma_hist") else None,
|
||||
sat_hist=bytes.fromhex(s["sat_hist"]) if s.get("sat_hist") else None,
|
||||
phash=s.get("phash"),
|
||||
)
|
||||
for s in raw_scenes
|
||||
]
|
||||
scenes_by_id = {s.scene_id: s for s in scenes}
|
||||
recovered = list(results)
|
||||
|
||||
with open_video(cfg.paths.source_movie) as cap:
|
||||
for beat in targets:
|
||||
templates = _content_alignment_templates(beat, cfg)
|
||||
if not templates:
|
||||
continue
|
||||
hits = run_vibe_check(
|
||||
beat,
|
||||
scenes,
|
||||
top_k=6,
|
||||
hist_method=cfg.cv.vibe_check.hist_compare_method,
|
||||
phash_max_distance=64,
|
||||
)
|
||||
if len(hits) < 2:
|
||||
continue
|
||||
top, second = hits[0], hits[1]
|
||||
if top.combined_score < 0.74 or top.combined_score - second.combined_score < 0.03:
|
||||
continue
|
||||
scene = scenes_by_id.get(top.scene_id)
|
||||
if scene is None or scene.duration_s < max(0.5, beat.duration_s):
|
||||
continue
|
||||
|
||||
best: tuple[float, float] | None = None
|
||||
scan_end = max(scene.start_s, scene.end_s - beat.duration_s)
|
||||
step_s = 0.12
|
||||
t = scene.start_s
|
||||
while t <= scan_end:
|
||||
score = _content_alignment_score(cap, t, templates, cfg)
|
||||
if best is None or score > best[0]:
|
||||
best = (score, t)
|
||||
t = round(t + step_s, 6)
|
||||
if best is None or best[0] < 0.15:
|
||||
continue
|
||||
|
||||
content_score, in_point_s = best
|
||||
final_score = max(
|
||||
cfg.cv.deep_scan.provisional_match_threshold,
|
||||
min(0.64, top.combined_score * 0.55 + content_score * 0.45),
|
||||
)
|
||||
recovered.append(MatchResult(
|
||||
beat_id=beat.beat_id,
|
||||
scene_id=scene.scene_id,
|
||||
source_path=scene.source_path,
|
||||
in_point_s=in_point_s,
|
||||
out_point_s=in_point_s + beat.duration_s,
|
||||
in_point_frame=int(in_point_s * cfg.export.edl_frame_rate),
|
||||
match_score=final_score,
|
||||
match_location=(0, 0),
|
||||
is_confirmed=False,
|
||||
segments=tuple(),
|
||||
))
|
||||
|
||||
return sorted(recovered, key=lambda r: r.beat_id)
|
||||
|
||||
|
||||
def _filter_semantically_invalid_vision_matches(results: list, beats: list, cfg) -> list:
|
||||
"""Drop vision-enabled matches whose final action phase contradicts the beat."""
|
||||
if not cfg.vision.enabled or not results:
|
||||
@@ -1338,6 +1636,41 @@ def _attach_visual_segments(results: list, beats: list, cfg) -> list:
|
||||
if not segment_matches:
|
||||
continue
|
||||
seg = segment_matches[0]
|
||||
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
|
||||
repaired = _local_same_scene_segment_match(
|
||||
segment_beat,
|
||||
beat,
|
||||
start_s,
|
||||
cached + expanded,
|
||||
cfg,
|
||||
)
|
||||
if (
|
||||
repaired is None
|
||||
or repaired.match_score
|
||||
< max(
|
||||
cfg.cv.deep_scan.multi_shot_segment_threshold,
|
||||
seg.match_score + cfg.cv.deep_scan.duration_tie_break_score_delta,
|
||||
)
|
||||
):
|
||||
scenes = _load_scene_cache_light(cfg)
|
||||
scene = _scene_by_id_light(scenes, seg.scene_id)
|
||||
probe = (
|
||||
_phase_probe_segment_in_scene(segment_beat, scene, seg.in_point_s, cfg)
|
||||
if scene is not None else None
|
||||
)
|
||||
if probe is None:
|
||||
continue
|
||||
in_point_s, _phase_score = probe
|
||||
from dataclasses import replace as _replace
|
||||
seg = _replace(
|
||||
seg,
|
||||
in_point_s=in_point_s,
|
||||
out_point_s=in_point_s + seg.duration_s,
|
||||
match_score=max(seg.match_score, _phase_score),
|
||||
is_confirmed=_phase_score >= cfg.cv.deep_scan.match_threshold,
|
||||
)
|
||||
else:
|
||||
seg = repaired
|
||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
||||
segments.append(
|
||||
MatchSegment(
|
||||
@@ -1438,21 +1771,12 @@ def _match_unmatched_visual_segments(
|
||||
start_s=beat.start_s + start_s,
|
||||
end_s=beat.start_s + end_s,
|
||||
)
|
||||
if island_idx == 0:
|
||||
# First island of an unmatched multi-shot beat: search globally
|
||||
# without a continuity bias from the previous beat. Continuity
|
||||
# assumes the shot follows the previous beat in the source, but
|
||||
# the lead shot of a multi-shot beat is often an insert cut from
|
||||
# a completely different scene. A wrong seed with score 0.92
|
||||
# would push the real match out of the refinement candidate pool.
|
||||
continuity = {}
|
||||
else:
|
||||
continuity = _continuity_seed_in_points(
|
||||
beat.beat_id,
|
||||
[b if b.beat_id != beat.beat_id else segment_beat for b in beats],
|
||||
cached + expanded,
|
||||
cfg,
|
||||
)
|
||||
continuity = _continuity_seed_in_points(
|
||||
beat.beat_id,
|
||||
[b if b.beat_id != beat.beat_id else segment_beat for b in beats],
|
||||
cached + expanded,
|
||||
cfg,
|
||||
)
|
||||
segment_matches = []
|
||||
if beat.beat_id not in skip_global_segment_scan_for:
|
||||
segment_matches = _run_segment_match(segment_beat, continuity, cfg, allow_fullscan=True)
|
||||
@@ -1468,7 +1792,10 @@ def _match_unmatched_visual_segments(
|
||||
if recovered:
|
||||
rec = recovered[0]
|
||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, rec.duration_s))
|
||||
if seg_dur > 0:
|
||||
if (
|
||||
seg_dur > 0
|
||||
and rec.match_score >= cfg.cv.deep_scan.multi_shot_segment_threshold
|
||||
):
|
||||
segments.append(MatchSegment(
|
||||
trailer_offset_s=start_s,
|
||||
duration_s=seg_dur,
|
||||
@@ -1490,6 +1817,8 @@ def _match_unmatched_visual_segments(
|
||||
segments.append(local_segment)
|
||||
continue
|
||||
seg = segment_matches[0]
|
||||
if seg.match_score < cfg.cv.deep_scan.multi_shot_segment_threshold:
|
||||
continue
|
||||
seg_dur = min(max(0.0, end_s - start_s), max(0.0, seg.duration_s))
|
||||
segments.append(
|
||||
MatchSegment(
|
||||
@@ -1561,7 +1890,13 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
||||
cfg.cv.deep_scan.provisional_content_threshold * 0.70,
|
||||
cfg.cv.deep_scan.provisional_match_threshold,
|
||||
)
|
||||
step_s = max(1.0 / cfg.export.edl_frame_rate, 0.04)
|
||||
# Coarse repair scan over already plausible neighbouring scenes. A frame-step
|
||||
# sweep across long dialogue scenes is slow and can overfit static layouts.
|
||||
step_s = max(
|
||||
cfg.vision.local_scan_step_s,
|
||||
cfg.cv.deep_scan.content_align_sample_step_s,
|
||||
0.25,
|
||||
)
|
||||
best: tuple[float, float, int] | None = None
|
||||
with open_video(cfg.paths.source_movie) as cap:
|
||||
for scene_id in scene_ids:
|
||||
@@ -1570,12 +1905,14 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
||||
continue
|
||||
start_s = max(0.0, float(scene["start_s"]) - 0.25)
|
||||
end_s = max(start_s, float(scene["end_s"]) - max(0.04, segment_beat.duration_s) + 0.25)
|
||||
max_points = max(4, min(48, int(cfg.vision.local_scan_max_points_per_scene)))
|
||||
scene_step_s = max(step_s, (end_s - start_s) / max_points)
|
||||
t = start_s
|
||||
while t <= end_s:
|
||||
score = _content_alignment_score(cap, t, templates, cfg)
|
||||
if best is None or score > best[0]:
|
||||
best = (score, t, int(scene_id))
|
||||
t = round(t + step_s, 6)
|
||||
t = round(t + scene_step_s, 6)
|
||||
|
||||
if best is None or best[0] < min_score:
|
||||
return None
|
||||
@@ -1593,6 +1930,186 @@ def _local_same_scene_segment_match(segment_beat, beat, segment_offset_s: float,
|
||||
)
|
||||
|
||||
|
||||
def _phase_probe_segment_in_scene(segment_beat, scene: dict, original_in_s: float, cfg):
|
||||
"""Retune a weak multi-shot segment inside its own scene using saliency-weighted frames."""
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
offsets = [0.0, 0.16, 0.32, 0.48, 0.64, 0.80, 0.96, 1.12]
|
||||
size = (160, 90)
|
||||
|
||||
def prepared_gray(frame):
|
||||
if frame is None:
|
||||
return None
|
||||
h, w = frame.shape[:2]
|
||||
frame = frame.copy()
|
||||
# Timecode overlays and letterbox edges are trailer/source-specific and
|
||||
# should not pull the phase toward the wrong moment.
|
||||
frame[: int(h * 0.16), : int(w * 0.32)] = 0
|
||||
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
|
||||
gray = cv2.resize(gray, size)
|
||||
return cv2.equalizeHist(gray).astype("float32") / 255.0
|
||||
|
||||
def edge(gray):
|
||||
return cv2.Canny((gray * 255).astype("uint8"), 45, 130).astype("float32") / 255.0
|
||||
|
||||
def pair_score(ref_gray, src_gray, mask):
|
||||
if ref_gray is None or src_gray is None:
|
||||
return None
|
||||
pixel = 1.0 - float((np.abs(ref_gray - src_gray) * mask).sum())
|
||||
edge_score = 1.0 - float((np.abs(edge(ref_gray) - edge(src_gray)) * mask).sum())
|
||||
return 0.65 * pixel + 0.35 * edge_score
|
||||
|
||||
def frame_at(cap, t_s):
|
||||
cap.set(cv2.CAP_PROP_POS_MSEC, t_s * 1000.0)
|
||||
ok, frame = cap.read()
|
||||
return frame if ok else None
|
||||
|
||||
trailer_cap = cv2.VideoCapture(str(cfg.paths.reference_trailer))
|
||||
ref_candidates = []
|
||||
fallback_items = []
|
||||
for offset in offsets:
|
||||
if offset > segment_beat.duration_s + 0.04:
|
||||
continue
|
||||
frame = frame_at(trailer_cap, segment_beat.start_s + offset)
|
||||
ref = prepared_gray(frame)
|
||||
if ref is None:
|
||||
continue
|
||||
fallback_items.append((offset, ref))
|
||||
raw_gray = cv2.resize(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), size)
|
||||
h, w = raw_gray.shape[:2]
|
||||
raw_gray[: int(h * 0.16), : int(w * 0.32)] = 0
|
||||
roi = raw_gray[int(h * 0.12) : int(h * 0.90), :]
|
||||
mean_luma = float(roi.mean() / 255.0)
|
||||
p90_luma = float(np.percentile(roi, 90) / 255.0)
|
||||
contrast = float(roi.std() / 255.0)
|
||||
ref_candidates.append((offset, ref, mean_luma, p90_luma, contrast))
|
||||
|
||||
transition_start = False
|
||||
ref_items = []
|
||||
if ref_candidates:
|
||||
max_mean = max(item[2] for item in ref_candidates)
|
||||
max_p90 = max(item[3] for item in ref_candidates)
|
||||
transition_start = (
|
||||
ref_candidates[0][2] < max_mean * 0.90
|
||||
or ref_candidates[0][3] < max_p90 * 0.90
|
||||
)
|
||||
ref_items = [
|
||||
(offset, ref)
|
||||
for offset, ref, mean_luma, p90_luma, contrast in ref_candidates
|
||||
if (
|
||||
mean_luma >= max(0.16, max_mean * 0.82)
|
||||
and p90_luma >= max(0.28, max_p90 * 0.86)
|
||||
and contrast >= 0.035
|
||||
)
|
||||
]
|
||||
if len(ref_items) < 4:
|
||||
ref_items = fallback_items
|
||||
if len(ref_items) < 4:
|
||||
return None
|
||||
ref_offsets = [item[0] for item in ref_items]
|
||||
refs = [item[1] for item in ref_items]
|
||||
|
||||
align_offset = ref_offsets[0]
|
||||
ref_offsets = [offset - align_offset for offset in ref_offsets]
|
||||
|
||||
ref_stack = np.stack(refs, axis=0)
|
||||
edge_stack = np.stack([edge(ref) for ref in refs], axis=0)
|
||||
# Static window/room edges are useful for finding the scene, but toxic for
|
||||
# phase retuning inside a repeated dialogue shot. Bias the mask toward
|
||||
# areas that actually change across the reference segment.
|
||||
saliency = ref_stack.std(axis=0) * 3.0 + edge_stack.std(axis=0) * 0.75 + edge_stack.mean(axis=0) * 0.15
|
||||
saliency[:, : int(size[0] * 0.12)] *= 0.15
|
||||
saliency[: int(size[1] * 0.16), : int(size[0] * 0.32)] = 0.0
|
||||
threshold = np.quantile(saliency, 0.66)
|
||||
mask = (saliency >= threshold).astype("float32")
|
||||
mask /= mask.sum() + 1e-6
|
||||
|
||||
scene_start = float(scene["start_s"])
|
||||
scene_end = float(scene["end_s"])
|
||||
center_t = max(scene_start, min(scene_end, original_in_s + align_offset))
|
||||
retune_radius_s = max(4.0, min(12.0, segment_beat.duration_s * 2.5))
|
||||
scan_start = max(scene_start, center_t - retune_radius_s)
|
||||
scene_scan_end = min(scene_end, center_t + retune_radius_s)
|
||||
scan_end = max(scan_start, scene_scan_end - max(0.04, segment_beat.duration_s - align_offset))
|
||||
max_points = 400
|
||||
step_s = max(0.04, (scan_end - scan_start) / max_points)
|
||||
|
||||
source_cap = cv2.VideoCapture(str(cfg.paths.source_movie))
|
||||
source_fps = source_cap.get(cv2.CAP_PROP_FPS) or _scene_fps_light(scene, cfg)
|
||||
stride = max(1, int(round(step_s * source_fps)))
|
||||
start_frame = max(0, int(round(scan_start * source_fps)))
|
||||
end_frame = max(start_frame, int(round(scene_scan_end * source_fps)))
|
||||
times: list[float] = []
|
||||
source_frames: list = []
|
||||
frame_idx = start_frame
|
||||
while frame_idx <= end_frame:
|
||||
source_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
|
||||
ok, frame = source_cap.read()
|
||||
if not ok:
|
||||
break
|
||||
times.append(frame_idx / source_fps)
|
||||
source_frames.append(prepared_gray(frame))
|
||||
frame_idx += stride
|
||||
base_time = times[0] if times else scan_start
|
||||
|
||||
candidates: list[tuple[float, float, float]] = []
|
||||
for i, t in enumerate(times):
|
||||
if t > scan_end:
|
||||
break
|
||||
vals = []
|
||||
src_for_offsets = []
|
||||
for offset, ref in zip(ref_offsets, refs):
|
||||
j = int(round((t + offset - base_time) / step_s))
|
||||
if 0 <= j < len(source_frames):
|
||||
src = source_frames[j]
|
||||
score = pair_score(ref, src, mask)
|
||||
else:
|
||||
src = None
|
||||
score = None
|
||||
if score is not None:
|
||||
vals.append(score)
|
||||
src_for_offsets.append(src)
|
||||
if len(vals) >= 4:
|
||||
avg_score = sum(vals) / len(vals)
|
||||
early_count = min(2, len(vals))
|
||||
tail_count = min(2, len(vals))
|
||||
early_score = sum(vals[:early_count]) / early_count
|
||||
tail_score = sum(vals[-tail_count:]) / tail_count
|
||||
motion_vals = []
|
||||
for idx in range(1, min(len(refs), len(src_for_offsets))):
|
||||
if src_for_offsets[idx - 1] is None or src_for_offsets[idx] is None:
|
||||
continue
|
||||
ref_motion = refs[idx] - refs[idx - 1]
|
||||
src_motion = src_for_offsets[idx] - src_for_offsets[idx - 1]
|
||||
motion_vals.append(1.0 - float((np.abs(ref_motion - src_motion) * mask).sum()))
|
||||
motion_score = sum(motion_vals) / len(motion_vals) if motion_vals else avg_score
|
||||
# Phase retuning must reject "same shot, wrong moment" matches.
|
||||
# A plain average can hide a bad onset inside slow dialogue shots;
|
||||
# keep the low-water mark, onset, and frame-to-frame motion influential.
|
||||
phase_score = (
|
||||
0.26 * avg_score
|
||||
+ 0.24 * min(vals)
|
||||
+ 0.24 * early_score
|
||||
+ 0.08 * tail_score
|
||||
+ 0.18 * motion_score
|
||||
)
|
||||
candidates.append((phase_score, min(vals), t))
|
||||
|
||||
if not candidates:
|
||||
return None
|
||||
|
||||
candidates.sort(reverse=True)
|
||||
best_score = candidates[0][0]
|
||||
tie_window = 0.006 if transition_start else 0.002
|
||||
near_tie = [c for c in candidates if c[0] >= best_score - tie_window]
|
||||
if transition_start:
|
||||
chosen = max(near_tie, key=lambda c: (c[1], c[0]))
|
||||
else:
|
||||
chosen = min(near_tie, key=lambda c: abs((c[2] - align_offset) - original_in_s))
|
||||
return max(scene_start, chosen[2] - align_offset), chosen[0]
|
||||
|
||||
|
||||
def cmd_match(args: argparse.Namespace, cfg) -> list:
|
||||
from src.pipeline.matcher import run_matching
|
||||
from dataclasses import replace
|
||||
@@ -1666,6 +2183,7 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
|
||||
results = _attach_visual_segments(results, beats, cfg)
|
||||
results = _filter_semantically_invalid_vision_matches(results, beats, cfg)
|
||||
results = _recover_unmatched_beats_via_vision(results, beats, cfg)
|
||||
results = _recover_short_lowlight_vibe_matches(results, beats, cfg)
|
||||
|
||||
# A targeted one-beat match must NEVER delete or modify any other beat's
|
||||
# cache entry. We deliberately re-load the raw cache from disk here so
|
||||
@@ -1692,7 +2210,8 @@ def cmd_match(args: argparse.Namespace, cfg) -> list:
|
||||
results_to_save = results
|
||||
|
||||
_save_results(results_to_save, cfg)
|
||||
_regenerate_cutter_report(cfg)
|
||||
force_report_beats = {int(args.beat)} if getattr(args, "beat", None) is not None else None
|
||||
_regenerate_cutter_report(cfg, force_beats=force_report_beats)
|
||||
|
||||
print(f"\n✅ {len(results)} / {len(beats)} beats matched.")
|
||||
for r in results:
|
||||
@@ -1862,17 +2381,12 @@ def cmd_rematch(args: argparse.Namespace, cfg) -> None:
|
||||
|
||||
|
||||
def cmd_report(args: argparse.Namespace, cfg) -> None:
|
||||
from src.pipeline.reporter import generate_report
|
||||
beats = _select_beats(_load_beats(cfg), getattr(args, "beat", None))
|
||||
beat_ids = {b.beat_id for b in beats} if getattr(args, "beat", None) is not None else None
|
||||
results = _select_results(_normalize_cached_results(_load_beats(cfg), _load_results(cfg), cfg), beat_ids)
|
||||
out = generate_report(beats, results, cfg)
|
||||
if getattr(args, "beat", None) is not None and not results:
|
||||
print(
|
||||
f"\n⚠️ Beat {args.beat} has no cached match yet. "
|
||||
f"Run: python cli.py match --beat {args.beat}"
|
||||
)
|
||||
print(f"\n\u2705 Report \u2192 {out}")
|
||||
if getattr(args, "beat", None) is not None:
|
||||
print(f"\n⚠️ Generating cutter report for all beats (ignoring --beat {args.beat}).")
|
||||
|
||||
_regenerate_cutter_report(cfg)
|
||||
project_root = cfg.paths.cache_dir.parent
|
||||
print(f"\n✅ Report → {project_root / 'CUTTER_REPORT.html'} and CUTTER_REPORT.md")
|
||||
|
||||
|
||||
def cmd_export(args: argparse.Namespace, cfg) -> None:
|
||||
@@ -1913,6 +2427,141 @@ def cmd_run(args: argparse.Namespace, cfg) -> None:
|
||||
cmd_export(args, cfg)
|
||||
|
||||
|
||||
def cmd_preview(args: argparse.Namespace, cfg) -> None:
|
||||
"""Assemble a rough preview video from cached source matches, with original audio."""
|
||||
import subprocess
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
results_path = _results_cache_path(cfg)
|
||||
if not results_path.exists():
|
||||
log.error("No match_results.json — run 'match' first.")
|
||||
return
|
||||
|
||||
data = sorted(
|
||||
json.loads(results_path.read_text(encoding="utf-8")),
|
||||
key=lambda r: r["beat_id"],
|
||||
)
|
||||
|
||||
beats_path = cfg.paths.cache_dir / "trailer_beats.json"
|
||||
beats_by_id: dict = {}
|
||||
if beats_path.exists():
|
||||
for b in json.loads(beats_path.read_text(encoding="utf-8")):
|
||||
beats_by_id[int(b["beat_id"])] = b
|
||||
|
||||
clip_width = 1280
|
||||
fps = 25
|
||||
out_dir = cfg.paths.output_dir / "preview_clips"
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
preview_out = cfg.paths.output_dir / "preview.mp4"
|
||||
|
||||
def _run(cmd: list, timeout: int = 120) -> bool:
|
||||
r = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
|
||||
if r.returncode != 0:
|
||||
log.debug("ffmpeg stderr: %s", r.stderr[-600:])
|
||||
return r.returncode == 0
|
||||
|
||||
def extract_with_audio(src: Path, start_s: float, duration_s: float, out: Path) -> bool:
|
||||
preroll = 2.0 if start_s >= 2.0 else 0.0
|
||||
input_seek = max(0.0, start_s - preroll)
|
||||
accurate_seek = start_s - input_seek
|
||||
return _run([
|
||||
"ffmpeg", "-y", "-loglevel", "error",
|
||||
"-ss", f"{input_seek:.3f}", "-i", str(src),
|
||||
"-ss", f"{accurate_seek:.3f}", "-t", f"{max(0.04, duration_s):.3f}",
|
||||
"-map", "0:v:0", "-map", "0:a:0",
|
||||
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
|
||||
"-vf", f"fps={fps},scale={clip_width}:-2,setsar=1,setpts=PTS-STARTPTS",
|
||||
"-c:a", "aac", "-ar", "48000", "-ac", "2",
|
||||
"-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
|
||||
])
|
||||
|
||||
def black_silence(duration_s: float, out: Path) -> bool:
|
||||
return _run([
|
||||
"ffmpeg", "-y", "-loglevel", "error",
|
||||
"-f", "lavfi", "-i", f"color=black:s={clip_width}x720:r={fps}",
|
||||
"-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo",
|
||||
"-t", f"{max(0.5, duration_s):.3f}",
|
||||
"-c:v", "libx264", "-preset", "veryfast", "-crf", "23",
|
||||
"-c:a", "aac", "-pix_fmt", "yuv420p", "-movflags", "+faststart", str(out),
|
||||
])
|
||||
|
||||
def concat_clips(parts: list[Path], out: Path) -> bool:
|
||||
lst = out.with_suffix(".txt")
|
||||
lst.write_text(
|
||||
"\n".join(f"file '{p.resolve().as_posix()}'" for p in parts),
|
||||
encoding="utf-8",
|
||||
)
|
||||
ok = _run([
|
||||
"ffmpeg", "-y", "-loglevel", "error",
|
||||
"-f", "concat", "-safe", "0", "-i", str(lst),
|
||||
"-c", "copy", str(out),
|
||||
], timeout=300)
|
||||
lst.unlink(missing_ok=True)
|
||||
return ok
|
||||
|
||||
beat_clips: list[Path] = []
|
||||
|
||||
for rec in data:
|
||||
bid = int(rec["beat_id"])
|
||||
segs = rec.get("segments", [])
|
||||
src = Path(rec["source_path"]) if rec.get("source_path") else None
|
||||
clip_out = out_dir / f"beat_{bid:02d}.mp4"
|
||||
|
||||
if src is None or not src.exists():
|
||||
beat = beats_by_id.get(bid, {})
|
||||
dur = max(0.5, float(beat.get("end_s", 1)) - float(beat.get("start_s", 0)))
|
||||
log.info("Beat %02d: NO MATCH — black/silence %.2fs", bid, dur)
|
||||
if black_silence(dur, clip_out):
|
||||
beat_clips.append(clip_out)
|
||||
continue
|
||||
|
||||
if len(segs) >= 2:
|
||||
parts: list[Path] = []
|
||||
for idx, seg in enumerate(segs):
|
||||
in_s = float(seg["in_point_s"])
|
||||
dur = max(0.04, float(seg["out_point_s"]) - in_s)
|
||||
seg_src = Path(seg["source_path"]) if seg.get("source_path") else src
|
||||
part = out_dir / f"beat_{bid:02d}_seg{idx:02d}.mp4"
|
||||
log.info("Beat %02d seg%d: scene=%s %.2fs–%.2fs", bid, idx, seg.get("scene_id"), in_s, in_s + dur)
|
||||
if extract_with_audio(seg_src, in_s, dur, part):
|
||||
parts.append(part)
|
||||
if not parts:
|
||||
log.warning("Beat %02d: no segments extracted", bid)
|
||||
continue
|
||||
if len(parts) == 1:
|
||||
parts[0].rename(clip_out)
|
||||
beat_clips.append(clip_out)
|
||||
else:
|
||||
if concat_clips(parts, clip_out):
|
||||
beat_clips.append(clip_out)
|
||||
for p in parts:
|
||||
p.unlink(missing_ok=True)
|
||||
else:
|
||||
in_s = float(rec["in_point_s"])
|
||||
beat = beats_by_id.get(bid, {})
|
||||
beat_dur = float(beat["end_s"]) - float(beat["start_s"]) if beat else 0.0
|
||||
source_dur = float(rec["out_point_s"]) - in_s
|
||||
dur = max(0.04, beat_dur if beat_dur > 0.04 else source_dur)
|
||||
log.info("Beat %02d: scene=%s %.2fs+%.2fs (trailer=%.2fs src=%.2fs)", bid, rec.get("scene_id"), in_s, dur, beat_dur, source_dur)
|
||||
if extract_with_audio(src, in_s, dur, clip_out):
|
||||
beat_clips.append(clip_out)
|
||||
else:
|
||||
log.warning("Beat %02d: extraction failed", bid)
|
||||
|
||||
if not beat_clips:
|
||||
log.error("No clips extracted — aborting.")
|
||||
return
|
||||
|
||||
log.info("Concatenating %d beat clips → %s", len(beat_clips), preview_out)
|
||||
if concat_clips(beat_clips, preview_out):
|
||||
size_mb = preview_out.stat().st_size / 1_048_576
|
||||
log.info("Preview ready: %s (%.1f MB)", preview_out, size_mb)
|
||||
print(f"\n Preview → {preview_out} ({size_mb:.1f} MB)")
|
||||
else:
|
||||
log.error("Final concat failed — per-beat clips are in %s", out_dir)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Argument parser
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -1983,6 +2632,12 @@ def _build_parser() -> argparse.ArgumentParser:
|
||||
p_run.add_argument("--beat", type=int,
|
||||
help="Run match/report/export for only one cached beat")
|
||||
|
||||
# preview
|
||||
sub.add_parser(
|
||||
"preview",
|
||||
help="Build output/preview.mp4 from cached matches — source clips with audio in beat order",
|
||||
)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
@@ -2007,6 +2662,7 @@ def main() -> None:
|
||||
"report": cmd_report,
|
||||
"export": cmd_export,
|
||||
"run": cmd_run,
|
||||
"preview": cmd_preview,
|
||||
}
|
||||
|
||||
handler = dispatch[args.command]
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
[project]
|
||||
name = "AI Trailer Generator v2"
|
||||
version = "2.0.0"
|
||||
log_level = "INFO" # DEBUG | INFO | WARNING | ERROR
|
||||
log_level = "DEBUG" # DEBUG | INFO | WARNING | ERROR
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# [paths] — External video sources (read-only access)
|
||||
@@ -86,7 +86,10 @@ span_score_weight = 0.15
|
||||
coarse_score_weight = 0.10
|
||||
duration_score_weight = 0.20
|
||||
duration_tie_break_score_delta = 0.03
|
||||
min_duration_coverage = 0.65
|
||||
min_duration_coverage = 0.55
|
||||
# Every visible sub-shot in a multi-shot beat must pass this stricter gate.
|
||||
# A weak segment is left unmatched instead of being hidden by a strong neighbor.
|
||||
multi_shot_segment_threshold = 0.50
|
||||
continuity_seed_offsets_s = [-1.0, 0.0, 0.5, 1.0, 1.5, 2.0, 3.0]
|
||||
scene_seed_top_k = 30
|
||||
scene_seed_points_per_scene = 6
|
||||
@@ -183,7 +186,7 @@ local_scan_step_s = 0.12
|
||||
local_scan_max_points_per_scene = 180
|
||||
local_scan_top_candidates = 36
|
||||
local_scan_tie_break_score_delta = 0.08
|
||||
multi_shot_cut_corr_threshold = 0.20
|
||||
multi_shot_cut_corr_threshold = 0.55
|
||||
multi_shot_boundary_tolerance_s = 0.20
|
||||
fullscan_fallback = false
|
||||
content_threshold = 0.22
|
||||
|
||||
@@ -132,8 +132,33 @@ bereits auf die sichtbare Aktionsphase ausgerichtet.
|
||||
Der Segment-Offset zählt nur über vorherige scorebare Bildinseln, nicht über
|
||||
schwarze oder blendige Lücken. Nach dem Retiming wird die nutzbare Source-
|
||||
Dauer erneut geschätzt; läuft die Source am Ende in eine sichtbar andere
|
||||
Aktionsphase, wird der Clip gekürzt und der Rest bleibt Placeholder/Fade
|
||||
statt einen falschen Bewegungsmoment zu zeigen.
|
||||
Aktionsphase, wird der Treffer im Cutter-Report klar als phasenkritisch
|
||||
markiert. Schwarz/Placeholder wird nur für wirklich ungematchte Trailer-
|
||||
Bereiche oder Fades verwendet, nicht um sichtbare Kandidatenbewegung im Review
|
||||
zu verstecken.
|
||||
|
||||
Diese Span-Schätzung ist strenger als der grobe Suchscore: Ein fast stehender
|
||||
Anfang darf einen Match nicht retten, wenn spätere Frames sichtbar in eine
|
||||
andere Gestik, Körperposition oder eintretende Figur driften. Stabile
|
||||
Score-Plateaus dürfen nur verlängern, wenn sie noch nah genug am Anfangsniveau
|
||||
liegen; sonst bleibt der Treffer vorläufig und muss neu gesucht oder visuell
|
||||
geprüft werden. Der Review-Clip zeigt den Kandidaten weiterhin sichtbar, damit
|
||||
Phasenfehler nicht durch Schwarz verdeckt werden.
|
||||
|
||||
Für Multi-Shot-Beats gilt zusätzlich eine Segment-Schwelle pro sichtbarer
|
||||
Einstellung. Ein gutes erstes Segment darf kein zweites Segment mit schwachem
|
||||
Score mitziehen. Segmente unter `multi_shot_segment_threshold` werden nicht als
|
||||
stabile Wahrheit behandelt, sondern innerhalb derselben plausiblen Source-Scene
|
||||
nachjustiert. Die Nachjustierung nutzt eine saliency-gewichtete Mehrframe-Prüfung:
|
||||
Timecodes und statische Randbereiche werden entwertet, kontrastreiche und über
|
||||
mehrere Trailerframes unterscheidbare Bildbereiche zählen stärker. Dadurch kann
|
||||
eine schwache zweite Einstellung phasengenauer repariert werden, ohne den Fehler
|
||||
durch Schwarzbild zu verdecken oder einen Beat manuell zu kuratieren.
|
||||
|
||||
Der Cutter-Report verwendet Clip-Caching. Bereits vorhandene Compare-Clips werden
|
||||
wiederverwendet; bei gezielten Rematches wird nur der betroffene Beat neu gerendert
|
||||
(`CUTTER_REPORT_FORCE_BEATS`). So bleibt der Report aktuell, ohne alle Beats jedes
|
||||
Mal neu zu kodieren.
|
||||
|
||||
## Vision-Seeds vs. Vollscan
|
||||
|
||||
@@ -165,6 +190,56 @@ eine kurze Geste erst korrekt erkannt und anschließend in eine spätere
|
||||
ähnliche Körperhaltung verschoben wird. Wenn mehrere Vision-Kandidaten in
|
||||
derselben Source-Szene ähnlich gut scoren und die Beat-Dauer abdecken,
|
||||
bevorzugt der Matcher die frühere Phase.
|
||||
Die Vision-Recovery läuft nicht nur für komplett fehlende Beats, sondern auch
|
||||
für schwache unbestätigte Treffer. Gerade Low-Light-Beats dürfen nicht an einem
|
||||
falschen dunklen CV-Treffer hängen bleiben, wenn der Cache semantisch eine
|
||||
bessere Handlungsphase kennt.
|
||||
Bei langen Source-Szenen prüft die Action-Window-Suche immer den Szenenanfang
|
||||
und mehrere frühe Fenster, bevor sie gleichmäßig über die ganze Szene sampelt.
|
||||
Damit gehen kurze Trailer-Aktionen am Anfang einer langen Szene nicht unter,
|
||||
wenn der Rest der Szene aus Credits, Schwarzbild oder ruhigen Folgeframes
|
||||
besteht.
|
||||
Wenn ein Action-Window die starke Beat-Aktion explizit enthält, darf es eine
|
||||
etwas niedrigere Textähnlichkeit haben; die Handlung zählt dann stärker als
|
||||
Nebenwörter zu Licht, Bildausschnitt oder Stimmung.
|
||||
Bereits gecachte Action-Windows einer Szene bleiben gültige Kandidaten, auch
|
||||
wenn sich das aktuelle Sampling-Raster ändert. So verliert der Matcher keine
|
||||
teuren Vision-Hinweise und muss dieselben Fenster nicht erneut beschreiben.
|
||||
Wenn neue Vision-Calls deaktiviert sind, darf die Recovery vorhandene Cache-
|
||||
Beschreibungen trotzdem lesen; das erzeugt keine API-Kosten und verhindert,
|
||||
dass alte schwache CV-Treffer stehen bleiben.
|
||||
Schlägt die CV-Feinjustierung bei einem semantisch klaren Low-Light-Fenster
|
||||
fehl, bleibt das Action-Window als provisorischer Treffer erhalten. CV darf
|
||||
einen dunklen Treffer verfeinern, aber nicht einen eindeutigen Cache-Hinweis
|
||||
komplett verwerfen.
|
||||
Zusätzlich kann Recovery vorhandene gecachte Action-Windows direkt über alle
|
||||
Szenen ranken. Dieser schnelle Pfad vermeidet einen teuren Vollscan, wenn der
|
||||
Cache bereits eine starke Aktion wie Hand-am-Mund, Kuss oder Blickwechsel
|
||||
enthält.
|
||||
Eindeutige Begriffe aus der Beat-Beschreibung wirken als harte Filter für
|
||||
Vision-Fenster: `mouth` muss im Kandidaten wiederkehren, `dark interior` darf
|
||||
nicht auf Outdoor-Material fallen, und markante Personenmerkmale wie `blonde`
|
||||
bleiben bindend.
|
||||
|
||||
Der zusätzliche Hi-Res-Phasenrefine bleibt lokal um den bereits validierten
|
||||
Inpoint und übernimmt nur klare Verbesserungen. Er darf keine ganze lange
|
||||
Dialogszene nach ähnlichen Layouts durchsuchen, weil sonst dieselbe Location
|
||||
mit anderer Gestik als falsche Phase gewinnen kann und die Laufzeit explodiert.
|
||||
Die lokale Retune-Wertung nutzt deshalb nicht nur den mittleren Frame-Score,
|
||||
sondern auch den schlechtesten Einzelvergleich, die ersten sichtbaren Frames
|
||||
und die Frame-zu-Frame-Bewegung. Dadurch gewinnt nicht mehr ein späteres
|
||||
Standbild derselben Einstellung, nur weil Fenster, Gesichter und Licht fast
|
||||
identisch aussehen.
|
||||
Unsichere Einzeltreffer ohne Segmentliste laufen ebenfalls durch diesen lokalen
|
||||
Phasen-Probe. Das repariert alte Cache-Einträge, deren Szene korrekt ist, deren
|
||||
Inpoint aber einige Frames in der Bewegung daneben liegt. Der Probe bleibt auf
|
||||
kleine lokale Shifts begrenzt und wird nicht für jeden bestätigten Treffer
|
||||
erzwungen, damit Report-Refreshes nicht zum Vollscan werden.
|
||||
Report-Clips werden zusätzlich an den bekannten Source-Szenenstart plus eine
|
||||
sehr kurze Ein-Frame-Guard-Zone geklemmt, damit ein knapp vor oder direkt auf
|
||||
der Schnittkante liegender Inpoint nicht mit Frames der vorherigen Einstellung
|
||||
beginnt. Die Guard-Zone bleibt bewusst klein, weil eine längere Korrektur die
|
||||
sichtbare Bewegungsphase innerhalb derselben Einstellung verschieben würde.
|
||||
|
||||
## Multi-Shot-Beats
|
||||
|
||||
@@ -175,6 +250,13 @@ nur wenn die relative Source-Grenze zeitlich zu einem erkannten Trailer-
|
||||
Umschnitt passt. So kann ein Beat aus Frage/Antwort-Shots vollständig erfasst
|
||||
werden, ohne Szenen willkürlich zusammenzukleben.
|
||||
|
||||
## Titel- und Grafikbeats
|
||||
|
||||
Dunkle Trailerkarten mit deutlich isoliertem Text werden im Cutter-Report als
|
||||
`GFX` markiert, wenn es keinen Source-Treffer gibt. Diese Beats sind keine
|
||||
fehlgeschlagenen Matches: Der Cutter soll die Trailer-Grafik beziehungsweise
|
||||
eine NLE-Titelkarte übernehmen und nicht im Spielfilm nach einem Bild suchen.
|
||||
|
||||
## Reranking-Pipeline
|
||||
|
||||
Vor dem teuren Frame-Refine wird der gesamte Kandidatenpool mit einer
|
||||
@@ -296,3 +378,4 @@ bzw. letzten scorebaren Frame derselben Einstellung passen.
|
||||
|
||||
Treffer unter `provisional_content_threshold` werden nicht mehr gespeichert
|
||||
oder aus alten Cache-Ergebnissen übernommen.
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 9.1 KiB After Width: | Height: | Size: 9.1 KiB |
|
Before Width: | Height: | Size: 2.0 KiB After Width: | Height: | Size: 2.0 KiB |
|
Before Width: | Height: | Size: 8.4 KiB After Width: | Height: | Size: 8.4 KiB |
|
Before Width: | Height: | Size: 9.0 KiB After Width: | Height: | Size: 9.0 KiB |
|
Before Width: | Height: | Size: 8.4 KiB After Width: | Height: | Size: 8.4 KiB |
|
Before Width: | Height: | Size: 8.9 KiB After Width: | Height: | Size: 8.9 KiB |
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 9.9 KiB |
|
Before Width: | Height: | Size: 8.2 KiB After Width: | Height: | Size: 8.2 KiB |
|
Before Width: | Height: | Size: 9.9 KiB After Width: | Height: | Size: 9.9 KiB |
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 4.1 KiB After Width: | Height: | Size: 4.1 KiB |