1. Action-group classifier conflated object-touches and person-touches.
"man touches the red door with a small object" was being tagged as
forehead_touch because "touch" was in the forehead_touch needles set.
That made the realign pass yank Beat 16 from scene 451 (correct: man
painting red door, IV stand) over to scene 623 (woman/man in bed) —
a totally wrong shot at score 0.344.
Fix: removed generic "touch*" verbs from forehead_touch's needle set.
forehead_touch is now added in _semantic_action_groups() only when a
touch verb is paired with an explicit body-part target (forehead,
face, cheek, head, hand, ...) and not paired with an object target
(door, handle, brush, tool, lock, ...).
Effect on Beat 16 after `match --beat 16 --vision`:
scene 623 in=5476.28 score=0.344 -> scene 451 in=3912.48 score=0.626.
2. Cutter-report stills/clips were keyed by source-video mtime, so a
match-position change without a video change served stale frames from
the previous match. Dropped the mtime cache; both extractors now
render fresh every time. Slower (~minute per full regen) but correct.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
For segmented beats, the repair stage now searches for the source action
window using the segment's own description first; the full beat context is
used only as a fallback or when it scores noticeably higher. The trailer-
offset shift is applied only when the beat context is actually chosen.
Also harden vision-call retries to catch read-side network errors
(TimeoutError, socket.timeout, ConnectionError, OSError) and wrap the
filter/repair loop so a transient vision failure preserves the previously
cached match instead of dropping it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>