Vibe Check

am i any good at this?

Anyone can hold down a camera shutter. Doesn't make them a photographer. Anyone can vibe code. Doesn't make them a software engineer. This page is an honest, self-graded report card of how much my vibe coding actually holds up — and what I'm doing about the gaps.

Vibe Score

decent

↑ +22 pts

over 9 weeks

click for how this number gets made →

How 63 gets calculated

Average of three theme scores below. Each theme = average of 3 sub-metrics. Each metric scored 0–100 (higher = better) against a clear threshold (e.g. 0 fix commits = 100, 80% fix commits = 0).

🚨

🤡

🧹

(71 + 67 + 52) ÷ 3 = 63

What I'm working on this month

Stop the 6-hour loops

Targeting Longest debug spiral (0/100, lowest sub-score).

→Worst loop was 6.0h in Interior-Design — when 3 hours in and looping, stop. Restart with a written goal.
→Set a personal 2-hour timer on hard debugging sessions before walking away.

Refreshed every Sunday. Next score in this metric tells the story.

Improver activity (last 30 days)

The improver runs every Monday at 9am and opens a draft PR targeting the weakest metric. First run: next Monday. Check back for the loop closing.

Autonomous engine — opens draft PRs only, never auto-merges.

Habits I'm trying to break

Extracted weekly from my own session transcripts by vibe-coach. The lessons get written into my global CLAUDE.md so the next session enforces them automatically.

→ Break sessions at the 2-hour mark
Long sessions ≠ productive sessions. Force a commit and re-orient at 2h.
Evidence: 5 of 8 recent sessions hit the 6-hour cap · 1 week in flight
→ Grep before re-implementing
Check if the utility already exists before writing it. Habit lives in CLAUDE.md so the next session enforces it.
Evidence: Almost wrote safe_msg twice in one session · 1 week in flight
→ End research sessions with a written handoff
Any research session ≥1h ends with a one-page note. Otherwise the next session restarts the search from zero.
Evidence: One 6-hour session produced zero artifacts · 1 week in flight
→ Audit scheduled-task sessions that hit the 6h cap with no work
Cron tasks running to the cap with zero Edits/Writes are likely stuck, not productive. Verify their lastRunAt completion vs the cap timeout.
Evidence: 4 cron sessions (reddit-pulse-health-check, claude-reddit-pulse, calmar-bug-fixer, resume trigger) hit 6h cap with 0 Edits/Writes · 1 week in flight

Going well, keep doing it

✓ Dry-runs caught real bugs before prod (awk truncation in vibe-improver)
✓ Zero hotfix sequences in the last 7 days (vs 69 in the 30-day baseline)

9 sessions analyzed · refreshed Saturdays

How well is Claude executing my vibe-coding?

Three signals: how often I have to course-correct, how well Claude follows my custom rules, and who's driving the habit fixes.

Override rate

How often I have to course-correct Claude per user message. Lower = Claude predicting my intent better.

2.7% this week

↓ -0.5 pts

over 6 weeks

⚠ Detects course-correction language ("no", "actually", "wait", "that's wrong"). Noisy in weeks where I'm refining requirements vs catching mistakes — won't fully separate productive corrections from Claude failures.

CLAUDE.md compliance

Sampled session: 3270008a. Each rule scored against actual session behavior.

90Scope discipline — boring vs ambitious framing before >2 moving parts — Framed boring vs ambitious 4 times this session (privacy fixes, vibe-improver, vibe-coach, claude-quality). One miss on page UI changes.
60Constraint check before coding — Verified gh auth + local clones before vibe-improver run. Skipped pre-checks on aggregator + page changes.
100Deploy verification — confirmed live URL after every push — Every push followed by curl + grep verification in background. Zero "declared done before verified" instances.
80Don't punt work back to Hannah — Mostly self-sufficient. One legitimate UI handoff ("hit Run now in the sidebar") that I couldn't drive myself.

Who's driving the habit fixes?

Each habit in flight gets tagged with who first surfaced it. claude_proactive = Claude flagged the issue in the moment. hannah_corrected = Hannah noticed and pushed back. tool_caught = an automated check (linter, test, vibe-improver) caught it before review.

Claude proactive

Hannah corrected

Tool caught

🚨Does it actually work?⌄

the prod reality check

Average of 4 metrics below. Weakest: Live site latency (11/100).

Broken in prod⌄

hotfixes within 24h of the commit they broke

100/100

Live site latency⌄

avg time to first byte across live sites

1798ms

11/100

Live site response times (slowest first)

https://www.muse.shopping200 · 2903ms
https://kindle.schlacter.me200 · 1445ms
https://schlacter.me200 · 1046ms

Mean time to fix⌄

median hours bugs lived before patched

n/a

100/100

Scheduled task health⌄

scheduled tasks firing on time (stale = >2× expected period)

3 stale

73/100

Stale scheduled tasks

vibe-coach-weekly 3.3d stale (3.3× expected)
last ran 2026-07-18 · expected every 1d
vercel-deploy-fixer 0.6d stale (3.8× expected)
last ran 2026-07-21 · expected every 0.2d
managed-agents-pulse 0.6d stale (7.7× expected)
last ran 2026-07-21 · expected every 0.1d

🤡Do I know what I'm doing?⌄

the panic index

Average of 3 metrics below. Weakest: Longest debug spiral (0/100).

Fix-to-feature ratio⌄

of commits start with 'fix'

100/100

Revert / oops count⌄

reverts and 'oops' commits

100/100

Longest debug spiral⌄

longest single debug spiral, capped 6h

6.0h

0/100

Longest single sessions, capped at 6h

6h in Interior-Design · session c83448a2 · 2026-07-18
6h in Interior-Design · session 97450b3e · 2026-06-30
6h in Interior-Design · session 9418f1ca · 2026-06-20
6h in -Users-hannahschlacter · session 7a9dc381 · 2026-07-13
6h in -Users-hannahschlacter · session a6f3d26f · 2026-07-05

🧹Did I leave a mess?⌄

the tech debt tax

Average of 3 metrics below. Weakest: Test coverage (25/100).

Test coverage⌄

repos with any test file at all

1/4

25/100

Repos with zero tests

claude-code-insights-dashboard
managed-agents-pulse
twitch-community-research

TODOs left in code⌄

TODOs, FIXMEs, HACKs across the codebase

80/100

Sample TODOs left in code

muse-shopping/frontend/app/onboarding/start/page.tsx:76
// TODO: Send to backend to actually follow these curators
muse-shopping/frontend/scripts/auto-resolver.js:93
// TODO: Integrate with notification system (email, Slack, PagerDuty, etc.)
claude-code-insights-dashboard/insight-detector.py:22
data["suggestions"] = [] # TODO v2: actual pattern detection

Secret protection⌄

secret-y patterns gitignored before they leaked

50/100

Repos missing secret protection

claude-code-insights-dashboard — no secret patterns in .gitignore
twitch-community-research — no .gitignore

methodology

Every metric is computed from data I can't fudge: my own commit history (hbschlac/*, public repos only), my Claude Code session logs, and direct curl hits to live sites. No API keys, no third parties.

Each metric is scored 0–100 (higher = better). Theme score is the average of its metrics. Vibe Score is the average of theme scores. Window: rolling 90 days. 4 repos scanned. Sparkline shows weekly Vibe Score history.

Refreshed weekly · last run Jul 21, 2026 · biggest lever: Longest debug spiral