← All posts
agentic-ai claude-code developer-tooling

Ship-it: a Claude Code skill for the last mile

· 6 min read

Every time I ship code it’s the same steps. Check formatting, run lints, run tests, sync translations if needed, get a code review, create the PR, update the ticket. The sequence changes depending on whether the change is frontend, backend, or both, but the shape is the same.

I wrote about start-ticket in the last post, the skill that kicks off work on a Linear ticket. Ship-it is the other end. It takes a completed feature branch through quality gates, code review, and PR creation. Same idea: encode the sequence so nothing gets skipped.

The work involved was pretty different though. Start-ticket is mostly read-only. It gathers context and plans. Ship-it has to make decisions about the code itself, and most of that code is agent-generated, which means I’m reviewing work I didn’t write line by line. That changes the problem. For bigger changes (five-plus files, couple hundred lines of diff) I couldn’t just run a code review agent and trust the output. I needed to walk through the changes myself before anyone else saw them. So ship-it grew a guided walkthrough step, a stale-branch check, auto-formatting before review so the reviewer isn’t drowning in whitespace noise, and approval gates at every point where the skill is about to do something visible to the team. It’s a heavier skill than start-ticket and it took more iterations to get right.

Scope detection

First thing ship-it does is read the diff and figure out what kind of change this is. Our monorepo has a React frontend and Go backend with completely different toolchains. Running the full frontend pipeline on a backend-only change wastes minutes and produces noise.

The case I actually built this for was fullstack changes. Before ship-it, a change that touched both sides meant one massive PR that nobody wanted to review. We learned the hard way that review noise compounds fast. Now ship-it detects both scopes in the diff and splits the branch into two PRs: backend first targeting main, frontend targeting the backend branch. If there’s no child ticket in Linear for the frontend half, it creates one. That split used to be 15 minutes of cherry-picking and branch management.

There’s also a fast path: if a PR already exists for the branch, ship-it skips creation entirely and just runs quality gates then pushes. I hit that path more than the full flow because I push incremental updates to open PRs constantly.

Quality gates

Early versions ran code review first. Bad idea. The reviewer would flag formatting issues that the formatter was about to fix anyway.

Now formatting runs first and auto-fixes. The skill applies the fix, shows me the diff, and moves on if I approve. Lint, typecheck, and tests come next as hard stops. If those fail I need to look at them. Code review runs last so the reviewer only sees real issues.

Translation sync is conditional. It checks whether i18n files are in the diff and only runs the Tolgee push/pull cycle if they are. I’d forgotten translations enough times that automating the check was worth it.

There’s a stale-branch check before any of this runs too. If main has moved ahead, the GitHub diff inflates with changes that aren’t mine. I found this out when a code review agent flagged a “critical regression” in middleware that turned out to be someone else’s already-merged commit in the stale diff. Ship-it now merges the base branch before running gates so the reviewer only sees my actual changes.

The guided walkthrough

For small changes ship-it goes straight from quality gates to code review. For anything substantial (five-plus files or a couple hundred lines) it offers a guided walkthrough first.

This came from a problem with reviewing agent-generated code. When the agent builds a feature I’m looking at a diff I didn’t write line by line. It’s easy to skim past something that looks right but isn’t, especially when the change touches multiple files and the logic spans across them. The Microsoft/CMU study found that people stop scrutinizing AI output over time. I’ve caught myself doing exactly that.

The walkthrough chunks the diff by logical area and steps through each one, waiting for me to approve or flag issues before moving on. If I find something worth fixing, the skill fixes it and re-runs the relevant quality gates before continuing. Only after I’ve walked the whole thing does it hand off to the code review agent.

Adds maybe five minutes to a big PR. Cheaper than finding the problem in a teammate’s review, or in production.

What surprised me

Scope detection turned out simpler than I expected. If the diff touches files in apps/web/ or libs/lyng/, it’s frontend. services/ or *.go, backend. Path matching covers 95% of cases. The remaining 5% (shared config, CI files) defaults to both scopes, which is the safe path. I kept expecting to need something smarter. Never came up.

The PR approval gate was an afterthought that turned out to be the most important step in the whole skill. Ship-it writes the title and description, and it’s usually right. But “usually” isn’t good enough for something my team reads. The gate costs three seconds. It’s caught bad titles, wrong base branches, descriptions that missed the point of the change. I almost shipped the skill without it because it felt like unnecessary friction. Would have regretted that.

The thing I didn’t expect was how much of the value comes from ordering, not from any individual step. I was already running lints, already doing code review, already creating PRs. The skill’s contribution is doing them in the right sequence and not letting me skip one when I’m in a hurry. Agents are most useful when they cover for the parts you’re weakest at, and for me that turned out to be self-discipline: doing things the right way when I’m three hours into a feature and just want it merged.

What’s next

Ship-it still has rough edges. The domain knowledge capture step (spawning dala-expert after the PR to record what was learned) works, but it’s slow enough that I skip it on small tickets. I tried moving it to a background agent to avoid blocking. Background agents in Claude Code can’t prompt for bash permissions though, so the gh API calls silently fail. For now it stays synchronous.

The guided walkthrough could be smarter about what it shows me. Right now it chunks by file, so a rename that touches 12 files gets 12 approval prompts. Grouping by logical change instead of by file would fix that, but I haven’t found a reliable way to cluster diffs by intent yet.

The bigger question is how you iterate on skills like this without disrupting the team. Ship-it exists in two places: a team version in the repo that goes through code review, and a personal version in my dotfiles where I experiment freely. Changes I test in the personal copy need to get ported back through a PR, and they diverge silently if I forget. I’ve been bitten by that more than once. How to structure skills so they can be developed, tested, and shared without breaking things for everyone else is its own problem, and that’s what the next post is about.


Next up: how I structure skills so they compose instead of competing for context. 109 skills down to 25, and why that made everything work better.

Part of a series on agentic development tooling. See also: Harness engineering: the model is just the horse, Five revisions of start-ticket