Five revisions of start-ticket: how a Claude Code skill learned to start work

Every ticket at dala.care starts the same way. Pull up Linear, read the description, check out a branch, grep around the codebase, maybe ask someone what the business rules are, figure out an approach, update the ticket, start coding.

I did this manually for months and it mostly worked, except when it didn’t. Steps would get skipped in ways I wouldn’t notice until later. I’d explore the codebase before checking domain knowledge and miss a business rule that changed the whole approach. I’d start designing before scoping and end up with something too large for one PR. I’d update Linear after I’d already started building, so the ticket would quietly drift from what I was actually doing.

To solve this I ended up writing a Claude Code skill called start-ticket to lock the sequence down, and it’s gone through five revisions since then, each one fixing a different class of problem I hadn’t anticipated.

V1: the checklist

The first version had six steps: fetch the ticket from Linear, explore the codebase, evaluate scope, brainstorm the design, update Linear, transition to implementation. It worked fine but there were some issues that only became obvious through use.

Step 2 was just “search for related files.” No domain knowledge lookup, no vault query, nothing about business rules. Grep around and see what you find. That was fine when I was the only one using it because I had most of the domain context in my head already.

Step 6 was a lie. “Transition to implementation” wasn’t a real step. It was a side effect of step 4. The brainstorming skill would finish and auto-transition into writing plans. I listed it as its own step because it felt like one, but the skill didn’t do anything there.

The Linear update timing was vague. The skill said “during or after brainstorming,” which in practice meant it happened whenever the agent got around to it, or not at all. When brainstorming auto-transitioned to implementation, the agent had already moved on and the Linear update got skipped entirely.

This version lasted about two weeks. It was too loose where it mattered (the Linear update, the phantom step) and too rigid where it didn’t. A one-file bug fix still had to go through the full brainstorming flow, which just added time for no reason.

V2: hard gates and guard rails

The first real restructuring came when I tried sharing the skill with the team. The V1 problems were things I could work around because I knew what the skill was supposed to do. Other people couldn’t.

I collapsed steps 5 and 6 into a single hard gate. The agent has to update Linear and get explicit user approval before it can move to implementation. No more “whenever the agent gets around to it.” The gate is a wall, not a suggestion.

I added guard rails at the top. “Don’t use this for spikes without a ticket, branches not linked to Linear, or mid-ticket work where context is already loaded.” The skill kept getting invoked in situations it wasn’t built for, so I told it where it doesn’t apply.

And I had to deal with a plan mode conflict. Claude Code has a built-in plan mode, and start-ticket has its own planning flow (brainstorming into writing-plans). If both were active the skill would cut short because plan mode tried to take over. Now the skill exits plan mode on startup if it detects it.

The domain knowledge gap was still there though. Step 2 was still just codebase exploration. If you didn’t already know the business rules for the area you were touching, the skill didn’t help you find them.

V3: domain knowledge

This is the version where start-ticket got a memory. We have a knowledge vault for the dala.care codebase, an Obsidian-based system where architectural decisions and business rules live as markdown notes. The vault existed but start-ticket didn’t touch it.

I added a record-discoveries step after brainstorming. When the agent finishes designing an approach it spawns a domain expert agent to capture anything new from the session. Business rules that got clarified, architectural patterns that weren’t documented, domain concepts that would help whoever works in that area next.

I also added disable-model-invocation: true to the skill frontmatter. Without that flag Claude Code can invoke the skill automatically when it thinks the context matches. Sounds convenient. It’s not. The skill has a specific entry point (you’ve checked out a branch, you have a ticket) and auto-invocation would fire in the middle of unrelated work because the agent saw a branch name that looked like a ticket number.

Discovery capture was the bigger change though. Before this, knowledge left the session when the session ended. I’d figure out that municipality agreements have a specific lifecycle with constraints that aren’t obvious from the code, and two weeks later someone else would rediscover the same thing from scratch. Now that context persists.

The vault lookup was still gap-triggered. The skill told the agent “assess whether domain context is needed” and let it decide. If the agent didn’t know enough to know what it didn’t know, it would skip the lookup entirely.

V4: proactive lookups

Small change, big difference. I replaced “assess whether domain context is needed” with “always query the knowledge vault for the ticket’s domain area before proceeding to design.”

The gap-triggered approach had a blind spot that’s obvious in hindsight. The agent decides whether to look things up based on what it already knows. If it doesn’t know about a constraint, it doesn’t know to look for it. A ticket about visit type categories looks straightforward from the code. There’s a table, there’s a CRUD API. But the vault has a note saying visit type changes require municipality agreement validation because billing rates are tied to visit types. Nothing in the code or the ticket mentions billing, so the agent would never search for it on its own.

Making the lookup unconditional costs maybe 10 seconds per ticket. Missing a business rule that changes the whole approach costs an hour of rework when it surfaces during code review.

I also changed the dala-expert prompt guidance here. The old version was something like “what context do I need?” which is too vague to be useful. The new version tells the agent to ask specific questions based on what codebase exploration left unanswered. “I found the agreement service at this path but I need to understand the valid state transitions” gets better results than “tell me about agreements.”

V5: stop editing ticket descriptions

This one came from a code review comment. The skill was appending implementation plans to the Linear ticket description with an UPDATE section. Seemed reasonable, the plan is related to the ticket so put it where the ticket is.

The problem is that ticket descriptions are the original intent. Product wrote them. They have context about why something needs to happen, screenshots, links to customer conversations. When the agent appends an implementation plan to that, the original intent gets buried under technical details. Six months later someone looks at the ticket and sees a stale implementation plan before the reason the ticket exists.

The fix was simple: post implementation plans as comments instead. The description stays clean. The plan is still attached to the ticket. Comments have timestamps so the chronology is obvious.

One of those changes where the previous behavior wasn’t obviously wrong until someone pointed it out. The skill was doing what I told it to do. I just told it wrong.

What the revisions have in common

Each version fixed a different kind of problem but they share a pattern.

V1 was ambiguity. The skill said what to do but not when or how strictly, so the agent filled in the gaps with its own judgment. That judgment was unreliable.

V2 was boundaries. The skill didn’t know where it applied and transitions between steps had no enforcement.

V3 was lifecycle. Knowledge wasn’t persisting between sessions, and the skill was activating when it shouldn’t.

V4 was epistemology, if that’s not too fancy a word for it. The agent couldn’t find what it didn’t know to look for. Changing the default from “check if you need context” to “always get context” removed a decision the agent was bad at making.

V5 was data ownership. Implementation details were overwriting product intent.

None of these were obvious before they happened. Each one surfaced through use, either me hitting the problem or a teammate flagging it in review. The skill got better by running it on real tickets and paying attention to where it broke, not through upfront design.

That’s probably the most useful thing I can say about building workflow skills: the first version will be wrong. Ship it anyway, use it daily, fix what breaks. Five revisions in three weeks got me to something I trust. Trying to design it perfectly upfront would have taken longer and still missed all five problems.

Where it is now

Seven steps: fetch the ticket from Linear, explore the codebase and query the knowledge vault, read team config, evaluate scope, design via design-gate, record discoveries back to the vault, update Linear (hard gate) and begin implementation.

The whole sequence takes about two minutes on a medium-sized ticket. Most of that is codebase exploration and the vault lookup. Actual overhead compared to doing it by hand is maybe 30 seconds, but the steps happen in the right order every time and nothing gets skipped.

The skill composes with other skills rather than doing everything itself. Design-gate handles brainstorming. Impl-plan handles task decomposition and TDD rhythm. Dala-expert, a specialized agent, handles vault reads and writes. Start-ticket is really just the orchestrator that calls them in sequence and enforces gates between phases. I’ll get into how that composition works, and why there are two versions of the skill (personal and team), in a separate post.

The one thing I’d still change is scope evaluation. It checks whether the ticket is too large and suggests splitting, but it doesn’t have a good sense of what “too large” means for our codebase. It flags things as too big that are actually fine, and I override it often enough that the step is losing credibility. It needs better calibration data, probably examples of tickets that were the right size versus ones that should have been split. This will only come when enough of those edges have been reached in real use, so for now it’s a known imperfection that I work around.

Next up: the other end of the workflow. Ship-it automates the last mile: formatting, lints, code review, PR creation. Turns out the hard part isn’t the automation, it’s reviewing code you didn’t write.

Part of a series on agentic development tooling. See also: Harness engineering: the model is just the horse