Posts
Why we built a native MCP server for retros

Kelly Lewandowski
Last updated 19/05/20267 min read
"Native" doesn't mean "different code"
/api/v1/ endpoints that anyone with a personal access token can hit directly. The same handlers, the same Zod schemas, the same permission checks. If we change a rule in one, the other moves with it.Decision 1: fewer tools, with toggles instead of pairs
retro_create_reaction and a retro_delete_reaction, the same way our REST routes split create from delete. Two tools per emoji reaction. Multiply that across items and comments and you're spending real tokens on noise before the model has done anything useful.retro_toggle_reaction is one tool that flips an emoji on or off depending on whether the caller already reacted. It returns "added" or "removed" in the response, so the model can narrate what happened without storing reaction ids it would only need to call delete.Decision 2: write hints into descriptions, not docs
userId and a kudoType and returned a friendly error if you passed someone who wasn't in the space. Models would consistently invent a user id, hit the error, apologise, and ask the user to paste the right id. Useless.retro_create_item if none exists). The receiver
must be a member of the retro's space — use organization_list_users
(supports a search filter) to look up the user id by name.organization_list_users with the name the user said out loud, gets the id, and gives the kudos in one shot. The handler didn't change. The hint did.
retro_update warns inside the description that deleting a column also deletes every item in it. retro_cast_item_vote mentions the per-board and per-column vote cap explicitly so the model can prompt the user before hitting the 400. Every "you have to know this to call me correctly" goes inside the tool, not in a separate guide nobody reads.Decision 3: a semantic search tool, not a list-and-filter chain
retro_list, then retro_list_items for each result, then read them all into context. That's a tool-call storm that costs the user money and produces a worse answer than grep would.search tool that runs semantic search across the whole space at once. It returns retro items, comments, action items, standup answers, poll responses, ice-breaker answers, and notes, ranked by cosine similarity against the query, grouped by type. The model gets the relevant 20 results in one call instead of fanning out across hundreds of records.Decision 4: per-feature consent at OAuth time
What this means at the retro board
Pre-seed the board
"Open this sprint's retro and add items from the postmortems we wrote in Linear this fortnight, one per incident, in the What Could Improve column." The model creates the items, marks them anonymous where the source ticket was, and stops. Find context from old retros
"Have we talked about CI flakiness before?" Semantic search returns the three retros where it came up and the action items that came out of them, in one call. Turn discussion into action items
"Make action items from the top-voted three items on the board, assign them to the person who wrote each, due Friday." retro_list_itemsthen a fewaction_item_createcalls. The model does the assignment from the item author.Give kudos by name
"Give Priya kudos for unblocking the migration." The model calls organization_list_userswith a search for "Priya," then attaches the kudos.
What we'd do again, what we'd skip
retro_get tool. Models then summarised the summary, and the latency tripled. Boring won. The MCP tool returns the same shape the REST endpoint does. AI on top of that is for the user to opt into via prompt, not for us to bake into the protocol./api/v1/
handlers. The interesting work is in the tool definitions: naming, grouping,
hints, and which actions get collapsed into a single toggle. The backend
logic is shared.retro_create_item accepts an anonymous flag,
and retro_create_item_comment does too. When set, the response
omits the author user id, the same way the UI does. Anonymity is enforced at
the handler level, not in the description.retro_update warns that deleting a
column also deletes every item in it. We rely on the model surfacing this to
the user before calling it, the same way a developer reading docs would
notice the warning. There's no separate confirmation step inside the
protocol.