Posts

Why we built a native MCP server for retros

Editorial illustration of a retro board with sticky notes flowing into a chat panel on the left, soft purple and pink gradient, modern flat vector style, the chat is choosing tools from a small palette, no readable text or UI labels
Kelly Lewandowski

Kelly Lewandowski

Last updated 19/05/20267 min read

We could have shipped a Kollabe MCP server in an afternoon by running our OpenAPI spec through a code generator. Instead, we spent a couple of weeks designing the tool surface by hand. This is the part of that decision that surprised us most: the work was almost entirely about retros. Standups and planning poker are mechanical. You submit answers; you cast a vote. A model that can read JSON and write JSON does fine with a generated wrapper. Retros are the opposite shape. They're long, messy, anonymous in parts, voted on, reacted to, grouped, summarised, and they reference people you may not be able to look up by name. The first version we tried, the one that mirrored REST one-to-one, fell over the moment a model tried to give someone kudos.

"Native" doesn't mean "different code"

A native MCP server isn't a separate backend. Ours is a thin layer on top of the same /api/v1/ endpoints that anyone with a personal access token can hit directly. The same handlers, the same Zod schemas, the same permission checks. If we change a rule in one, the other moves with it. What native does mean is that the tool surface was designed for the thing using it. A REST API is read by a developer with docs open. An MCP server is read by a model in the middle of a conversation, with a thousand tokens of system prompt already spent and a user waiting. The two callers want different things from the same backend. Four differences ended up mattering for retros.

Decision 1: fewer tools, with toggles instead of pairs

A generated server would have given us a retro_create_reaction and a retro_delete_reaction, the same way our REST routes split create from delete. Two tools per emoji reaction. Multiply that across items and comments and you're spending real tokens on noise before the model has done anything useful. We collapsed every reversible action into a single tool. retro_toggle_reaction is one tool that flips an emoji on or off depending on whether the caller already reacted. It returns "added" or "removed" in the response, so the model can narrate what happened without storing reaction ids it would only need to call delete. The same logic kept item votes as a pair (you can hold multiple votes on the same item, so you genuinely do need an id to remove one) and kept item creation and deletion separate (items aren't reversible state, they're records). Toggle when reversible, keep separate when not.

Decision 2: write hints into descriptions, not docs

The retro kudos tool taught us this. The first version accepted a userId and a kudoType and returned a friendly error if you passed someone who wasn't in the space. Models would consistently invent a user id, hit the error, apologise, and ask the user to paste the right id. Useless. We fixed it by rewriting the tool description rather than the handler:
Give kudos to another user, attached to an existing retro item (create one first with retro_create_item if none exists). The receiver must be a member of the retro's space — use organization_list_users (supports a search filter) to look up the user id by name.
Same code, same error, but now the model reads the description, calls organization_list_users with the name the user said out loud, gets the id, and gives the kudos in one shot. The handler didn't change. The hint did. Illustration of a chat bubble on the left with a small workflow on the right showing three steps: search for a user, find their record, then attach kudos to a retro item, soft pastel palette, flat vector editorial style We started doing this everywhere. retro_update warns inside the description that deleting a column also deletes every item in it. retro_cast_item_vote mentions the per-board and per-column vote cap explicitly so the model can prompt the user before hitting the 400. Every "you have to know this to call me correctly" goes inside the tool, not in a separate guide nobody reads.

Decision 3: a semantic search tool, not a list-and-filter chain

Retros accumulate. A team that runs one every two weeks has 26 boards a year, with maybe 800 items across them. When someone asks an assistant "what did we say about flaky deploys last quarter," the worst possible answer is for the model to call retro_list, then retro_list_items for each result, then read them all into context. That's a tool-call storm that costs the user money and produces a worse answer than grep would. So we built a search tool that runs semantic search across the whole space at once. It returns retro items, comments, action items, standup answers, poll responses, ice-breaker answers, and notes, ranked by cosine similarity against the query, grouped by type. The model gets the relevant 20 results in one call instead of fanning out across hundreds of records.

Decision 4: per-feature consent at OAuth time

The Kollabe MCP server has roughly forty tools across retros, standups, planning poker, action items, and search. Asking a user to consent to all forty in one screen is the kind of decision people click through without reading. We split the consent screen by category. When you connect Kollabe to Claude or Cursor, you tick the categories you want (retros, standups, planning poker, action items, search) and the token is scoped to those. A team that only wants their PM to draft retro action items through Claude doesn't have to grant standup or poker access. A revoke from the user settings page kills it whether they consented to one category or all of them. The token also names a specific Kollabe organization. If you belong to several orgs, you pick which one on the consent screen, and the token acts as you, in that org only. Switch orgs, you reconnect.

What this means at the retro board

A user who has connected Kollabe to their AI client can now have a conversation that looks like this, with the model doing the legwork:
  1. Pre-seed the board
    "Open this sprint's retro and add items from the postmortems we wrote in Linear this fortnight, one per incident, in the What Could Improve column." The model creates the items, marks them anonymous where the source ticket was, and stops.
  2. Find context from old retros
    "Have we talked about CI flakiness before?" Semantic search returns the three retros where it came up and the action items that came out of them, in one call.
  3. Turn discussion into action items
    "Make action items from the top-voted three items on the board, assign them to the person who wrote each, due Friday." retro_list_items then a few action_item_create calls. The model does the assignment from the item author.
  4. Give kudos by name
    "Give Priya kudos for unblocking the migration." The model calls organization_list_users with a search for "Priya," then attaches the kudos.
None of this is a new feature. Every one of those calls maps to a REST endpoint that's been live for months. The MCP layer is the difference between "the API exists" and "the model can use the API."

What we'd do again, what we'd skip

We'd build the toggle pattern from day one. We'd write the cross-referenced hints into descriptions from day one. We'd ship semantic search before any individual list endpoint, because that's the tool the model actually wants. The thing we'd skip is the temptation to make MCP "richer" than REST. We tried, briefly, baking auto-summaries into the retro_get tool. Models then summarised the summary, and the latency tripled. Boring won. The MCP tool returns the same shape the REST endpoint does. AI on top of that is for the user to opt into via prompt, not for us to bake into the protocol. If you want to try this on your own retros, the setup guide is sixty seconds of OAuth. The deeper background on the protocol is in our MCP explainer. And if you'd rather see AI fit a retro before wiring up a model, our retrospective template generator is a good no-commitment start.

No. The MCP server is a thin adapter over the same /api/v1/ handlers. The interesting work is in the tool definitions: naming, grouping, hints, and which actions get collapsed into a single toggle. The backend logic is shared.

Yes. retro_create_item accepts an anonymous flag, and retro_create_item_comment does too. When set, the response omits the author user id, the same way the UI does. Anonymity is enforced at the handler level, not in the description.

The tool description for retro_update warns that deleting a column also deletes every item in it. We rely on the model surfacing this to the user before calling it, the same way a developer reading docs would notice the warning. There's no separate confirmation step inside the protocol.

No. Consent is split by category at OAuth time. You can grant retro access alone, or any combination of retros, standups, planning poker, action items, and search. Tokens are revocable from your Kollabe settings.