Blog · 3 July 2026 · 5 min read

Dashboards inform, agents act.

Putting a Model Context Protocol endpoint on a churn model, and why the data layer underneath decides whether the agent can be trusted.

Every BI stack I have ever seen ends the same way: a human looks at a chart and decides what to do. All the engineering upstream, the ingestion, the modelling, the testing, exists to serve that one moment of a person reading a screen. The dashboard informs. A person acts. That handoff is where analytics has lived for twenty years.

For my MSc dissertation I built the usual first half of that story. A churn intelligence stack for a live B2B SaaS environment: survival analysis to say when accounts are likely to leave, gradient-boosted classification with SHAP explanations to say how likely and why, and DR-Learner causal inference to say which interventions actually work rather than which ones merely correlate with staying. Underneath it, a warehouse of 48 dbt models in layered Kimball style, ingested by Python extractors and orchestrated with Dagster.

Then, instead of stopping at the dashboard, I put a Model Context Protocol endpoint on top.

What MCP actually changes

MCP is a small open standard that lets an LLM agent call tools you define. My server exposes twelve of them over the intelligence stack: things like churn_at_risk_accounts, churn_score_for_account, and csm_intervention_priority, organised into churn, account, and customer-success domains over a shared database layer.

The difference is easiest to see in one exchange. A customer success lead asks: "Which accounts are high risk this quarter, and which of those would actually respond if we intervened?" On a dashboard, that is two filters, a mental join, and an argument about what "respond" means. Through the MCP endpoint, an agent chains two tool calls, cross-references the survival window with the causal treatment group, and answers with the ranked list and the SHAP drivers per account. The person still decides. But the waiting, the filtering, and the mental join are gone.

That is the whole thesis in one line: dashboards inform, agents act, and the interesting engineering question moves one layer down.

The agent is only as honest as the marts underneath it

Here is the part that gets skipped in most agent demos. An LLM agent will answer from a broken mart with exactly the same confidence as from a correct one. It has no idea your revenue model silently double-counts renewals. A human analyst might squint at a suspicious number; the agent will summarise it fluently and move on.

So the credibility of the whole agentic layer is decided by the least glamorous work in the stack. In this build that meant dimensional models with tests on every layer, contracts on the interfaces the tools query, and correctness gated in CI before anything reaches the serving layer. The same discipline I used on the production platform I built at Force24, where 123 dbt tests and 82 pytest tests caught 22 data quality bugs before they shipped. None of that work demos well. All of it is why the agent's answers can be trusted.

If your marts are not tested, an MCP endpoint does not give you an AI analyst. It gives you a very articulate liar.

Boundaries, honestly stated

This was a dissertation build against an NDA-protected environment, so the case study on this site is sanitised. The models are strong but not magic: the survival model cross-validated at a C-index around 0.94, the classifier at an AUC around 0.95 against a 0.89 logistic baseline, and the causal layer corrected a selection bias of roughly 50 percentage points that a naive comparison would have shipped as fact. Those numbers are good because the data layer let them be good.

And agents do not replace analysts. They replace the waiting between a question and its answer. Someone still has to decide what a treatment group means, whether an intervention is worth its cost, and when the model is wrong. That someone now spends their time on judgement instead of filters.

The pattern travels

Nothing here is churn-specific. Any intelligence layer with real modelling underneath it, priced risk, demand forecasts, fraud scores, can be exposed the same way: tools over tested marts, explanations attached to every number, and an agent in front. Few teams have shipped this pattern end to end yet. I think in three years it will simply be what "serving layer" means.

Read the full case study →

I am a Manchester-based MSc Data Science graduate (Salford, 2026), available now for junior and mid-level analytics engineering, data engineering, and data science roles across the UK and Germany/EU. If your team is wrestling with the layer between models and decisions, I would like to hear about it.