Case study / 2026

Agentic ELT Data Platform for Customer Intelligence

An MSc dissertation in a live B2B SaaS environment under NDA: a JSONB-first ELT pipeline, a 48-model dbt warehouse, a three-model churn intelligence stack, and a Model Context Protocol endpoint surfacing it all to agentic LLM workflows.

6 min read  ·  Repository: private (academic NDA)

Academic research project under NDA with an industry partner. Architecture, methodology, and headline results described accurately. Client identities, schema details, and code excerpts withheld. GDPR-compliant pseudonymisation throughout.
Agentic ELT Data Platform thumbnail showing layered data flow with MCP node

In one line.

I make data trustworthy, then I make it predictive. This dissertation is the proof.

The problem.

Off-the-shelf ELT tools and traditional ETL assume stable schemas and well-behaved APIs. In a real B2B SaaS, that assumption breaks: schema drift, null primary keys, HTTP 429 rate limits, mid-pagination 500s, late-arriving SFTP files. Meanwhile, churn research is dominated by B2C telecoms; B2B SaaS (fewer accounts, higher revenue per account, multi-stakeholder cancellation paths) is largely ignored. I tackled both gaps in a single live deployment, fully owned end to end.

What I built.

1. Custom JSONB-first ELT pipeline.

Ingestion (Python 3.11). Custom extractors per vendor, each handling its own auth, pagination, and failure mode. Per-account isolation so a mid-pagination 500 in one account does not cascade. Multi-stage backoff respecting Retry-After. Deterministic _ingestion_id via SHA-256 content hashing for null primary keys. SFTP file-level cursors with lookback for late files. 82 pytest tests covering cursor persistence, hashing, retry logic.

Orchestration. Dagster software-defined assets, so a failure in one vendor extraction does not cascade to the others. Scheduled daily ingestion, downstream dbt transformation, and weekly ML scoring.

Raw layer. Append-only PostgreSQL 16 with JSONB columns and GIN indexes. 19 tables, over 1 million records ingested, full source fidelity, deterministic replay. GIN-indexed lookups on the largest table under 200ms.

Transformation. dbt Core 1.9, 48 models across staging (20 views), intermediate (7 tables including an MDM-style cross-source account spine, a recursive-CTE customer hierarchy, and gaps-and-islands engagement streaks), star-schema marts (5), and an ML feature layer (3). Automated tests caught 14 data-quality issues including UUID-casing conflicts that would have affected up to 100% of cross-source matches if missed.

2. Three-model churn intelligence stack.

Each model answers a different operational question for the Customer Success team:

  • Survival model: GradientBoostingSurvivalAnalysis (scikit-survival). When will an account churn? Cross-validated C-index 0.9449.
  • Health score: XGBoost via PostgresML (in-database training). How likely is churn? Cross-validated AUC 0.950 vs 0.888 logistic baseline. SHAP attributions surfaced for every prediction.
  • Causal model: DR-Learner (econml). Would CSM outreach actually help this account? The naive correlation between CSM assignment and retention was substantially inflated by reactive-assignment selection bias, since CSMs were being assigned to accounts that were already deteriorating. The DR-Learner correction surfaced the honest treatment effect, materially smaller than the naive estimate but defensibly positive. This is the kind of correction enterprise risk-modelling teams hire for.

Designed with a strict prospective cohort (84-day observation window followed by a 90-day outcome window) to prevent leakage. 29 features across 8 thematic groups (billing, engagement volume, engagement trajectory, support, integration, cross-level hierarchy aggregations).

3. Serving and agentic interface.

FastAPI service with JWT authentication and PostgreSQL Row-Level Security for per-CSM scoping, so a CSM can only ever see the accounts assigned to them.

Angular 21 dashboard with SHAP explanations exposed at the per-account level for the CSM team.

Model Context Protocol (MCP) endpoint so an LLM agent can query account risk, drivers, and intervention recommendations directly. This is the differentiator: agentic workflows over the customer intelligence layer, accessible to any MCP-compatible client.

4. Prioritisation framework.

Combined temporal urgency (survival), risk classification (health score), and intervention responsiveness (CATE from the causal model) into a single tier ranking. Output: a defensible, explainable prioritisation list ready for CSM action, with each recommendation traceable back to its underlying model evidence.

Architectural decisions worth defending.

JSONB-first raw layer. Schema drift in upstream vendor APIs is a fact of life. Storing raw payloads as JSONB with GIN indexes means I can replay history at any point, audit any record back to its source, and adapt to schema changes without rebuilding the pipeline. Append-only ensures every transformation is reproducible from raw.

Three models, three questions. Survival, classification, and causal inference each answer something different. Most "churn models" only do classification and treat it as if it answers all three. By separating the questions, I produce three different operational outputs the Customer Success team can act on.

Honest causal correction over naive correlation. Most churn-intervention systems report the raw correlation between treatment and outcome and treat it as a treatment effect. That conflates correlation with causation under selection bias. The DR-Learner correction surfaces the honest number, meaningfully positive but materially smaller than the naive estimate. I would rather report the smaller honest number than the inflated naive one. Enterprise risk-modelling teams test for exactly this.

MCP as the next-generation interface. Dashboards inform; agents act. Exposing the customer intelligence layer through MCP means an LLM agent can pull a per-account risk profile, ask why, and recommend the next intervention, all without a CSM context-switching to a dashboard. Few engineers have shipped this pattern end to end in production today. It is the work I want to take into industry at scale.

Outcomes worth quoting.

  • Over 1 million records ingested across multiple vendor APIs with zero silent loss.
  • 48 dbt models, 14 data-quality issues caught and resolved before production.
  • C-index 0.9449 (survival), AUC 0.950 (health score), measurable causal correction surfacing reactive-assignment selection bias.
  • Cursor-safe restarts: full vendor run recoverable mid-flight in under 5 minutes.
  • GDPR-compliant pseudonymisation under NDA throughout.

What this dissertation proves.

End-to-end ownership: API extractors, warehouse modelling, multi-model ML, causal inference, REST and MCP serving, and an analytics dashboard, all designed, built, and tested by me, defended in academic supervision, and deployed against a real production environment. Engineered for actual API failure modes, not idealised ones. Production-minded: tests, RLS, JWT, deterministic IDs, append-only audit trail, GDPR-compliant pseudonymisation under NDA.

Stack.

Python 3.11 · PostgreSQL 16 (JSONB + GIN) · Dagster · dbt Core 1.9 · PostgresML · XGBoost · scikit-survival · econml (DR-Learner) · SHAP · FastAPI · Angular 21 · Model Context Protocol (MCP) · pytest

Repository: private (academic NDA)