Selected work

Eleven projects, written out properly.

Each one is a case study, not a screenshot. Pick any.

01

2026 · Force24 · Analytics Engineer, Data Layer

Account Intelligence Platform

Force24 Account Intelligence Platform thumbnail with layered architecture mark

A greenfield Account Intelligence platform built over 16 weeks at Force24 for CSMs, accounts, and stakeholders, shipped with the engineering team into a live production environment. I owned the data layer end to end, integrated it into the FastAPI service through endpoint changes and Redis caching, and shipped features and enhancements across the Angular frontend. Built on the principle that dashboards should drive action, not just display data. Confidential, sanitised case study.

DagsterdbtPostgreSQL PythonRedisAngular

Read the case study →

02

2026 · Research · Production ML · Agentic AI

Agentic ELT Data Platform for Customer Intelligence

Agentic ELT Data Platform thumbnail showing layered data flow with MCP node

MSc dissertation in a live B2B SaaS environment under NDA. End-to-end JSONB-first ELT platform and three-model churn intelligence stack (survival, XGBoost via PostgresML, and DR-Learner causal inference) surfaced via FastAPI, Angular, and an MCP endpoint for agentic LLM access. Over 1 million records ingested across multiple vendor APIs, 48 dbt models.

PythonPostgreSQLdbt DagsterPostgresMLXGBoostMCP

Read the case study →

03

2026 · Side project · Healthcare ML

Pharmaceutical Side Effect Classification

Pharmaceutical Side Effect Classification thumbnail with stacked classification bars

Production-grade Python package classifying free-text adverse-event descriptions into a MedDRA-style taxonomy of ten clinical categories across 11,825 marketed medicines. Single sklearn Pipeline with ColumnTransformer for TF-IDF text features and ordinal manufacturer encoding, joblib-serialized end to end. Pydantic config, CLI entrypoints, pytest fixtures, GitHub Actions CI matrix on Python 3.10, 3.11, 3.12.

Pythonscikit-learnpandas pytestGitHub Actions

Read the case study →   View on GitHub →

04

2023 · MSc Thesis · Sports analytics

Big Data Analytics for Player Recruitment

Big Data Player Scouting thumbnail with football pitch outline

Ranking and recommending football talent across five top European leagues using event data and PlayeRank metrics. Hypothesis-driven research piece exploring how data analytics can support coaches and scouts in player recruitment. Published thesis, public codebase.

PythonPySparkJupyter PlayeRankUEFA event data

Read the case study →   View on GitHub →

05

2026 · Side project · NLP and media analysis

Conflict Sentiment Analysis

Conflict Sentiment Analysis thumbnail with sentiment polarity wave

A comparative methodology study quantifying how 8,158 English-language news articles framed the Russia and Ukraine conflict across 68 publishers and 18 countries. Three sentiment engines (TextBlob, VADER, CardiffNLP RoBERTa) and a five-topic LDA over the same corpus, behind an abstract SentimentEngine base class with a registry pattern. Transformer mocked in CI so tests do not download gigabytes of weights.

PythonHuggingFacePyTorch NLTKgensim LDAGitHub Actions

Read the case study →   View on GitHub →

06

2024 · Side project · Gaming sector

Collaborative Filtering Recommender at Scale

Steam Games Recommender thumbnail with three-node network graph

PySpark + ALS recommender on the Steam 200k implicit-feedback dataset. Distributed training, hyperparameter tuning via Grid Search with CrossValidator, full experiment tracking and model logging in MLflow. Built and tested on Databricks Community Edition.

PySparkSpark MLlibALS MLflowDatabricksPython

Read the case study →   View on GitHub →

07

2026 · Side project · Finance, time series

Equity Forecasting

Equity Forecasting thumbnail with time series line and forecast cone

Reproducible R analysis package for daily equity closing-price forecasting. ARIMA, ETS, and a naive baseline behind a MODEL_REGISTRY pattern. Combined ADF + KPSS stationarity verdict, residual diagnostics (Ljung-Box, Shapiro-Wilk), forecast evaluation against the naive baseline. 5,124 daily observations of NYSE ticker A from 1999 to 2023. testthat suite, lintr config, R-CMD-check CI matrix across multiple R versions.

Rforecasttseries testthatlintrGitHub Actions

Read the case study →   View on GitHub →

08

2024 · Side project · Healthcare sector

Clinical Trial Data Analysis

Clinical Trial Data Analysis thumbnail with ascending bar chart

Statistical analysis and visualisation of clinical trial outcomes in Python. Methodical walk-through: load and profile the data, handle missing values with documented rules, run hypothesis tests selected by data shape, pair every p-value with an effect size.

PythonpandasNumPy MatplotlibStatistics

Read the case study →   View on GitHub →

09

2024 · Side project · Aviation sector

Relational DB Design for Airport Ticketing

AirWave Express Ticketing System thumbnail with flight path arc and aircraft

SQL Server schema covering passenger management, flight scheduling, reservations, ticketing, baggage, and ancillary services. Auto-incrementing IDs via sequences and triggers, referential integrity via CHECK / NOT NULL / UNIQUE constraints. Production-grade schema design with detailed project report.

SQL ServerT-SQLSchema Design TriggersSequences

Read the case study →   View on GitHub →

10

2025 · Side project · Engineering, statistics

Building Energy Loads

Building Energy Loads thumbnail with overlapping building elevation lines

R analysis package predicting heating and cooling loads from eight building geometry parameters across 768 simulated configurations. Stepwise linear regression on each response, with VIF collinearity checks (car), Shapiro-Wilk normality, Breusch-Pagan heteroscedasticity, and the standard residual diagnostic plots. testthat suite with synthetic fixture, R-CMD-check CI.

Rstats::stepcar (VIF) testthatlintrGitHub Actions

Read the case study →   View on GitHub →

11

2025 · Side project · BI showcase, macroeconomics

Economic Resilience Dashboard

Economic Resilience Dashboard thumbnail with descending economic indicator bars

Single-screen Power BI report on 20 years of IMF World Economic Outlook data across 26 high-income economies (2001 to 2020). Star-schema model, eight indicators feeding nine DAX measures, a Metric Selector field parameter, and a Python companion validator (pandas + openpyxl + pytest) that asserts every dataset descriptor in the README against the actual xlsx contents on every push.

Power BIDAXPower Query Python validatorpytestGitHub Actions

Read the case study →   View on GitHub →