Clinical Trial Data Analysis

Outcomes data is messy by design.

Clinical trial datasets sit at the intersection of small sample sizes, strict measurement protocols, and downstream decisions that affect real patients. The analysis has to handle missing values without erasing them, run statistical tests that respect the trial design, and surface effects in a way a non-statistician can act on.

This project is a methodical walk-through of one such dataset: load it, profile it, test the hypotheses, visualise the effects, and write the conclusion in language the trial sponsor would actually read.

Approach.

The analysis follows the standard sequence and writes each step out in a reproducible Jupyter notebook. Data load and schema audit. Missing-value handling with documented rules. Descriptive statistics across treatment and control arms. Hypothesis tests selected by data shape, not by reflex (parametric where assumptions hold, non-parametric where they do not). Effect size estimation alongside p-values, because p-values without effect sizes are noise.

Visualisation is restrained: distribution plots for the raw data, paired comparisons for the primary outcome, confidence-interval plots for the conclusions. Every figure has a caption that explains what it shows and what it does not show.

What I shipped.

A reproducible analysis notebook that runs end to end on the dataset, a methodology section that explains every choice, a conclusions section written for a non-statistician audience, and a public repository with the data-handling rules documented so the work is auditable.

Lessons.

Most of the value in clinical-trial analysis is in the boring parts: documenting how missing values were handled, picking the right test for the data shape, and pairing every p-value with an effect size. The flashy chart matters less than the methodology trail behind it. Reviewers and sponsors trust an analysis they can re-run, not an analysis they have to take on faith.

→ GitHub repository