Problem.
How news framing differs across publishers, countries, and topics is a methodologically interesting question with active academic literature behind it. The harder version of the question is whether a single sentiment engine is enough, or whether the answer changes depending on which engine you trust.
This project quantifies how 8,158 English-language news articles framed the Russia and Ukraine conflict across 68 publishers and 18 countries, and tests three sentiment engines plus a topic model on the same corpus to surface where they agree and where they diverge. The work is descriptive of media framing, not of the underlying events.
My contribution.
Solo, end to end:
- Built the corpus ingestion and cleaning pipeline. URL stripping, alphabet-only normalisation, lowercase, regex tokenisation, stopword removal, WordNet lemmatisation across both verb and noun parts of speech.
- Wired three complementary sentiment engines on the same corpus behind an abstract
SentimentEnginebase class with a registry pattern. Adding a fourth engine is one class plus one registry entry. - Built a five-topic gensim LDA model over the cleaned corpus to surface what each article is talking about, not just how it talks about it. Pinned
random_statefor reproducibility. - Cross-tabulated sentiment by publisher, country, and topic. The transformer's signed sentiment score is the primary cross-publisher metric. The lexicon engines provide an interpretable check.
- Wrote the pytest suite with the transformer engine fully mocked so CI does not download gigabytes of weights on every run. Lazy load on real use, mocked load in tests.
- Set up the GitHub Actions matrix CI on Python 3.10, 3.11, and 3.12 with NLTK resource caching, ruff and black gating.
Methodology and the multi-engine question.
The reason the project runs three engines is that lexicon-based sentiment (TextBlob, VADER) and transformer-based sentiment (RoBERTa) sometimes disagree, and the disagreement itself is informative.
- TextBlob (lexical baseline). Naive Bayes pattern polarity from a movie-review prior. Cheap, interpretable, but misaligned with conflict language.
- VADER (social and news lexicon). Handles intensifiers, negation, and the punctuation patterns common in headlines.
- CardiffNLP RoBERTa (
cardiffnlp/twitter-roberta-base-sentiment-latest). Contextual transformer fine-tuned on roughly 124M tweets. Captures interactions that lexicons miss, and is the only engine producing calibrated three-class probabilities. Signed confidence (positive = +conf, negative = -conf, neutral = 0) is used for cross-publisher aggregation.
When all three agree, the signal is strong. When the transformer says negative and the lexicon says neutral, you typically have hedged or sarcastic language that the lexicon misses. When the lexicon says strongly positive and the transformer says mildly negative, you typically have stylistic positivity that the lexicon scores up but the transformer reads in context.
The five-topic LDA gives the secondary axis. Sentiment on the same conflict varies sharply by topic in ways the headline number hides.
Architecture.
configs/default.yaml
|
v
data/raw/*.xls -> loader -> cleaner -> preprocessor (URL strip, lowercase, tokenize, stopword, lemmatize)
|
+----------------------------+----------------------------+
| | |
v v v
TextBlobEngine VaderEngine TransformerEngine (RoBERTa)
\ | /
\ v /
+------------> sentiment_results.csv <-------------+
|
+-------------------+--------------------+
v v
LDATopicModel (gensim) analysis (publisher,
| country, temporal,
v wordfreq)
topic_assignments.csv |
topic_summaries.json v
reports/figures
Ethical considerations.
This is methodological research, not commentary. The framing question (how publishers cover a conflict) is well-established in media studies. The contribution is the multi-engine triangulation. No conclusions are drawn about which publisher is right, only about how their framings differ.
The English-only corpus excludes Russian, Ukrainian, Arabic, and Mandarin reporting, and the transformer's training distribution skews Western, so non-Western voices may be systematically misclassified. Aggregates are relative comparisons, not ground-truth severity. Cross-reference with independent sources before drawing policy or operational conclusions.
Stack.
Python · HuggingFace Transformers · PyTorch · NLTK · gensim · pandas · pytest · ruff · black · GitHub Actions