# Predicting mental health risk through social media

Canonical HTML: https://dsti.school/techblog/predicting-mental-health-risk-social-media

This Markdown copy is generated from the same DSTI static-site build as the canonical HTML page. It is intended for machine readability and concise retrieval.

[DSTI TechBlog](https://dsti.school/techblog)  /  Alumni

Alumni Alumni research in action

The difficult target is not the post already on screen. It is the risk level of the next post — one that has not yet been written. DSTI alumnus Jannic Alexander Cutura explains the two-stage LLM and temporal-modelling approach that took first place in the IEEE Big Data Cup 2025.

JC Jannic Alexander Cutura DSTI alumnus · Research Fellow & Lecturer · Natural language processing and data engineering

30 Mar 2026 12 min read IEEE Big Data Cup 2025 · 1st place

mental-health-ai ordinal-classification temporal-modelling LLMs NLP responsible-ai

## Five observed posts. One unseen target.

Observed posts

P1 t−5

P2 t−4

P3 t−3

P4 t−2

P5 t−1
Future target Unwritten post

t

The modelling question Can semantic understanding of individual posts be combined with a lightweight model of how risk changes over time?

1st IEEE Big Data Cup 2025

7,000+ labelled post sequences

395 unique Reddit users

4 ordered risk levels

Mental-health language can change over time. The IEEE Big Data Cup 2025 asked participants to work with anonymised Reddit data and predict the ordinal suicide-risk level of a user’s next post from the five preceding posts and their timestamps. Jannic’s solution combined prompt-based large-language-model classification with simple temporal aggregation. It won first place and a US$1,000 prize.

!
Subject matter and scope This article discusses research on suicide-risk prediction. The models described produce statistical estimates, not clinical diagnoses, and the work explicitly requires privacy protection, qualified human oversight and careful evaluation before any operational use.

This is not ordinary text classification. The target text is absent. The system must infer a future ordinal state from a short, irregular sequence of earlier observations.

## 01 The challenge: forecast the next state

The IEEE International Conference on Big Data brings together research on large-scale data processing, machine learning and real-world applications. Its 2025 Big Data Cup, sponsored by Hong Kong Polytechnic University, focused on suicide-risk prediction from social-media posts.

For each sequence, the model receives a user’s five most recent posts and their timestamps. It must predict the risk level of a sixth post that has not yet been written. That distinction matters. Classifying visible language asks, “What does this text express?” Forecasting asks, “Given the recent trajectory, what is likely to come next?”

7,000+ post sequences in the research dataset

395 unique users represented in the anonymised Reddit data

<5% attempt-level posts, creating a strongly imbalanced task

Nearly half of all labelled posts fall into the ideation category, while attempt-level posts are rare. The model therefore has to learn an ordered, imbalanced target rather than four interchangeable classes.

![Word cloud produced from language in the anonymised social-media research dataset](https://media.dsti.school/wp-content/uploads/2026/03/26095244/wordcloud.avif)

> **Figure caption:** Language represented in the project data. The research works with anonymised Reddit posts labelled for ordinal suicide risk.

## 02 Four ordered levels, not four unrelated labels

Each post is assigned one of four risk levels. Their order carries meaning: a prediction one step away is not equivalent to a prediction at the opposite end of the scale.

Level 1 Indicator
General warning signs.

Level 2 Ideation
Explicit suicidal thoughts.

Level 3 Behaviour
Intent to act.

Level 4 Attempt
Reference to suicidal actions.

This is why the research reports both weighted F1 and mean absolute error. F1 reflects classification quality across an imbalanced label distribution. Mean absolute error reflects the distance between the predicted and actual positions on the ordinal scale.

## 03 A two-stage method: understand each post, then model the trajectory

The core architecture separates semantic interpretation from temporal forecasting. Large language models classify individual posts. A second, lightweight stage combines those classifications across time.

1

### Post-level classification

Prompt-based GPT-5, GPT-4o and GPT-5-mini models assign an ordinal risk level to each observed post. The approach is zero-shot rather than fine-tuned.
2

### Temporal aggregation

The sequence of post-level predictions and timestamps is combined to forecast the risk level of the unseen next post.

The first stage uses prompts validated in earlier mental-health NLP research. The second asks how much weight to place on each previous observation, especially when posts are irregularly spaced.

Simple average Linear recency weighting Exponential decay Time-distance weighting ARIMA forecasting

One of the sharpest findings is that the aggregation choice matters much less than the quality of the post-level classifications. The five strategies perform within 0.4% of one another. Once the individual posts are classified well, even a simple average is competitive.

![Overview of the two-stage framework combining post-level classification with temporal aggregation](https://media.dsti.school/wp-content/uploads/2026/03/26100936/verview-of-the-two-stage-modeling-framework-scaled.avif)

> **Figure caption:** The two-stage modelling framework: semantic classification of observed posts followed by a lightweight temporal forecast of the unseen target.

## 04 LLM predictions against compact neural baselines

The research also evaluates three neural methods that learn directly from post sequences without external model calls.

Approach | Representation and temporal logic | Operational characteristic

MiniLM | Compact sentence embeddings, time-weighted pooling and an ordinal regression head. | Small local model, but weakest on the unseen final observation.

GRU | Sequential processing that learns interactions between language cues and posting rhythm. | Best neural baseline; overall accuracy within 0.02% of GPT-5.

DistilBERT + LoRA | Parameter-efficient transformer adaptation while most model weights remain frozen. | Local deployment without dependence on an external API.

GPT + aggregation | Prompt-based post classification followed by interpretable temporal aggregation. | Strongest on final-observation sequences and inexpensive to cache.

Overall scores are close. The important separation appears on the hardest subset: sequences where the model must predict the unseen final post. Here, the pretrained semantic knowledge of the LLM approach generalises better than models trained only on the limited challenge data.

### F1 on final-observation sequences

GPT-5

0.46

GRU

0.38

MiniLM

0.25

## 05 What the best configuration achieved

GPT-5 combined with linear weighted averaging produced the strongest overall result.

0.72 overall weighted F1 score

0.30 mean absolute error on the four-point ordinal scale

≈ US$25 one-time cost to classify the training posts with GPT-5

<1 ms temporal aggregation after post-level predictions are cached

An MAE of 0.30 means errors are generally local on the ordinal scale: the model is more likely to confuse adjacent categories than to jump from a general indicator to an attempt-level prediction.

### LLM Hosted semantic models

- Best performance on unseen final observations.
- Post classifications can be cached and reused.
- Simple aggregation limits tuning and computational overhead.
- External processing requires careful treatment of sensitive data.

### Local Neural sequence models

- No dependence on external API calls.
- Potentially preferable where data must remain within a controlled environment.
- Overall performance remains competitive.
- Generalisation is weaker on the genuinely predictive final-observation subset.

## 06 Ethics before automation

Suicide-risk prediction is not an ordinary ranking or recommendation problem. The paper treats deployment as a socio-technical responsibility rather than a simple accuracy threshold.

### Four non-negotiable boundaries

Not a diagnosis
Outputs are statistical estimates and cannot replace assessment by qualified mental-health professionals.

Privacy by design
Operational use would require secure handling, platform-policy compliance and strong data-protection safeguards.

Both error directions matter
False positives can cause distress or unnecessary intervention; false negatives can miss people who may benefit from support.

Human oversight
Uncertainty estimates, bias monitoring and human-in-the-loop review are necessary parts of any responsible system.

The technology may eventually support earlier intervention by identifying changing patterns at scale. It should complement access to qualified professionals, never substitute for the human connection at the centre of mental-health care.

## 07 Research, replication and the alumnus behind the work

The approach was published as Time-Aware Ordinal Modelling of Sequential Text Data in the proceedings of the 2025 IEEE International Conference on Big Data. The public repository contains the challenge solution and a copy of the paper.

JC

### Jannic Alexander Cutura

DSTI alumnus, Research Fellow and Lecturer at DSTI School of Engineering, and Staff Data Engineer at the European Central Bank. His research interests include natural language processing, machine learning and applications of AI in social-good domains.

[LinkedIn](https://www.linkedin.com/in/jannic-cutura/)[GitHub](https://github.com/JannicCutura)[Website](https://www.janniccutura.net/)

Author’s disclaimer: the views presented in this work are solely those of the author and do not represent the views of the European Central Bank or the Eurosystem of central banks. Article adapted for the DSTI TechBlog from the author’s original WordPress contribution; wording and presentation have been revised without changing the research claims, methods or reported results.