Predicting mental health risk through social media

Mental-health language can change over time. The IEEE Big Data Cup 2025 asked participants to work with anonymised Reddit data and predict the ordinal suicide-risk level of a user’s next post from the five preceding posts and their timestamps. Jannic’s solution combined prompt-based large-language-model classification with simple temporal aggregation. It won first place and a US$1,000 prize.

Subject matter and scopeThis article discusses research on suicide-risk prediction. The models described produce statistical estimates, not clinical diagnoses, and the work explicitly requires privacy protection, qualified human oversight and careful evaluation before any operational use.

This is not ordinary text classification. The target text is absent. The system must infer a future ordinal state from a short, irregular sequence of earlier observations.

01The challenge: forecast the next state

The IEEE International Conference on Big Data brings together research on large-scale data processing, machine learning and real-world applications. Its 2025 Big Data Cup, sponsored by Hong Kong Polytechnic University, focused on suicide-risk prediction from social-media posts.

For each sequence, the model receives a user’s five most recent posts and their timestamps. It must predict the risk level of a sixth post that has not yet been written. That distinction matters. Classifying visible language asks, “What does this text express?” Forecasting asks, “Given the recent trajectory, what is likely to come next?”

7,000+post sequences in the research dataset

395unique users represented in the anonymised Reddit data

<5%attempt-level posts, creating a strongly imbalanced task

Nearly half of all labelled posts fall into the ideation category, while attempt-level posts are rare. The model therefore has to learn an ordered, imbalanced target rather than four interchangeable classes.

Word cloud produced from language in the anonymised social-media research dataset — Language represented in the project data. The research works with anonymised Reddit posts labelled for ordinal suicide risk.

02Four ordered levels, not four unrelated labels

Each post is assigned one of four risk levels. Their order carries meaning: a prediction one step away is not equivalent to a prediction at the opposite end of the scale.

Level 1Indicator

General warning signs.

Level 2Ideation

Explicit suicidal thoughts.

Level 3Behaviour

Intent to act.

Level 4Attempt

Reference to suicidal actions.

This is why the research reports both weighted F1 and mean absolute error. F1 reflects classification quality across an imbalanced label distribution. Mean absolute error reflects the distance between the predicted and actual positions on the ordinal scale.

03A two-stage method: understand each post, then model the trajectory

The core architecture separates semantic interpretation from temporal forecasting. Large language models classify individual posts. A second, lightweight stage combines those classifications across time.

Post-level classification

Prompt-based GPT-5, GPT-4o and GPT-5-mini models assign an ordinal risk level to each observed post. The approach is zero-shot rather than fine-tuned.

Temporal aggregation

The sequence of post-level predictions and timestamps is combined to forecast the risk level of the unseen next post.

The first stage uses prompts validated in earlier mental-health NLP research. The second asks how much weight to place on each previous observation, especially when posts are irregularly spaced.

Simple averageLinear recency weightingExponential decayTime-distance weightingARIMA forecasting

One of the sharpest findings is that the aggregation choice matters much less than the quality of the post-level classifications. The five strategies perform within 0.4% of one another. Once the individual posts are classified well, even a simple average is competitive.

Overview of the two-stage framework combining post-level classification with temporal aggregation — The two-stage modelling framework: semantic classification of observed posts followed by a lightweight temporal forecast of the unseen target.

04LLM predictions against compact neural baselines

The research also evaluates three neural methods that learn directly from post sequences without external model calls.

Approach	Representation and temporal logic	Operational characteristic
MiniLM	Compact sentence embeddings, time-weighted pooling and an ordinal regression head.	Small local model, but weakest on the unseen final observation.
GRU	Sequential processing that learns interactions between language cues and posting rhythm.	Best neural baseline; overall accuracy within 0.02% of GPT-5.
DistilBERT + LoRA	Parameter-efficient transformer adaptation while most model weights remain frozen.	Local deployment without dependence on an external API.
GPT + aggregation	Prompt-based post classification followed by interpretable temporal aggregation.	Strongest on final-observation sequences and inexpensive to cache.

Overall scores are close. The important separation appears on the hardest subset: sequences where the model must predict the unseen final post. Here, the pretrained semantic knowledge of the LLM approach generalises better than models trained only on the limited challenge data.

F1 on final-observation sequences

GPT-5

0.46

GRU

0.38

MiniLM

0.25

05What the best configuration achieved

GPT-5 combined with linear weighted averaging produced the strongest overall result.

0.72overall weighted F1 score

0.30mean absolute error on the four-point ordinal scale

≈ US$25one-time cost to classify the training posts with GPT-5

<1 mstemporal aggregation after post-level predictions are cached

An MAE of 0.30 means errors are generally local on the ordinal scale: the model is more likely to confuse adjacent categories than to jump from a general indicator to an attempt-level prediction.

LLMHosted semantic models

Best performance on unseen final observations.
Post classifications can be cached and reused.
Simple aggregation limits tuning and computational overhead.
External processing requires careful treatment of sensitive data.

LocalNeural sequence models

No dependence on external API calls.
Potentially preferable where data must remain within a controlled environment.
Overall performance remains competitive.
Generalisation is weaker on the genuinely predictive final-observation subset.

06Ethics before automation

Suicide-risk prediction is not an ordinary ranking or recommendation problem. The paper treats deployment as a socio-technical responsibility rather than a simple accuracy threshold.

Four non-negotiable boundaries

Not a diagnosis

Outputs are statistical estimates and cannot replace assessment by qualified mental-health professionals.

Privacy by design

Operational use would require secure handling, platform-policy compliance and strong data-protection safeguards.

Both error directions matter

False positives can cause distress or unnecessary intervention; false negatives can miss people who may benefit from support.

Human oversight

Uncertainty estimates, bias monitoring and human-in-the-loop review are necessary parts of any responsible system.

The technology may eventually support earlier intervention by identifying changing patterns at scale. It should complement access to qualified professionals, never substitute for the human connection at the centre of mental-health care.

07Research, replication and the alumnus behind the work

The approach was published as Time-Aware Ordinal Modelling of Sequential Text Data in the proceedings of the 2025 IEEE International Conference on Big Data. The public repository contains the challenge solution and a copy of the paper.

Jannic Alexander Cutura

DSTI alumnus, Research Fellow and Lecturer at DSTI School of Engineering, and Staff Data Engineer at the European Central Bank. His research interests include natural language processing, machine learning and applications of AI in social-good domains.

LinkedIn GitHub Website

Author’s disclaimer: the views presented in this work are solely those of the author and do not represent the views of the European Central Bank or the Eurosystem of central banks. Article adapted for the DSTI TechBlog from the author’s original WordPress contribution; wording and presentation have been revised without changing the research claims, methods or reported results.