Mental-health language can change over time. The IEEE Big Data Cup 2025 asked participants to work with anonymised Reddit data and predict the ordinal suicide-risk level of a user’s next post from the five preceding posts and their timestamps. Jannic’s solution combined prompt-based large-language-model classification with simple temporal aggregation. It won first place and a US$1,000 prize.
Subject matter and scopeThis article discusses research on suicide-risk prediction. The models described produce statistical estimates, not clinical diagnoses, and the work explicitly requires privacy protection, qualified human oversight and careful evaluation before any operational use.
01The challenge: forecast the next state
The IEEE International Conference on Big Data brings together research on large-scale data processing, machine learning and real-world applications. Its 2025 Big Data Cup, sponsored by Hong Kong Polytechnic University, focused on suicide-risk prediction from social-media posts.
For each sequence, the model receives a user’s five most recent posts and their timestamps. It must predict the risk level of a sixth post that has not yet been written. That distinction matters. Classifying visible language asks, “What does this text express?” Forecasting asks, “Given the recent trajectory, what is likely to come next?”
Nearly half of all labelled posts fall into the ideation category, while attempt-level posts are rare. The model therefore has to learn an ordered, imbalanced target rather than four interchangeable classes.

02Four ordered levels, not four unrelated labels
Each post is assigned one of four risk levels. Their order carries meaning: a prediction one step away is not equivalent to a prediction at the opposite end of the scale.
General warning signs.
Explicit suicidal thoughts.
Intent to act.
Reference to suicidal actions.
This is why the research reports both weighted F1 and mean absolute error. F1 reflects classification quality across an imbalanced label distribution. Mean absolute error reflects the distance between the predicted and actual positions on the ordinal scale.
03A two-stage method: understand each post, then model the trajectory
The core architecture separates semantic interpretation from temporal forecasting. Large language models classify individual posts. A second, lightweight stage combines those classifications across time.
Post-level classification
Prompt-based GPT-5, GPT-4o and GPT-5-mini models assign an ordinal risk level to each observed post. The approach is zero-shot rather than fine-tuned.
Temporal aggregation
The sequence of post-level predictions and timestamps is combined to forecast the risk level of the unseen next post.
The first stage uses prompts validated in earlier mental-health NLP research. The second asks how much weight to place on each previous observation, especially when posts are irregularly spaced.
One of the sharpest findings is that the aggregation choice matters much less than the quality of the post-level classifications. The five strategies perform within 0.4% of one another. Once the individual posts are classified well, even a simple average is competitive.

04LLM predictions against compact neural baselines
The research also evaluates three neural methods that learn directly from post sequences without external model calls.
| Approach | Representation and temporal logic | Operational characteristic |
|---|---|---|
| MiniLM | Compact sentence embeddings, time-weighted pooling and an ordinal regression head. | Small local model, but weakest on the unseen final observation. |
| GRU | Sequential processing that learns interactions between language cues and posting rhythm. | Best neural baseline; overall accuracy within 0.02% of GPT-5. |
| DistilBERT + LoRA | Parameter-efficient transformer adaptation while most model weights remain frozen. | Local deployment without dependence on an external API. |
| GPT + aggregation | Prompt-based post classification followed by interpretable temporal aggregation. | Strongest on final-observation sequences and inexpensive to cache. |
Overall scores are close. The important separation appears on the hardest subset: sequences where the model must predict the unseen final post. Here, the pretrained semantic knowledge of the LLM approach generalises better than models trained only on the limited challenge data.
F1 on final-observation sequences
05What the best configuration achieved
GPT-5 combined with linear weighted averaging produced the strongest overall result.
An MAE of 0.30 means errors are generally local on the ordinal scale: the model is more likely to confuse adjacent categories than to jump from a general indicator to an attempt-level prediction.
LLMHosted semantic models
- Best performance on unseen final observations.
- Post classifications can be cached and reused.
- Simple aggregation limits tuning and computational overhead.
- External processing requires careful treatment of sensitive data.
LocalNeural sequence models
- No dependence on external API calls.
- Potentially preferable where data must remain within a controlled environment.
- Overall performance remains competitive.
- Generalisation is weaker on the genuinely predictive final-observation subset.
06Ethics before automation
Suicide-risk prediction is not an ordinary ranking or recommendation problem. The paper treats deployment as a socio-technical responsibility rather than a simple accuracy threshold.
Four non-negotiable boundaries
Outputs are statistical estimates and cannot replace assessment by qualified mental-health professionals.
Operational use would require secure handling, platform-policy compliance and strong data-protection safeguards.
False positives can cause distress or unnecessary intervention; false negatives can miss people who may benefit from support.
Uncertainty estimates, bias monitoring and human-in-the-loop review are necessary parts of any responsible system.
The technology may eventually support earlier intervention by identifying changing patterns at scale. It should complement access to qualified professionals, never substitute for the human connection at the centre of mental-health care.
07Research, replication and the alumnus behind the work
The approach was published as Time-Aware Ordinal Modelling of Sequential Text Data in the proceedings of the 2025 IEEE International Conference on Big Data. The public repository contains the challenge solution and a copy of the paper.
Jannic Alexander Cutura
DSTI alumnus, Research Fellow and Lecturer at DSTI School of Engineering, and Staff Data Engineer at the European Central Bank. His research interests include natural language processing, machine learning and applications of AI in social-good domains.
Author’s disclaimer: the views presented in this work are solely those of the author and do not represent the views of the European Central Bank or the Eurosystem of central banks. Article adapted for the DSTI TechBlog from the author’s original WordPress contribution; wording and presentation have been revised without changing the research claims, methods or reported results.