A Deep Learning Model for Detecting Mental Illness
from User Content on Social Media

Jina Kim
Dept. of
Interaction Science
Jieon Lee
Dept. of
Interaction Science
Eunil Park
Dept. of Interaction Science & Applied AI
Jinyoung Han
Dept. of Applied AI

Sungkyunkwan University, Seoul, Republic of Korea

Abstract

Users of social media often share their feelings or emotional states through their posts. In this study, we developed a deep learning model to identify a user’s mental state based on his/her posting information. To this end, we collected posts from mental health communities in Reddit. By analyzing and learning posting information written by users, our proposed model could accurately identify whether a user’s post belongs to a specifc mental disorder, including depression, anxiety, bipolar, borderline personality disorder, schizophrenia, and autism. We believe our model can help identify potential suferers with mental illness based on their posts. This study further discusses the implication of our proposed model, which can serve as a supplementary tool for monitoring mental health states of individuals who frequently use social media.

Method

Data collection: We collected post data from the following six mental-health-related subreddits, each of which is reported to be associated witha specific disorder: r/depression, r/Anxiety, r/bipolar, r/BPD, r/schizophrenia, and r/autism. In addition, we further collected post data from the most popular health-related subreddit, r/mentalhealth, to analyze posts with general health information.

Data anonymization: All the user information is anonymized, hence no personally identifiable information was not identified; we followed all the anonymization process guided by the Sungkyunkwan University Institutional Review Board (IRB).



An architecture of the proposed CNN-based classifcation model.

Classification models: We developed six binary classification models, each of which categorizes a user' specific post into one of the following subreddits: r/depression, r/Anxiety, r/bipolar, r/BPD, r/schizophrenia, and r/autism. Our conjecture is that a user who suffers from a specific mental problem writes a post on the corresponding subreddit that deals with the problem. Therefore, we developed six independent binary classification models for each symptom to improve the performance. By developing six independent models for each mental disorder, each of which uses data where users suffer from only one particular mental problem, we were able to accurately identify a user's potential mental state.

We divided our dataset into training (80%) and testing (20%) sets. We used two well-known classifiers, XGBoost and Convolutional Neural Network (CNN). Note that we excluded the posts of users who wrote posts across multiple subreddits in learning phase.

Paper URL

Scientific Reports (Published: 16 July 2020)

BibTeX

@article{kim2020deep,
title={A deep learning model for detecting mental illness from user content on social media},
author={Kim, Jina and Lee, Jieon and Park, Eunil and Han, Jinyoung},
journal={Scientific Reports},
volume={10},
number={1},
pages={1--6},
year={2020},
publisher={Nature Publishing Group}
}

Dataset

  • Posts of Mental-health-related Subreddits (.csv format)
  • - We are only allowed to distribute the data for the research purpose. Please complete the request form.

Media Citation