A New Approach To Suicide Ideation Detection from Text Content

Document Type : Original Article

Authors

1 Information Syatem Department, Faculty of computersb and information, Menofia University

2 Information System Department, Faculty of Computers and Information, Menofia University

3 Information System Department, faculty of computer and information, Menoufia University

Abstract

Suicide is a serious issue in modern society all over the world. Suicide can be caused by a number of risk factors. Anxiety, hopelessness, social isolation, and depression are the most popular risk factors. Early detection of those risk factors can help reduce or prevent the number of suicide attempts. Many expressions of suicidal thoughts can be discovered in online communities, mostly by young people. In this paper, a new approach to detecting suicidal ideation is built using natural language processing (NLP), and machine learning techniques. This study compares three classifiers, Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB). Our study extracted various feature sets, namely, Statistical, TFIDF, POS, N-grams, and Topic Modeling features. We used various feature reduction techniques, Principal Component Analysis (PCA) and Information Gain (IG). The study aims to increase the suicide ideation detection accuracy, and address shortcomings in previous studies such as using few feature sets, focusing only on the level of words without considering the meaning context, and using all extracted features in the classification task, which includes some irrelative and redundant features. In this study, the RF classifier achieves the highest classification accuracy of 97.02% when using PCA as a future reduction technique. This study proved that using expressive feature sets and selecting relevant and informative features can achieve a more accurate classification process

Keywords