Institute of Sociology
of the Federal Center of Theoretical and Applied Sociology
of the Russian Academy of Sciences

Divisenko, K.S. (2025) A biographical study of subjective well-being using natural language processing methods. Vestnik Tomskogo gosudarstvennogo universiteta. Filosofiya. Sotsiologiya. Politologiya – Tomsk State University Journal of Philosophy, Sociology and Political Science. 85. pp. 164–176. (In Russian). DOI: 10.17223 ...



Divisenko, K.S. (2025) A biographical study of subjective well-being using natural language processing methods. Vestnik Tomskogo gosudarstvennogo universiteta. Filosofiya. Sotsiologiya. Politologiya – Tomsk State University Journal of Philosophy, Sociology and Political Science. 85. pp. 164–176. (In Russian). DOI: 10.17223/1998863X/85/14
ISSN 1998-863X
DOI 10.17223/1998863X/85/14
РИНЦ: https://elibrary.ru/contents.asp?id=82690037

Posted on site: 01.10.25

Текст статьи на сайте журнала URL: https://journals.tsu.ru/philosophy/&journal_page=archive&id=2621&article_id=53561 (дата обращения 01.10.2025)


Abstract

Text data became more frequent in subjective well-being studies due to development of computer-assisted methods for qualitative research. The article examines the possibility of studying subjective well-being based on autobiographical data by means of natural language processing and machine learning. The six-factor model of well-being developed by Carol Ryff was used for its reconstruction in current study. Open coding of the autobiographical texts corpus written by high school students (n = 197) was carried out in accordance with this six-factor model of subjective well-being: self-acceptance, positive relationships with others, autonomy, environmental mastery, purpose in life, personal growth. Fragments describing purpose in life and positive relationships with others are the most frequent in high school students’ autobiographical texts. The labeled data were used to build a baseline machine learning model build upon count and TF-IDF vectorisation as well as logistic regression and random decision forests algorithms. Semantic vectorisation of the text with ruBert (Bidirectional Encoder Representations from Transformers) increased the classification accuracy. The weighted average Fi value in the case of binary classification for “personal growth”, “goals in life”, “positive relationships with others” was 0.92, 0.85 and 0.89, respectively. The results of the study are entirely consistent with the previously described changes in the high school students’ lifeworld and indicate the gradual development of a realistic type of a biographical project. It seems important to conduct experiments with an expanded dataset, as well as testing other language models. Probably, the classification accuracy can be increased by adding part of speech tagging. The trained models can be used to analyze similar autobiographical texts and as a screening test of subjective well-being. The author declares no conflicts of interests.