COMPARATIVE ANALYSIS OF PERSONALIZED RESPONSE GENERATION METHODS BASED ON MESSENGER CORPUS

Authors

  • Bohdan Utenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute", Ukraine
  • Vladyslav Taran National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute", Ukraine https://orcid.org/0000-0003-2493-7239

Keywords:

personalized text generation, TF-IDF, Markov chains, DistilGPT-2, DialoGPT, fine-tuning, messenger corpus, BERTScore

Abstract

This paper presents a comparative study of four approaches to automated personalized text response generation in the communication style of a specific individual: TF-IDF retrieval, Markov chains (bigram model), and two fine-tuned neural language models – DistilGPT-2 (82M parameters) and DialoGPT-medium (345M parameters). The training corpus consists of Ukrainian-language private Telegram chat logs (~200,000 messages), from which 10,000 query-response pairs were extracted in an 80/20 train/test split. Neural models were fine-tuned on the Kaggle platform using an NVIDIA T4 GPU; inference and evaluation were performed on CPU. Evaluation was conducted on 500 test pairs across five metrics: BLEU, ROUGE-L, BERTScore-F1, average inference time, and peak RAM consumption. DistilGPT-2 achieved the highest BLEU (0.0088) and ROUGE-L (0.0535) scores and the best BERTScore F1 (0.687), while DialoGPT-medium underperformed across quality metrics despite consuming significantly more memory (Net RAM: 1343 MB vs. 297 MB). TF-IDF retrieval is viable for minimal hardware, and Markov chains are the fastest method (70 µs per query), but both lag behind neural approaches in semantic coherence. The findings indicate that fine-tuned DistilGPT-2 offers the best quality-to-resource trade-off for style personalization in a local CPU deployment scenario.

References

Weizenbaum J. ELIZA – A computer program for the study of natural language communication between man and machine. Communications of the ACM. 1966. Vol. 9, No. 1. P. 36-45. DOI: 10.1145/365153.365168.

Salton G., Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management. 1988. Vol. 24, No. 5. P. 513-523. DOI: 10.1016/0306-4573(88)90021-0.

Shannon C. E. A mathematical theory of communication. Bell System Technical Journal. 1948. Vol. 27, No. 3. P. 379-423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.

Vaswani A. et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5998-6008. URL: https://arxiv.org/abs/1706.03762.

Radford A. et al. Improving language understanding by generative pre-training. OpenAI Blog. 2018. URL: https://openai.com/research/language-unsupervised.

Zhang S. et al. DialoGPT: Large-scale generative pre-training for conversational response generation. ACL 2020. P. 270-278. DOI: 10.18653/v1/2020.acl-demos.30.

Sanh V. et al. DistilBERT, a distilled version of BERT. EMC2 Workshop, NeurIPS 2019. URL: https://arxiv.org/abs/1910.01108.

Howard J., Ruder S. Universal language model fine-tuning for text classification. ACL 2018. P. 328-339. DOI: 10.18653/v1/P18-1031.

Zhang T. et al. BERTScore: Evaluating text generation with BERT. ICLR 2020. URL: https://arxiv.org/abs/1904.09675.

Liu C.-W. et al. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. EMNLP 2016. P. 2122-2132. DOI: 10.18653/v1/D16-1230.

Published

2026-05-08

Issue

Section

Machine learning, Big Data (AI)