Detection and classification of fake news from social media

Perspective of End Consumer Behaviour

Authors

  • Zuzana Janková
  • Michal Páleš
  • Marcin Potrykus
  • László Buics

DOI:

https://doi.org/10.13164/

Keywords:

fake news, false information, financial market, stock market, social media, SVM, ML

Abstract

The tremendous rise of artificial intelligence and the use of ChatGPT enables the unprecedented rapid spread of various messages. The fundamental problem at present is the spread of false, unverified news and information known as fake news. This great challenge of the modern world causes deliberate manipulation of public opinion, can contribute to the loss of value in the stock market and poses many risks on a global level. For the detection of fake news, a dataset containing text messages from the social platform X is used. Within the framework of natural language processing (NLP), a text analysis was performed including traditional pre-processing steps, such as tokenization, removal of traces of words, excessive punctuation, etc. Subsequently the Bag of Words transformation was used, the text is coded into word vectors through the word embedding method and Word2Vec vectorization. Based on previous research and impressive practical performance, the Support Vector Machine (SVM) technique is chosen for fake news classification, which is a highly robust and effective machine learning algorithm.nThe issue of detection and classification of information disseminated on online platforms is difficult, and so far, no unambiguous approach has been provided that would provide satisfactory results. To fill the research gaps, this paper is focused on the detection of fake news from messages published on a social network using Machine Learning (ML) methods, specifically, the SVM algorithm is chosen.nPreprocessing the text data reduced the dimensionality of the dataset by almost 50 %, as many news headlines contained a large number of meaningless tokens or excessive punctuation. The importance of this step has been proven, especially when using unstructured data from social platforms. The accuracy of fake news classification is almost 75 % using SVM. Due to the fact that unstructured data and individual pre-processing steps can distort certain elements revealing fake news from legitimate news, it is advisable to focus in more detail on individual pre-processing steps in future research. In particular, excessive punctuation or frequent use of stop words can provide additional elements that can help separate fake news from real news.

Downloads

Published

2024-12-13

Issue

Section

ORIGINAL SCIENTIFIC ARTICLE