RT Journal Article T1 Enhancing representation in the context of multiple-channel spam filtering A1 Novo Lourés, María A1 Ruano Ordás, David Alfonso A1 Pavon Rial, Maria Reyes A1 Laza Fidalgo, Rosalía A1 Gomez Meire, Silvana A1 Méndez Reboredo, José Ramón K1 3304.99 Otras AB This study addresses the usage of different features to complement synset-based and bag-of-words representations of texts in the context of using classical ML approaches for spam filtering (Ferrara, 2019). Despite the existence of a large number of complementary features, in order to improve the applicability of this study, we have selected only those that can be computed regardless of the communication channel used to distribute content. Feature evaluation has been performed using content distributed through different channels (social networks and email) and classifiers (Adaboost, Flexible Bayes, Naïve Bayes, Random Forests, and SVMs). The results have revealed the usefulness of detecting some non-textual entities (such as URLs, Uniform Resource Locators) in the addressed distribution channels. Moreover, we also found that compression properties and/or information regarding the probability of correctly guessing the language of target texts could be successfully used to improve the classification in a wide range of situations. Finally, we have also detected features that are influenced by specific fashions and habits of users of certain Internet services (e.g. the existence of words written in capital letters) that are not useful for spam filtering. PB Information Processing & Management SN 03064573 YR 2022 FD 2022-03 LK http://hdl.handle.net/11093/2761 UL http://hdl.handle.net/11093/2761 LA eng NO Information Processing & Management, 59(2): 102812 (2022) NO Financiado para publicación en acceso aberto: Universidade de Vigo/CISUG NO Xunta de Galicia | Ref. ED481D-2021/024 DS Investigo RD 04-oct-2023