Show simple item record

dc.contributor.authorMéndez Reboredo, José Ramón 
dc.contributor.authorCotos Yáñez, Tomas Raimundo 
dc.contributor.authorRuano Ordás, David Alfonso 
dc.date.accessioned2019-01-15T12:20:24Z
dc.date.issued2019-03
dc.identifier.citationApplied Soft Computing, 76: 89-104 (2019)spa
dc.identifier.issn15684946
dc.identifier.issn18729681
dc.identifier.urihttp://hdl.handle.net/11093/1149
dc.description.abstractThe Internet emerged as a powerful infrastructure for the worldwide communication and interaction of people. Some unethical uses of this technology (for instance spam or viruses) generated challenges in the development of mechanisms to guarantee an affordable and secure experience concerning its usage. This study deals with the massive delivery of unwanted content or advertising campaigns without the accordance of target users (also known as spam). Currently, words (tokens) are selected by using feature selection schemes; they are then used to create feature vectors for training different Machine Learning (ML) approaches. This study introduces a new feature selection method able to take advantage of a semantic ontology to group words into topics and use them to build feature vectors. To this end, we have compared the performance of nine well-known ML approaches in conjunction with (i) Information Gain, the most popular feature selection method in the spam-filtering domain and (ii) Latent Dirichlet Allocation, a generative statistical model that allows sets of observations to be explained by unobserved groups that describe why some parts of the data are similar, and (iii) our semantic-based feature selection proposal. Results have shown the suitability and additional benefits of topic-driven methods to develop and deploy high-performance spam filters.en
dc.description.sponsorshipXunta de Galicia | Ref. ED481B 2017/018spa
dc.description.sponsorshipXunta de Galicia | Ref. ED431C2016-040spa
dc.description.sponsorshipAgencia Estatal de Investigación | Ref. MTM2017-89422-Pspa
dc.description.sponsorshipSMEIC/SRA/ERDF | Ref. TIN2017-84658-C2-1-Rspa
dc.language.isoengen
dc.publisherApplied Soft Computingspa
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación (PEICTI) 2013-2016/MTM2017-89422-P/ES/NUEVOS AVANCES METODOLOGICOS Y COMPUTATIONALES EN ESTADISTICA NO PARAMETRICA Y SEMIPARAMETRICA
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación (PEICTI) 2013-2016/TIN2017-84658-C2-1-R/ES/INTEGRACION DE CONOCIMIENTO SEMANTICO PARA EL FILTRADO DE SPAM BASADO EN CONTENIDO
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.en
dc.titleA new semantic-based feature selection method for spam filteringen
dc.typearticlespa
dc.rights.accessRightsopenAccess
dc.identifier.doi10.1016/j.asoc.2018.12.008
dc.identifier.editorhttps://www.sciencedirect.com/science/article/abs/pii/S1568494618306963
dc.publisher.departamentoInformáticaspa
dc.publisher.departamentoEstatística e investigación operativaspa
dc.publisher.grupoinvestigacionSistemas Informáticos de Nova Xeraciónspa
dc.publisher.grupoinvestigacionInferencia Estatística, Decisión e Investigación Operativaspa
dc.date.updated2019-01-15T11:34:54Z
dc.computerCitationpub_title=Applied Soft Computing|volume=76|journal_number=|start_pag=89|end_pag=104spa


Files in this item

[PDF]

    Show simple item record

    Attribution-NonCommercial-NoDerivatives 4.0 International
    Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International