RT Journal Article T1 Improvements for research data repositories: The case of text spam A1 Vázquez, Ismael A1 Novo Lourés, María A1 Pavón Rial, Maria Reyes A1 Laza Fidalgo, Rosalía A1 Méndez Reboredo, José Ramón A1 Ruano Ordás, David Alfonso K1 1203.17 Informática AB Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the resultsof their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through otherapproximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements forthe development of experimental protocols and test benches. This study has analysed a significant number of CS/ML (Computer Science/Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2)facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms forlicencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application whichimplements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in theURL https://rdata.4spam.group to facilitate understanding of this study PB Journal of Information Science SN 01655515 YR 2023 FD 2023-04 LK http://hdl.handle.net/11093/7491 UL http://hdl.handle.net/11093/7491 LA eng NO Journal of Information Science, 49(2), 285-301 (2023) NO Xunta de Galicia | Ref. ED431C 2021/44 DS Investigo RD 19-abr-2025