A methodology for the resolution of cashtag collisions on Twitter – A natural language processing & data fusion approach
DATE:
2019-08-01
UNIVERSAL IDENTIFIER: http://hdl.handle.net/11093/4186
EDITED VERSION: https://linkinghub.elsevier.com/retrieve/pii/S0957417419301812
DOCUMENT TYPE: article
ABSTRACT
Investors utilise social media such as Twitter as a means of sharing news surrounding financials stocks
listed on international stock exchanges. Company ticker symbols are used to uniquely identify companies
listed on stock exchanges and can be embedded within tweets to create clickable hyperlinks referred to
as cashtags, allowing investors to associate their tweets with specific companies. The main limitation is
that identical ticker symbols are present on exchanges all over the world, and when searching for such
cashtags on Twitter, a stream of tweets is returned which match any company in which the cashtag
refers to - we refer to this as a cashtag collision. The presence of colliding cashtags could sow confusion
for investors seeking news regarding a specific company. A resolution to this issue would benefit investors
who rely on the speediness of tweets for financial information, saving them precious time. We propose
a methodology to resolve this problem which combines Natural Language Processing and Data Fusion
to construct company-specific corpora to aid in the detection and resolution of colliding cashtags, so
that tweets can be classified as being related to a specific stock exchange or not. Supervised machine
learning classifiers are trained twice on each tweet – once on a count vectorisation of the tweet text,
and again with the assistance of features contained in the company-specific corpora. We validate the
cashtag collision methodology by carrying out an experiment involving companies listed on the London
Stock Exchange. Results show that several machine learning classifiers benefit from the use of the custom
corpora, yielding higher classification accuracy in the prediction and resolution of colliding cashtags.