When we launched our project, nvestwe received a lot of positive feedback from traders surrounding the cool charts, technical indicators and some statics we posted to exemplify how traders and investors can use data in their decision-making process. However one area that I think has not go the deserved attention it should is text mining/ natural language processing (NLP). Sure everyone understands what a sentiment indicator is or has seen a word cloud but that hardly scratches the surface of what you can do with NLP. Today I am going to walk through an example of a recent word cloud we build on the OmiseGo whitepaper.
A simple word cloud takes the count of different words in a document and summarizes them visually by the number of counts. So if the word ‘car’ appears 20 times vs the word ‘muffler’ only 10 times, the visualization of the word ‘car’ would be proportionally bigger in the word cloud. The premise behind a word cloud is that you can get a quick and dirty overview of what topics are discussed in a document by looking at the counts or frequencies of words used.
The problem with using simple word frequencies
In the English language there are many rules regarding how different words are used. Verbs are conjugated for the tense (e.g buy, buying, bought) nouns are generally consistent may have possessive terms (Jake’s car) and there are many others. When you ask a computer to do a simple count of words it doesn’t know any of these rules and just looks for 100% matches. Buy and Buying are not the same.
However researchers in NLP have been working on this issue for a while and today you have lots of software that can deal with most of these issues quite easily, in English at least. One popular option that we used is the SpaCy package which categories word and can converts them into a ‘lemma’.
In the above word cloud you will notice that we have a few terms or words connected with an underscore as well. That’s because some terms function like words for example ‘smart contract’ is really a noun and could be considered one word if the end goal is to count frequencies.
Next Steps
Counting words and putting them in a word cloud is easy but for more advanced analysis maybe you want to see the comparison of two documents. For example, the US Federal Reserve releases a statement at each meeting and the financial markets pay close attention to each word. In this case its not so much the frequency of words, but what words have changed that is important. Or maybe you are looking on reddit and want to see how many time the world HODL is used in the past week vs the average for the past 2 years. These are just two of the many examples of tasks that NLP is good for.
At nvest, our expert Machine Learning team is ready to build you a technology platform to provide you these types of tools, without having to program or know all the grammar rules. If that sounds like something you are interested in, join us at www.nvest.ai.