The transition into the information age has brought a massive proliferation of data, but fortunately, at the same time there has been rapid innovation in tools to analyze this data. In past, the focus of most large scale data analysis solutions has been on metrics that are easily measured – the number of visitors to a webpage, what products visitors purchase, how many ‘likes’ certain posts get. However focusing only on things that are easy to measure can mean missing the most important data.
New methodologies in data analysis seek to change that to tap the potential of a much wider selection of data sources. One of these data analysis technique is Text analytics, also known as text mining, a way of transforming raw, unstructured text into structured data, which can then be measured and analyzed scientifically. It seeks to quantify the sprawling masses of text such as product reviews, customer service interactions, or comments on a product page, and turn it into measurable data, indentifying the “who,” “what,” “when,” “where,” “why,” as well as the emotional tone of conversations.
The process of carrying out text analysis often includes tasks such as:
- Categorizing information
- Counting the number of times subjects are mentioned
- Identifying the sentiments of text
- Summarizing documents
- Statistically analyzing blocks of text
- Extracting concepts and themes
- Drawing connections between different hyperlinked web pages and
- Identifying the relationships between entities in the text.
Text analytics can also integrate data from multiple sources - combining sources such as Twitter tweets, customer service interactions, or general mentions of a product or a brand on the internet to get a more comprehensive view of what is being said. Text analysis solutions have also been developed for many languages other than English, including French, Spanish, German, Mandarin, Arabic, and Japanese.
The importance of text analytics is highlighted by its use by major companies. Facebook recently released ‘Topic Data’, a system to anonymously analyze comments and posts about subjects relevant to specific products. On the page of this system, they give the example of how a company selling hair de-frizzing products can actually harvest data from users’ posts about how humidity affects their hair. IBM also recently purchased AlchemyAPI to augment the analytics of their Watson platform, and Microsoft recently purchased the text analytics company Equivio. In addition, all email providers use text analytics in their anti-spam filters and while these never seem to be perfect, their increasing rate of correctly identifying spam highlights the effectiveness of text analytics.
Other practical uses of text analytics include:
- Identifying consumer attitude towards brands and products
- Checking for plagiarism
- Electronic discovery’ process in legal investigations
- Determining automatic advertisement placements
- Monitoring online conversations for national security
- Indexing large publication databases in academic and scientific fields
Thus, text analytics can be valuable for everyone from small businesses to multinational corporations. As it can be a complicated field, companies can benefit from outside help in the form of a technology consultant with expertise in this area. A good technology consulting firm can advise on the most appropriate software and help organizations get the most value from its use. Since it is such a new and diverse field, we still do not know all potential uses of text analytics, and as such, businesses could be surprised by innovative ways in which it could help them.
However, there is a limit to how much text reading can be automated. Having a program go over the comments section on a business’s Facebook page will never be the same reading it personally. However, with the sheer amount of data out there it can no longer be feasible to have a human look at everything, as labour costs add up quickly compared to the cost to run a program. In addition, computer programs can be objective where humans tend to make mistakes – for example, a human reader may pay too much attention to certain passages of text over others, reading their own perceptions and emotional biases into the raw data.
It should also be noted that in some situations there could be copyright issues with text mining, if the analysis is being performed on copyrighted data. Just because someone has the rights to read and access a certain piece of text does not mean that they can carry out an automated analysis of it. This is more of a problem in countries that have less permissive copyright laws. Users should research whether there are any relevant copyright issues before running analysis on any non-public data.
While it is difficult to say for certain, most estimates say that more than 80% of the data is in the form of text. This suggests that there is enormous commercial potential in the field of text mining. While text mining was originally developed by intelligence agencies during the Second World War, it has only been in recent years that the technology has truly began to come into its own. And due to its complexity, it is a field with huge potential for growth, as machines learn to read more and more like their human counterparts. In the end, we can only guess at how effective the technique will become, but the potential is truly revolutionary, which we are already reaching with the many diverse uses of text analytics available today.