Can Text and Sentiment Analysis Tools be Trusted to Interpret Human Data?
If you’ve read my other blog (The Importance of Humanising Insight), you’ll know that I don’t believe in artificial intelligence as a replacement for the human researcher, but I do believe in its benefits as a supporting technology.
For me, text and sentiment analysis tools are no different. They are a fantastic way of saving time and resource in certain areas. No longer do research professionals have to digest masses of content from social media posts to survey verbatim responses, and tag with topics, themes and sentiments. Text and sentiment analysis tools take care of this allowing for a swifter project turnaround. However, there are caveats to their use.
Interpreting Contextual Qual
Have you ever misread a text or email, or missed the context of it? I’m guessing the answer is ‘yes’. At the end of the day, everyone is human! And it’s for this reason that I would never be totally comfortable relinquishing control and trusting any analytics software to interpret text and sentiment. If we can make mistakes ourselves, how is software (however intelligent) meant to get it right 100% of the time?
There’s a whole host of areas where text and sentiment analysis tools can fall down when you think about the interpretation of text based on the context it’s being used in.
Take the following sentence, “Great — I’ve used up my data allowance this month!”
Whilst I can tell by reading this that the person writing it is likely showing annoyance, analysis tools could well categorise it as positive. How can they know that ‘great’ isn’t actually great in this instance but is in fact a negative.
Have you ever thought about how many emoji’s there are on your smartphone? And in how many contexts each could be used? And that’s before you even start thinking about the use of emoticons in place of certain words on social media.
People are constantly looking for creative ways to cut back on their words in tweets to fit within the 140 character limit and emoji’s are a handy way of doing this. Clearly they don’t think about the inconvenience it causes to text analytics.
The Complexity of Language
Language is a complex thing. A recent BBC article — The language rules we know. but don’t know we know — highlights this with some brilliant examples. If you’ve ever tried to explain an English phrase to someone whose first language isn’t English, you’ll realise that there are linguistic anomalies.
For example, why are you ‘on the bus’, ‘in a canoe’ but ‘on a cruise ship’? As a native English speaker, I both say and understand these phrases naturally. There is inevitably a reason for the difference, but I don’t know what it is, neither does it matter to me… But it matters to analysis software. Without an understanding of the nuances of all languages, text analysis will be floored. And that is a huge NLP undertaking.
An Ever Growing Dictionary
Last month the Oxford English Dictionary released a new update including over 1,000 new entries. Gen Z are coming up with more and more slang all the time (some of which even us Millennials struggle to understand!). Each area of every country has their own colloquial terms, and across counties with a shared language the same word can mean different things. This makes text analysis via technology alone incredibly hard. At the least, it creates areas where analysis quality may be lacking and must be monitored.
Using Text And Sentiment Analysis In Practice
As Brandwatch explained, sentiment analytics software must evolve constantly to learn and account for the equivalent evolution in our language. As a researcher, it’s also vital to be able to re-categorise sentiment should you disagree with the automated outcomes. When selecting your analytics software take both of these points into account.
Whilst it’s clear that text and sentiment analysis software offers some benefits, I am not ready to hand over the reins completely and I may never be. For the reasons above I will continue to sense check any automated analysis before interpretation and client reporting begins.