Sentiment Analysis and Context Extraction

How do you determine the most popular TV show in the country?

Introduction

The first reaction to this problem is to conduct a survey. Prepare a questionnaire, send an army of field representatives to collect the data, run an online campaign to spread the word. The question is, is it reliable? There are quite a few limitations with the traditional survey mechanism – both in the process and in the outcome.

Primarily, it is a very time consuming exercise. It can take up to several weeks to collect the data and even more time to process it.This approach is especially challenging if the intent is to generate ratings on a daily basis.

secondly, they are prone to error as the size of the data points that can be collected on the ground or from an online survey is very small.There is also a question of veracity of the data collected, as the data are subjective.

Considering all these problems, when our client, who runs a popular TV channel guide portal, approached us to solve the problem of determining the latest trending TV shows in the country, we had to deviate from the traditional approach. By leveraging Social Media, Big Data and Natural Language Processing (NLP), we delivered a unique and a robust solution.

Sentiment Analysis

Social Media, The Gold Mine Of Data.

In 2011, Twitter released an interesting promotional video. In it, a young man from New York, engrossed in his book at a break room table, is distracted by a Twitter notification on his phone. He checks it, nonchalantly picks up his coffee mug off the table and resumes reading. A second later, the room starts shaking. He waits for the tremor to pass and puts the mug back on the table.

This commercial was based on a mild earthquake that occurred in the US that year. When it happened, tweets began pouring in from Washington nearly 30 seconds before New Yorkers could feel the tremors. Twitter obviously produced this advertisement to showcase the speed at which it delivers tweets.

But there is also an important underlying message in it: people immediately post about things that ‘move’ them, both literally and figuratively.

Today, for people to share their views, we have a countless number of social media platforms. The popular social networks like Facebook and Twitter have a combined membership of over 1 billion. So, there is a huge amount of data corresponding to people’s reactions to products, movies, TV shows, events, etc. readily available on the Internet. Since these are spontaneous reactions and not obligated responses, the data carries much more weight. But the question is: How do you capture, analyze and make sense of this ‘Big’ data?

Big Data

As per 2012, IBM estimates, we generate a staggering 2.5 billion gigabytes (GB) of data everyday.**CITE “If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute,” says Bernard Marr, a Big Data Guru.***CITE This data explosion is posing great challenges to the existing technology and infrastructure. However, through data warehousing, we have been dealing with high volumes of data. These systems were not developed with Big Data in mind and are very expensive. The other challenge is the data velocity. So much data is generated in real time to be captured, read, analyzed and moved. A traditional data warehousing system would take days to complete this.

Also, the variety of data that is being generated poses a challenge. Traditional relational database systems were designed to take in predictable, consistent data structures. The data that is getting generated today is unstructured, coming from various sources such as log files, pictures, text messages, emails, social media etc. IBM estimates that unstructured data comprises up to 75% of the total data generated.

Big data technologies are helping us solve the 3Vs: Volume, Velocity and Variety of data. In this project, we leveraged these Big Data frameworks to capture and store the gold mine of online data.

Natural Language Processing

Computers are far superior to the human brain on many accounts, but regarding the issue of understanding the human language, which is full of nuance, emotion, and implied intent, they don’t compare. Natural Language Processing (NLP) is the scientific discipline that is trying to fix this gap.In terms of text analysis, NLP helps us turn this humongous unstructured text data obtained from social media into structured values by identifying keywords and relationships between them.

Practices Involved

Case Studies

Retail Live Connect

How do you leverage big data to improve consumer experience and effectively target shoppers in real-time?