Some thoughts inspired by this stellar paper.
Classifying information based on sentiment and opinion is the aim of the ongoing research on Sentiment Analysis (SA) and Opinion Mining (OM). Treating concepts like ”good” and ”bad” as first class objects presents new challenges to the NLP community: on one hand, heuristics that proved successful for other feats of classification simply cannot be applied to this kind of information. On the other, the challenges of discerning opinion come bundled with those of processing user generated content.
There are several reasons to investigate the ”[…] computational treatment of opinion, sentiment and subjectivity”. Improved search accuracy and recommendation system are obvious candidates - users want to know how others feel about certain object or services. Corporations are eager to collect intelligence on how their products are perceived. Classifying information as ”hateful” might help governments preventing crime. Summarization can certainly benefit by isolating subjective paragraphs to either present or omit them. Classifying opinions as liberal or conservative allows for more accurate predictions on the outcomes of elections. Intelligent systems could boost their performance by adapting to the mood of the user.
One of the first to undertake this task was Peter D. Turney, as described in his article “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”.
He does it by using Pointwise Mutual Information (PMI) to calculate the Semantic Orientation (SO) of sentences in a review
The PMI is estimated by issuing queries to a search engine and gets promoted to PMI-IR - the derived estimation of SO from (1) and (2) is
The review is classified by calculating the positive/negative average.
The results bring interesting reflections on the challenges of sentiment analysis. The classifier does a good job with automobile reviews and a poor one with movie reviews. This might be because the quality of a car seems to be a function of the quality of its parts, yet a good director does not necessarily make for a good movie. Turney points us to examples such as this one:
Well as usual Keanu Reeves is nothing special, but surprisingly, the very talented Laurence Fishbourne is not so good either, I was surprised.
This sentence does not contain any word that is in itself easily associable to negative sentiment. To correctly classify this review we have to rely on our understanding of the semantic dependence of words like special on polarity reversers like nothing. Turney’s classifier is mislead by very talented and classifies the review as positive.
It is naive to think that OM can be done by keywords alone: sentiment is a subtle phenomena. Positive features can be apparent to a human and obscure to a classifier as much as the opposite (the feature still has been found a good indicator of positivity - so you can run and tell that, luddites!). Term presence has been shown to be more significant than term frequency; a certain token in a certain position can effect the overall status of subjectivity in the text. Parts of speech such as adjectives are strongly correlated to subjective content; patterns can be useful to disambiguate opinion.