March 2011
1 post
On SA Corpora
Movie Review Data includes:
The sentiment polarity datasets - 1000 positive and 1000 negative reviews, lightly tokenized (contracted forms are preserved, but punctuation is spaced) and down-cased, one sentence per line. Reviews were classified according to the star-rating.
The sentiment scale datasets - four sets of approximately 1300 “subjective snippets” each from four reviewers....
February 2011
4 posts
On Dependencies and Possible Opinions
Extracting Product Features and Opinions from Reviews (Popescu & Etzioni, 2005) brings some interesting ideas that intersect with SA and parsing. Rather than identifying windows for product features by counting k words surrounding a feature, they extract the dependencies of features to surrounding words. This allows them to use a set of domain-independent rules for the extraction of potential...
The case for Dependency Parsing
Dependency links are closer to the semantic relationships between the words
No rule-nodes: one-node-per-word makes for simpler computation
Word-at-a-time parsing: no need to wait for whole phrases to form
More adequate treatment of languages with variable word-order
It is impossible to distinguish exactly what phrase-modifiers actually modify (think of ambiguous PP attachment), but that is...
On Sentiment
Some thoughts inspired by this stellar paper.
Classifying information based on sentiment and opinion is the aim of the ongoing research on Sentiment Analysis (SA) and Opinion Mining (OM). Treating concepts like ”good” and ”bad” as first class objects presents new challenges to the NLP community: on one hand, heuristics that proved successful for other feats of classification simply cannot be...
Some meta-information
I decided to tumblog short notes on my MA thesis. This blog will be a digest of related literature and experiments, mostly in the form of “notes to self”. If your interests intersect with language technology and computer science, please be gracious about the approximations contained within it: I aim solely to track my own progress and to force myself to write in a presentable manner. I...