Pars Orationis

Month

March 2011

1 post

On SA Corpora

Movie Review Data includes:

  • The sentiment polarity datasets - 1000 positive and 1000 negative reviews, lightly tokenized (contracted forms are preserved, but punctuation is spaced) and down-cased, one sentence per line. Reviews were classified according to the star-rating.
  • The sentiment scale datasets -  four sets of approximately 1300 “subjective snippets” each from four reviewers. Each set is paired (with a per-line correspondance) with three files containing labels for each snippet: one with three classes, one with four and a more fine-grained one ([0-1] with stepsize 0.1 or smaller).
  • The subjectivity datasets - 5000 subjective (reviews) and 5000 objective (movie plots) snippets, one per line.

The MPQA Opinion Corpus is a database comprising five subsets of extensively (manually) annotated data. The annotations include:

  • Agent - marks phrases that refer to sources of private states and speech events, or phrases that refer to agents who are targets of an attitude.
  • Expressive-subjectivity - Marks expressive-subjective elements, words and phrases that indirectly express a private state.  For example, ’fraud’ and ‘daylight robbery’ in the following sentence are expressive-subjective elements.
  • Direct-subjective - Marks direct mentions of private states and speech events (spoken or written) expressing private states.
  • Objective-speech-event annotation - Marks speech events that do not express private states.
  • Attitude - Marks the attitudes that compose the expressed private states (attitude is discussed in greater detail in the excerpt “Representing attitude and targets”).
  • Target - Marks the targets of the attitudes, i.e., what the attitudes are about or what the attitudes are directed toward. 
  • Inside - The term ‘inside’ refers to the words inside the scope of a direct private state or speech event phrase.

Customer Reviews Datasets includes 2 sets of lightly tokenized reviews for five and nine products respectively. Product features immediately precede a positive/negative rating tag (e.g. [+3]). Additional metadata informs on feature absence, possible need for pronoun resolution and whether or not the opinionated sentence is a comparison or a suggestion.

    Mar 9, 2011
    Next page →
    2011
    • January
    • February 4
    • March 1
    • April
    • May
    • June
    • July
    • August
    • September
    • October
    • November
    • December