Research: Sentiment Analysis
I have worked extensively with sentiment analysis systems, applying them to international and domestic news coverage, historical and contemporary material, document collections, social media, and communications flows for academic, governmental, and corporate projects. I have also worked on cross-domain and learning systems for maximizing the accuracy of such systems by tailoring them for specific disciplines.
Sentiment mining, also known as "tone analysis", uses complex perceptual models to estimate the average emotional response of a reader to a given passage of text or the emotional state of the author of that text. Additional dimensions may measure the potential "energy" and "persuasiveness" of the text. Sentiment mining has become an increasingly popular tool for corporate "brand mining," offering a rough estimate of the overall tone of coverage of that organization.
- Whole-Document vs Entity Sentiment. Basic systems estimate only the overall tone of a given document. A blogger posting that she loves the lens of her new camera but hates the battery life would result in a net tone score of neutral, because the extreme positive and extreme negative scores negate each other at the document level. Entity-level systems delve further to separate which aspects of the product or brand the customer likes and dislikes.
- Human-Trained vs Learning Systems. Most current sentiment mining systems are built by having a large team of human editors (often college students) read through the dictionary or through a large body of documents, and assign scores from positive to negative to each word, which are then aggregated together. Learning systems learn the "contexts" of positive and negative words and continually update their internal lexicons with new words and "sayings," allowing them to evolve to changing language use.
- Cross-Domain. Sentiment systems tend to perform best on the text they were initially trained for. A system trained on movie reviews will traditionally do poorly when scoring computer reviews or news articles. Domain customization and cross-domain techniques adjust the internal models to work on data from other disciplines. This requires an understanding of language use and perceptual models.