Computational analysis of text sentiment
This project addressed the problem of automatically
extracting sentiment, or subjective content, from any
given text. The objective of the project is, first of all,
to establish the best methods and algorithms for
extracting sentiment from texts, and, second, to implement
a system that can perform sentiment classification
automatically in a large corpus.
Sentiment is defined as subjective content, expressed in
whether a text contains positive or negative views and
opinions towards its subject matter (e.g., an opinion
piece in a newspaper, a movie review, a report on a new
product, an e-mail message, or a post on a bulletin
board). The hypothesis is that, given a text, we can
determine whether it contains subjective material, and, if
it does, we can determine its positive or negative
sentiment by parsing its discourse structure.
Part of the project includes building a discourse parser.
The first stage is a discourse segmentation tool, which
you can download from the SLSeg page.
The most detailed description of the project is in the Computational
Linguistics paper below. An informal
description in this presentation:
- Taboada, M. (2007) Thumbs Up or Thumbs Down? Detecting
Sentiment and Opinion Automatically. Presented at the Speaker
Series. ¶¡ÏãÔ°AV. November 2007. and a on the presentation.
- Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M.
Stede (2011) Lexicon-Based Methods for Sentiment
Analysis.
37 (2): 267-307.
Related outputs were corpus
coding using Appraisal Theory, and the ¶¡ÏãÔ°AV
Review Corpus. The system, SO-CAL, is available from
GitHub:
Funding:
- Natural Sciences and Engineering Research Council of
Canada (NSERC)
- Discovery Grant "A computational treatment of negation
and speculation in natural language" (2015-2020)
- Discovery Grant "Discourse parsing for summarization
and sentiment detection" (2008-2014)
- Discovery Grant "Computational analysis of text
sentiment" (2003-2008)
- Also funded through an NSERC University Faculty Award
(2004-2010)
Participants, present and past:
Rada Trnavac (Postdoc), Jennifer Hinnell (M.A. student),
Ashleigh Gonzales (M.A. student), Dennis Sharkey (M.A.
student), Debopam Das (Ph.D. student), Nicola Bergen (B.A.
student), Mathieu Dovan (B.A. student), Sam Al Khatib (M.
A. student), Vita Markman (Assistant Professor), Milan
Tofiloski (M.Sc. student), Julian Brooke (M.A. student),
Patrick Larrivee-Woods (B.Sc. student), K. Montana Hay
(B.A. student), Kim Voll (Ph.D. student), Caroline Anthony
(B.Sc. student), Jack Grieve (M.A. student), Dennis
Storoshenko (M.A. student), Katia Dilkina (B.Sc. student).
Publications, reports and manuals — please see publications page for more related
publications
- Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M.
Stede (2011) Lexicon-Based Methods for Sentiment
Analysis.
37 (2): 267-307.
- Taboada, M., J. Brooke and M. Stede (2009) Genre-Based Paragraph Classification
for Sentiment Analysis. In Proceedings of
10th Annual SIGDIAL Conference on Discourse and
Dialogue. London, UK. September 2009. pp. 62-70.
- Brooke, J., M. Tofiloski and M. Taboada (2009) Cross-Linguistic Sentiment Analysis:
From English to Spanish. In Proceedings of
RANLP 2009, Recent Advances in Natural Language
Processing. Borovets, Bulgaria. September 2009.
-- Poster
- Tofiloski, M., J. Brooke and M. Taboada (2009) A Syntactic and Lexical-Based
Discourse Segmenter. In Proceedings of the
47th Annual Meeting of the Association for
Computational Linguistics. Singapore, August
2009. pp. 77-80. -- Poster
- Brooke, J. (2009) A Semantic Approach to Automated Text
Sentiment Analysis. Master's Thesis. Department of
Linguistics. ¶¡ÏãÔ°AV.
- Stede, M., M. Taboada and J. Brooke (2008) Movie Stages Annotation Manual.
Guidelines for annotating functional zones or stages in
movie reviews. Universität Potsdam and Simon Fraser
University.
- Taboada, M., Kimberly Voll and Julian Brooke (2008) Extracting
Sentiment as a Function of Discourse Structure and
Topicality. .
- Voll, K. and M. Taboada (2007) Not All Words are
Created Equal: Extracting Semantic Orientation as a
Function of Adjective Relevance. In Proceedings of
the 20th Australian Joint Conference on Artificial
Intelligence. Gold Coast, Australia. December
2007. pp. 337-346. Paper in pdf format.
- Taboada, M., C. Anthony and K. Voll (2006) Methods
for Creating Semantic Orientation Dictionaries. Proceedings
of 5th International Conference on Language Resources
and Evaluation (LREC). Genoa, Italy. May 2006.
pp. 427-432. Paper in pdf format.
- Taboada, M. and J. Grieve (2004) Analyzing Appraisal
Automatically. American Association for Artificial
Intelligence Spring Symposium on Exploring Attitude
and Affect in Text. Stanford. March 2004. AAAI
Technical Report SS-04-07. (pp.158-161). Download paper in pdf format. - Download
poster (pdf).
Related project:
The Construction of Literary Reputation in
Britain: 1900-1950
The objective of this grant is to develop a pilot
project to study the evolution of the literary reputations
of two authors (John Galsworthy and D. H. Lawrence).
Reputation is assessed based on the automatic extraction
of key phrases from the authors' work and from writings
concerning the authors. The project will create a database
of texts, and computational tools to analyze text content
automatically.
Funding:
¶¡ÏãÔ°AV's Social Sciences and Humanities
Research Council Grant.
PI:
Mary Ann Gillies.
Co-investigators:
Paul McFetridge, Maite Taboada
Publications
- Taboada, M., M. A. Gillies, P. McFetridge and R.
Outtrim (2008) Tracking literary reputation with text
analysis tools. Presented at the Meeting of the
Society of Digital Humanities. Vancouver. June
2008. (Poster) Abstract.
- Gillies, M. A., P. McFetridge, R. Outtrim and M.
Taboada (2008) Finding, scanning, formatting and
processing literary reviews. Presented at the Symposium
of the Center for Print and Media Studies. ¶¡ÏãÔ°AV. May
2008.
- Taboada, M., M. A. Gillies and P. McFetridge (2006)
Sentiment Classification Techniques for Tracking
Literary Reputation. Proceedings of LREC Workshop,
"Towards Computational Models of Literary Analysis". Genoa,
Italy. May 2006. pp. 36-43. Paper in pdf format