¶¡ÏãÔ°AV

Computational analysis of text sentiment

This project addressed the problem of automatically extracting sentiment, or subjective content, from any given text. The objective of the project is, first of all, to establish the best methods and algorithms for extracting sentiment from texts, and, second, to implement a system that can perform sentiment classification automatically in a large corpus.

Sentiment is defined as subjective content, expressed in whether a text contains positive or negative views and opinions towards its subject matter (e.g., an opinion piece in a newspaper, a movie review, a report on a new product, an e-mail message, or a post on a bulletin board). The hypothesis is that, given a text, we can determine whether it contains subjective material, and, if it does, we can determine its positive or negative sentiment by parsing its discourse structure.

Part of the project includes building a discourse parser. The first stage is a discourse segmentation tool, which you can download from the SLSeg page.

The most detailed description of the project is in the Computational Linguistics  paper below. An informal description in this presentation:

  • Taboada, M. (2007) Thumbs Up or Thumbs Down? Detecting Sentiment and Opinion Automatically. Presented at the Speaker Series. ¶¡ÏãÔ°AV. November 2007. and a on the presentation.
  • Taboada, M., J. Brooke, M. Tofiloski, K. Voll and M. Stede (2011) Lexicon-Based Methods for Sentiment Analysis. 37 (2): 267-307.

Related outputs were  corpus coding using Appraisal Theory, and the ¶¡ÏãÔ°AV Review Corpus. The system, SO-CAL, is available from GitHub:

  • .

Funding:

  • Natural Sciences and Engineering Research Council of Canada (NSERC)
  • Discovery Grant "A computational treatment of negation and speculation in natural language" (2015-2020)
  • Discovery Grant "Discourse parsing for summarization and sentiment detection" (2008-2014)
  •  Discovery Grant "Computational analysis of text sentiment" (2003-2008)
  • Also funded through an NSERC University Faculty Award (2004-2010)

Participants, present and past:

Rada Trnavac (Postdoc), Jennifer Hinnell (M.A. student), Ashleigh Gonzales (M.A. student), Dennis Sharkey (M.A. student), Debopam Das (Ph.D. student), Nicola Bergen (B.A. student), Mathieu Dovan (B.A. student), Sam Al Khatib (M. A. student), Vita Markman (Assistant Professor), Milan Tofiloski (M.Sc. student), Julian Brooke (M.A. student), Patrick Larrivee-Woods (B.Sc. student), K. Montana Hay (B.A. student), Kim Voll (Ph.D. student), Caroline Anthony (B.Sc. student), Jack Grieve (M.A. student), Dennis Storoshenko (M.A. student), Katia Dilkina (B.Sc. student).

Publications, reports and manuals — please see publications page for more related publications

Related project:

The Construction of Literary Reputation in Britain: 1900-1950

The objective of this grant is to develop a pilot project to study the evolution of the literary reputations of two authors (John Galsworthy and D. H. Lawrence). Reputation is assessed based on the automatic extraction of key phrases from the authors' work and from writings concerning the authors. The project will create a database of texts, and computational tools to analyze text content automatically.

Funding:

¶¡ÏãÔ°AV's Social Sciences and Humanities Research Council Grant.

PI:

Mary Ann Gillies.

Co-investigators:

Paul McFetridge, Maite Taboada

Publications

  • Taboada, M., M. A. Gillies, P. McFetridge and R. Outtrim (2008) Tracking literary reputation with text analysis tools. Presented at the Meeting of the Society of Digital Humanities. Vancouver. June 2008. (Poster) Abstract.
  • Gillies, M. A., P. McFetridge, R. Outtrim and M. Taboada (2008) Finding, scanning, formatting and processing literary reviews. Presented at the Symposium of the Center for Print and Media Studies. ¶¡ÏãÔ°AV. May 2008.
  • Taboada, M., M. A. Gillies and P. McFetridge (2006) Sentiment Classification Techniques for Tracking Literary Reputation. Proceedings of LREC Workshop, "Towards Computational Models of Literary Analysis". Genoa, Italy. May 2006. pp. 36-43. Paper in pdf format