3.1 Subjectivity Lexicons

A subjectivity lexicon is a predefined list of words associated with emotional context such as positive/negative. Subjectivity lexicons are typically short (a few thousand words), but work because of Zipf’s law which holds that the nth-ranked item in a frequency table has a frequency count equal to 1/n of the top-ranked item. So infrequently used words are used very infrequently.

There are three common sentiment lexicons. Bing is common for polarity scoring, AFINN for emotion classification. NRC is a less common option for emotion classification.

Bing classifies words as positive or negative.

bing <- tidytext::get_sentiments("bing") %>%
  # remove dups
  filter(!word %in% c("envious", "enviously", "enviousness"))

bing %>% 
  count(sentiment) %>% 
  adorn_totals() %>%
  flextable::flextable() %>% 
  flextable::autofit()

sentiment

n

negative

4,778

positive

2,002

Total

6,780

AFINN, by Finn Arup Nielsen, associates words with a manually rated valence integer between -5 (negative) and +5 (positive).

afinn <- tidytext::get_sentiments("afinn")

afinn %>%
  count(value) %>% 
  adorn_totals() %>%
  flextable::flextable() %>% 
  flextable::autofit()

value

n

-5

16

-4

43

-3

264

-2

966

-1

309

0

1

1

208

2

448

3

172

4

45

5

5

Total

2,477

NRC lexicon associates words with eight emotions corresponding to the second level of Plutchik’s Wheel of Emotions and two sentiments (negative and positive). NRC was created by manual annotation on a crowd sourcing platform (see this).

nrc <- tidytext::get_sentiments("nrc")

nrc %>%
  count(sentiment) %>% 
  adorn_totals() %>%
  flextable::flextable() %>% 
  flextable::autofit()

sentiment

n

anger

1,245

anticipation

837

disgust

1,056

fear

1,474

joy

687

negative

3,316

positive

2,308

sadness

1,187

surprise

532

trust

1,230

Total

13,872