3.1 Subjectivity Lexicons
A subjectivity lexicon is a predefined list of words associated with emotional context such as positive/negative. Subjectivity lexicons are typically short (a few thousand words), but work because of Zipf’s law which holds that the nth-ranked item in a frequency table has a frequency count equal to 1/n of the top-ranked item. So infrequently used words are used very infrequently.
There are three common sentiment lexicons. Bing is common for polarity scoring, AFINN for emotion classification. NRC is a less common option for emotion classification.
Bing classifies words as positive or negative.
bing <- tidytext::get_sentiments("bing") %>%
# remove dups
filter(!word %in% c("envious", "enviously", "enviousness"))
bing %>%
count(sentiment) %>%
adorn_totals() %>%
flextable::flextable() %>%
flextable::autofit()
sentiment | n |
---|---|
negative | 4,778 |
positive | 2,002 |
Total | 6,780 |
AFINN, by Finn Arup Nielsen, associates words with a manually rated valence integer between -5 (negative) and +5 (positive).
afinn <- tidytext::get_sentiments("afinn")
afinn %>%
count(value) %>%
adorn_totals() %>%
flextable::flextable() %>%
flextable::autofit()
value | n |
---|---|
-5 | 16 |
-4 | 43 |
-3 | 264 |
-2 | 966 |
-1 | 309 |
0 | 1 |
1 | 208 |
2 | 448 |
3 | 172 |
4 | 45 |
5 | 5 |
Total | 2,477 |
NRC lexicon associates words with eight emotions corresponding to the second level of Plutchik’s Wheel of Emotions and two sentiments (negative and positive). NRC was created by manual annotation on a crowd sourcing platform (see this).
nrc <- tidytext::get_sentiments("nrc")
nrc %>%
count(sentiment) %>%
adorn_totals() %>%
flextable::flextable() %>%
flextable::autofit()
sentiment | n |
---|---|
anger | 1,245 |
anticipation | 837 |
disgust | 1,056 |
fear | 1,474 |
joy | 687 |
negative | 3,316 |
positive | 2,308 |
sadness | 1,187 |
surprise | 532 |
trust | 1,230 |
Total | 13,872 |