1.4 Lemmatize
Stemming and lemmatizing convert word variations like “staying”, “stayed”, and “stay” into a generic form: “stay”. Stemming tends to chop off endings to create a root word, but the stem is often not a word itself. E.g., “staying” becomes “stai”. Lemmatize gives you the more natural “stay”.
token_2 <- token_1 %>% mutate(word = textstem::lemmatize_words(word))
tibble(before = token_1$word, after = token_2$word) %>%
filter(before != after) %>%
count(before, after, sort = TRUE)
## # A tibble: 2,711 × 3
## before after n
## <chr> <chr> <int>
## 1 was be 4156
## 2 is be 2291
## 3 were be 1522
## 4 had have 1224
## 5 are be 916
## 6 an a 586
## 7 stayed stay 550
## 8 rooms room 528
## 9 well good 440
## 10 more much 341
## # ℹ 2,701 more rows