So I heard from someone that they are using R to mine text, to look for sentiment in statements about the market.
I thought I’d give it a try, but instead using the Gutenberg Press text of Jane Eyre.
I used the TextMining package because I found that first, and it got me started, though I haven’t done any of the real work of analysing text (like looking for correlations between words).
But still, got me started.
#text mining
library(tm)
library(wordcloud2)
library(tidyverse)
docs <- Corpus(DirSource(pattern=”text_source*”, ignore.case = TRUE, encoding = “UTF-8”))
# don’t use this, it seems to break everything
# inspect(docs)
# clean the docs
docs <- tm_map(docs, removePunctuation)
# stopwords is a very slow step, avoid running it in demo
docs <- tm_map(docs, removeWords, stopwords(“english”))
docs <- tm_map(docs, removeWords, c(“will”, “now”, “one”, “said”, “like”, “little”))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, tolower)
dtm <- DocumentTermMatrix(docs)
freq <-colSums( as.matrix(dtm))
ord <- order(freq,decreasing=TRUE)
tops <- freq[head(ord, 1000)]
wf <- tibble(word=labels(tops), count=tops) %>% filter (count < 500)
wordcloud2(data = wf)