Trump Word Cloud

I've been having some fun web scraping and integrating R with social APIs such as LinkedIn, Facebook and, in this case, Twitter.
I started a collection of wordclouds weighted by frequency, created from President Trump's daily tweets.
After 3 days of manually running this file, I quickly became perturbed and automated an R Script using Launchd jobs - which I am working on a tutorial for now and will share both how to automate scheduled R Scripts as well as the Evolution of Trump's Tweets.GIF being created with the wordcloud outputs

Initial Setup

Load Libraries

First load the libraries that will be used in this exercise

library(twitteR)
library(wordcloud)
library(tm)
library(XML)
library(RColorBrewer)
library(dplyr)

Connect to twitter API

Then use the twitteR package to send your credentials and create a session.
note: my personal information has been hidden here, you must enter your own
To get access to the twitter API you must sign up through there developer program. If you haven't done so already, please do so HERE

## [1] "Using direct authentication"
consumer_key <- "ENTER-CONSUMER-KEY"
consumer_secret <- "ENTER-CONSUMER-SECRET"
access_token <- "ENTER-ACCESS-TOKEN"
access_secret <- "ENTER-ACCESS-SECRET"
# setup OAuth between this app and you twitter app
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)

Most Recent 100 Tweets World Cloud

Let's take a look at what we receive by setting parameters to retrieve the most recent 100 tweets.

# create rtwitter user for Trump by search twitter for user @realDonaldTrump
## Note: the '@' symbol has been removed
D.User <- getUser('realDonaldTrump')


# exclude replies so we only have his words
Don100 <- userTimeline(D.User, n = 100, excludeReplies = TRUE)
# create data frame from twitteR list
t100.frame <- twListToDF(Don100)

# create vector of words from column of sentences
# Start with most recent 100 collection
l100 <- strsplit(t100.frame$text, " ")

## Corpus from Data Frame has TermDocumentMatrix() func issues
#d100 <- data.frame(Words = unlist(l100))

# use tm package to convert our column of words into a corpus of text
tCorpus = Corpus(VectorSource(l100))
# tm_map allows us to remove punctuation, numbers,
# make sure lower and upper case words are counted the same
# and perhaps most significant, remove stop words like the, and, but, etc
# otherwise our word cloud would be full of these words
# with how often they appear in languge
## content_transformer(tolower) has issues
## Use TermDocumentMatrix() to lower
#tCorpus = tm_map(tCorpus, content_transformer(tolower))
tCorpus = tm_map(tCorpus, removePunctuation)
tCorpus = tm_map(tCorpus, removeNumbers)
tCorpus = tm_map(tCorpus, removeWords, c(stopwords("SMART")))

myTDM = TermDocumentMatrix(tCorpus,
                           control = list(minWordLength = 1,
                                          tolower = TRUE))
m <- as.matrix(myTDM)
v <- sort(rowSums(m), decreasing = TRUE)
wordcloud(names(v), v, scale = c(3,0.75), min.freq = 1,
              colors = brewer.pal(7, "Dark2"))

The complete collection of stopwords in the SMART list can be seen at the SMART information retrieval system. The SMART list is a reliable and fairly standard set of stopwords and is also used by the MC toolkit.

Using a table to make a word cloud

We can also create a word cloud from the table as shown here.
Using the most recent 100 tweets data frame, d100
# create table from d100 data frame, grouping by unique words
# while providing a new column for the tally of each
# finally sort the table by number of occurrences for each unique word
t100 <- d100 %>% group_by(Words) %>% tally(sort = TRUE)

wordcloud(words = t100$Words, freq = t100$n, scale = c(3,0.5),
          random.order = FALSE, min.freq = 2,
          colors = brewer.pal(6, "Reds"))

Pulling Max Trump Tweets

the twitteR package's userTimeline() function has a 3200 max n-number of returned tweets. This can be 3200 just from the user passed through, or if the userTimeline() function arguement is left to its default value of excludeReplies = FALSE then a max of 3200 tweets will be the combined most recent user object's tweets and replies to those tweets.

NOTE: while I requested 3200 tweets, I only received 718 on Feb 8, 2017. It appears to be common for Trump to delete his old tweets nearly everday Extra NOTE: You can pull tweets that have been deleted. Thus, Twitter keeps an accessable record of deleted tweets. Just an FYI, since I've pulled deleted tweets from Trump's profile before