To start, I installed the Anaconda Python distribution.
I then used the use_conda()
function from the R package reticulate to specify that I wanted to use the primary environment for python from that distribution (and not a virtual environment created within Anaconda; those can be listed with reticulate::conda_list()
; the name of the environment that you want to use can simply replace “anaconda3” in the code below):
library(reticulate)
library(tidyverse)
library(lubridate)
use_condaenv("anaconda3")
my_tweets <- read_csv("~/OneDrive - University of Tennessee/ngsschat/all-ngsschat-tweets-flattened.csv")
If you wanted, py_config()
would return inforation on the version of python available. use_python()
can be used to specify a different version (i.e., the one that is built-in to Mac computers, rather than the one in Anaconda):
Use r.my_tweets
to access the object my_tweets
generated by the above code:
d = r.my_tweets
import vaderSentiment as vs
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()
def sentiment_analyzer_scores(text):
# this function is from here: https://towardsdatascience.com/almost-real-time-twitter-sentiment-analysis-with-tweep-vader-f88ed5b93b1c
score = analyser.polarity_scores(text)
lb = score['compound']
if lb >= 0.05:
return 1
elif (lb > -0.05) and (lb < 0.05):
return 0
else:
return -1
out = []
for i in d.index:
tmp = sentiment_analyzer_scores(d.text[i])
out.append(tmp)
d['sentiment'] = out
reticulate automatically translates the R data frame/tibble into a pandas data frame; it does the same in reverse in the code below. I note that I’m not a great Python programmer.
Use py$d
do access the object d
generated by the code above:
py$d %>%
mutate(date = round_date(created_at, "day")) %>%
group_by(date) %>%
summarize(mean_sentiment = mean(sentiment)) %>%
ggplot(aes(x = date, y = mean_sentiment)) +
geom_point() +
geom_smooth()