11/1/2020

Google Search Activity for Cupcakes in Guam and Egypt

Intro

Today, we will do some analysis of google data which tracks google searches of the word “Cupcakes” in select countries.

Secondly, Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

Getting Our Data From Google into R & Onto One Graph

library(readxl)
my_data <- read_excel("/Users/jaredpaulus/Downloads/Cupcake Google Searches.xlsx")
eg <- my_data$`Egypt Activity`
guam <- my_data$`Guam Activity`
plot(my_data$Month, eg ,type="l", xlab = "Time Period", ylab = "Frequency", main = "Cupcake Google Searches Over Time", ylim = c(0,110), col = "red")
lines(my_data$Month, guam, col = "blue")
legend("topright", legend = c( "Eqypt", "Guam"), col = c("red", "blue"), lty = 1:2, cex = 0.8)

Egypt’s Cupcake Search Activity Modeled with Geosmooth

library("ggplot2")
egyptPlot <- qplot(my_data$Month, eg, xlab = "Time Period", ylab = "Frequency")
egyptPlot + geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

Now Let’s do Hypothesis Testing

Lets look at Guam’s average activity in 2010 and test that sample

We will use a t test of guam’s activity in year 2010 against the null hypothesis that the true population average is 20 of activity.

Thus, we will reject the null hypothesis that the true average is 20 units of activity.

guam2010 <- guam[c(73:84)]
mean(guam2010)
## [1] 16.83333
t.test(guam2010, n = 20)
## 
##  One Sample t-test
## 
## data:  guam2010
## t = 6.7884, df = 11, p-value = 3e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  11.37552 22.29115
## sample estimates:
## mean of x 
##  16.83333

Using Plotly, we will plot Guam’s Cupcake Popularity

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(my_data, x = my_data$Month, y = guam, xaxis = "Time Period", yaxis = "Frequency")
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

We see the Frequency of Search start to rise, then fall drastically

Lastly, we’ll look at Egypt Activity

Here is our Output

library("ggplot2")

gghi <- ggplot(my_data, aes(x = 'Egypt Activity'))

gghi + geom_bar(stat = "count")