James Anderson Dr. McMullen Psych 541- Grad stats 2/14/24

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pageviews)
library(pageviews)        # This package gets data on Wikipedia viewing
library(DT)               # DT stands for datatable, and creates interactive tables
library(infer)            # for some stats like t_test
library(devtools)
Loading required package: usethis

1.Look up views of a new Wikipedia article called “Mass shootings in the United States” over several years, graph it over time, do the same with the “Gun control” article again, and overlay the two in a single graph.

Mass_shootings <- pageviews::article_pageviews(article = "Mass shootings in the United States", start = as.Date("2017-1-1"), end = as.Date("2023-12-31"))

Mass_shootings |> 
  ggplot(aes(x = date, y = views)) +
  geom_line()

The graph shown above correlates to the amount of views on wikipedia overtime for “Mass Shootings in the United States”. As seen within the graph there is a moderate spike in 2018 however overtime this spike is increased in the year 2020, and furthermore in 2022-2023.

Gun_control <- pageviews::article_pageviews(article = "Gun control", start = as.Date("2017-1-1"), end = as.Date("2023-12-31"))

Gun_control |> 
  ggplot(aes(x = date, y = views)) +
  geom_line()

Similarly to graph 1, this second graph shows the trends of Wikipedia views regarding the topic of gun control. As we can see the trends of both graph 1 and graph 2 are similar in that there are increasing amounts of views in 2018, furthermore in 2020, and a noticeable increase in 2022-2023. However the scaling of ‘views’ is not the same. Now let’s look at both graphs together.

2.Create a scatterplot of views of the two articles.

shootings <- bind_rows(Gun_control, Mass_shootings)

shootings |> 
  ggplot(aes(color = article, x = date, y = views)) +
  geom_line()

As we can see, once layed over one another, the topic of Mass Shootings in the United States was searched way more frequently compared to Gun control. Although the points of trending were similar, the topic of Mass Shootings was viewed significantly higher.

3.Find two different mass shootings in the US that received a lot of attention and look up the dates. Use different ones than the ones I used in the example. To see whether people’s attention was on those shootings, for each shooting, create a table (using datatable() and removing special pages) and a bar graph of the top article views of the day after each mass shooting happened.

top_florida <- top_articles(start = as.Date("2023-1-1"))

top_louisiana <- top_articles(start = as.Date("2023-5-6"))

top_florida |> 
  select(article, views) |>
  filter(!article == "Main_Page", !article == "Special:Search") |>
  datatable()
top_louisiana |> 
  select(article, views) |>
  filter(!article == "Main_Page", !article == "Special:Search") |>
  top_n(10, views) |>
  ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
  geom_col(fill = "blue") +
  coord_flip() +
  scale_y_continuous (labels = scales::comma)

The graph shown above displays the top 10 most search articles on Wikipedia the day after the shooting after the fourth of July block party in Shreveport, Louisiana.

top_florida |> 
  select(article, views) |>
  filter(!article == "Main_Page", !article == "Special:Search") |>
  datatable()
top_florida |> 
  select(article, views) |>
  filter(!article == "Main_Page", !article == "Special:Search") |>
  top_n(10, views) |>
  ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
  geom_col(fill = "red") +
  coord_flip() +
  scale_y_continuous (labels = scales::comma)

The bar graph listed above shows the top 10 most frequently searched items the day after the mass shooting that occured at Miami Gardens on New Years Day of 2023.

4.Find the views of the Gun control article 7 days before and 7 days after the two shootings, combine the data, and graph it over time.

florida <- article_pageviews(article = "Gun_control",
                           start = as.Date("2023-10-18"),
                           end = as.Date("2023-11-1"))

louisiana <- article_pageviews(article = "Gun_control",
                           start = as.Date("2022-5-17"),
                           end = as.Date("2022-5-31"))

florida <- florida |> 
  mutate(day = -7:7) |> 
  mutate(event = "Florida")

louisiana <- louisiana |> 
  mutate(day = -7:7) |> 
  mutate(event = "Louisiana")
florida |> 
  ggplot(aes(x = day, y = views)) +
  geom_line()

This line graph shows the tracking of views on gun control related articles a week before and after the mass shooting that occured in florida at Miami Gardens on New Years Day in 2023.

louisiana |> 
  ggplot(aes(x = day, y = views)) +
  geom_line()

This line graph shows the tracking of views on gun control related articles a week before and after the mass shooting that occurred in Louisiana at a block party in celebration of the fourth of July in 2023.

shootings2 <- bind_rows(florida, louisiana)

shootings2 |> 
  ggplot(aes(x = day, y = views, color = event)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Days before/after Shooting", 
       y = "Wikipedia Views", 
       color = "Event", 
       title = "Views of the Wikipedia Gun Control Article before and after Two Mass Shootings")

This graph displays an overlap between the views on articles related to gun control a week both before, and after the mass shootings in Miami Gardens, Florida, and Shreveport, Louisiana. We see that with this selected scale there is a significant increase in the wikipedia views concerned with the event that occurred in Louisiana moreso than in Florida.

5.Conduct a t-test to see if views of the gun control article were higher in the 7 days after the shootings compared to the 7 days prior.

shootings2 |> 
  mutate(after_event = (day > 0)) |> 
  t_test(views ~ after_event)
Warning: The statistic is based on a difference or ratio; by default, for
difference-based statistics, the explanatory variable is subtracted in the
order "TRUE" - "FALSE", or divided in the order "TRUE" / "FALSE" for
ratio-based statistics. To specify this order yourself, supply `order =
c("TRUE", "FALSE")`.
# A tibble: 1 × 7
  statistic  t_df p_value alternative estimate lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>    <dbl>
1      2.34  13.1  0.0354 two.sided      1621.     129.    3113.
shootings2 |> 
  mutate(after_event = (day > 0)) |> 
  group_by(after_event) |> 
  summarize(Mean = mean(views),
            StdDev = sd(views),
            N = n())
# A tibble: 2 × 4
  after_event  Mean StdDev     N
  <lgl>       <dbl>  <dbl> <int>
1 FALSE        488.   207.    16
2 TRUE        2109.  2580.    14

After conducting a t-test we can see that the number of searches regarding gun control on Wikipedia in both Florida and Louisiana were significant(M=488, Sd=207) when searched within a week prior, compared to the average number of views one week after the shootings (M=2110, Sd=2580, t=2.3, p=0.35).