Wikipedia API- TG

#devtools::install_github("ironholds/pageviews")
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pageviews)        # This package gets data on Wikipedia viewing
library(DT)               # DT stands for datatable, and creates interactive tables
library(infer)            # for some stats like t_test
library(devtools)
Loading required package: usethis
gun_control <- article_pageviews(article = "Gun_control", start = as.Date("2018-02-01"), end = as.Date("2023-12-31"))

Mass_Shootings <- article_pageviews(article = "Mass_shootings_in_the_United_States", start = as.Date("2018-02-01"), end = as.Date("2023-12-31"))

glimpse(gun_control)
Rows: 2,160
Columns: 8
$ project     <chr> "wikipedia", "wikipedia", "wikipedia", "wikipedia", "wikip…
$ language    <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en", "en"…
$ article     <chr> "Gun_control", "Gun_control", "Gun_control", "Gun_control"…
$ access      <chr> "all-access", "all-access", "all-access", "all-access", "a…
$ agent       <chr> "all-agents", "all-agents", "all-agents", "all-agents", "a…
$ granularity <chr> "daily", "daily", "daily", "daily", "daily", "daily", "dai…
$ date        <dttm> 2018-02-01, 2018-02-02, 2018-02-03, 2018-02-04, 2018-02-0…
$ views       <dbl> 913, 535, 297, 376, 640, 639, 599, 678, 519, 240, 319, 688…

Next I am going to use ggplot to graph gun control and mass shootings.

gun_control |> 
  ggplot(aes(x = date, y = views)) +
  geom_line()

With this graph we are able to see that there was a spike in gun control in 2022.

gun_control |> 
  slice_max(views, n = 10)
     project language     article     access      agent granularity       date
1  wikipedia       en Gun_control all-access all-agents       daily 2022-05-25
2  wikipedia       en Gun_control all-access all-agents       daily 2018-02-16
3  wikipedia       en Gun_control all-access all-agents       daily 2018-02-15
4  wikipedia       en Gun_control all-access all-agents       daily 2022-05-26
5  wikipedia       en Gun_control all-access all-agents       daily 2018-02-22
6  wikipedia       en Gun_control all-access all-agents       daily 2018-02-20
7  wikipedia       en Gun_control all-access all-agents       daily 2018-02-19
8  wikipedia       en Gun_control all-access all-agents       daily 2018-02-21
9  wikipedia       en Gun_control all-access all-agents       daily 2018-02-18
10 wikipedia       en Gun_control all-access all-agents       daily 2018-02-23
   views
1   9666
2   6549
3   6117
4   5587
5   5142
6   4233
7   4229
8   3914
9   3760
10  3754

The graph below is a line graph of Mass shootings.

Mass_Shootings |> 
  ggplot(aes(x = date, y = views)) +
  geom_line()

Looking at this graph we can see that there was a big spike in Mass shooting in the United States in 2022.

Mass_Shootings |> 
  slice_max(views, n = 10)
     project language                             article     access      agent
1  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
2  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
3  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
4  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
5  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
6  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
7  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
8  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
9  wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
10 wikipedia       en Mass_shootings_in_the_United_States all-access all-agents
   granularity       date  views
1        daily 2022-05-25 224670
2        daily 2019-08-04 187025
3        daily 2019-08-05 140170
4        daily 2022-05-26 104011
5        daily 2019-08-06 100574
6        daily 2018-02-15  87552
7        daily 2023-10-26  83192
8        daily 2018-02-16  70186
9        daily 2022-05-27  68475
10       daily 2021-03-23  68186
Mass_Shootings <- article_pageviews(article = "Mass shootings in the United States", start = as.Date("2017-1-1"), end = as.Date("2023- 12-31")) 
guns <- bind_rows(gun_control, Mass_Shootings) 

Below is a graph of both articles of Mass shootings in the United States and Gun control.

guns |> 
  ggplot(aes(x = date, y = views, color = article)) +
  geom_line()

  1. With this graph we can see that mass shootings was viewed more than gun control but it is still hard to fully see since they overlap with each other. We are able to see that from the previous graphs before that both spiked during the same time in 2022. If you look online you are able to see that before December 31st, 2022 there were some mass shootings. This showed a rise in gun violence so people pushed for more gun control. On December 31st, 2022 they passed a gun safety legislation law.
guns |> 
  pivot_wider(names_from = article, values_from = views)
# A tibble: 2,554 × 8
   project   language access   agent granularity date                Gun_control
   <chr>     <chr>    <chr>    <chr> <chr>       <dttm>                    <dbl>
 1 wikipedia en       all-acc… all-… daily       2018-02-01 00:00:00         913
 2 wikipedia en       all-acc… all-… daily       2018-02-02 00:00:00         535
 3 wikipedia en       all-acc… all-… daily       2018-02-03 00:00:00         297
 4 wikipedia en       all-acc… all-… daily       2018-02-04 00:00:00         376
 5 wikipedia en       all-acc… all-… daily       2018-02-05 00:00:00         640
 6 wikipedia en       all-acc… all-… daily       2018-02-06 00:00:00         639
 7 wikipedia en       all-acc… all-… daily       2018-02-07 00:00:00         599
 8 wikipedia en       all-acc… all-… daily       2018-02-08 00:00:00         678
 9 wikipedia en       all-acc… all-… daily       2018-02-09 00:00:00         519
10 wikipedia en       all-acc… all-… daily       2018-02-10 00:00:00         240
# ℹ 2,544 more rows
# ℹ 1 more variable: Mass_shootings_in_the_United_States <dbl>
guns |> 
  pivot_wider(names_from = article, values_from = views) |> 
  ggplot(aes(x = Mass_shootings_in_the_United_States, y = Gun_control)) +          # create scatterplot
  geom_point() +
  geom_smooth(method = lm) +                                  # create regression line
  labs(x = "Views of the Wikipedia Mass shootings in the US article", 
       y = "Views of the Wikipedia Gun control article", 
       title = "Relationship between Wikipedia article views") 
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 394 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 394 rows containing missing values (`geom_point()`).

  1. Here is a scatter plot of Mass shootings in the United States and Gun control. We are able to see a relationship between the two. Looking at this we are able to see that it is more clumped together in the beginning and started to spread out comparing the views with one another. We can see how they both correlate with one another.
topmichigan <- top_articles(start = as.Date("2023-2-13"))

topmichigan |> 
  select(article, views) |> 
  filter(!article == "Main_Page", !article == "Special:Search") |> 
  slice_max(views, n = 10) |> 
  datatable() 
topmaryland <- top_articles(start = as.Date("2023-7-2"))

topmaryland |> 
  select(article, views) |> 
  filter(!article == "Main_Page", !article == "Special:Search") |> 
  slice_max(views, n = 10) |> 
  datatable() 
topmichigan |> 
  select(article, views) |> 
  filter(!article == "Main_Page", !article == "Special:Search") |> 
  top_n(10, views) |> 
  ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
  geom_col(fill = "pink") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(y = "Number of Views", x = "Article", title = "Top Wikipedia articles, FEB. 13, 2023")

    1. Looking at this graph we are able to see that due to the date people searched more about the Super Bowl, players, and the half time show than the mass shooting in Michigan. A quick search on the internet proved this due to the Superbowl being Feb. 12th so the next day people searched more about that than anything else. So this helps show us that depending on what is going on and how big it is people will search the more popular thing.
topmaryland |> 
  select(article, views) |> 
  filter(!article == "Main_Page", !article == "Special:Search") |> 
  top_n(10, views) |> 
  ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
  geom_col(fill = "purple") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(y = "Number of Views", x = "Article", title = "Top Wikipedia articles, JULY 2, 2023")

    1. Looking at this we see that the most searched/viewed thing that day was The Idol. We are able to see depending on what is more popular that day is what people are going to look at. The one relevant view to gun control was deaths in 2023 and it was on the lower end of the views.
Maryland <- article_pageviews(article = "Mass_shootings",
                           start = as.Date("2023-6-25"),
                           end = as.Date("2023-7-9"))
Maryland <- article_pageviews(article = "Gun_control",
                           start = as.Date("2023-6-25"),
                           end = as.Date("2023-7-9"))

Maryland <- Maryland |> 
  mutate(day = -7:7) |> 
  mutate(event = "Maryland")

Maryland |> 
  ggplot(aes(x = day, y = views)) +
  geom_line()

Michigan <- article_pageviews(article = "Gun_control",
                           start = as.Date("2023-2-15"),
                           end = as.Date("2023-3-1"))

Michigan <- Michigan |> 
  mutate(day = -7:7) |> 
  mutate(event = "Michigan")

Michigan |> 
  ggplot(aes(x = day, y = views)) +
  geom_line()

shootings <- bind_rows(Maryland, Michigan) 

shootings |> 
  ggplot(aes(x = day, y = views, color = event)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Days before/after Shooting", 
       y = "Wikipedia Views", 
       color = "Event", 
       title = "Views of the Wikipedia Gun Control Article before and after Two Mass Shootings")

  1. With this graph I looked at 7 day prior to the shooting and 7 days after the shooting, so a total of 14 days that we are looking at. With this we can see that Maryland had less views on gun control before and after a mass shooting. We can see that Michigan had more views of gun control prior and after a mass shooting. We can even see with Michigan that there was more of a spike in views a couple of days after. We can even see that in Michigan the views spiked a day before the mass shooting. While in Maryland they stayed more neutral with the views in gun control.
shootings |> 
  mutate(after_event = (day > 0)) 
     project language     article     access      agent granularity       date
1  wikipedia       en Gun_control all-access all-agents       daily 2023-06-25
2  wikipedia       en Gun_control all-access all-agents       daily 2023-06-26
3  wikipedia       en Gun_control all-access all-agents       daily 2023-06-27
4  wikipedia       en Gun_control all-access all-agents       daily 2023-06-28
5  wikipedia       en Gun_control all-access all-agents       daily 2023-06-29
6  wikipedia       en Gun_control all-access all-agents       daily 2023-06-30
7  wikipedia       en Gun_control all-access all-agents       daily 2023-07-01
8  wikipedia       en Gun_control all-access all-agents       daily 2023-07-02
9  wikipedia       en Gun_control all-access all-agents       daily 2023-07-03
10 wikipedia       en Gun_control all-access all-agents       daily 2023-07-04
11 wikipedia       en Gun_control all-access all-agents       daily 2023-07-05
12 wikipedia       en Gun_control all-access all-agents       daily 2023-07-06
13 wikipedia       en Gun_control all-access all-agents       daily 2023-07-07
14 wikipedia       en Gun_control all-access all-agents       daily 2023-07-08
15 wikipedia       en Gun_control all-access all-agents       daily 2023-07-09
16 wikipedia       en Gun_control all-access all-agents       daily 2023-02-15
17 wikipedia       en Gun_control all-access all-agents       daily 2023-02-16
18 wikipedia       en Gun_control all-access all-agents       daily 2023-02-17
19 wikipedia       en Gun_control all-access all-agents       daily 2023-02-18
20 wikipedia       en Gun_control all-access all-agents       daily 2023-02-19
21 wikipedia       en Gun_control all-access all-agents       daily 2023-02-20
22 wikipedia       en Gun_control all-access all-agents       daily 2023-02-21
23 wikipedia       en Gun_control all-access all-agents       daily 2023-02-22
24 wikipedia       en Gun_control all-access all-agents       daily 2023-02-23
25 wikipedia       en Gun_control all-access all-agents       daily 2023-02-24
26 wikipedia       en Gun_control all-access all-agents       daily 2023-02-25
27 wikipedia       en Gun_control all-access all-agents       daily 2023-02-26
28 wikipedia       en Gun_control all-access all-agents       daily 2023-02-27
29 wikipedia       en Gun_control all-access all-agents       daily 2023-02-28
30 wikipedia       en Gun_control all-access all-agents       daily 2023-03-01
   views day    event after_event
1    315  -7 Maryland       FALSE
2    364  -6 Maryland       FALSE
3    318  -5 Maryland       FALSE
4    331  -4 Maryland       FALSE
5    371  -3 Maryland       FALSE
6    322  -2 Maryland       FALSE
7    343  -1 Maryland       FALSE
8    335   0 Maryland       FALSE
9    336   1 Maryland        TRUE
10   373   2 Maryland        TRUE
11   308   3 Maryland        TRUE
12   360   4 Maryland        TRUE
13   330   5 Maryland        TRUE
14   304   6 Maryland        TRUE
15   293   7 Maryland        TRUE
16   651  -7 Michigan       FALSE
17   590  -6 Michigan       FALSE
18   532  -5 Michigan       FALSE
19   419  -4 Michigan       FALSE
20   440  -3 Michigan       FALSE
21   507  -2 Michigan       FALSE
22   597  -1 Michigan       FALSE
23   508   0 Michigan       FALSE
24   524   1 Michigan        TRUE
25   482   2 Michigan        TRUE
26   415   3 Michigan        TRUE
27   439   4 Michigan        TRUE
28   665   5 Michigan        TRUE
29   636   6 Michigan        TRUE
30   614   7 Michigan        TRUE
shootings |> 
  mutate(after_event = (day > 0)) |> 
  t_test(views ~ after_event)
Warning: The statistic is based on a difference or ratio; by default, for
difference-based statistics, the explanatory variable is subtracted in the
order "TRUE" - "FALSE", or divided in the order "TRUE" / "FALSE" for
ratio-based statistics. To specify this order yourself, supply `order =
c("TRUE", "FALSE")`.
# A tibble: 1 × 7
  statistic  t_df p_value alternative estimate lower_ci upper_ci
      <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>    <dbl>
1   0.00615  26.2   0.995 two.sided      0.277    -92.2     92.7
shootings |> 
  mutate(after_event = (day > 0)) |> 
  group_by(after_event) |> 
  summarize(Mean = mean(views),
            StdDev = sd(views),
            N = n())
# A tibble: 2 × 4
  after_event  Mean StdDev     N
  <lgl>       <dbl>  <dbl> <int>
1 FALSE        434.   114.    16
2 TRUE         434.   130.    14
  1. This T-test helps us show the relevance between Mass shootings and gun control. The average number of views of the Wikipedia Gun Control article in the 7 days prior to the two shootings (M = 433.9, SD = 114.) was not statistically significantly different from the average number of views in the 7 days after the shooting (M = 434.21, SD = 130), t(26.18) = .0062, p = 1.0.