#devtools::install_github("ironholds/pageviews")Wikipedia API- TG
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pageviews) # This package gets data on Wikipedia viewing
library(DT) # DT stands for datatable, and creates interactive tables
library(infer) # for some stats like t_test
library(devtools)Loading required package: usethis
gun_control <- article_pageviews(article = "Gun_control", start = as.Date("2018-02-01"), end = as.Date("2023-12-31"))
Mass_Shootings <- article_pageviews(article = "Mass_shootings_in_the_United_States", start = as.Date("2018-02-01"), end = as.Date("2023-12-31"))
glimpse(gun_control)Rows: 2,160
Columns: 8
$ project <chr> "wikipedia", "wikipedia", "wikipedia", "wikipedia", "wikip…
$ language <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en", "en"…
$ article <chr> "Gun_control", "Gun_control", "Gun_control", "Gun_control"…
$ access <chr> "all-access", "all-access", "all-access", "all-access", "a…
$ agent <chr> "all-agents", "all-agents", "all-agents", "all-agents", "a…
$ granularity <chr> "daily", "daily", "daily", "daily", "daily", "daily", "dai…
$ date <dttm> 2018-02-01, 2018-02-02, 2018-02-03, 2018-02-04, 2018-02-0…
$ views <dbl> 913, 535, 297, 376, 640, 639, 599, 678, 519, 240, 319, 688…
Next I am going to use ggplot to graph gun control and mass shootings.
gun_control |>
ggplot(aes(x = date, y = views)) +
geom_line()With this graph we are able to see that there was a spike in gun control in 2022.
gun_control |>
slice_max(views, n = 10) project language article access agent granularity date
1 wikipedia en Gun_control all-access all-agents daily 2022-05-25
2 wikipedia en Gun_control all-access all-agents daily 2018-02-16
3 wikipedia en Gun_control all-access all-agents daily 2018-02-15
4 wikipedia en Gun_control all-access all-agents daily 2022-05-26
5 wikipedia en Gun_control all-access all-agents daily 2018-02-22
6 wikipedia en Gun_control all-access all-agents daily 2018-02-20
7 wikipedia en Gun_control all-access all-agents daily 2018-02-19
8 wikipedia en Gun_control all-access all-agents daily 2018-02-21
9 wikipedia en Gun_control all-access all-agents daily 2018-02-18
10 wikipedia en Gun_control all-access all-agents daily 2018-02-23
views
1 9666
2 6549
3 6117
4 5587
5 5142
6 4233
7 4229
8 3914
9 3760
10 3754
The graph below is a line graph of Mass shootings.
Mass_Shootings |>
ggplot(aes(x = date, y = views)) +
geom_line()Looking at this graph we can see that there was a big spike in Mass shooting in the United States in 2022.
Mass_Shootings |>
slice_max(views, n = 10) project language article access agent
1 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
2 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
3 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
4 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
5 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
6 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
7 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
8 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
9 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
10 wikipedia en Mass_shootings_in_the_United_States all-access all-agents
granularity date views
1 daily 2022-05-25 224670
2 daily 2019-08-04 187025
3 daily 2019-08-05 140170
4 daily 2022-05-26 104011
5 daily 2019-08-06 100574
6 daily 2018-02-15 87552
7 daily 2023-10-26 83192
8 daily 2018-02-16 70186
9 daily 2022-05-27 68475
10 daily 2021-03-23 68186
Mass_Shootings <- article_pageviews(article = "Mass shootings in the United States", start = as.Date("2017-1-1"), end = as.Date("2023- 12-31")) guns <- bind_rows(gun_control, Mass_Shootings) Below is a graph of both articles of Mass shootings in the United States and Gun control.
guns |>
ggplot(aes(x = date, y = views, color = article)) +
geom_line()- With this graph we can see that mass shootings was viewed more than gun control but it is still hard to fully see since they overlap with each other. We are able to see that from the previous graphs before that both spiked during the same time in 2022. If you look online you are able to see that before December 31st, 2022 there were some mass shootings. This showed a rise in gun violence so people pushed for more gun control. On December 31st, 2022 they passed a gun safety legislation law.
guns |>
pivot_wider(names_from = article, values_from = views)# A tibble: 2,554 × 8
project language access agent granularity date Gun_control
<chr> <chr> <chr> <chr> <chr> <dttm> <dbl>
1 wikipedia en all-acc… all-… daily 2018-02-01 00:00:00 913
2 wikipedia en all-acc… all-… daily 2018-02-02 00:00:00 535
3 wikipedia en all-acc… all-… daily 2018-02-03 00:00:00 297
4 wikipedia en all-acc… all-… daily 2018-02-04 00:00:00 376
5 wikipedia en all-acc… all-… daily 2018-02-05 00:00:00 640
6 wikipedia en all-acc… all-… daily 2018-02-06 00:00:00 639
7 wikipedia en all-acc… all-… daily 2018-02-07 00:00:00 599
8 wikipedia en all-acc… all-… daily 2018-02-08 00:00:00 678
9 wikipedia en all-acc… all-… daily 2018-02-09 00:00:00 519
10 wikipedia en all-acc… all-… daily 2018-02-10 00:00:00 240
# ℹ 2,544 more rows
# ℹ 1 more variable: Mass_shootings_in_the_United_States <dbl>
guns |>
pivot_wider(names_from = article, values_from = views) |>
ggplot(aes(x = Mass_shootings_in_the_United_States, y = Gun_control)) + # create scatterplot
geom_point() +
geom_smooth(method = lm) + # create regression line
labs(x = "Views of the Wikipedia Mass shootings in the US article",
y = "Views of the Wikipedia Gun control article",
title = "Relationship between Wikipedia article views") `geom_smooth()` using formula = 'y ~ x'
Warning: Removed 394 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 394 rows containing missing values (`geom_point()`).
- Here is a scatter plot of Mass shootings in the United States and Gun control. We are able to see a relationship between the two. Looking at this we are able to see that it is more clumped together in the beginning and started to spread out comparing the views with one another. We can see how they both correlate with one another.
topmichigan <- top_articles(start = as.Date("2023-2-13"))
topmichigan |>
select(article, views) |>
filter(!article == "Main_Page", !article == "Special:Search") |>
slice_max(views, n = 10) |>
datatable() topmaryland <- top_articles(start = as.Date("2023-7-2"))
topmaryland |>
select(article, views) |>
filter(!article == "Main_Page", !article == "Special:Search") |>
slice_max(views, n = 10) |>
datatable() topmichigan |>
select(article, views) |>
filter(!article == "Main_Page", !article == "Special:Search") |>
top_n(10, views) |>
ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
geom_col(fill = "pink") +
coord_flip() +
scale_y_continuous(labels = scales::comma) +
labs(y = "Number of Views", x = "Article", title = "Top Wikipedia articles, FEB. 13, 2023")- Looking at this graph we are able to see that due to the date people searched more about the Super Bowl, players, and the half time show than the mass shooting in Michigan. A quick search on the internet proved this due to the Superbowl being Feb. 12th so the next day people searched more about that than anything else. So this helps show us that depending on what is going on and how big it is people will search the more popular thing.
topmaryland |>
select(article, views) |>
filter(!article == "Main_Page", !article == "Special:Search") |>
top_n(10, views) |>
ggplot(aes(x = fct_rev(as_factor(article)), y = views)) +
geom_col(fill = "purple") +
coord_flip() +
scale_y_continuous(labels = scales::comma) +
labs(y = "Number of Views", x = "Article", title = "Top Wikipedia articles, JULY 2, 2023")- Looking at this we see that the most searched/viewed thing that day was The Idol. We are able to see depending on what is more popular that day is what people are going to look at. The one relevant view to gun control was deaths in 2023 and it was on the lower end of the views.
Maryland <- article_pageviews(article = "Mass_shootings",
start = as.Date("2023-6-25"),
end = as.Date("2023-7-9"))Maryland <- article_pageviews(article = "Gun_control",
start = as.Date("2023-6-25"),
end = as.Date("2023-7-9"))
Maryland <- Maryland |>
mutate(day = -7:7) |>
mutate(event = "Maryland")
Maryland |>
ggplot(aes(x = day, y = views)) +
geom_line()Michigan <- article_pageviews(article = "Gun_control",
start = as.Date("2023-2-15"),
end = as.Date("2023-3-1"))
Michigan <- Michigan |>
mutate(day = -7:7) |>
mutate(event = "Michigan")
Michigan |>
ggplot(aes(x = day, y = views)) +
geom_line()shootings <- bind_rows(Maryland, Michigan)
shootings |>
ggplot(aes(x = day, y = views, color = event)) +
geom_line() +
theme_minimal() +
labs(x = "Days before/after Shooting",
y = "Wikipedia Views",
color = "Event",
title = "Views of the Wikipedia Gun Control Article before and after Two Mass Shootings")- With this graph I looked at 7 day prior to the shooting and 7 days after the shooting, so a total of 14 days that we are looking at. With this we can see that Maryland had less views on gun control before and after a mass shooting. We can see that Michigan had more views of gun control prior and after a mass shooting. We can even see with Michigan that there was more of a spike in views a couple of days after. We can even see that in Michigan the views spiked a day before the mass shooting. While in Maryland they stayed more neutral with the views in gun control.
shootings |>
mutate(after_event = (day > 0)) project language article access agent granularity date
1 wikipedia en Gun_control all-access all-agents daily 2023-06-25
2 wikipedia en Gun_control all-access all-agents daily 2023-06-26
3 wikipedia en Gun_control all-access all-agents daily 2023-06-27
4 wikipedia en Gun_control all-access all-agents daily 2023-06-28
5 wikipedia en Gun_control all-access all-agents daily 2023-06-29
6 wikipedia en Gun_control all-access all-agents daily 2023-06-30
7 wikipedia en Gun_control all-access all-agents daily 2023-07-01
8 wikipedia en Gun_control all-access all-agents daily 2023-07-02
9 wikipedia en Gun_control all-access all-agents daily 2023-07-03
10 wikipedia en Gun_control all-access all-agents daily 2023-07-04
11 wikipedia en Gun_control all-access all-agents daily 2023-07-05
12 wikipedia en Gun_control all-access all-agents daily 2023-07-06
13 wikipedia en Gun_control all-access all-agents daily 2023-07-07
14 wikipedia en Gun_control all-access all-agents daily 2023-07-08
15 wikipedia en Gun_control all-access all-agents daily 2023-07-09
16 wikipedia en Gun_control all-access all-agents daily 2023-02-15
17 wikipedia en Gun_control all-access all-agents daily 2023-02-16
18 wikipedia en Gun_control all-access all-agents daily 2023-02-17
19 wikipedia en Gun_control all-access all-agents daily 2023-02-18
20 wikipedia en Gun_control all-access all-agents daily 2023-02-19
21 wikipedia en Gun_control all-access all-agents daily 2023-02-20
22 wikipedia en Gun_control all-access all-agents daily 2023-02-21
23 wikipedia en Gun_control all-access all-agents daily 2023-02-22
24 wikipedia en Gun_control all-access all-agents daily 2023-02-23
25 wikipedia en Gun_control all-access all-agents daily 2023-02-24
26 wikipedia en Gun_control all-access all-agents daily 2023-02-25
27 wikipedia en Gun_control all-access all-agents daily 2023-02-26
28 wikipedia en Gun_control all-access all-agents daily 2023-02-27
29 wikipedia en Gun_control all-access all-agents daily 2023-02-28
30 wikipedia en Gun_control all-access all-agents daily 2023-03-01
views day event after_event
1 315 -7 Maryland FALSE
2 364 -6 Maryland FALSE
3 318 -5 Maryland FALSE
4 331 -4 Maryland FALSE
5 371 -3 Maryland FALSE
6 322 -2 Maryland FALSE
7 343 -1 Maryland FALSE
8 335 0 Maryland FALSE
9 336 1 Maryland TRUE
10 373 2 Maryland TRUE
11 308 3 Maryland TRUE
12 360 4 Maryland TRUE
13 330 5 Maryland TRUE
14 304 6 Maryland TRUE
15 293 7 Maryland TRUE
16 651 -7 Michigan FALSE
17 590 -6 Michigan FALSE
18 532 -5 Michigan FALSE
19 419 -4 Michigan FALSE
20 440 -3 Michigan FALSE
21 507 -2 Michigan FALSE
22 597 -1 Michigan FALSE
23 508 0 Michigan FALSE
24 524 1 Michigan TRUE
25 482 2 Michigan TRUE
26 415 3 Michigan TRUE
27 439 4 Michigan TRUE
28 665 5 Michigan TRUE
29 636 6 Michigan TRUE
30 614 7 Michigan TRUE
shootings |>
mutate(after_event = (day > 0)) |>
t_test(views ~ after_event)Warning: The statistic is based on a difference or ratio; by default, for
difference-based statistics, the explanatory variable is subtracted in the
order "TRUE" - "FALSE", or divided in the order "TRUE" / "FALSE" for
ratio-based statistics. To specify this order yourself, supply `order =
c("TRUE", "FALSE")`.
# A tibble: 1 × 7
statistic t_df p_value alternative estimate lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 0.00615 26.2 0.995 two.sided 0.277 -92.2 92.7
shootings |>
mutate(after_event = (day > 0)) |>
group_by(after_event) |>
summarize(Mean = mean(views),
StdDev = sd(views),
N = n())# A tibble: 2 × 4
after_event Mean StdDev N
<lgl> <dbl> <dbl> <int>
1 FALSE 434. 114. 16
2 TRUE 434. 130. 14
- This T-test helps us show the relevance between Mass shootings and gun control. The average number of views of the Wikipedia Gun Control article in the 7 days prior to the two shootings (M = 433.9, SD = 114.) was not statistically significantly different from the average number of views in the 7 days after the shooting (M = 434.21, SD = 130), t(26.18) = .0062, p = 1.0.