Introduction

Magic Kingdom is considered the best theme park in Walt Disney World Resort for families visiting with children. The timing for families visiting Magic Kingdom often depends on the school schedule for children. Many parents want to make sure their children receive full-time education by not missing too much school while enjoying their time at the happiest place on earth. As a result, a more extensive crowd always be found at the Magic Kingdom historically when schools are off for holidays, which can be seen in the long wait times for attractions at the park. In other words, it is highly possible that the wait times for attractions on a particular day at the Magic Kingdom tie with the proportions of school in session in the U.S. on that day.

My study develops and applies statistical methods by performing multiple simply linear regressions to investigate the correlation between the average wait time for the top 5 most popular attractions at Magic Kingdom and the proportions of schools in session in different regions of the U.S. It is conducted by analyzing data gathered from the website touringplans.com (https://touringplans.com/walt-disney-world/crowd-calendar), encompassing many popular attractions at the Wait Disney World Resort. Since the Wait Disney World Resort is located in Orlando, Florida, my project is also interested in exploring if the proportion of schools in session in Florida impacts the average wait time for the top 5 attractions at Magic Kingdom.

According to a recently conducted survey asking participants to name their favorite ride overall for all four theme parks at Disney World and their favorite ride at each park, Space Mountain, Haunted Mansion, Big Thunder Mountain, Pirate of the Caribbean, and Splash Mountain are the top 5 most popular attractions at Magic Kingdom (https://www.prnewswire.com/news-releases/latest-survey-ranks-the-most-popular-rides-at-walt-disney-world-across-every-state-301262965.html).

library(dplyr)
library(lubridate)
library(tidyr)
library(ggplot2)
library(gridExtra)

Cleaning Data

Dataset data

The original dataset is very complex, with 190 variables containing a vast amount of information and data. Therefore, it is cleaned by only selecting variables that are related and useful to the project topic, including Date, Date of Week, Month, Year, Total Hours, and five columns containing data on the percentage of schools in session in five different regions in the U.S. (Midwest, Middle Atlantic, New England, Northwest, Southwest), and one column called FL representing the percentage of school in session in Florida. All percent data representing the percentage of schools in session are converted into proportions with two decimal places in excel. To exclude all rows with missing values in the dataset, the na.omit function is used in R after the dataset is imported, called data.

library(readr)
data <- read_csv("Final Project/cleaned main data.csv")
View(data)

na.omit(data) -> data
mdy(data$Date) -> data$Date
View(data)

Here is a list of states in different regions.

Midwest: Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin

Middle Atlantic: New York, New Jersey, Pennsylvania

New England: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont

Northwest: Oregon, Washington, Idaho, Montana, Wyoming

Southwest: Arizona, New Mexico, CA, Colorado, Nevada Utah, Oklahoma, Texas

Note: Middle Atlantic only contains New York, New Jersey, Pennsylvania in this project.

Attractions datasets

All five datasets for the top 5 attractions at Magic Kingdome are cleaned in excel first before importing into R in the following steps: deleting column of SPOSTMIN representing standby posted wait time in minutes, changing column name from SACTMIN to wait time indicating actual wait for the attraction in minutes, deleting all rows with empty cells in waittime column, and deleting all rows containing value -999 for column wait time which mean the attraction was closed.

Further steps to clean and organize the five datasets are performed in R by using the group_by function to make the dataset group by date and by creating a new variable wait_mean to calculate the average wait time per day for the attraction by using mutate function. A new dataset for each attraction is created only containing column date and wait_mean. The duplicated function is used to exclude all rows with duplicate data. Data variables in all five datasets are converted into a format of yyyy-mm-dd (e.g., 2015-01-01) by using myd function. Because the dataset data containing all information about the proportion of schools in session only includes data from 2015-01-01 to 2020-12-31, the filter function excludes all data in 2021 in the five attractions dataset. A separate function is applied to create three new variables Year, Month, and Day based on the date variable.

Space Mountain

library(readr)
space_mountain <- read_csv("Final Project/space_mountain.csv")
View(space_mountain)

space_mountain %>%
  group_by(date) %>%
  mutate(wait_mean = mean(waittime)) -> space_mountain
View(space_mountain)

space_mountain %>%
select(date,wait_mean)-> space_mountain2
View(space_mountain2)

space_mountain2[!duplicated(space_mountain2),] -> space_mountain2
View(space_mountain2)

mdy(space_mountain2$date) -> space_mountain2$date
View(space_mountain2)

space_mountain2 %>% 
filter(date <= "2020-12-31") -> space_mountain2
View(space_mountain2)

space_mountain2 %>%
  separate(date, c("Year","Month","Day"),sep = '-', remove=FALSE)-> space_mountain2
View(space_mountain2)

Haunted Mansion

library(readr)
haunted_mansion <- read_csv("Final Project/haunted_mansion.csv")
View(haunted_mansion)

haunted_mansion %>%
  group_by(date) %>%
  mutate(wait_mean = mean(waittime)) -> haunted_mansion
View(haunted_mansion)

haunted_mansion %>%
  select(date,wait_mean)-> haunted_mansion2
View(haunted_mansion2)

haunted_mansion2[!duplicated(haunted_mansion2),] -> haunted_mansion2
View(haunted_mansion2)

mdy(haunted_mansion2$date) -> haunted_mansion2$date
View(haunted_mansion2)

haunted_mansion2 %>% 
  filter(date <= "2020-12-31") -> haunted_mansion2
View(haunted_mansion2)

haunted_mansion2 %>%
  separate(date, c("Year","Month","Day"),sep = '-', remove=FALSE)-> haunted_mansion2
View(haunted_mansion2)

Pirates of the Caribbean

library(readr)
pirates <- read_csv("Final Project/pirates_of_caribbean.csv")
View(pirates)

pirates %>%
  group_by(date) %>%
  mutate(wait_mean = mean(waittime)) -> pirates
View(pirates)

pirates %>%
  select(date,wait_mean)-> pirates2
View(pirates2)

pirates2[!duplicated(pirates2),] -> pirates2
View(pirates2)

mdy(pirates2$date) -> pirates2$date
View(pirates2)

pirates2 %>% 
  filter(date <= "2020-12-31") -> pirates2
View(pirates2)

pirates2 %>%
  separate(date, c("Year","Month","Day"),sep = '-', remove=FALSE)-> pirates2
View(pirates2)

Big Thunder Mountain Railroad

library(readr)
big <- read_csv("Final Project/big_thunder_mtn.csv")
View(big)

big %>%
  group_by(date) %>%
  mutate(wait_mean = mean(waittime)) -> big
View(big)

big %>%
  select(date,wait_mean)-> big2
View(big2)

big2[!duplicated(big2),] -> big2
View(big2)

mdy(big2$date) -> big2$date
View(big2)

big2 %>% 
  filter(date <= "2020-12-31") -> big2
View(big2)

big2 %>%
  separate(date, c("Year","Month","Day"),sep = '-', remove=FALSE)-> big2
View(big2)

Splash Mountain

library(readr)
splash <- read_csv("Final Project/splash_mountain.csv")
View(splash)

splash %>%
  group_by(date) %>%
  mutate(wait_mean = mean(waittime)) -> splash
View(splash)

splash %>%
  select(date,wait_mean)-> splash2
View(splash2)

splash2[!duplicated(splash2),] -> splash2
View(splash2)

mdy(splash2$date) -> splash2$date
View(splash2)

splash2 %>% 
  filter(date <= "2020-12-31") -> splash2
View(splash2)

splash2 %>%
  separate(date, c("Year","Month","Day"),sep = '-', remove=FALSE)-> splash2
View(splash2)

Missing Values

Even though both the dataset data comprising data for the proportions of schools in session in different regions and Florida State and the five attractions datasets have a starting date of 2015-01-01 and end on 2020-12-3, the number of rows for these six datasets is not the same. It can be an issue that unsuccessfully performs simple linear regressions between the proportion of school in session and the average wait time for attractions. Therefore, the sum function counts the amount of data for each year and each month, if necessary, to find the missing dates and exclude them.

Space Mountain

nrow(space_mountain2)
nrow(data)

sum(space_mountain2$Year == "2015")
sum(data$Year == "2015")
sum(space_mountain2$Year == "2016")
sum(data$Year == "2016")
sum(space_mountain2$Year == "2017")
sum(data$Year == "2017")
sum(space_mountain2$Year == "2018")
sum(data$Year == "2018")
sum(space_mountain2$Year == "2019")
sum(data$Year == "2019")
sum(space_mountain2$Year == "2020")
sum(data$Year == "2020")

sum(data$Year == "2017" & data$Month == "1")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "01")
sum(data$Year == "2017" & data$Month == "2")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "02")
sum(data$Year == "2017" & data$Month == "3")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "03")
sum(data$Year == "2017" & data$Month == "4")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "04")
sum(data$Year == "2017" & data$Month == "5")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "05")
sum(data$Year == "2017" & data$Month == "6")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "06")
sum(data$Year == "2017" & data$Month == "7")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "07")
sum(data$Year == "2017" & data$Month == "8")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "08")
sum(data$Year == "2017" & data$Month == "9")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "09")
sum(data$Year == "2017" & data$Month == "10")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "10")
sum(data$Year == "2017" & data$Month == "11")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "11")
sum(data$Year == "2017" & data$Month == "12")
sum(space_mountain2$Year == "2017" & space_mountain2$Month == "12")

sum(data$Year == "2019" & data$Month == "1")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "01")
sum(data$Year == "2019" & data$Month == "2")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "02")
sum(data$Year == "2019" & data$Month == "3")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "03")
sum(data$Year == "2019" & data$Month == "4")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "04")
sum(data$Year == "2019" & data$Month == "5")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "05")
sum(data$Year == "2019" & data$Month == "6")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "06")
sum(data$Year == "2019" & data$Month == "7")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "07")
sum(data$Year == "2019" & data$Month == "8")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "08")
sum(data$Year == "2019" & data$Month == "9")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "09")
sum(data$Year == "2019" & data$Month == "10")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "10")
sum(data$Year == "2019" & data$Month == "11")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "11")
sum(data$Year == "2019" & data$Month == "12")
sum(space_mountain2$Year == "2019" & space_mountain2$Month == "12")

sum(data$Year == "2020" & data$Month == "1")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "01")
sum(data$Year == "2020" & data$Month == "2")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "02")
sum(data$Year == "2020" & data$Month == "3")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "03")
sum(data$Year == "2020" & data$Month == "4")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "04")
sum(data$Year == "2020" & data$Month == "5")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "05")
sum(data$Year == "2020" & data$Month == "6")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "06")
sum(data$Year == "2020" & data$Month == "7")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "07")
sum(data$Year == "2020" & data$Month == "8")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "08")
sum(data$Year == "2020" & data$Month == "9")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "09")
sum(data$Year == "2020" & data$Month == "10")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "10")
sum(data$Year == "2020" & data$Month == "11")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "11")
sum(data$Year == "2020" & data$Month == "12")
sum(space_mountain2$Year == "2020" & space_mountain2$Month == "12")

datafinal1 <- subset(data, 
                    Date!="2017-09-10" & 
                    Date!="2017-09-11" & 
                    Date!="2020-06-27")
nrow(datafinal1)

space_mountain_final <- subset(space_mountain2, 
                               date!="2019-06-30")
nrow(space_mountain_final)

Haunted Mansion

nrow(haunted_mansion2)
nrow(data)

sum(haunted_mansion2$Year == "2015")
sum(haunted_mansion2$Year == "2016")
sum(haunted_mansion2$Year == "2017")
sum(haunted_mansion2$Year == "2018")
sum(haunted_mansion2$Year == "2019")
sum(haunted_mansion2$Year == "2020")

sum(data$Year == "2016" & data$Month == "1")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "01")
sum(data$Year == "2016" & data$Month == "2")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "02")
sum(data$Year == "2016" & data$Month == "3")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "03")
sum(data$Year == "2016" & data$Month == "4")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "04")
sum(data$Year == "2016" & data$Month == "5")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "05")
sum(data$Year == "2016" & data$Month == "6")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "06")
sum(data$Year == "2016" & data$Month == "7")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "07")
sum(data$Year == "2016" & data$Month == "8")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "08")
sum(data$Year == "2016" & data$Month == "9")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "09")
sum(data$Year == "2016" & data$Month == "10")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "10")
sum(data$Year == "2016" & data$Month == "11")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "11")
sum(data$Year == "2016" & data$Month == "12")
sum(haunted_mansion2$Year == "2016" & haunted_mansion2$Month == "12")

sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "01")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "02")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "03")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "04")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "05")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "06")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "07")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "08")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "09")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "10")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "11")
sum(haunted_mansion2$Year == "2017" & haunted_mansion2$Month == "12")

sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "01")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "02")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "03")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "04")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "05")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "06")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "07")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "08")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "09")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "10")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "11")
sum(haunted_mansion2$Year == "2019" & haunted_mansion2$Month == "12")

sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "01")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "02")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "03")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "04")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "05")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "06")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "07")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "08")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "09")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "10")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "11")
sum(haunted_mansion2$Year == "2020" & haunted_mansion2$Month == "12")

datafinal2 <- subset(data,
                     Date!="2016-10-07" & 
                     Date!="2016-11-28" & 
                     Date!="2016-11-29" &
                     Date!="2016-11-30" & 
                     Date!="2016-12-01" & 
                     Date!="2017-09-10" &
                     Date!="2017-09-11" &
                     Date!="2020-03-03" & 
                     Date!="2020-03-04" & 
                     Date!="2020-03-05" &
                     Date!="2020-06-27")
nrow(datafinal2)

haunted_mansion_final <- subset(haunted_mansion2, 
                               date!="2019-06-30")
nrow(haunted_mansion_final)

Pirates of the Caribbean

nrow(pirates2)
nrow(data)

sum(pirates2$Year == "2015")
sum(pirates2$Year == "2016")
sum(pirates2$Year == "2017")
sum(pirates2$Year == "2018")
sum(pirates2$Year == "2019")
sum(pirates2$Year == "2020")

sum(data$Year == "2015" & data$Month == "1")
sum(pirates2$Year == "2015" & pirates2$Month == "01")
sum(data$Year == "2015" & data$Month == "2")
sum(pirates2$Year == "2015" & pirates2$Month == "02")
sum(data$Year == "2015" & data$Month == "3")
sum(pirates2$Year == "2015" & pirates2$Month == "03")
sum(data$Year == "2015" & data$Month == "4")
sum(pirates2$Year == "2015" & pirates2$Month == "04")
sum(data$Year == "2015" & data$Month == "5")
sum(pirates2$Year == "2015" & pirates2$Month == "05")
sum(data$Year == "2015" & data$Month == "6")
sum(pirates2$Year == "2015" & pirates2$Month == "06")
sum(data$Year == "2015" & data$Month == "7")
sum(pirates2$Year == "2015" & pirates2$Month == "07")
sum(data$Year == "2015" & data$Month == "8")
sum(pirates2$Year == "2015" & pirates2$Month == "08")
sum(data$Year == "2015" & data$Month == "9")
sum(pirates2$Year == "2015" & pirates2$Month == "09")
sum(data$Year == "2015" & data$Month == "10")
sum(pirates2$Year == "2015" & pirates2$Month == "10")
sum(data$Year == "2015" & data$Month == "11")
sum(pirates2$Year == "2015" & pirates2$Month == "11")
sum(data$Year == "2015" & data$Month == "12")
sum(pirates2$Year == "2015" & pirates2$Month == "12")

sum(pirates2$Year == "2016" & pirates2$Month == "01")
sum(pirates2$Year == "2016" & pirates2$Month == "02")
sum(pirates2$Year == "2016" & pirates2$Month == "03")
sum(pirates2$Year == "2016" & pirates2$Month == "04")
sum(pirates2$Year == "2016" & pirates2$Month == "05")
sum(pirates2$Year == "2016" & pirates2$Month == "06")
sum(pirates2$Year == "2016" & pirates2$Month == "07")
sum(pirates2$Year == "2016" & pirates2$Month == "08")
sum(pirates2$Year == "2016" & pirates2$Month == "09")
sum(pirates2$Year == "2016" & pirates2$Month == "10")
sum(pirates2$Year == "2016" & pirates2$Month == "11")
sum(pirates2$Year == "2016" & pirates2$Month == "12")

sum(pirates2$Year == "2017" & pirates2$Month == "01")
sum(pirates2$Year == "2017" & pirates2$Month == "02")
sum(pirates2$Year == "2017" & pirates2$Month == "03")
sum(pirates2$Year == "2017" & pirates2$Month == "04")
sum(pirates2$Year == "2017" & pirates2$Month == "05")
sum(pirates2$Year == "2017" & pirates2$Month == "06")
sum(pirates2$Year == "2017" & pirates2$Month == "07")
sum(pirates2$Year == "2017" & pirates2$Month == "08")
sum(pirates2$Year == "2017" & pirates2$Month == "09")
sum(pirates2$Year == "2017" & pirates2$Month == "10")
sum(pirates2$Year == "2017" & pirates2$Month == "11")
sum(pirates2$Year == "2017" & pirates2$Month == "12")

sum(data$Year == "2018" & data$Month == "1")
sum(pirates2$Year == "2018" & pirates2$Month == "01")
sum(data$Year == "2018" & data$Month == "2")
sum(pirates2$Year == "2018" & pirates2$Month == "02")
sum(data$Year == "2018" & data$Month == "3")
sum(pirates2$Year == "2018" & pirates2$Month == "03")
sum(data$Year == "2018" & data$Month == "4")
sum(pirates2$Year == "2018" & pirates2$Month == "04")
sum(data$Year == "2018" & data$Month == "5")
sum(pirates2$Year == "2018" & pirates2$Month == "05")
sum(data$Year == "2018" & data$Month == "6")
sum(pirates2$Year == "2018" & pirates2$Month == "06")
sum(data$Year == "2018" & data$Month == "7")
sum(pirates2$Year == "2018" & pirates2$Month == "07")
sum(data$Year == "2018" & data$Month == "8")
sum(pirates2$Year == "2018" & pirates2$Month == "08")
sum(data$Year == "2018" & data$Month == "9")
sum(pirates2$Year == "2018" & pirates2$Month == "09")
sum(data$Year == "2018" & data$Month == "10")
sum(pirates2$Year == "2018" & pirates2$Month == "10")
sum(data$Year == "2018" & data$Month == "11")
sum(pirates2$Year == "2018" & pirates2$Month == "11")
sum(data$Year == "2018" & data$Month == "12")
sum(pirates2$Year == "2018" & pirates2$Month == "12")

sum(pirates2$Year == "2019" & pirates2$Month == "01")
sum(pirates2$Year == "2019" & pirates2$Month == "02")
sum(pirates2$Year == "2019" & pirates2$Month == "03")
sum(pirates2$Year == "2019" & pirates2$Month == "04")
sum(pirates2$Year == "2019" & pirates2$Month == "05")
sum(pirates2$Year == "2019" & pirates2$Month == "06")
sum(pirates2$Year == "2019" & pirates2$Month == "07")
sum(pirates2$Year == "2019" & pirates2$Month == "08")
sum(pirates2$Year == "2019" & pirates2$Month == "09")
sum(pirates2$Year == "2019" & pirates2$Month == "10")
sum(pirates2$Year == "2019" & pirates2$Month == "11")
sum(pirates2$Year == "2019" & pirates2$Month == "12")

sum(pirates2$Year == "2020" & pirates2$Month == "01")
sum(pirates2$Year == "2020" & pirates2$Month == "02")
sum(pirates2$Year == "2020" & pirates2$Month == "03")
sum(pirates2$Year == "2020" & pirates2$Month == "04")
sum(pirates2$Year == "2020" & pirates2$Month == "05")
sum(pirates2$Year == "2020" & pirates2$Month == "06")
sum(pirates2$Year == "2020" & pirates2$Month == "07")
sum(pirates2$Year == "2020" & pirates2$Month == "08")
sum(pirates2$Year == "2020" & pirates2$Month == "09")
sum(pirates2$Year == "2020" & pirates2$Month == "10")
sum(pirates2$Year == "2020" & pirates2$Month == "11")
sum(pirates2$Year == "2020" & pirates2$Month == "12")

data %>% 
  filter(Date < "2015-06-08" | Date > "2015-09-25") %>%
  filter(Date < "2018-02-26" | Date > "2018-02-28") %>%
  filter(Date < "2018-03-01" | Date > "2018-03-18") %>%
  subset(Date!="2015-02-23" & 
         Date!="2016-03-05" & 
         Date!="2016-10-07" &
         Date!="2017-09-10" & 
         Date!="2017-09-11" & 
         Date!="2020-06-27")-> datafinal3
View(datafinal3)
nrow(datafinal3)

pirates_final <- subset(pirates2, date!="2019-06-30")
nrow(pirates_final)

Big Thunder Mountain Railroad

nrow(big2)
nrow(data)

sum(big2$Year == "2015")
sum(big2$Year == "2016")
sum(big2$Year == "2017")
sum(big2$Year == "2018")
sum(big2$Year == "2019")
sum(big2$Year == "2020")

sum(big2$Year == "2015" & big2$Month == "01")
sum(big2$Year == "2015" & big2$Month == "02")
sum(big2$Year == "2015" & big2$Month == "03")
sum(big2$Year == "2015" & big2$Month == "04")
sum(big2$Year == "2015" & big2$Month == "05")
sum(big2$Year == "2015" & big2$Month == "06")
sum(big2$Year == "2015" & big2$Month == "07")
sum(big2$Year == "2015" & big2$Month == "08")
sum(big2$Year == "2015" & big2$Month == "09")
sum(big2$Year == "2015" & big2$Month == "10")
sum(big2$Year == "2015" & big2$Month == "11")
sum(big2$Year == "2015" & big2$Month == "12")

sum(big2$Year == "2016" & big2$Month == "01")
sum(big2$Year == "2016" & big2$Month == "02")
sum(big2$Year == "2016" & big2$Month == "03")
sum(big2$Year == "2016" & big2$Month == "04")
sum(big2$Year == "2016" & big2$Month == "05")
sum(big2$Year == "2016" & big2$Month == "06")
sum(big2$Year == "2016" & big2$Month == "07")
sum(big2$Year == "2016" & big2$Month == "08")
sum(big2$Year == "2016" & big2$Month == "09")
sum(big2$Year == "2016" & big2$Month == "10")
sum(big2$Year == "2016" & big2$Month == "11")
sum(big2$Year == "2016" & big2$Month == "12")

sum(big2$Year == "2017" & big2$Month == "01")
sum(big2$Year == "2017" & big2$Month == "02")
sum(big2$Year == "2017" & big2$Month == "03")
sum(big2$Year == "2017" & big2$Month == "04")
sum(big2$Year == "2017" & big2$Month == "05")
sum(big2$Year == "2017" & big2$Month == "06")
sum(big2$Year == "2017" & big2$Month == "07")
sum(big2$Year == "2017" & big2$Month == "08")
sum(big2$Year == "2017" & big2$Month == "09")
sum(big2$Year == "2017" & big2$Month == "10")
sum(big2$Year == "2017" & big2$Month == "11")
sum(big2$Year == "2017" & big2$Month == "12")

sum(big2$Year == "2019" & big2$Month == "01")
sum(big2$Year == "2019" & big2$Month == "02")
sum(big2$Year == "2019" & big2$Month == "03")
sum(big2$Year == "2019" & big2$Month == "04")
sum(big2$Year == "2019" & big2$Month == "05")
sum(big2$Year == "2019" & big2$Month == "06")
sum(big2$Year == "2019" & big2$Month == "07")
sum(big2$Year == "2019" & big2$Month == "08")
sum(big2$Year == "2019" & big2$Month == "09")
sum(big2$Year == "2019" & big2$Month == "10")
sum(big2$Year == "2019" & big2$Month == "11")
sum(big2$Year == "2019" & big2$Month == "12")

sum(big2$Year == "2020" & big2$Month == "01")
sum(big2$Year == "2020" & big2$Month == "02")
sum(big2$Year == "2020" & big2$Month == "03")
sum(big2$Year == "2020" & big2$Month == "04")
sum(big2$Year == "2020" & big2$Month == "05")
sum(big2$Year == "2020" & big2$Month == "06")
sum(big2$Year == "2020" & big2$Month == "07")
sum(big2$Year == "2020" & big2$Month == "08")
sum(big2$Year == "2020" & big2$Month == "09")
sum(big2$Year == "2020" & big2$Month == "10")
sum(big2$Year == "2020" & big2$Month == "11")
sum(big2$Year == "2020" & big2$Month == "12")

data %>% 
  filter(Date < "2015-04-27"|Date > "2015-04-30") %>%
  filter(Date < "2016-08-08"|Date > "2016-11-18") %>%
  subset(Date!="2017-09-10" & 
         Date!="2017-09-11" & 
         Date!="2019-09-18" &
         Date!="2020-06-27" & 
         Date!="2020-07-08") -> datafinal4
View(datafinal4)
nrow(datafinal4)

big_final <- subset(big2, date!="2019-06-30")
nrow(big_final)

Splash Mountain

nrow(splash2)
nrow(data)

sum(splash2$Year == "2015")
sum(splash2$Year == "2016")
sum(splash2$Year == "2017")
sum(splash2$Year == "2018")
sum(splash2$Year == "2019")
sum(splash2$Year == "2020")

sum(splash2$Year == "2015" & splash2$Month == "01")
sum(splash2$Year == "2015" & splash2$Month == "02")
sum(splash2$Year == "2015" & splash2$Month == "03")
sum(splash2$Year == "2015" & splash2$Month == "04")
sum(splash2$Year == "2015" & splash2$Month == "05")
sum(splash2$Year == "2015" & splash2$Month == "06")
sum(splash2$Year == "2015" & splash2$Month == "07")
sum(splash2$Year == "2015" & splash2$Month == "08")
sum(splash2$Year == "2015" & splash2$Month == "09")
sum(splash2$Year == "2015" & splash2$Month == "10")
sum(splash2$Year == "2015" & splash2$Month == "11")
sum(splash2$Year == "2015" & splash2$Month == "12")

sum(splash2$Year == "2016" & splash2$Month == "01")
sum(splash2$Year == "2016" & splash2$Month == "02")
sum(splash2$Year == "2016" & splash2$Month == "03")
sum(splash2$Year == "2016" & splash2$Month == "04")
sum(splash2$Year == "2016" & splash2$Month == "05")
sum(splash2$Year == "2016" & splash2$Month == "06")
sum(splash2$Year == "2016" & splash2$Month == "07")
sum(splash2$Year == "2016" & splash2$Month == "08")
sum(splash2$Year == "2016" & splash2$Month == "09")
sum(splash2$Year == "2016" & splash2$Month == "10")
sum(splash2$Year == "2016" & splash2$Month == "11")
sum(splash2$Year == "2016" & splash2$Month == "12")

sum(splash2$Year == "2017" & splash2$Month == "01")
sum(splash2$Year == "2017" & splash2$Month == "02")
sum(splash2$Year == "2017" & splash2$Month == "03")
sum(splash2$Year == "2017" & splash2$Month == "04")
sum(splash2$Year == "2017" & splash2$Month == "05")
sum(splash2$Year == "2017" & splash2$Month == "06")
sum(splash2$Year == "2017" & splash2$Month == "07")
sum(splash2$Year == "2017" & splash2$Month == "08")
sum(splash2$Year == "2017" & splash2$Month == "09")
sum(splash2$Year == "2017" & splash2$Month == "10")
sum(splash2$Year == "2017" & splash2$Month == "11")
sum(splash2$Year == "2017" & splash2$Month == "12")

sum(splash2$Year == "2018" & splash2$Month == "01")
sum(splash2$Year == "2018" & splash2$Month == "02")
sum(splash2$Year == "2018" & splash2$Month == "03")
sum(splash2$Year == "2018" & splash2$Month == "04")
sum(splash2$Year == "2018" & splash2$Month == "05")
sum(splash2$Year == "2018" & splash2$Month == "06")
sum(splash2$Year == "2018" & splash2$Month == "07")
sum(splash2$Year == "2018" & splash2$Month == "08")
sum(splash2$Year == "2018" & splash2$Month == "09")
sum(splash2$Year == "2018" & splash2$Month == "10")
sum(splash2$Year == "2018" & splash2$Month == "11")
sum(splash2$Year == "2018" & splash2$Month == "12")

sum(splash2$Year == "2019" & splash2$Month == "01")
sum(splash2$Year == "2019" & splash2$Month == "02")
sum(splash2$Year == "2019" & splash2$Month == "03")
sum(splash2$Year == "2019" & splash2$Month == "04")
sum(splash2$Year == "2019" & splash2$Month == "05")
sum(splash2$Year == "2019" & splash2$Month == "06")
sum(splash2$Year == "2019" & splash2$Month == "07")
sum(splash2$Year == "2019" & splash2$Month == "08")
sum(splash2$Year == "2019" & splash2$Month == "09")
sum(splash2$Year == "2019" & splash2$Month == "10")
sum(splash2$Year == "2019" & splash2$Month == "11")
sum(splash2$Year == "2019" & splash2$Month == "12")

sum(splash2$Year == "2020" & splash2$Month == "01")
sum(splash2$Year == "2020" & splash2$Month == "02")
sum(splash2$Year == "2020" & splash2$Month == "03")
sum(splash2$Year == "2020" & splash2$Month == "04")
sum(splash2$Year == "2020" & splash2$Month == "05")
sum(splash2$Year == "2020" & splash2$Month == "06")
sum(splash2$Year == "2020" & splash2$Month == "07")
sum(splash2$Year == "2020" & splash2$Month == "08")
sum(splash2$Year == "2020" & splash2$Month == "09")
sum(splash2$Year == "2020" & splash2$Month == "10")
sum(splash2$Year == "2020" & splash2$Month == "11")
sum(splash2$Year == "2020" & splash2$Month == "12")

data %>% 
  filter(Date < "2015-01-05"|Date > "2015-01-30") %>%
  filter(Date < "2016-01-10"|Date > "2016-01-14") %>%
  filter(Date < "2017-08-28"|Date > "2017-11-16") %>%
  filter(Date < "2018-01-08"|Date > "2018-02-01") %>%
  filter(Date < "2020-01-06"|Date > "2020-02-27") %>%
  subset(Date!="2016-10-07" & 
         Date!="2016-12-07" & 
         Date!="2020-06-27") -> datafinal5
View(datafinal5)
nrow(datafinal5)

splash_final <- subset(splash2, date!="2019-06-30")
nrow(splash_final)

Below is a list of dates that are not included when performing simple linear regressions for each attraction.

Space Mountain: 2017-09-10, 2017-09-11, 2020-06-27 and 2019-06-30

Haunted Mansion: 2016-10-07, 2016-11-28 to 2016-12-01, 2017-09-10, 2017-09-11, 2019-06-30, 2020-03-03, 2020-03-04, 2020-03-05, 2020-06-27

Pirates of the Caribbean: 2015-06-08 to 2015-09-25, 2018-02-26 to 2018-02-28, 2018-03-01 to 2018-03-18, 2015-02-23, 2016-03-05, 2016-10-07, 2017-09-10, 2017-09-11, 2019-06-30, 2020-06-27

Big Thunder Mountain Railroad: 2015-04-27 to 2015-04-30, 2016-08-08 to 2016-11-18, 2017-09-10, 2019-06-30, 2019-09-11, 2019-09-18, 2020-06-27, 2020-07-08

Splash Mountain: 2015-01-05 to 2015-01-31, 2016-01-10 to 2016-01-14, 2016-10-07, 2016-12-07, 2017-08-28 to 2017-11-16, 2018-01-08 to 2018-02-01, 2019-06-30, 2020-01-06 to 2020-02-27, 2020-06-27

Simple Linear Regression

Simple Linear Regression generates an equation describing the statistical relationship between the predictor and response variables. Thirty simply linear regressions are performed in this project, where the proportions of schools in session in five different regions and Florida state are the predictor variables, and the average wait times for the top 5 most popular attractions at Magic Kingdom are the response variables.

The p-value tests the null hypothesis that the coefficient equals zero, meaning there is no correlation between the predictor and response variables. A small p-value (<0.05) indicates that we have enough evidence to reject the null hypothesis and conclude that there is some relationship between the predictor and response variables. In other words, a significant p-value (smaller than 0.05) suggests that changes in the predictor are associated with changes in the response.

Space Mountain

Midwest1<- lm(space_mountain_final$wait_mean ~ datafinal1$Midwest)
summary(Midwest1)
Atalentic1<- lm(space_mountain_final$wait_mean ~ datafinal1$MiddleAtlantic)
summary(Atalentic1)
NewEngland1<- lm(space_mountain_final$wait_mean ~ datafinal1$NewEngland)
summary(NewEngland1)
Northwest1<- lm(space_mountain_final$wait_mean ~ datafinal1$Nothwest)
summary(Northwest1)
Southwest1<- lm(space_mountain_final$wait_mean ~ datafinal1$Southwest)
summary(Southwest1)
FL1<- lm(space_mountain_final$wait_mean ~ datafinal1$FL)
summary(FL1)

Haunted Mansion

Midwest2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$Midwest)
summary(Midwest2)
Atalentic2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$MiddleAtlantic)
summary(Atalentic2)
NewEngland2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$NewEngland)
summary(NewEngland2)
Northwest2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$Nothwest)
summary(Northwest2)
Southwest2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$Southwest)
summary(Southwest2)
FL2<- lm(haunted_mansion_final$wait_mean ~ datafinal2$FL)
summary(FL2)

Pirates of the Caribbean

Midwest3<- lm(pirates_final$wait_mean ~ datafinal3$Midwest)
summary(Midwest3)
Atalentic3<- lm(pirates_final$wait_mean ~ datafinal3$MiddleAtlantic)
summary(Atalentic3)
NewEngland3<- lm(pirates_final$wait_mean ~ datafinal3$NewEngland)
summary(NewEngland3)
Northwest3<- lm(pirates_final$wait_mean ~ datafinal3$Nothwest)
summary(Northwest3)
Southwest3<- lm(pirates_final$wait_mean ~ datafinal3$Southwest)
summary(Southwest3)
FL3<- lm(pirates_final$wait_mean ~ datafinal3$FL)
summary(FL3)

Big Thunder Mountain Railroad

Midwest4<- lm(big_final$wait_mean ~ datafinal4$Midwest)
summary(Midwest4)

Atalentic4<- lm(big_final$wait_mean ~ datafinal4$MiddleAtlantic)
summary(Atalentic4)

NewEngland4<- lm(big_final$wait_mean ~ datafinal4$NewEngland)
summary(NewEngland4)

Northwest4<- lm(big_final$wait_mean ~ datafinal4$Nothwest)
summary(Northwest4)

Southwest4<- lm(big_final$wait_mean ~ datafinal4$Southwest)
summary(Southwest4)

FL4<- lm(big_final$wait_mean ~ datafinal4$FL)
summary(FL4)

Splash Mountain

Midwest5<- lm(splash_final$wait_mean ~ datafinal5$Midwest)
summary(Midwest5)
Atalentic5<- lm(splash_final$wait_mean ~ datafinal5$MiddleAtlantic)
summary(Atalentic5)
NewEngland5<- lm(splash_final$wait_mean ~ datafinal5$NewEngland)
summary(NewEngland5)
Northwest5<- lm(splash_final$wait_mean ~ datafinal5$Nothwest)
summary(Northwest5)
Southwest5<- lm(splash_final$wait_mean ~ datafinal5$Southwest)
summary(Southwest5)
FL5<- lm(splash_final$wait_mean ~ datafinal5$FL)
summary(FL5)

Test Results

P-values for Simple Linear Regressions

Region/State Space Mountain Haunted Mansion Pirates of the Caribbean Big Thunder Mountain Railroad Splash Mountain
Midwest < 2E-16 9.0499999999999999E-15 < 2E-16 < 2E-16 < 2E-16
Middle Atlantic < 2E-16 3.8700000000000002E-3 7.5699999999999997E-5 1.5E-11 < 2E-16
New England < 2E-16 5.4299999999999997E-6 1.4700000000000001E-7 2.4399999999999998E-15 < 2E-16
Northwest < 2E-16 2.0299999999999998E-9 2.4400000000000001E-11 < 2E-16 < 2E-16
Southwest < 2E-16 < 2E-16 < 2E-16 < 2E-16 < 2E-16
Florida < 2E-16 < 2E-16 < 2E-16 < 2E-16 < 2E-16

According to the output above, all 30 p-values are significantly small. With such small p-values, we can reject that null hypothesis and conclude that there are correlations between the proportions of school in sessions in all five regions (Midwest, Middle Atlantic, New England, Northwest, Southwest), and Florida, and the average wait time for all top 5 attractions at the Magic Kingdom (Space Mountain, Haunted Mansion, Pirates of the Caribbean, Big Thunder Mountain Railroad and Splash Mountain).

Conclusion

Results from data analysis indicate that there is a significant correlation between the average wait time for the top 5 most popular attractions at Magic Kingdom and the proportions of schools in all five focused regions (Midwest, Mid-Atlantic, New England, Northwest, and Southwest), and the focused state Florida. A possible future extension to the study can explore which regions or states have the most significant impacts on the average wait time for attractions by combining the dataset data and attractions datasets and transforming them, creating a new column called region. Additionally, multiple linear regressions with two predictor variables proportions of school in session and region and an interaction term of proportions of school in session times region can be applied for future data analysis.