Connor Lewis
You have been hired as a consultant by Disney to create a location for a new amusement park. Your job is to analyze weather data from different locations to pick out the best option.
Begin by looking at the documentation for the climate normal data.
Then load your ClimateNormalData.csv datafile. You will need to do some clean-up. Be sure to look at the data carefully. Below are a list of suggested dplyr activities.
Suggested tasks:
data <- data_raw %>% mutate(across(c(month, day,hour), as.numeric)) %>%
mutate(date_temp = paste(month, day, "2024", sep = "/")) %>%
mutate(date = mdy(date_temp)) %>%
select(`STATION`, `date`, matches("NORMAL"),matches("CLOD"),`HLY-WIND-AVGSPD`,month) %>%
left_join(data_stations_raw, by = c('STATION' = 'STATION')) %>%
mutate(mean_temp = `HLY-TEMP-NORMAL`, dew_point_mean = `HLY-DEWP-NORMAL`, average_wind_speed = `HLY-WIND-AVGSPD`, cloud_clear_percent = `HLY-CLOD-PCTCLR`) %>%
select(STATION, NAME, LATITUDE,LONGITUDE,ELEVATION, date, mean_temp, dew_point_mean, average_wind_speed, cloud_clear_percent,month)
data
Now that you have the data, create some basic summary data. Show a table with the average temperature by month and location. Have the stations as rows, and the month as columns.
Hint: you may need to use dplyr pivot.
temperature_table <- data %>% filter(!is.na(STATION), STATION != "x") %>%
mutate(month_name = month(date, label = TRUE)) %>%
group_by(STATION, month_name) %>%
summarise(average_temp = mean(mean_temp, na.rm = TRUE)) %>%
pivot_wider(names_from = month_name, values_from = average_temp)
## `summarise()` has grouped output by 'STATION'. You can override using the
## `.groups` argument.
# Display the table
temperature_table
We want to find the best location for an amusement park that isn’t too hot, or too cold. Define an appropriate temperature range where it is comfortable to be outside. Then, create a graph showing how different locations meet your temperature requirement.
Write a brief 2-3 sentence explanation of your findings.
location_table <- data %>% filter(month != 12 | month !=1 | month !=2,
mean_temp >=60 & mean_temp<=80, !is.na(NAME)) %>%
group_by(NAME) %>% summarize(Number_of_Suitable_Days = n())
print(location_table)
## # A tibble: 4 × 2
## NAME Number_of_Suitable_Days
## <chr> <int>
## 1 AUSTIN BERGSTROM AP, TX US 4023
## 2 BINGHAMTON, NY US 2689
## 3 JUNEAU INTL AP, AK US 418
## 4 PITTSBURGH ALLEGHENY CO AP, PA US 3369
I filtered the months to look at to be all non winter months as themeparks are more populated during the spring/summer. I then filtered the mean temp by day to temperatures between 65 - 80 degrees F. After doing this, I summarized the locations of the stations by the total number of days that fell between these ranges. What I found was that either Austin, Texas or Pittsburgh Pennsylvania would make the best location as they have the most suitable days.
Which of the sites would be best as an airport? Explore the others variables in your dataset. For example, look at cloud cover or average wind speed. This may require some data clean, so carefully look at your dataset for weird values.
Give a chart, and write a 2-3 sentence explanation of your answer.
airport_Table <- data %>%
filter(!is.na(NAME)) %>%
mutate(cloud_clear_percent = ifelse(cloud_clear_percent == -9999, NA, cloud_clear_percent),
average_wind_speed = ifelse(average_wind_speed == -9999, NA, average_wind_speed)) %>%
group_by(NAME) %>%
summarise(cloud_clear_percent = mean(cloud_clear_percent, na.rm = TRUE),
averaged_windspeed = mean(average_wind_speed,na.rm=TRUE))
print(airport_Table)
## # A tibble: 4 × 3
## NAME cloud_clear_percent averaged_windspeed
## <chr> <dbl> <dbl>
## 1 AUSTIN BERGSTROM AP, TX US 38.8 7.62
## 2 BINGHAMTON, NY US 46.9 8.03
## 3 JUNEAU INTL AP, AK US 5.33 7.00
## 4 PITTSBURGH ALLEGHENY CO AP, PA US 65.7 7.18
To pick a location for an airport, I wanted to look at the percent of time that the area experienced clear skies and also the average wind speed overall. I would want to choose a location that experiences lower average wind speeds and also experiences clear skies frequently. After cleaning the data to set -9999 to be NA and then ignoring NA in my calculations, it seems like the best location for an airport would be probably Pittsburgh as it as the highest percent clear skies and a moderate wind speed average.