# load data
library(RCurl)
weather_data <- getURL("https://raw.githubusercontent.com/josephsimone/DATA607/master/ww-ii-data..csv")
weather.raw <- read.csv(text = weather_data)
head(weather.raw, 5)
## STA Date Precip WindGustSpd MaxTemp MinTemp MeanTemp Snowfall
## 1 10001 7/1/1942 1.016 NA 25.55556 22.22222 23.88889 0
## 2 10001 7/2/1942 0 NA 28.88889 21.66667 25.55556 0
## 3 10001 7/3/1942 2.54 NA 26.11111 22.22222 24.44444 0
## 4 10001 7/4/1942 2.54 NA 26.66667 22.22222 24.44444 0
## 5 10001 7/5/1942 0 NA 26.66667 21.66667 24.44444 0
## PoorWeather YR MO DA PRCP DR SPD MAX MIN MEA SNF SND FT FB FTI ITH PGT
## 1 42 7 1 0.04 NA NA 78 72 75 0 NA NA NA NA NA NA
## 2 42 7 2 0 NA NA 84 71 78 0 NA NA NA NA NA NA
## 3 42 7 3 0.1 NA NA 79 72 76 0 NA NA NA NA NA NA
## 4 42 7 4 0.1 NA NA 80 72 76 0 NA NA NA NA NA NA
## 5 42 7 5 0 NA NA 80 71 76 0 NA NA NA NA NA NA
## TSHDSBRSGF SD3 RHX RHN RVG WTE
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Is there a relationship between the daily minimum and maximum temperature? Can you predict the maximum temperature given the minimum temperature?
What are the cases, and how many are there?
At first glace, the information in this dataset includes precipitation, snowfall, temperatures, wind speed and whether the day included thunder storms or other poor weather conditions.There are over twenty cases in this dataset, however, a lot of the cases and NULL. Therefore, I will be eleminating them from my dataset for analysis. The cases that I will be keeping include, the station #, , date, precipitation, wind gust speed, max, min and mean temperature, and snowfall.
Describe the method of data collection.
Contains 1940–1945 data for 162 stations outside of the United States. The actual period of data availability varies depending upon the station’s activity. Many stations in the European and Pacific theaters of operation are included
What type of study is this (observational/experiment)? This study is an observational study, this is a collected of weather conditions recorded on each day at various weather stations around the world.
If you collected the data, state self-collected. If not, provide a citation/link.
“World War II Era Data.” National Climatic Data Center, www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/world-war-ii-era-data.
What is the response variable? Is it quantitative or qualitative?
The response vairbale that I will be trying to calculate, is going to be temperature. Given the minimum temperature can you predict the maximum. Therefore, this is a quantitative variable.
You should have two independent variables, one quantitative and one qualitative.
The two independent variables that I will using for this linear regression analysis are the minimum and maximum tempatures.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
names(weather.raw)
## [1] "STA" "Date" "Precip" "WindGustSpd" "MaxTemp"
## [6] "MinTemp" "MeanTemp" "Snowfall" "PoorWeather" "YR"
## [11] "MO" "DA" "PRCP" "DR" "SPD"
## [16] "MAX" "MIN" "MEA" "SNF" "SND"
## [21] "FT" "FB" "FTI" "ITH" "PGT"
## [26] "TSHDSBRSGF" "SD3" "RHX" "RHN" "RVG"
## [31] "WTE"
weather_df = subset(weather.raw, select = c(STA,Date,MaxTemp,MinTemp,MeanTemp,MAX,MIN,MEA))
head(weather_df, 5)
## STA Date MaxTemp MinTemp MeanTemp MAX MIN MEA
## 1 10001 7/1/1942 25.55556 22.22222 23.88889 78 72 75
## 2 10001 7/2/1942 28.88889 21.66667 25.55556 84 71 78
## 3 10001 7/3/1942 26.11111 22.22222 24.44444 79 72 76
## 4 10001 7/4/1942 26.66667 22.22222 24.44444 80 72 76
## 5 10001 7/5/1942 26.66667 21.66667 24.44444 80 71 76
summary(weather_df$MaxTemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -33.33 25.56 29.44 27.05 31.67 50.00
boxplot(weather_df$MaxTemp)
barplot(weather_df$MaxTemp)
summary(weather_df$MinTemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -38.33 15.00 21.11 17.79 23.33 34.44
barplot(weather_df$MinTemp)
summary(weather_df$MeanTemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -35.56 20.56 25.56 22.41 27.22 40.00
barplot(weather_df$MeanTemp)