Exploratory Data Analysis in R. Choose an interesting dataset and use R graphics to describe the data. You may use base R graphics, or a graphics package of your choice. You should include at least one example of each of the following:
The weather data below was downloaded from the National Oceanic and Atmospheric Adminstration government website.
Specifically, the data is a subset of local weather readings within the New York Central Park station from 1/1/2016 to 8/3/2017.
In addition to loading the data, we will also rename the headers, and also add datetime columns to the end of the data set.
library(readr)
library(lubridate)
library(dplyr)
input_url <- "https://raw.githubusercontent.com/stevenjhan/R-Bridge-Week-5-Assignment/master/1039515.csv"
assignment_5 <- read_delim(file = input_url, delim = ",", col_names = TRUE, na = c("-9999"))
colnames(assignment_5) <- c("Station ID", "Station Name", "Elevation", "Latitude", "Longitude", "Date", "Precipitation", "Snow depth", "Snowfall", "Average temperature", "Maximum temperature", "Minimum temperature", "Average daily wind speed", "Direction of fastest 2-minute wind (degrees)", "Direction of fastest 5-second wind (degrees)", "Fastest 2-minute wind speed", "Fastest 5-second wind speed", "Weather Type: Fog, ice fog, or freezing fog (may include heavy fog)", "Weather Type: Glaze or rime", "Weather Type: Heavy fog or heaving freezing fog (not always distinguished from fog)", "Weather Type: Ice pellets, sleet, snow pellets, or small hail ", "Weather Type: Smoke or haze")
assignment_5$Date <- as.Date(as.character(assignment_5$Date), "%Y%m%d")
assignment_5$`Average temperature` <- rowMeans(assignment_5[c("Maximum temperature","Minimum temperature")])
Year <- as.character(year(assignment_5$Date))
Month <- as.character(month(assignment_5$Date))
Day <- as.character(day(assignment_5$Date))
Weekday <- as.character(wday(assignment_5$Date))
assignment_5 <- tbl_df(cbind(assignment_5, Year, Month, Day, Weekday))
assignment_5
## # A tibble: 581 x 26
## `Station ID` `Station Name` Elevation Latitude
## <chr> <chr> <dbl> <dbl>
## 1 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 2 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 3 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 4 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 5 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 6 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 7 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 8 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 9 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## 10 GHCND:USW00094728 NY CITY CENTRAL PARK NY US 39.6 40.77889
## # ... with 571 more rows, and 22 more variables: Longitude <dbl>,
## # Date <date>, Precipitation <dbl>, `Snow depth` <dbl>, Snowfall <dbl>,
## # `Average temperature` <dbl>, `Maximum temperature` <int>, `Minimum
## # temperature` <int>, `Average daily wind speed` <dbl>, `Direction of
## # fastest 2-minute wind (degrees)` <int>, `Direction of fastest 5-second
## # wind (degrees)` <int>, `Fastest 2-minute wind speed` <dbl>, `Fastest
## # 5-second wind speed` <dbl>, `Weather Type: Fog, ice fog, or freezing
## # fog (may include heavy fog)` <int>, `Weather Type: Glaze or
## # rime` <int>, `Weather Type: Heavy fog or heaving freezing fog (not
## # always distinguished from fog)` <int>, `Weather Type: Ice pellets,
## # sleet, snow pellets, or small hail ` <int>, `Weather Type: Smoke or
## # haze` <int>, Year <fctr>, Month <fctr>, Day <fctr>, Weekday <fctr>
From our weather data set, we will create a series of histograms showing the frequency of average temperatures for each month in 2016.
library(ggplot2)
library(dplyr)
hb_subset <- filter(assignment_5, `Year` == "2016")
hg_example <- ggplot(data=hb_subset) + geom_histogram(aes(x = `Average temperature`)) + facet_wrap(~ Month)
hg_example
The below example loads the dplyr package in order to make use of the filter function. We first create a tbl_df for the month of January before creating a boxplot broken out by year.
library(ggplot2)
library(dplyr)
bp_subset <- filter(assignment_5, `Month` == 1)
bp_example <- ggplot(data=bp_subset, aes(y = `Average temperature`, x = `Year`)) + geom_boxplot()
bp_example
The below example will create a scatterplot to see if there is a relationship between the wind speed and average temperature.
library(ggplot2)
sp_example <- ggplot(data=assignment_5, aes(x = `Fastest 5-second wind speed`, y = `Average temperature`)) + geom_point()
sp_example