Ozone Solar.R Wind Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
Month Day
Min. :5.000 Min. : 1.0
1st Qu.:6.000 1st Qu.: 8.0
Median :7.000 Median :16.0
Mean :6.993 Mean :15.8
3rd Qu.:8.000 3rd Qu.:23.0
Max. :9.000 Max. :31.0
Ozone Solar.R Wind Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
Month Day Monthnum
May :31 Min. : 1.0 Min. :5.000
June :30 1st Qu.: 8.0 1st Qu.:6.000
July :31 Median :16.0 Median :7.000
August :31 Mean :15.8 Mean :6.993
September:30 3rd Qu.:23.0 3rd Qu.:8.000
Max. :31.0 Max. :9.000
Plot 1: Create a histogram categorized by Month
p1 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service") #provide the data sourcep1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <- airquality |>ggplot(aes(x=Temp, fill=Month)) +geom_histogram(position="identity", alpha=0.5, binwidth =5, color ="white")+scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September")) +labs(x ="Monthly Temperatures from May - Sept", y ="Frequency of Temps",title ="Histogram of Monthly Temperatures from May - Sept, 1973",caption ="New York State Department of Conservation and the National Weather Service")p2
Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot() +scale_fill_discrete(name ="Month", labels =c("May", "June","July", "August", "September"))p3
Plot 4: Side by Side Boxplots in Gray Scale
p4 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Monthly Temperatures", y ="Temperatures", title ="Side-by-Side Boxplot of Monthly Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_boxplot()+scale_fill_grey(name ="Month", labels =c("May", "June","July", "August", "September"))p4
Plot 5: Scatterplot of temperature by date
p5 <- airquality |>ggplot(aes(Month, Temp, fill = Month)) +labs(x ="Months from May through September", y ="Temperatures",title ="Scatterplot of Daily Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_point(shape =21, size =3) +scale_fill_discrete(name ="Month", labels =c("May", "June", "July", "August", "September"))p5
It looks like a new column has to be created, since these temperatures are only organized by month
We’ll have to make a new column for date and format the plot differently
# Create a new date column combining Month and Day so that the data can be organizedairquality <- airquality |>mutate(Date =as.Date(paste(1973, Month = Monthnum, Day, sep ="-"))) # 1973 because the data is from 1973p6 <- airquality |>ggplot(aes(Date, Temp, fill = Month)) +labs(x ="Date from May through September", y ="Temperatures",title ="Scatterplot of Daily Temperatures",caption ="New York State Department of Conservation and the National Weather Service") +geom_point(shape =21, size =3) +scale_x_date(date_breaks ="1 month", date_labels ="%B 1") +# Show month names and "1"scale_fill_discrete(name ="Month", labels =c("May", "June", "July", "August", "September"))p6
Essay
I created a scatterplot that shows the temperature fluctuations, not just between months, but within them. Using this format, rather than a scatterplot or histogram with distinct months, we can draw a more precise line to predict temperature behaviors. Additionally, given data from multiple years, we could further predict how temperatures fluctuate within months.
As you can see, the temperatures within months tend to be rather chaotic, but some patterns do emerge. For instance, while the temperatures throughout May tend to stay around 65 ± 10, the temperatures rise, then fall, then rise again in June. The same happens in August, before falling quickly in September.
To achieve this plot, I had to create a new “Date” column. Some trouble emerged when I tried to use only the month and day, but it turns out dates are automatically stored in Y-m-d format. I solved this problem by choosing to use scale_x_date() to show only the first of each month as a reference date, rather than showing every day separately. Besides, there would be far too many labels in the latter method.