Airquality HW 2

Author

Mike Alfaro

Loading in the dataset

Because airquality is a pre-built dataset, we can write it to our data directory to store it for later use.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading dataset into my global environment

airquality <- airquality

Looking at the first 6 rows of my dataset (airquality) using the head function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Learning how to Calculate Summary Statistics

If you want to look at specific statistics, here are some variations on coding. Here are 2 different ways to calculate “mean.” plus one my example

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
mean(airquality$Month)
[1] 6.993464

Next up ill Calculate Median, Standard Deviation, and Variance you can do this by using their functions()

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Learning how to change months from number to names

Number 5 - 9 to May through September

airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now how can i see that the numbers have to changed to words

See how Month has changed to have characters instead of numbers

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

Reorder the Months so they do not default to alphabetical

airquality$Month <- factor(airquality$Month, levels = c("May", "June","July", "August", "September"))

Plot 1: Create a histogram categorized by Month with qplot

Qplot stands for “Quick-Plot” (in the ggplot2 package)

p1 <- qplot(data = airquality,Temp, fill = Month, geom = "histogram", bins = 20)
Warning: `qplot()` was deprecated in ggplot2 3.4.0.
p1

Plot 2: Make a histogram using ggplot

ggplot is more sophisticated than qplot, but still uses ggplot2 package (within Tidyverse) Reorder the legend so that it is not the default (alphabetical), but rather in order that months come Outline the bars in white using the color = “white” command ## Histogram of Average Temperature by Month

p2 <- airquality %>%
  ggplot(aes(x = Temp, fill = Month)) +
  geom_histogram(Position = "identity, alpha = 0.5, binwidth = 5, color = white") +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  xlab("Monthly Tempertures") + 
  ylab("Frequency") +
  ggtitle("Histogram of Monthly Temperatures")
Warning in geom_histogram(Position = "identity, alpha = 0.5, binwidth = 5,
color = white"): Ignoring unknown parameters: `Position`
p2
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.