Loading in the dataset
Because airquality is a pre-built dataset, we can write it to our data directory to store it for later use.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading dataset into my global environment
Looking at the first 6 rows of my dataset (airquality) using the head function
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Learning how to Calculate Summary Statistics
If you want to look at specific statistics, here are some variations on coding. Here are 2 different ways to calculate “mean.” plus one my example
Learning how to change months from number to names
Number 5 - 9 to May through September
airquality$ Month[airquality$ Month == 5 ] <- "May"
airquality$ Month[airquality$ Month == 6 ] <- "June"
airquality$ Month[airquality$ Month == 7 ] <- "July"
airquality$ Month[airquality$ Month == 8 ]<- "August"
airquality$ Month[airquality$ Month == 9 ]<- "September"
Now how can i see that the numbers have to changed to words
See how Month has changed to have characters instead of numbers
summary (airquality$ Month)
Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
Reorder the Months so they do not default to alphabetical
airquality$ Month <- factor (airquality$ Month, levels = c ("May" , "June" ,"July" , "August" , "September" ))
Plot 1: Create a histogram categorized by Month with qplot
Qplot stands for “Quick-Plot” (in the ggplot2 package)
p1 <- qplot (data = airquality,Temp, fill = Month, geom = "histogram" , bins = 20 )
Warning: `qplot()` was deprecated in ggplot2 3.4.0.
Plot 2: Make a histogram using ggplot
ggplot is more sophisticated than qplot, but still uses ggplot2 package (within Tidyverse) Reorder the legend so that it is not the default (alphabetical), but rather in order that months come Outline the bars in white using the color = “white” command ## Histogram of Average Temperature by Month
p2 <- airquality %>%
ggplot (aes (x = Temp, fill = Month)) +
geom_histogram (Position = "identity, alpha = 0.5, binwidth = 5, color = white" ) +
scale_fill_discrete (name = "Month" , labels = c ("May" , "June" ,"July" , "August" , "September" )) +
xlab ("Monthly Tempertures" ) +
ylab ("Frequency" ) +
ggtitle ("Histogram of Monthly Temperatures" )
Warning in geom_histogram(Position = "identity, alpha = 0.5, binwidth = 5,
color = white"): Ignoring unknown parameters: `Position`
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.