Objective:
You will use R to analyze the built-in airquality dataset, applying descriptive statistics techniques to explore environmental data. The assignment covers measures of central tendency, spread, histograms, boxplots, scatterplots, correlations, and summary tables, aligning with the Week 6 agenda on Descriptive Statistics.
Dataset
Source: Built-in R dataset airquality.
Description: Contains 153 observations of daily air quality measurements in New York from May to September 1973.
Variables (selected for this assignment):
Notes
-The airquality dataset has missing values in Ozone and Solar.R. The code uses na.rm = TRUE or use = “complete.obs” to handle them.
-If you encounter errors, check that tidyverse and corrplot are installed and loaded.
-Feel free to modify plot aesthetics (e.g., colors, binwidth) to enhance clarity.
Instructions:
Complete the following tasks using R to analyze the airquality dataset. Submit your Rpubs link that includes code, outputs (tables and plots), and written interpretations for each task. Ensure you load the dataset using data(airquality) and install/load the tidyverse and corrplot packages.
#Load your dataset
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
data("airquality")
Tasks and Questions
Using functions you learned this week, compute mean, median, standard deviation, min, and max separately for Ozone, Temp, and Wind.
#Your code for Ozone goes here
library(tidyverse)
library(corrplot)
data("airquality")
Ozone_stats <- airquality %>%
summarise(
Ozone_mean = mean(Ozone, na.rm = TRUE),
Ozone_median = median(Ozone, na.rm = TRUE),
Ozone_sd = sd(Ozone, na.rm = TRUE),
Ozone_min = min(Ozone, na.rm = TRUE),
Ozone_max = max(Ozone, na.rm = TRUE),
)
Ozone_stats
## Ozone_mean Ozone_median Ozone_sd Ozone_min Ozone_max
## 1 42.12931 31.5 32.98788 1 168
print(Ozone_stats)
## Ozone_mean Ozone_median Ozone_sd Ozone_min Ozone_max
## 1 42.12931 31.5 32.98788 1 168
#Your code for Temp goes here
library(tidyverse)
library(corrplot)
data("airquality")
Temp_stats <- airquality %>%
summarise(
Temp_mean = mean(Temp, na.rm = TRUE),
Temp_median = median(Temp, na.rm = TRUE),
Temp_sd = sd(Temp, na.rm = TRUE),
Temp_min = min(Temp, na.rm = TRUE),
Temp_max = max(Temp, na.rm = TRUE),
)
Temp_stats
## Temp_mean Temp_median Temp_sd Temp_min Temp_max
## 1 77.88235 79 9.46527 56 97
print(Temp_stats)
## Temp_mean Temp_median Temp_sd Temp_min Temp_max
## 1 77.88235 79 9.46527 56 97
#Your code for Wind goes here
library(tidyverse)
library(corrplot)
data("airquality")
wind_stats <- airquality %>%
summarise(
mean = mean(Wind, na.rm = TRUE),
median = median(Wind, na.rm = TRUE),
sd = sd(Wind, na.rm = TRUE),
min = min(Wind, na.rm = TRUE),
max = max(Wind, na.rm = TRUE)
)
wind_stats
## mean median sd min max
## 1 9.957516 9.7 3.523001 1.7 20.7
print(wind_stats)
## mean median sd min max
## 1 9.957516 9.7 3.523001 1.7 20.7
Question: Compare the mean and median for each variable. Are they similar or different, and what does this suggest about the distribution (e.g., skewness)? What does the standard deviation indicate about variability?
Wind Mean: ~9.96 Median: ~9.7 The mean and median being so close suggests the wind fluctuation is evenly and symmetrically distributed. The standard deviation of ~3.52 suggests the wind values moderately vary around the mean.
Temp Mean: ~77.88 Median: ~79 Since mean and median are so close, this indicates data is symmetric and relatively evenly distributed. The standard deviation of ~9.33 suggests the temperature doesn’t fluctuate as much as Ozone but more than Wind.
Ozone Mean: ~42.13 Median: ~31.5 The mean being higher than median suggests a right skewed distribution due to some high Ozone days pulling the mean up, and the standard deviation of ~32.79 suggests the data highly varys day to day.
Generate the histogram for Ozone.
library(dplyr)
library(ggplot2)
library(tidyverse)
# Remove NAs
airquality_clean <- airquality %>% drop_na(Ozone)
# Create histogram and assign it to an object
ozone_plot <- ggplot(airquality_clean, aes(x = Ozone)) +
geom_histogram(binwidth = 10, fill = "steelblue", color = "black") +
labs(title = "Histogram of Ozone Levels",
x = "Ozone (ppb)",
y = "Frequency") +
theme_minimal()
print(ozone_plot)
Question: Describe the shape of the ozone distribution (e.g., normal, skewed, unimodal). Are there any outliers or unusual features?
The Ozone distribution is right skewed with most of the data clustered on the left side of the graph, and there are outliers on the far right at about 170.
Create a boxplot of ozone levels (Ozone) by month, with months
displayed as names (May, June, July, August, September) instead of
numbers (5–9).Recode the Month variable into a new column called
month_name with month names using case_when
from week 4.Generate a boxplot of Ozone by month_name.
# Your code here
library(tidyverse)
airquality_clean <- airquality %>% drop_na(Ozone)
airquality_clean <- airquality_clean %>%
mutate(
month_name = case_when(
Month == 5 ~ "May",
Month == 6 ~ "June",
Month == 7 ~ "July",
Month == 8 ~ "August",
Month == 9 ~"September"
)
)
airquality_clean <- airquality_clean %>%
mutate(month_name = factor(month_name, levels = c("May","June","July","August","September")))
# Plot
ggplot(airquality_clean, aes(x = month_name, y = Ozone)) +
geom_boxplot(fill = "red") +
labs(
title = "Ozone Levels by Month",
x = "Month",
y = "Ozone (ppb)"
) +
theme_minimal()
Question: How do ozone levels vary across months? Which month has the highest median ozone? Are there outliers in any month, and what might they indicate?
The levels vary month to month with July having the highest median ozone. Every single month listed except July had outliers, this indicates the data is shaped by many strongly different data points. Causing pattern trends to be less recognizable.
Produce the scatterplot of Temp vs. Ozone, colored by Month.
# Your code goes here
library(tidyverse)
ggplot(airquality_clean, aes(x = Temp, y = Ozone, color = month_name)) +
geom_point(size = 2, alpha = 0.7) +
labs(
title = "Scatterplot of Temperature vs Ozone",
x = "Temperature (°F)",
y = "Ozone (ppb)",
color = "Month"
) +
theme_minimal()
Question: Is there a visible relationship between temperature and ozone levels? Do certain months cluster together (e.g., higher ozone in warmer months)? Describe any patterns. As tempurature and warmer months approach, the Ozone layer tends to increase with it Extreme outliers such as seen in August skew and dilute the data set as it is so far away from the majority of the other months and is an anomoly.
Compute and visualize the correlation matrix for Ozone, Temp, and Wind.
# Your code goes here
# Load packages
library(tidyverse)
library(corrplot)
airquality_clean <- airquality %>%
select(Ozone, Temp, Wind) %>%
drop_na()
# Compute correlation matrix
cor_matrix <- cor(airquality_clean, use = "complete.obs")
print(cor_matrix)
## Ozone Temp Wind
## Ozone 1.0000000 0.6983603 -0.6015465
## Temp 0.6983603 1.0000000 -0.5110750
## Wind -0.6015465 -0.5110750 1.0000000
#Plot
corrplot(cor_matrix, method = "color", adCoef.col = "black",
tl.col = "black",tl.srt=45, number.cex = 0.8 )
## Warning in text.default(pos.xlabel[, 1], pos.xlabel[, 2], newcolnames, srt =
## tl.srt, : "adCoef.col" is not a graphical parameter
## Warning in text.default(pos.ylabel[, 1], pos.ylabel[, 2], newrownames, col =
## tl.col, : "adCoef.col" is not a graphical parameter
## Warning in title(title, ...): "adCoef.col" is not a graphical parameter
print(corrplot)
## function (corr, method = c("circle", "square", "ellipse", "number",
## "shade", "color", "pie"), type = c("full", "lower", "upper"),
## col = NULL, col.lim = NULL, is.corr = TRUE, bg = "white",
## title = "", add = FALSE, diag = TRUE, outline = FALSE, mar = c(0,
## 0, 0, 0), addgrid.col = NULL, addCoef.col = NULL, addCoefasPercent = FALSE,
## order = c("original", "AOE", "FPC", "hclust", "alphabet"),
## hclust.method = c("complete", "ward", "ward.D", "ward.D2",
## "single", "average", "mcquitty", "median", "centroid"),
## addrect = NULL, rect.col = "black", rect.lwd = 2, tl.pos = NULL,
## tl.cex = 1, tl.col = "red", tl.offset = 0.4, tl.srt = 90,
## cl.pos = NULL, cl.length = NULL, cl.cex = 0.8, cl.ratio = 0.15,
## cl.align.text = "c", cl.offset = 0.5, number.cex = 1, number.font = 2,
## number.digits = NULL, addshade = c("negative", "positive",
## "all"), shade.lwd = 1, shade.col = "white", transKeepSign = TRUE,
## p.mat = NULL, sig.level = 0.05, insig = c("pch", "p-value",
## "blank", "n", "label_sig"), pch = 4, pch.col = "black",
## pch.cex = 3, plotCI = c("n", "square", "circle", "rect"),
## lowCI.mat = NULL, uppCI.mat = NULL, na.label = "?", na.label.col = "black",
## win.asp = 1, ...)
## {
## method = match.arg(method)
## type = match.arg(type)
## order = match.arg(order)
## hclust.method = match.arg(hclust.method)
## addshade = match.arg(addshade)
## insig = match.arg(insig)
## plotCI = match.arg(plotCI)
## if (win.asp != 1 && !(method %in% c("circle", "square"))) {
## stop("Parameter 'win.asp' is supported only for circle and square methods.")
## }
## asp_rescale_factor = min(1, win.asp)/max(1, win.asp)
## stopifnot(asp_rescale_factor >= 0 && asp_rescale_factor <=
## 1)
## if (!is.matrix(corr) && !is.data.frame(corr)) {
## stop("Need a matrix or data frame!")
## }
## if (is.null(addgrid.col)) {
## addgrid.col = switch(method, color = NA, shade = NA,
## "grey")
## }
## if (!is.corr & !transKeepSign & method %in% c("circle", "square",
## "ellipse", "shade", "pie")) {
## stop("method should not be in c('circle', 'square', 'ellipse', 'shade', 'pie') when transKeepSign = FALSE")
## }
## if (any(corr[!is.na(corr)] < col.lim[1]) || any(corr[!is.na(corr)] >
## col.lim[2])) {
## stop("color limits should cover matrix")
## }
## if (is.null(col.lim)) {
## if (is.corr) {
## col.lim = c(-1, 1)
## }
## else {
## if (!diag) {
## diag(corr) = NA
## }
## col.lim = c(min(corr, na.rm = TRUE), max(corr, na.rm = TRUE))
## }
## }
## SpecialCorr = 0
## if (is.corr) {
## if (min(corr, na.rm = TRUE) < -1 - .Machine$double.eps^0.75 ||
## max(corr, na.rm = TRUE) > 1 + .Machine$double.eps^0.75) {
## stop("The matrix is not in [-1, 1]!")
## }
## SpecialCorr = 1
## if (col.lim[1] < -1 | col.lim[2] > 1) {
## stop("col.lim should be within the interval [-1, 1]")
## }
## }
## intercept = 0
## zoom = 1
## if (!is.corr) {
## c_max = max(corr, na.rm = TRUE)
## c_min = min(corr, na.rm = TRUE)
## if ((col.lim[1] > c_min) | (col.lim[2] < c_max)) {
## stop("Wrong color: matrix should be in col.lim interval!")
## }
## if (diff(col.lim)/(c_max - c_min) > 2) {
## warning("col.lim interval too wide, please set a suitable value")
## }
## if (c_max <= 0 | c_min >= 0 | !transKeepSign) {
## intercept = -col.lim[1]
## zoom = 1/(diff(col.lim))
## }
## else {
## stopifnot(c_max * c_min < 0)
## stopifnot(c_min < 0 && c_max > 0)
## intercept = 0
## zoom = 1/max(abs(col.lim))
## SpecialCorr = 1
## }
## corr = (intercept + corr) * zoom
## }
## col.lim2 = (intercept + col.lim) * zoom
## int = intercept * zoom
## if (is.null(col) & is.corr) {
## col = COL2("RdBu", 200)
## }
## if (is.null(col) & !is.corr) {
## if (col.lim[1] * col.lim[2] < 0) {
## col = COL2("RdBu", 200)
## }
## else {
## col = COL1("YlOrBr", 200)
## }
## }
## n = nrow(corr)
## m = ncol(corr)
## min.nm = min(n, m)
## ord = 1:min.nm
## if (order != "original") {
## ord = corrMatOrder(corr, order = order, hclust.method = hclust.method)
## corr = corr[ord, ord]
## if (!is.null(p.mat)) {
## p.mat = p.mat[ord, ord]
## }
## }
## if (is.null(rownames(corr))) {
## rownames(corr) = 1:n
## }
## if (is.null(colnames(corr))) {
## colnames(corr) = 1:m
## }
## apply_mat_filter = function(mat) {
## x = matrix(1:n * m, nrow = n, ncol = m)
## switch(type, upper = mat[row(x) > col(x)] <- Inf, lower = mat[row(x) <
## col(x)] <- Inf)
## if (!diag) {
## diag(mat) = Inf
## }
## return(mat)
## }
## getPos.Dat = function(mat) {
## tmp = apply_mat_filter(mat)
## Dat = tmp[is.finite(tmp)]
## ind = which(is.finite(tmp), arr.ind = TRUE)
## Pos = ind
## Pos[, 1] = ind[, 2]
## Pos[, 2] = -ind[, 1] + 1 + n
## PosName = ind
## PosName[, 1] = colnames(mat)[ind[, 2]]
## PosName[, 2] = rownames(mat)[ind[, 1]]
## return(list(Pos, Dat, PosName))
## }
## getPos.NAs = function(mat) {
## tmp = apply_mat_filter(mat)
## ind = which(is.na(tmp), arr.ind = TRUE)
## Pos = ind
## Pos[, 1] = ind[, 2]
## Pos[, 2] = -ind[, 1] + 1 + n
## return(Pos)
## }
## testTemp = getPos.Dat(corr)
## Pos = getPos.Dat(corr)[[1]]
## PosName = getPos.Dat(corr)[[3]]
## if (any(is.na(corr)) && is.character(na.label)) {
## PosNA = getPos.NAs(corr)
## }
## else {
## PosNA = NULL
## }
## AllCoords = rbind(Pos, PosNA)
## n2 = max(AllCoords[, 2])
## n1 = min(AllCoords[, 2])
## nn = n2 - n1
## m2 = max(AllCoords[, 1])
## m1 = min(AllCoords[, 1])
## mm = max(1, m2 - m1)
## expand_expression = function(s) {
## ifelse(grepl("^[:=$]", s), parse(text = substring(s,
## 2)), s)
## }
## newrownames = sapply(rownames(corr)[(n + 1 - n2):(n + 1 -
## n1)], expand_expression)
## newcolnames = sapply(colnames(corr)[m1:m2], expand_expression)
## DAT = getPos.Dat(corr)[[2]]
## len.DAT = length(DAT)
## rm(expand_expression)
## assign.color = function(dat = DAT, color = col, isSpecialCorr = SpecialCorr) {
## if (isSpecialCorr) {
## newcorr = (dat + 1)/2
## }
## else {
## newcorr = dat
## }
## newcorr[newcorr <= 0] = 0
## newcorr[newcorr >= 1] = 1 - 1e-16
## color[floor(newcorr * length(color)) + 1]
## }
## col.fill = assign.color()
## isFALSE = function(x) identical(x, FALSE)
## isTRUE = function(x) identical(x, TRUE)
## if (isFALSE(tl.pos)) {
## tl.pos = "n"
## }
## if (is.null(tl.pos) || isTRUE(tl.pos)) {
## tl.pos = switch(type, full = "lt", lower = "ld", upper = "td")
## }
## if (isFALSE(cl.pos)) {
## cl.pos = "n"
## }
## if (is.null(cl.pos) || isTRUE(cl.pos)) {
## cl.pos = switch(type, full = "r", lower = "b", upper = "r")
## }
## if (isFALSE(outline)) {
## col.border = col.fill
## }
## else if (isTRUE(outline)) {
## col.border = "black"
## }
## else if (is.character(outline)) {
## col.border = outline
## }
## else {
## stop("Unsupported value type for parameter outline")
## }
## oldpar = par(mar = mar, bg = par()$bg)
## on.exit(par(oldpar), add = TRUE)
## if (!add) {
## plot.new()
## xlabwidth = max(strwidth(newrownames, cex = tl.cex))
## ylabwidth = max(strwidth(newcolnames, cex = tl.cex))
## laboffset = strwidth("W", cex = tl.cex) * tl.offset
## for (i in 1:50) {
## xlim = c(m1 - 0.5 - laboffset - xlabwidth * (grepl("l",
## tl.pos) | grepl("d", tl.pos)), m2 + 0.5 + mm *
## cl.ratio * (cl.pos == "r") + xlabwidth * abs(cos(tl.srt *
## pi/180)) * grepl("d", tl.pos))
## ylim = c(n1 - 0.5 - nn * cl.ratio * (cl.pos == "b") -
## laboffset, n2 + 0.5 + laboffset + ylabwidth *
## abs(sin(tl.srt * pi/180)) * grepl("t", tl.pos) +
## ylabwidth * abs(sin(tl.srt * pi/180)) * (type ==
## "lower") * grepl("d", tl.pos))
## plot.window(xlim, ylim, asp = 1, xaxs = "i", yaxs = "i")
## x.tmp = max(strwidth(newrownames, cex = tl.cex))
## y.tmp = max(strwidth(newcolnames, cex = tl.cex))
## laboffset.tmp = strwidth("W", cex = tl.cex) * tl.offset
## if (max(x.tmp - xlabwidth, y.tmp - ylabwidth, laboffset.tmp -
## laboffset) < 0.001) {
## break
## }
## xlabwidth = x.tmp
## ylabwidth = y.tmp
## laboffset = laboffset.tmp
## if (i == 50) {
## warning(c("Not been able to calculate text margin, ",
## "please try again with a clean new empty window using ",
## "{plot.new(); dev.off()} or reduce tl.cex"))
## }
## }
## if (.Platform$OS.type == "windows") {
## grDevices::windows.options(width = 7, height = 7 *
## diff(ylim)/diff(xlim))
## }
## xlim = xlim + diff(xlim) * 0.01 * c(-1, 1)
## ylim = ylim + diff(ylim) * 0.01 * c(-1, 1)
## plot.window(xlim = xlim, ylim = ylim, asp = win.asp,
## xlab = "", ylab = "", xaxs = "i", yaxs = "i")
## }
## laboffset = strwidth("W", cex = tl.cex) * tl.offset
## symbols(Pos, add = TRUE, inches = FALSE, rectangles = matrix(1,
## len.DAT, 2), bg = bg, fg = bg)
## if (method == "circle" && plotCI == "n") {
## symbols(Pos, add = TRUE, inches = FALSE, circles = asp_rescale_factor *
## 0.9 * abs(DAT)^0.5/2, fg = col.border, bg = col.fill)
## }
## if (method == "ellipse" && plotCI == "n") {
## ell.dat = function(rho, length = 99) {
## k = seq(0, 2 * pi, length = length)
## x = cos(k + acos(rho)/2)/2
## y = cos(k - acos(rho)/2)/2
## cbind(rbind(x, y), c(NA, NA))
## }
## ELL.dat = lapply(DAT, ell.dat)
## ELL.dat2 = 0.85 * matrix(unlist(ELL.dat), ncol = 2, byrow = TRUE)
## ELL.dat2 = ELL.dat2 + Pos[rep(1:length(DAT), each = 100),
## ]
## polygon(ELL.dat2, border = col.border, col = col.fill)
## }
## if (is.null(number.digits)) {
## number.digits = switch(addCoefasPercent + 1, 2, 0)
## }
## stopifnot(number.digits%%1 == 0)
## stopifnot(number.digits >= 0)
## if (method == "number" && plotCI == "n") {
## x = (DAT - int) * ifelse(addCoefasPercent, 100, 1)/zoom
## text(Pos[, 1], Pos[, 2], font = number.font, col = col.fill,
## labels = format(round(x, number.digits), nsmall = number.digits),
## cex = number.cex)
## }
## NA_LABEL_MAX_CHARS = 2
## if (is.matrix(PosNA) && nrow(PosNA) > 0) {
## stopifnot(is.matrix(PosNA))
## if (na.label == "square") {
## symbols(PosNA, add = TRUE, inches = FALSE, squares = rep(1,
## nrow(PosNA)), bg = na.label.col, fg = na.label.col)
## }
## else if (nchar(na.label) %in% 1:NA_LABEL_MAX_CHARS) {
## symbols(PosNA, add = TRUE, inches = FALSE, squares = rep(1,
## nrow(PosNA)), fg = bg, bg = bg)
## text(PosNA[, 1], PosNA[, 2], font = number.font,
## col = na.label.col, labels = na.label, cex = number.cex,
## ...)
## }
## else {
## stop(paste("Maximum number of characters for NA label is:",
## NA_LABEL_MAX_CHARS))
## }
## }
## if (method == "pie" && plotCI == "n") {
## symbols(Pos, add = TRUE, inches = FALSE, circles = rep(0.5,
## len.DAT) * 0.85, fg = col.border)
## pie.dat = function(theta, length = 100) {
## k = seq(pi/2, pi/2 - theta, length = 0.5 * length *
## abs(theta)/pi)
## x = c(0, cos(k)/2, 0)
## y = c(0, sin(k)/2, 0)
## cbind(rbind(x, y), c(NA, NA))
## }
## PIE.dat = lapply(DAT * 2 * pi, pie.dat)
## len.pie = unlist(lapply(PIE.dat, length))/2
## PIE.dat2 = 0.85 * matrix(unlist(PIE.dat), ncol = 2, byrow = TRUE)
## PIE.dat2 = PIE.dat2 + Pos[rep(1:length(DAT), len.pie),
## ]
## polygon(PIE.dat2, border = "black", col = col.fill)
## }
## if (method == "shade" && plotCI == "n") {
## symbols(Pos, add = TRUE, inches = FALSE, squares = rep(1,
## len.DAT), bg = col.fill, fg = addgrid.col)
## shade.dat = function(w) {
## x = w[1]
## y = w[2]
## rho = w[3]
## x1 = x - 0.5
## x2 = x + 0.5
## y1 = y - 0.5
## y2 = y + 0.5
## dat = NA
## if ((addshade == "positive" || addshade == "all") &&
## rho > 0) {
## dat = cbind(c(x1, x1, x), c(y, y1, y1), c(x,
## x2, x2), c(y2, y2, y))
## }
## if ((addshade == "negative" || addshade == "all") &&
## rho < 0) {
## dat = cbind(c(x1, x1, x), c(y, y2, y2), c(x,
## x2, x2), c(y1, y1, y))
## }
## return(t(dat))
## }
## pos_corr = rbind(cbind(Pos, DAT))
## pos_corr2 = split(pos_corr, 1:nrow(pos_corr))
## SHADE.dat = matrix(na.omit(unlist(lapply(pos_corr2, shade.dat))),
## byrow = TRUE, ncol = 4)
## segments(SHADE.dat[, 1], SHADE.dat[, 2], SHADE.dat[,
## 3], SHADE.dat[, 4], col = shade.col, lwd = shade.lwd)
## }
## if (method == "square" && plotCI == "n") {
## draw_method_square(Pos, DAT, asp_rescale_factor, col.border,
## col.fill)
## }
## if (method == "color" && plotCI == "n") {
## draw_method_color(Pos, col.border, col.fill)
## }
## draw_grid(AllCoords, addgrid.col)
## if (plotCI != "n") {
## if (is.null(lowCI.mat) || is.null(uppCI.mat)) {
## stop("Need lowCI.mat and uppCI.mat!")
## }
## if (order != "original") {
## lowCI.mat = lowCI.mat[ord, ord]
## uppCI.mat = uppCI.mat[ord, ord]
## }
## pos.lowNew = getPos.Dat(lowCI.mat)[[1]]
## lowNew = getPos.Dat(lowCI.mat)[[2]]
## pos.uppNew = getPos.Dat(uppCI.mat)[[1]]
## uppNew = getPos.Dat(uppCI.mat)[[2]]
## k1 = (abs(uppNew) > abs(lowNew))
## bigabs = uppNew
## bigabs[which(!k1)] = lowNew[!k1]
## smallabs = lowNew
## smallabs[which(!k1)] = uppNew[!k1]
## sig = sign(uppNew * lowNew)
## color_bigabs = col[ceiling((bigabs + 1) * length(col)/2)]
## color_smallabs = col[ceiling((smallabs + 1) * length(col)/2)]
## if (plotCI == "circle") {
## symbols(pos.uppNew[, 1], pos.uppNew[, 2], add = TRUE,
## inches = FALSE, circles = 0.95 * abs(bigabs)^0.5/2,
## bg = ifelse(sig > 0, col.fill, color_bigabs),
## fg = ifelse(sig > 0, col.fill, color_bigabs))
## symbols(pos.lowNew[, 1], pos.lowNew[, 2], add = TRUE,
## inches = FALSE, circles = 0.95 * abs(smallabs)^0.5/2,
## bg = ifelse(sig > 0, bg, color_smallabs), fg = ifelse(sig >
## 0, col.fill, color_smallabs))
## }
## if (plotCI == "square") {
## symbols(pos.uppNew[, 1], pos.uppNew[, 2], add = TRUE,
## inches = FALSE, squares = abs(bigabs)^0.5, bg = ifelse(sig >
## 0, col.fill, color_bigabs), fg = ifelse(sig >
## 0, col.fill, color_bigabs))
## symbols(pos.lowNew[, 1], pos.lowNew[, 2], add = TRUE,
## inches = FALSE, squares = abs(smallabs)^0.5,
## bg = ifelse(sig > 0, bg, color_smallabs), fg = ifelse(sig >
## 0, col.fill, color_smallabs))
## }
## if (plotCI == "rect") {
## rect.width = 0.25
## rect(pos.uppNew[, 1] - rect.width, pos.uppNew[, 2] +
## smallabs/2, pos.uppNew[, 1] + rect.width, pos.uppNew[,
## 2] + bigabs/2, col = col.fill, border = col.fill)
## segments(pos.lowNew[, 1] - rect.width, pos.lowNew[,
## 2] + DAT/2, pos.lowNew[, 1] + rect.width, pos.lowNew[,
## 2] + DAT/2, col = "black", lwd = 1)
## segments(pos.uppNew[, 1] - rect.width, pos.uppNew[,
## 2] + uppNew/2, pos.uppNew[, 1] + rect.width,
## pos.uppNew[, 2] + uppNew/2, col = "black", lwd = 1)
## segments(pos.lowNew[, 1] - rect.width, pos.lowNew[,
## 2] + lowNew/2, pos.lowNew[, 1] + rect.width,
## pos.lowNew[, 2] + lowNew/2, col = "black", lwd = 1)
## segments(pos.lowNew[, 1] - 0.5, pos.lowNew[, 2],
## pos.lowNew[, 1] + 0.5, pos.lowNew[, 2], col = "grey70",
## lty = 3)
## }
## }
## if (!is.null(addCoef.col) && method != "number") {
## text(Pos[, 1], Pos[, 2], col = addCoef.col, labels = format(round((DAT -
## int) * ifelse(addCoefasPercent, 100, 1)/zoom, number.digits),
## nsmall = number.digits), cex = number.cex, font = number.font)
## }
## if (!is.null(p.mat)) {
## pos.pNew = getPos.Dat(p.mat)[[1]]
## pNew = getPos.Dat(p.mat)[[2]]
## }
## if (!is.null(p.mat) && insig != "n") {
## if (!is.null(rownames(p.mat)) | !is.null(rownames(p.mat))) {
## if (!all(colnames(p.mat) == colnames(corr)) | !all(rownames(p.mat) ==
## rownames(corr))) {
## warning("p.mat and corr may be not paired, their rownames and colnames are not totally same!")
## }
## }
## if (insig == "label_sig") {
## if (!is.character(pch))
## pch = "*"
## place_points = function(sig.locs, point) {
## text(pos.pNew[, 1][sig.locs], pos.pNew[, 2][sig.locs],
## labels = point, col = pch.col, cex = pch.cex,
## lwd = 2)
## }
## if (length(sig.level) == 1) {
## place_points(sig.locs = which(pNew < sig.level),
## point = pch)
## }
## else {
## l = length(sig.level)
## for (i in seq_along(sig.level)) {
## iter = l + 1 - i
## pchTmp = paste(rep(pch, i), collapse = "")
## if (i == length(sig.level)) {
## locs = which(pNew < sig.level[iter])
## if (length(locs)) {
## place_points(sig.locs = locs, point = pchTmp)
## }
## }
## else {
## locs = which(pNew < sig.level[iter] & pNew >
## sig.level[iter - 1])
## if (length(locs)) {
## place_points(sig.locs = locs, point = pchTmp)
## }
## }
## }
## }
## }
## else {
## ind.p = which(pNew > sig.level)
## p_inSig = length(ind.p) > 0
## if (insig == "pch" && p_inSig) {
## points(pos.pNew[, 1][ind.p], pos.pNew[, 2][ind.p],
## pch = pch, col = pch.col, cex = pch.cex, lwd = 2)
## }
## if (insig == "p-value" && p_inSig) {
## text(pos.pNew[, 1][ind.p], pos.pNew[, 2][ind.p],
## round(pNew[ind.p], number.digits), col = pch.col)
## }
## if (insig == "blank" && p_inSig) {
## symbols(pos.pNew[, 1][ind.p], pos.pNew[, 2][ind.p],
## inches = FALSE, squares = rep(1, length(pos.pNew[,
## 1][ind.p])), fg = addgrid.col, bg = bg, add = TRUE)
## }
## }
## }
## if (cl.pos != "n") {
## colRange = assign.color(dat = col.lim2)
## ind1 = which(col == colRange[1])
## ind2 = which(col == colRange[2])
## colbar = col[ind1:ind2]
## if (is.null(cl.length)) {
## cl.length = ifelse(length(colbar) > 20, 11, length(colbar) +
## 1)
## }
## labels = seq(col.lim[1], col.lim[2], length = cl.length)
## if (cl.pos == "r") {
## vertical = TRUE
## xlim = c(m2 + 0.5 + mm * 0.02, m2 + 0.5 + mm * cl.ratio)
## ylim = c(n1 - 0.5, n2 + 0.5)
## }
## if (cl.pos == "b") {
## vertical = FALSE
## xlim = c(m1 - 0.5, m2 + 0.5)
## ylim = c(n1 - 0.5 - nn * cl.ratio, n1 - 0.5 - nn *
## 0.02)
## }
## colorlegend(colbar = colbar, labels = round(labels, 2),
## offset = cl.offset, ratio.colbar = 0.3, cex = cl.cex,
## xlim = xlim, ylim = ylim, vertical = vertical, align = cl.align.text)
## }
## if (tl.pos != "n") {
## pos.xlabel = cbind(m1:m2, n2 + 0.5 + laboffset)
## pos.ylabel = cbind(m1 - 0.5, n2:n1)
## if (tl.pos == "td") {
## if (type != "upper") {
## stop("type should be 'upper' if tl.pos is 'dt'.")
## }
## pos.ylabel = cbind(m1:(m1 + nn) - 0.5, n2:n1)
## }
## if (tl.pos == "ld") {
## if (type != "lower") {
## stop("type should be 'lower' if tl.pos is 'ld'.")
## }
## pos.xlabel = cbind(m1:m2, n2:(n2 - mm) + 0.5 + laboffset)
## }
## if (tl.pos == "d") {
## pos.ylabel = cbind(m1:(m1 + nn) - 0.5, n2:n1)
## pos.ylabel = pos.ylabel[1:min(n, m), ]
## symbols(pos.ylabel[, 1] + 0.5, pos.ylabel[, 2], add = TRUE,
## bg = bg, fg = addgrid.col, inches = FALSE, squares = rep(1,
## length(pos.ylabel[, 1])))
## text(pos.ylabel[, 1] + 0.5, pos.ylabel[, 2], newcolnames[1:min(n,
## m)], col = tl.col, cex = tl.cex, ...)
## }
## else {
## if (tl.pos != "l") {
## text(pos.xlabel[, 1], pos.xlabel[, 2], newcolnames,
## srt = tl.srt, adj = ifelse(tl.srt == 0, c(0.5,
## 0), c(0, 0)), col = tl.col, cex = tl.cex,
## offset = tl.offset, ...)
## }
## text(pos.ylabel[, 1], pos.ylabel[, 2], newrownames,
## col = tl.col, cex = tl.cex, pos = 2, offset = tl.offset,
## ...)
## }
## }
## title(title, ...)
## if (type == "full" && plotCI == "n" && !is.null(addgrid.col)) {
## rect(m1 - 0.5, n1 - 0.5, m2 + 0.5, n2 + 0.5, border = addgrid.col)
## }
## if (!is.null(addrect) && order == "hclust" && type == "full") {
## corrRect.hclust(corr, k = addrect, method = hclust.method,
## col = rect.col, lwd = rect.lwd)
## }
## corrPos = data.frame(PosName, Pos, DAT)
## colnames(corrPos) = c("xName", "yName", "x", "y", "corr")
## if (!is.null(p.mat)) {
## corrPos = cbind(corrPos, pNew)
## colnames(corrPos)[6] = c("p.value")
## }
## corrPos = corrPos[order(corrPos[, 3], -corrPos[, 4]), ]
## rownames(corrPos) = NULL
## res = list(corr = corr, corrPos = corrPos, arg = list(type = type))
## invisible(res)
## }
## <bytecode: 0x7fd8b1623af8>
## <environment: namespace:corrplot>
Question: Identify the strongest and weakest correlations. For example, is ozone more strongly correlated with temperature or wind speed? Explain what the correlation values suggest about relationships between variables. The strongest correlation of Ozone and Temp is due the cause and effect that as tempurature rise Ozone tends to increase. The weakest correlation is of the Temp and Wind, meaning the data does not prove there is a strong relation between these two factors.
Generate the summary table grouped by Month.Generate the summary table grouped by Month. It should include count, average mean of ozone, average mean of temperature, and average mean of wind per month.
# your code goes here
library(tidyverse)
airquality_clean <- airquality %>%
select(Ozone, Temp, Wind, Month) %>%
drop_na()
monthly_summary <- airquality_clean %>%
group_by(Month) %>%
summarise(
count = n(),
mean_ozone = mean(Ozone),
mean_temp = mean(Temp),
mean_wind = mean(Wind)
)
monthly_summary
## # A tibble: 5 × 5
## Month count mean_ozone mean_temp mean_wind
## <int> <int> <dbl> <dbl> <dbl>
## 1 5 26 23.6 66.7 11.5
## 2 6 9 29.4 78.2 12.2
## 3 7 26 59.1 83.9 8.52
## 4 8 26 60.0 84.0 8.57
## 5 9 29 31.4 76.9 10.1
monthly_summary <- monthly_summary %>%
mutate(
month_name = case_when(
Month == 5 ~ "May",
Month == 6 ~ "June",
Month == 7 ~ "July",
Month == 8 ~ "August",
Month == 9 ~ "September"
)
) %>%
select(month_name, everything())
print(monthly_summary)
## # A tibble: 5 × 6
## month_name Month count mean_ozone mean_temp mean_wind
## <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 May 5 26 23.6 66.7 11.5
## 2 June 6 9 29.4 78.2 12.2
## 3 July 7 26 59.1 83.9 8.52
## 4 August 8 26 60.0 84.0 8.57
## 5 September 9 29 31.4 76.9 10.1
Question: Which month has the highest average ozone level? How do temperature and wind speed vary across months? What environmental factors might explain these differences?
The highest Average Ozone level falls in August at 59.96. The temp and wind speed vary across months as first the wind rises from May until July. Highest temps are seen in July and August. While the windspeed is seen at its highest in May& June, dropping to its lowest in July and August.
Submission Requirements
Publish it to Rpubs and submit your link on blackboard