DAT 301 Midterm

Jack Huie

2025-11-02

Dataset

- The dataset used in this project is the built-in dataset airquality”. - The dataset is based on the New York Air Quality Measurements from May to September 1973. - The dataset has 6 variables, in columns titled: Ozone, Solar.R, Wind, Temp, Month, and Day.

Overview

The dataset will be analyzed using 4 plots to help visually understand the data, and a statistical summary to define the important values of the data. Using these graphs, we will be able to make a conclusion about how the variables temperaure, wind, and solar radiation relate to and affect ozone levels.

Ggplot Ozone level vs. Temperature

By graphing a scatterplot of temperature vs. ozone level, we can understand possible correlations between recorded temperature and recorded ozone levels. This correlation is best visualized by use of a line of best fit, which indicates a relationship between rise in temperature and rise in recorded ozone levels. The graph excludes missing data, and includes all recorded points, including some outliers.

Ggplot Ozone Level by Month (May-September)

A boxplot is an excellent graph to compare quantitative variables with respective categorical variables. By organizing the data to read the name of the month rather than the number, it is easier to visualize how the ozone levels are grouped. The boxplot shows the overall spread of the data, as well as the important summary including minimum, q1,median, q3, maximum, and outliers.

Plotly: Temperature, Wind, and Ozone

In our analysis of the 3-variable scatterplot, it is important to note why wind, temperature, and ozone level were chosen as the variables. Because the dataset is an observation of air quality, the pollutant level, in this case ozone, must be included in analysis. Wind and Temperature are both variables that can affect the pollutant, as temperature affects how much ozone forms, and wind affects how much is actually in the air, or the concentration of ozone. Using these 3 variables in a scatterplot, we can visualize how ozone levels are affected by temperature and wind.

3D plot

A 3d plot is useful when interpreting data with many variables. We have already discussed temperature and wind and how they affect pollution, but solar radiation is another variable that affects ozone. Ozone must be plotted on the graph, as that is the best way to note trends and compare it to the variables. Solar radiation is the easiest to see on a color grade, and how it affects ozone is a simple as: brighter colors signify more solar radiation, and from the graph, typically correspond to higher ozone levels, along with higher temperatures and lower wind speeds.

Statistical Analysis

A statistical analysis of a dataset consists of a traditional 5 number summary, including the median, q1, q3, min, and max of the data. The analysis also included any outliers. We will be evaluating the statistical analysis of the ozone level by month. We can see from the statistical summary that the 8th month, August, had the most variation between the minimum and maximum recorded ozone levels, and the 6th month, June, had the least variation. Comparing the medians, the 6th and 9th months have the lowest medians of 23, but not the lowest averages. Similarly, July, the 7th month has the highest median recorded ozone levels, but not the highest average. If we were to introduce temperature as a variable and find the summary of that compared to the month, we would notice a similar trend to our graphs about the relationship between temperature and ozone level.

## # A tibble: 5 × 7
##   Month   Min    Q1 Median  Mean    Q3   Max
##   <int> <int> <dbl>  <dbl> <dbl> <dbl> <int>
## 1     5     1  11       18  24.1  32.5   115
## 2     6    12  20       23  29.4  37      71
## 3     7     7  36.2     60  59.1  79.8   135
## 4     8     9  25.5     45  60    84.5   168
## 5     9     7  16       23  31.4  36      96

Conclusion

Based on the statistical summary, as well as our graphs, we can make the conclusion that there is a correlation between the variables temperature, wind, and solar radiation and ozone levels. The variables that are associated with production of ozone, temperature and solar radiation, cause higher levels of ozone when they are higher, and wind causes less ozone parts per billion when it is higher. The conditions for worse air quality can be concluded as high temperatures, high solar radiation, and low winds. Based on the boxplot we made, we can relate these conditions to two months, July and August.