Noise Data
noisedata <- c(55.3, 55.3, 55.3, 55.9, 55.9, 55.9, 55.9, 56.1, 56.1, 56.1, 56.1, 56.1, 56.1, 56.8, 56.8, 57.0, 57.0, 57.0, 57.8, 57.8, 57.8, 57.9, 57.9, 57.9, 58.8, 58.8, 58.8, 59.8, 59.8, 59.8, 62.2, 62.2, 63.8, 63.8, 63.8, 63.9, 63.9, 63.9, 64.7, 64.7, 64.7, 65.1, 65.1, 65.1, 65.3, 65.3, 65.3, 65.3, 67.4, 67.4, 67.4, 67.4, 68.7, 68.7, 68.7, 68.7, 69.0, 70.4, 70.4, 71.2, 71.2, 71.2, 73.0, 73.0, 73.1, 73.1, 74.6, 74.6, 74.6, 74.6, 79.3, 79.3, 79.3, 79.3, 83.0, 83.0, 83.0)
Basic Info
median(noisedata)
## [1] 64.7
The middle ordered noise level recorded for a worker in the data set is 64.700 dBA.
mean(noisedata)
## [1] 64.88701
The average value noise level recorded in the data set is 64.887 dBA.
The mean and median are nearly identical, which reveals to us that the data is almost symmetric.
sd(noisedata)
## [1] 7.802671
The average distance between each recorded noise level in the data set is 7.803 dBA.
IQR(noisedata)
## [1] 12.6
The average distance between the central data of recorded noise levels in the data set is 12.600 dBA.
From the interquartile range and standard deviation, we can see that the central part of the data is very spread out.
Histogram
hist(noisedata,
main = "Noise Level For Workers Near Construction",
xlab = "Noise Level in dBA",
ylab = "Frequency")
From the histogram, the data is bimodal with two distinct peaks and is skewed to the right. Specifically, most individuals working in an office building experienced dBA closer to the minimum in the data set as opposed to the maximum. This also means these workers most commonly experienced a noise level in one of the two peaks of the histogram.
Box Plot
boxplot(noisedata,
main = "Noise Level For Workers Near Construction",
ylab = "Noise Level in dBA",
horizontal = TRUE)
From the box plot, we observe that there does not appear to be any outliers in the data set. Along with that, the data is clearly skewed to the right. What this means is, nobody experienced noise levels extremely higher or lower than all the workers in the data sample. Additionally, we can see that most workers underwent noise levels near the smaller dBAs in the data set.
Part A - Ozone Analysis
mean(airquality$Ozone, na.rm = TRUE)
## [1] 42.12931
sd(airquality$Ozone, na.rm = TRUE)
## [1] 32.98788
boxplot(airquality$Ozone,
main = "Ozone Measurements in New York 1973",
ylab = "Ozone Measurements in ppb",
horizontal = TRUE)
Above is the mean and standard deviation of a data set, which measures the Ozone of air in New York of 1973. Using the data, the average ozone measurement in New York in 1973 is 42.129 ppb. The same data reveals that the average distance between ozone measurements is 32.988 ppb. This relationship between mean and standard deviation shows us that there is a wide variability. In context, this means the ozone measurements in New York in 1973 varied a ton. The data set also creates a box plot as shown above. We can see here that the data is right skewed and also includes two outliers. In context, there were two days where the ozone measurements reached extremely high values of 130 and 165 ppb approximately.
Part B - Normal Distribution Comparison
ozone_norm <- rnorm(200, mean = mean(airquality$Ozone, na.rm = TRUE), sd = sd(airquality$Ozone, na.rm = TRUE))
temp_norm <- rnorm(200, mean = mean(airquality$Temp, na.rm = TRUE), sd = sd(airquality$Temp, na.rm = TRUE))
boxplot(airquality$Ozone, ozone_norm, airquality$Temp, temp_norm,
main = "Multiple boxplots for comparison",
at = c(1,2,4,5),
names = c("ozone", "normal", "temp", "normal"),
las = 2,
col = c("orange","red"),
border = "brown",
horizontal = TRUE,
notch = TRUE)
Part C - Analysis of Normality
Above is a box plot comparison between the ozone measurements from Part A, displayed in orange, and a data set of 200 randomly sampled points from a normal distribution, displayed in red. This random sample’s mean and standard deviation are the same as the sample mean and sample standard deviation of the ozone measurements. The same thing can be seen in regard to the temperatures calculated in the same data set. We are able to observe that while close in comparison, the random sample for both the ozone and temperature measurements do not have the same mean or standard deviation compared to the data recorded. This is because the sample mean and the standard deviation are both random variables. Along with that, it appears that neither the ozone or temperature data is drawn from a normal distribution, but rather a right skew is shown.
Part D - Monthly Temperature Comparison
boxplot(Temp ~ Month, data = airquality,
main = "Temperature by Month in New York in 1973",
xlab = "Month (May to September)",
ylab = "Temperature in Fahrenheit")
Part E - Monthly Temperature Analysis
The graphic above shows the temperatures recorded in New York in 1973 in the months May to September. Each box plot shows the data from every month left to right starting with May. As we can see, the month of September held the biggest variation of temperatures. Additionally, we can observe a significant change in temperature between the months of May and June. The average temperature increased by more than 10 degrees Fahrenheit. June and July also saw a handful of outliers, where the temperature was significantly lower than the average that month. One day in June the temperature was about 67 degrees, when the average that month was around 78 degrees. Two days in July were approximately 75 and 76 degrees, while the mean temperature was around 85 degrees. Additionally, we can see that the temperatures from September appear to make up a normal distribution.