surfaceTemp <- read.csv("average-monthly-surface-temperature.csv")
# Omits any null or NA data
surfaceTemp <- na.omit(surfaceTemp)
sort() - sorts the data in ascending or descending orders summary() - provides a collection of summary statistical values mean() - finds the arithmetic average (i.e., adds all of the values and divides by the total number of values) median() - finds the middle value when ordered mfv() - finds the most frequent value, also called the MODE. Note the use of the ‘Modeest’ package is needed sd() - finds the standard deviation max() - largest value in the variable min() - smallest value in the variable range() -lowest to highest values
# lengths of data
lengthData <- nrow(surfaceTemp)
#variable one
averageOne <- surfaceTemp$Average.surface.temperature
#variable two
averageTwo <- surfaceTemp$Average.surface.temperature.1
# summary
avgOneSummary <- summary(averageOne)
avgTwoSummary <- summary(averageTwo)
# averages
avgOne <- mean(averageOne)
avgTwo <- mean(averageTwo)
avgAvg <- mean(avgOne, avgTwo)
#medians
medianTemp <- median(averageOne)
medianTempTwo <- median(averageTwo)
#Mode
modeTemp <- mfv(averageOne)
modeTempTwo <- mfv(averageTwo)
rangeMode <- (max(modeTemp) - min(modeTemp))
# SD
sdTemp <- sd(averageOne)
sdTempTwo <- sd(averageTwo)
# range of temps
maxAvg <- max(averageOne)
minAvg <- min(averageOne)
rangeAvg <- maxAvg - minAvg
# range of years
years <- surfaceTemp$year
oldestYear <- min(years)
latestYear <- max(years)
rangeYears <- oldestYear - latestYear
print(avgAvg)
## [1] 18.07207
print(avgOneSummary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -36.24 12.30 22.06 18.07 25.32 39.89
print(rangeYears)
## [1] -84
print(medianTemp)
## [1] 22.05579
print(modeTemp)
## [1] 22.61807 22.98580 23.74204 24.02933 24.20334 24.34774 24.48684 24.67424
## [9] 24.68173 24.81125 24.94287 24.95078 25.12210 25.20633 25.24775 25.29535
## [17] 25.33140 25.49304 25.57120 25.63913 25.87320 26.14557 26.26956 26.31015
## [25] 26.43381 26.45145 26.46195 26.59354 26.70111 26.76041 26.91000 26.91238
## [33] 27.15457 27.38791 27.67205
hist(averageOne,
breaks = 100,
xlab = "Temperatures",
main = "Histogram of the average surface temperatures (100 breaks)",
ylab = "density",
freq = FALSE)
lines(density(averageOne),
col = "maroon",
lwd = 4)
hist(averageOne,
breaks = 500,
xlab = "Temperatures",
main = "Histogram of the average surface temperatures (500 breaks)",
ylab = "density",
freq = FALSE)
lines(density(averageOne),
col = "pink",
lwd = 4)
When looking at the summary statistics, the data gathered had a min of -36 and a max of 40. The median was around 22 degrees while the mean was 18. From purely just that information, I gathered that the majority of the data would live around 15-25 degrees. When looking at the most frequented values, the values are all between 5 degrees of 22-27 degrees. The histogram complements with the visualization of the data is skewed to the right. This helps my understanding of how most of the surface temperature lives around 22 degrees. while there was once a low of -36, the probability of that happening is very low.