load("/cloud/project/OAW.Rdata")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gmodels)
str(OAW)
## tibble [28,708 × 7] (S3: tbl_df/tbl/data.frame)
## $ PRCP : num [1:28708] 0 0 0.2992 1.0787 0.0591 ...
## $ TMAX : num [1:28708] 66 63 57.9 55 57 ...
## $ TMIN : num [1:28708] 50 46.9 44.1 45 46 ...
## $ yr : num [1:28708] 1941 1941 1941 1941 1941 ...
## $ mo : Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ warmth : Factor w/ 2 levels "Cold","Warm": 2 2 1 1 1 1 1 2 2 2 ...
## $ wetness: Factor w/ 3 levels "Dry","Damp","Really Wet": 1 1 3 3 2 1 1 1 1 1 ...
summary(OAW)
## PRCP TMAX TMIN yr
## Min. :0.0000 Min. : 17.96 Min. :-7.96 Min. :1941
## 1st Qu.:0.0000 1st Qu.: 50.00 1st Qu.:33.08 1st Qu.:1961
## Median :0.0000 Median : 59.00 Median :39.92 Median :1980
## Mean :0.1363 Mean : 60.54 Mean :39.81 Mean :1980
## 3rd Qu.:0.1417 3rd Qu.: 71.06 3rd Qu.:46.94 3rd Qu.:2000
## Max. :4.8189 Max. :104.00 Max. :69.08 Max. :2019
##
## mo warmth wetness
## 8 : 2449 Cold:15233 Dry :15903
## 10 : 2449 Warm:13475 Damp : 5517
## 7 : 2448 Really Wet: 7288
## 12 : 2447
## 5 : 2432
## 3 : 2418
## (Other):14065
There are 28,708 observations with 7 variables. Month, Warmth and Wetness are categorical PRCP, TMAX, TMIN and Year are quantitative
summary(OAW$TMAX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.96 50.00 59.00 60.54 71.06 104.00
sd(OAW$TMAX)
## [1] 13.68366
IQR(OAW$TMAX)
## [1] 21.06
hist(OAW$TMAX)
boxplot(OAW$TMAX, horizontal = TRUE)
Distribution is skewed slightly right with two outliers. There is a significant peak at or around 45.
str(OAW$PRCP)
## num [1:28708] 0 0 0.2992 1.0787 0.0591 ...
summary(OAW$PRCP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1363 0.1417 4.8189
sd(OAW$PRCP)
## [1] 0.3007292
IQR(OAW$PRCP)
## [1] 0.1417323
hist(OAW$PRCP)
boxplot(OAW$PRCP, horizontal = TRUE)
You could use the structure to determine the location of the variable. You could use the summary to determine the variation of the variable. The answer to question two will be the most appropriate as we can see the precipitation increasing in the graph aligning with the measurments we’ve been provided with in the PRCP format.
table(OAW$wetness)
##
## Dry Damp Really Wet
## 15903 5517 7288
The variable is going to be a frequency. There are a higher freuency of dry days, and low frequency of damp days.
MO is categorical and TMAX is numerical.
str(OAW$mo)
## Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
summary(OAW$mo)
## 1 2 3 4 5 6 7 8 9 10 11 12
## 2417 2203 2418 2335 2432 2370 2448 2449 2370 2449 2370 2447
str(OAW$warmth)
## Factor w/ 2 levels "Cold","Warm": 2 2 1 1 1 1 1 2 2 2 ...
summary(OAW$warmth)
## Cold Warm
## 15233 13475
plot(OAW$warmth, OAW$mo)
tapply(OAW$warmth, OAW$mo, summary)
## $`1`
## Cold Warm
## 2411 6
##
## $`2`
## Cold Warm
## 2133 70
##
## $`3`
## Cold Warm
## 2097 321
##
## $`4`
## Cold Warm
## 1518 817
##
## $`5`
## Cold Warm
## 724 1708
##
## $`6`
## Cold Warm
## 170 2200
##
## $`7`
## Cold Warm
## 12 2436
##
## $`8`
## Cold Warm
## 16 2433
##
## $`9`
## Cold Warm
## 126 2244
##
## $`10`
## Cold Warm
## 1302 1147
##
## $`11`
## Cold Warm
## 2283 87
##
## $`12`
## Cold Warm
## 2441 6
The Warmest month is going to be July, as it’s slightly deviates in size compared to August according to observations. The coldest month will be December, as its recorded observations mark it as colder than January. December and January have the biggest variancces in TMAX according to the observations.
MO is a catergorical Variable and WETNESS is Numberical.
str(OAW$mo)
## Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
summary(OAW$mo)
## 1 2 3 4 5 6 7 8 9 10 11 12
## 2417 2203 2418 2335 2432 2370 2448 2449 2370 2449 2370 2447
str(OAW$wetness)
## Factor w/ 3 levels "Dry","Damp","Really Wet": 1 1 3 3 2 1 1 1 1 1 ...
summary(OAW$wetness)
## Dry Damp Really Wet
## 15903 5517 7288
plot(OAW$mo, OAW$wetness)
November has the largest fraction of “really wet” days. July has the largest fractions of “dry” days. The first signs of spring begin in March as the dryness begins to increase. As we gradually see the wetness decrease and the dryness increase on the graph we can assume as April becomes less wet, more sunshine will appear allowing more flowers to bloom in May.
Both variables are numerical according to the summary.
str(OAW$yr)
## num [1:28708] 1941 1941 1941 1941 1941 ...
summary(OAW$yr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1941 1961 1980 1980 2000 2019
str(OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.
## NULL
summary(OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.
## Length Class Mode
## 0 NULL NULL
plot(OAW$yr, OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.
The output is null, therefore unreadable. The numerical value is refrencing observations for that data field by year.
According to our data summary they’re both numerical variables.
str(OAW$TMAX)
## num [1:28708] 66 63 57.9 55 57 ...
summary(OAW$TMAX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.96 50.00 59.00 60.54 71.06 104.00
str(OAW$TMIN)
## num [1:28708] 50 46.9 44.1 45 46 ...
summary(OAW$TMIN)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7.96 33.08 39.92 39.81 46.94 69.08
plot(OAW$TMAX, OAW$TMIN)
Yes, data has been provided making it readable. The numerical results of the data tells us the quadrants, minimum, median and maximum recording of the observations done for both TMIN and TMAX.