load("/cloud/project/OAW.Rdata")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gmodels)

problem 1

str(OAW)
## tibble [28,708 × 7] (S3: tbl_df/tbl/data.frame)
##  $ PRCP   : num [1:28708] 0 0 0.2992 1.0787 0.0591 ...
##  $ TMAX   : num [1:28708] 66 63 57.9 55 57 ...
##  $ TMIN   : num [1:28708] 50 46.9 44.1 45 46 ...
##  $ yr     : num [1:28708] 1941 1941 1941 1941 1941 ...
##  $ mo     : Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ warmth : Factor w/ 2 levels "Cold","Warm": 2 2 1 1 1 1 1 2 2 2 ...
##  $ wetness: Factor w/ 3 levels "Dry","Damp","Really Wet": 1 1 3 3 2 1 1 1 1 1 ...
summary(OAW)
##       PRCP             TMAX             TMIN             yr      
##  Min.   :0.0000   Min.   : 17.96   Min.   :-7.96   Min.   :1941  
##  1st Qu.:0.0000   1st Qu.: 50.00   1st Qu.:33.08   1st Qu.:1961  
##  Median :0.0000   Median : 59.00   Median :39.92   Median :1980  
##  Mean   :0.1363   Mean   : 60.54   Mean   :39.81   Mean   :1980  
##  3rd Qu.:0.1417   3rd Qu.: 71.06   3rd Qu.:46.94   3rd Qu.:2000  
##  Max.   :4.8189   Max.   :104.00   Max.   :69.08   Max.   :2019  
##                                                                  
##        mo         warmth            wetness     
##  8      : 2449   Cold:15233   Dry       :15903  
##  10     : 2449   Warm:13475   Damp      : 5517  
##  7      : 2448                Really Wet: 7288  
##  12     : 2447                                  
##  5      : 2432                                  
##  3      : 2418                                  
##  (Other):14065

There are 28,708 observations with 7 variables. Month, Warmth and Wetness are categorical PRCP, TMAX, TMIN and Year are quantitative

problem 2

summary(OAW$TMAX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.96   50.00   59.00   60.54   71.06  104.00
sd(OAW$TMAX)
## [1] 13.68366
IQR(OAW$TMAX)
## [1] 21.06
hist(OAW$TMAX)

boxplot(OAW$TMAX, horizontal = TRUE)

Distribution is skewed slightly right with two outliers. There is a significant peak at or around 45.

problem 3

str(OAW$PRCP)
##  num [1:28708] 0 0 0.2992 1.0787 0.0591 ...
summary(OAW$PRCP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1363  0.1417  4.8189
sd(OAW$PRCP)
## [1] 0.3007292
IQR(OAW$PRCP)
## [1] 0.1417323
hist(OAW$PRCP)

boxplot(OAW$PRCP, horizontal = TRUE)

You could use the structure to determine the location of the variable. You could use the summary to determine the variation of the variable. The answer to question two will be the most appropriate as we can see the precipitation increasing in the graph aligning with the measurments we’ve been provided with in the PRCP format.

problem 4

table(OAW$wetness)
## 
##        Dry       Damp Really Wet 
##      15903       5517       7288

The variable is going to be a frequency. There are a higher freuency of dry days, and low frequency of damp days.

problem 5

MO is categorical and TMAX is numerical.

str(OAW$mo)
##  Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
summary(OAW$mo)
##    1    2    3    4    5    6    7    8    9   10   11   12 
## 2417 2203 2418 2335 2432 2370 2448 2449 2370 2449 2370 2447
str(OAW$warmth)
##  Factor w/ 2 levels "Cold","Warm": 2 2 1 1 1 1 1 2 2 2 ...
summary(OAW$warmth)
##  Cold  Warm 
## 15233 13475
plot(OAW$warmth, OAW$mo)

tapply(OAW$warmth, OAW$mo, summary)
## $`1`
## Cold Warm 
## 2411    6 
## 
## $`2`
## Cold Warm 
## 2133   70 
## 
## $`3`
## Cold Warm 
## 2097  321 
## 
## $`4`
## Cold Warm 
## 1518  817 
## 
## $`5`
## Cold Warm 
##  724 1708 
## 
## $`6`
## Cold Warm 
##  170 2200 
## 
## $`7`
## Cold Warm 
##   12 2436 
## 
## $`8`
## Cold Warm 
##   16 2433 
## 
## $`9`
## Cold Warm 
##  126 2244 
## 
## $`10`
## Cold Warm 
## 1302 1147 
## 
## $`11`
## Cold Warm 
## 2283   87 
## 
## $`12`
## Cold Warm 
## 2441    6

The Warmest month is going to be July, as it’s slightly deviates in size compared to August according to observations. The coldest month will be December, as its recorded observations mark it as colder than January. December and January have the biggest variancces in TMAX according to the observations.

problem 6

MO is a catergorical Variable and WETNESS is Numberical.

str(OAW$mo)
##  Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
summary(OAW$mo)
##    1    2    3    4    5    6    7    8    9   10   11   12 
## 2417 2203 2418 2335 2432 2370 2448 2449 2370 2449 2370 2447
str(OAW$wetness)
##  Factor w/ 3 levels "Dry","Damp","Really Wet": 1 1 3 3 2 1 1 1 1 1 ...
summary(OAW$wetness)
##        Dry       Damp Really Wet 
##      15903       5517       7288
plot(OAW$mo, OAW$wetness)

November has the largest fraction of “really wet” days. July has the largest fractions of “dry” days. The first signs of spring begin in March as the dryness begins to increase. As we gradually see the wetness decrease and the dryness increase on the graph we can assume as April becomes less wet, more sunshine will appear allowing more flowers to bloom in May.

Problem 7

Both variables are numerical according to the summary.

str(OAW$yr)
##  num [1:28708] 1941 1941 1941 1941 1941 ...
summary(OAW$yr)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1941    1961    1980    1980    2000    2019
str(OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.
##  NULL
summary(OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.
## Length  Class   Mode 
##      0   NULL   NULL
plot(OAW$yr, OAW$tmax)
## Warning: Unknown or uninitialised column: `tmax`.

The output is null, therefore unreadable. The numerical value is refrencing observations for that data field by year.

problem 8

According to our data summary they’re both numerical variables.

str(OAW$TMAX)
##  num [1:28708] 66 63 57.9 55 57 ...
summary(OAW$TMAX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.96   50.00   59.00   60.54   71.06  104.00
str(OAW$TMIN)
##  num [1:28708] 50 46.9 44.1 45 46 ...
summary(OAW$TMIN)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -7.96   33.08   39.92   39.81   46.94   69.08
plot(OAW$TMAX, OAW$TMIN)

Yes, data has been provided making it readable. The numerical results of the data tells us the quadrants, minimum, median and maximum recording of the observations done for both TMIN and TMAX.