Harold Nelson
2024-02-26
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Review the section
Find the row(s) with the maximum air temperature using the tidyverse.
## Ozone Solar.R Wind Temp Month Day
## 1 76 203 9.7 97 8 28
Use mutate to create the variable hot in the airquality dataframe. A day is hot if Temp >90. It is not_hot if it is not hot. Count the number of both types of day. Also count the fraction of each type in the total days.
airquality_counts = airquality %>%
mutate(hot = Temp > 90,
not_hot = hot == FALSE) %>%
summarize(hot_count = sum(hot),
fract_hot = mean(hot),
not_hot_count = sum(not_hot),
fract_not_hot = mean(not_hot))
airquality_counts
## hot_count fract_hot not_hot_count fract_not_hot
## 1 14 0.09150327 139 0.9084967
Put the philosphers.csv file in your current working directory. Then use the import control in RStudio to import it.
Copy the import command from the console and save it in a chunk.
Use group_by and summarize to get the mean value of ozone for each month from the airquality dataframe. Also get the counts of missin Ozone values for each month.
ozone_mo = airquality %>%
group_by(Month) %>%
summarize(mean_ozone = mean(Ozone,na.rm=T),
na_days = sum(is.na(Ozone)))
ozone_mo
## # A tibble: 5 × 3
## Month mean_ozone na_days
## <int> <dbl> <int>
## 1 5 23.6 5
## 2 6 29.4 21
## 3 7 59.1 5
## 4 8 60.0 5
## 5 9 31.4 1
Do the work using the tidyverse.
library(datasauRus)
datasaurus_dozen %>%
group_by(dataset) %>%
summarize(cor = cor(x,y),
mean_x = mean(x),
sd_x = sd(x),
mean_y = mean(y),
sd_y = sd(y))
## # A tibble: 13 × 6
## dataset cor mean_x sd_x mean_y sd_y
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 away -0.0641 54.3 16.8 47.8 26.9
## 2 bullseye -0.0686 54.3 16.8 47.8 26.9
## 3 circle -0.0683 54.3 16.8 47.8 26.9
## 4 dino -0.0645 54.3 16.8 47.8 26.9
## 5 dots -0.0603 54.3 16.8 47.8 26.9
## 6 h_lines -0.0617 54.3 16.8 47.8 26.9
## 7 high_lines -0.0685 54.3 16.8 47.8 26.9
## 8 slant_down -0.0690 54.3 16.8 47.8 26.9
## 9 slant_up -0.0686 54.3 16.8 47.8 26.9
## 10 star -0.0630 54.3 16.8 47.8 26.9
## 11 v_lines -0.0694 54.3 16.8 47.8 26.9
## 12 wide_lines -0.0666 54.3 16.8 47.8 26.9
## 13 x_shape -0.0656 54.3 16.8 47.8 26.9
Use faceting to look at scatterplots of x and y by dataset.
datasaurus_dozen %>%
ggplot(aes(x = x, y = y)) +
geom_point(size = .5) +
facet_wrap(~dataset) +
ggtitle("Woe be he...")
## A Weather Report
Load the OAW2309 dataframe.
Create the dataframe Mar01 using filter.
Verify your work using head()
## # A tibble: 6 × 7
## DATE PRCP TMAX TMIN mo dy yr
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl>
## 1 1942-03-01 0.16 58 35 3 1 1942
## 2 1943-03-01 0 57 28 3 1 1943
## 3 1944-03-01 0 55 25 3 1 1944
## 4 1945-03-01 0.01 53 32 3 1 1945
## 5 1946-03-01 0.19 53 41 3 1 1946
## 6 1947-03-01 0 57 24 3 1 1947
What is the probability of rain?
the possibilities for TMAX using the values produced by summary.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 36.00 46.00 50.00 50.46 55.00 62.00
The most likely value of the maximum temperature is 50. However, on this date, maximum temperatures have ranged from 36 to 62. The middle 50% of temperatures has been between 46 and 55.
Create a function weather_forecast that accepts any month and day to produce these results.
weather_forecast = function(month,day){
days = filter(OAW2309,
mo == month,
dy == day)
rain_prob = mean(days$PRCP > 0)
print(paste("The probability of rain is ",round(rain_prob,2)))
print('')
print("Summary of the Maximum Temperature")
summary(days$TMAX)
}
weather_forecast(7,4)
## [1] "The probability of rain is 0.23"
## [1] ""
## [1] "Summary of the Maximum Temperature"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 61.00 69.00 74.00 74.88 80.00 93.00