In this milestone, you’ll explore covariation between variables in nasa_power
with ggplot2. Then, you’ll fit a linear model and extract tidy model output.
For the first part of this milestone, your task is to explore covariation between variables in the nasa_power
dataset, generating at least two plots. You can visualize multiple relationships, or explore using different geoms to visualize the same one.
You can do your work in as many code chunks as you wish, but we’ve provided one to get you started:
library(tidyverse)
library(weathermetrics)
library(kableExtra)
library(readxl)
library(skimr)
library(hydroTSM)
library(lubridate)
library(broom)
nasa_power<-data.table::fread(here::here("data","power.csv")) %>%
left_join(.,read_excel(here::here("data","precip.xlsx"))) %>%
mutate(tdew=humidity.to.dewpoint(rh = rh2m,
t = t2m,
temperature.metric = "celsius")) %>%
select(-site_id) %>%
data.frame()
skim(nasa_power)
Name | nasa_power |
Number of rows | 3653 |
Number of columns | 16 |
_______________________ | |
Column type frequency: | |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
lon | 0 | 1 | -67.50 | 0.00 | -67.50 | -67.50 | -67.50 | -67.50 | -67.50 | ▁▁▇▁▁ |
lat | 0 | 1 | -26.50 | 0.00 | -26.50 | -26.50 | -26.50 | -26.50 | -26.50 | ▁▁▇▁▁ |
year | 0 | 1 | 2016.50 | 2.87 | 2012.00 | 2014.00 | 2016.00 | 2019.00 | 2021.00 | ▇▇▇▇▇ |
mm | 0 | 1 | 6.52 | 3.45 | 1.00 | 4.00 | 7.00 | 10.00 | 12.00 | ▇▅▅▅▇ |
dd | 0 | 1 | 15.73 | 8.80 | 1.00 | 8.00 | 16.00 | 23.00 | 31.00 | ▇▇▇▇▆ |
doy | 0 | 1 | 183.15 | 105.47 | 1.00 | 92.00 | 183.00 | 274.00 | 366.00 | ▇▇▇▇▇ |
ps | 0 | 1 | 63.81 | 0.20 | 62.89 | 63.70 | 63.83 | 63.95 | 64.42 | ▁▁▆▇▁ |
t2m | 0 | 1 | 5.20 | 4.57 | -9.48 | 1.43 | 5.25 | 9.15 | 15.58 | ▁▃▇▇▃ |
t2m_max | 0 | 1 | 14.69 | 5.04 | -4.90 | 10.87 | 15.43 | 18.76 | 25.11 | ▁▂▆▇▅ |
t2m_min | 0 | 1 | -2.39 | 4.01 | -16.00 | -4.99 | -3.35 | 0.44 | 8.10 | ▁▁▇▃▂ |
ws2m | 0 | 1 | 3.15 | 1.68 | 0.70 | 1.91 | 2.67 | 3.97 | 12.42 | ▇▅▁▁▁ |
wd2m | 0 | 1 | 264.96 | 55.47 | 67.62 | 231.94 | 286.00 | 308.19 | 341.50 | ▁▂▂▅▇ |
prectotcorr | 0 | 1 | 1.59 | 5.04 | 0.00 | 0.00 | 0.00 | 1.03 | 176.81 | ▇▁▁▁▁ |
qv2m | 0 | 1 | 3.31 | 2.01 | 0.79 | 1.71 | 2.56 | 4.46 | 9.58 | ▇▃▂▂▁ |
rh2m | 0 | 1 | 40.24 | 14.35 | 10.69 | 28.50 | 38.19 | 50.69 | 81.19 | ▃▇▆▃▁ |
tdew | 0 | 1 | -8.03 | 7.31 | -23.15 | -14.19 | -9.39 | -1.71 | 7.32 | ▂▇▆▅▃ |
nasa_power %>%
dplyr::select(-lon,-lat) %>%
cor() %>%
corrplot::corrplot(type = "lower",
method = "number")
nasa_power %>%
pivot_longer(data = .,cols = c("ws2m","wd2m")) %>%
mutate(mon=factor(month.abb[mm],levels = month.abb,
ordered = T)) %>%
ggplot(data=.,aes(x=mon,y=value))+
geom_point()+
facet_wrap(~name,scales="free_y")
nasa_power %>%
pivot_longer(data = .,cols = c("t2m","t2m_min","t2m_max",
"ps","prectotcorr")) %>%
# mutate(mon=factor(month.abb[mm],levels = month.abb,
# ordered = T)) %>%
ggplot(data=.,aes(x=qv2m,y=value))+
geom_point(alpha=0.2,shape=21)+
geom_smooth(method="lm",formula = y~poly(x,1))+
facet_wrap(~name,scales="free_y")+
theme_bw()
Let’s turn our attention to two variables in nasa_power
: temperature (t2m
) and surface pressure (ps
). Run the chunk below to see a plot of these variables.
knitr::include_graphics("images/solution_08_plot.png")
First, recreate this plot showing temperature and surface pressure.
nasa_power %>%
ggplot(data=.,aes(x=t2m,y=ps))+
geom_point()+
geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'
Next, fit a linear model using these variables, and extract tidy model output with broom::tidy()
. The result should match the output below:
readr::read_csv("data/solution_08_table.csv")
## Rows: 2 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): term
## dbl (4): estimate, std.error, statistic, p.value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 63.7 0.00457 13950. 0
## 2 t2m 0.0199 0.000660 30.2 1.08e-178
ps_mdl<-lm(data=nasa_power,formula = ps~t2m)
tidy(ps_mdl)
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 63.7 0.00457 13950. 0
## 2 t2m 0.0199 0.000660 30.2 1.08e-178
For your extension, consider joining the nasa_power
and precip
data once again to explore additional relationships. Alternatively, try fitting a different model based on your exploration of other variables in the nasa_power
dataset.
Write your extension in the following chunk:
nasa_power_mon<-nasa_power %>%
select(-lon,-lat) %>%
# mutate(date=ymd(paste(year,mm,dd))) %>%
group_by(year,mm) %>%
summarise_all(mean) %>%
mutate(dm=lubridate::days_in_month(ymd(paste(year,mm,1)))) %>%
mutate(prectotcorr=prectotcorr*dm)
nasa_power_ann<-nasa_power_mon %>%
group_by(year) %>%
summarise_all(mean) %>%
mutate(prectotcorr=12*prectotcorr) %>%
data.frame()
nasa_power_ann_adj<-nasa_power_ann %>%
select(-mm,-dd,-doy,-dm)
nasa_power_ann_adj %>%
pivot_longer(cols=names(nasa_power_ann_adj)[-1]) %>%
ggplot(data=.,aes(x=year,y=value,col=name))+
geom_smooth(method="lm")+
geom_point()+
facet_wrap(~name,scales="free_y")+
theme_bw()+
theme(legend.position = "none")+
labs(y="parameter")
## `geom_smooth()` using formula 'y ~ x'
nasa_power_ann_adj %>%
map(function(x)lm(data = .,formula = x~year) %>%
tidy() %>% slice(2)) %>%
do.call(rbind,.) %>%
mutate(term=names(nasa_power_ann_adj)) %>%
arrange(p.value) %>%
filter(term!="year") %>%
kbl() %>%
kable_styling(bootstrap_options = "striped",
full_width = F)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
ps | 0.0063773 | 0.0019236 | 3.3153165 | 0.0106138 |
tdew | 0.1306386 | 0.0579182 | 2.2555708 | 0.0540950 |
qv2m | 0.0344733 | 0.0154530 | 2.2308468 | 0.0562227 |
t2m_min | 0.0768864 | 0.0464948 | 1.6536552 | 0.1367964 |
t2m | 0.0624738 | 0.0435452 | 1.4346880 | 0.1892895 |
rh2m | 0.2410060 | 0.1798591 | 1.3399707 | 0.2170661 |
t2m_max | 0.0599993 | 0.0492043 | 1.2193907 | 0.2574250 |
wd2m | -0.6221411 | 0.5229836 | -1.1895995 | 0.2683092 |
ws2m | -0.0140380 | 0.0138477 | -1.0137425 | 0.3403807 |
prectotcorr | 8.3283030 | 9.6121569 | 0.8664344 | 0.4114831 |