Models and hypothesis tests

In this milestone, you’ll explore covariation between variables in nasa_power with ggplot2. Then, you’ll fit a linear model and extract tidy model output.

Recreation

Part one: Explore covariation

For the first part of this milestone, your task is to explore covariation between variables in the nasa_power dataset, generating at least two plots. You can visualize multiple relationships, or explore using different geoms to visualize the same one.

You can do your work in as many code chunks as you wish, but we’ve provided one to get you started:

library(tidyverse)
library(weathermetrics)
library(kableExtra)  
library(readxl)
library(skimr)
library(hydroTSM)
library(lubridate)
library(broom)

nasa_power<-data.table::fread(here::here("data","power.csv")) %>% 
  left_join(.,read_excel(here::here("data","precip.xlsx"))) %>% 
  mutate(tdew=humidity.to.dewpoint(rh = rh2m,
                                  t = t2m,
                                  temperature.metric = "celsius")) %>% 
  select(-site_id) %>% 
  data.frame()

skim(nasa_power)
Data summary
Name nasa_power
Number of rows 3653
Number of columns 16
_______________________
Column type frequency:
numeric 16
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
lon 0 1 -67.50 0.00 -67.50 -67.50 -67.50 -67.50 -67.50 ▁▁▇▁▁
lat 0 1 -26.50 0.00 -26.50 -26.50 -26.50 -26.50 -26.50 ▁▁▇▁▁
year 0 1 2016.50 2.87 2012.00 2014.00 2016.00 2019.00 2021.00 ▇▇▇▇▇
mm 0 1 6.52 3.45 1.00 4.00 7.00 10.00 12.00 ▇▅▅▅▇
dd 0 1 15.73 8.80 1.00 8.00 16.00 23.00 31.00 ▇▇▇▇▆
doy 0 1 183.15 105.47 1.00 92.00 183.00 274.00 366.00 ▇▇▇▇▇
ps 0 1 63.81 0.20 62.89 63.70 63.83 63.95 64.42 ▁▁▆▇▁
t2m 0 1 5.20 4.57 -9.48 1.43 5.25 9.15 15.58 ▁▃▇▇▃
t2m_max 0 1 14.69 5.04 -4.90 10.87 15.43 18.76 25.11 ▁▂▆▇▅
t2m_min 0 1 -2.39 4.01 -16.00 -4.99 -3.35 0.44 8.10 ▁▁▇▃▂
ws2m 0 1 3.15 1.68 0.70 1.91 2.67 3.97 12.42 ▇▅▁▁▁
wd2m 0 1 264.96 55.47 67.62 231.94 286.00 308.19 341.50 ▁▂▂▅▇
prectotcorr 0 1 1.59 5.04 0.00 0.00 0.00 1.03 176.81 ▇▁▁▁▁
qv2m 0 1 3.31 2.01 0.79 1.71 2.56 4.46 9.58 ▇▃▂▂▁
rh2m 0 1 40.24 14.35 10.69 28.50 38.19 50.69 81.19 ▃▇▆▃▁
tdew 0 1 -8.03 7.31 -23.15 -14.19 -9.39 -1.71 7.32 ▂▇▆▅▃
nasa_power %>% 
  dplyr::select(-lon,-lat) %>%
  cor() %>% 
  corrplot::corrplot(type = "lower",
                     method = "number")

nasa_power %>% 
  pivot_longer(data = .,cols = c("ws2m","wd2m")) %>% 
  mutate(mon=factor(month.abb[mm],levels =  month.abb,
                    ordered = T)) %>%
  ggplot(data=.,aes(x=mon,y=value))+
  geom_point()+
  facet_wrap(~name,scales="free_y")

nasa_power %>% 
  pivot_longer(data = .,cols = c("t2m","t2m_min","t2m_max",
                                 "ps","prectotcorr")) %>% 
  # mutate(mon=factor(month.abb[mm],levels =  month.abb,
  #                   ordered = T)) %>%
  ggplot(data=.,aes(x=qv2m,y=value))+
  geom_point(alpha=0.2,shape=21)+
  geom_smooth(method="lm",formula = y~poly(x,1))+
  facet_wrap(~name,scales="free_y")+
  theme_bw()

Part two: Fit a linear model

Let’s turn our attention to two variables in nasa_power: temperature (t2m) and surface pressure (ps). Run the chunk below to see a plot of these variables.

knitr::include_graphics("images/solution_08_plot.png")

First, recreate this plot showing temperature and surface pressure.

nasa_power %>% 
  ggplot(data=.,aes(x=t2m,y=ps))+
  geom_point()+
  geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'

Next, fit a linear model using these variables, and extract tidy model output with broom::tidy(). The result should match the output below:

readr::read_csv("data/solution_08_table.csv")
## Rows: 2 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): term
## dbl (4): estimate, std.error, statistic, p.value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  63.7     0.00457    13950.  0        
## 2 t2m           0.0199  0.000660      30.2 1.08e-178
ps_mdl<-lm(data=nasa_power,formula = ps~t2m)


tidy(ps_mdl)
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  63.7     0.00457    13950.  0        
## 2 t2m           0.0199  0.000660      30.2 1.08e-178

Extension

For your extension, consider joining the nasa_power and precip data once again to explore additional relationships. Alternatively, try fitting a different model based on your exploration of other variables in the nasa_power dataset.

Write your extension in the following chunk:

nasa_power_mon<-nasa_power %>% 
  select(-lon,-lat) %>% 
  # mutate(date=ymd(paste(year,mm,dd))) %>% 
  group_by(year,mm) %>% 
  summarise_all(mean) %>% 
  mutate(dm=lubridate::days_in_month(ymd(paste(year,mm,1)))) %>% 
  mutate(prectotcorr=prectotcorr*dm)

nasa_power_ann<-nasa_power_mon %>% 
  group_by(year) %>% 
  summarise_all(mean) %>% 
  mutate(prectotcorr=12*prectotcorr) %>% 
  data.frame() 

nasa_power_ann_adj<-nasa_power_ann %>% 
  select(-mm,-dd,-doy,-dm) 

nasa_power_ann_adj %>% 
  pivot_longer(cols=names(nasa_power_ann_adj)[-1]) %>% 
  ggplot(data=.,aes(x=year,y=value,col=name))+
  geom_smooth(method="lm")+
  geom_point()+
  facet_wrap(~name,scales="free_y")+
  theme_bw()+
  theme(legend.position = "none")+
  labs(y="parameter")
## `geom_smooth()` using formula 'y ~ x'

  nasa_power_ann_adj %>% 
  map(function(x)lm(data = .,formula = x~year) %>% 
        tidy() %>% slice(2)) %>% 
  do.call(rbind,.) %>% 
  mutate(term=names(nasa_power_ann_adj)) %>% 
  arrange(p.value) %>% 
  filter(term!="year") %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = "striped",
                full_width = F)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
term estimate std.error statistic p.value
ps 0.0063773 0.0019236 3.3153165 0.0106138
tdew 0.1306386 0.0579182 2.2555708 0.0540950
qv2m 0.0344733 0.0154530 2.2308468 0.0562227
t2m_min 0.0768864 0.0464948 1.6536552 0.1367964
t2m 0.0624738 0.0435452 1.4346880 0.1892895
rh2m 0.2410060 0.1798591 1.3399707 0.2170661
t2m_max 0.0599993 0.0492043 1.2193907 0.2574250
wd2m -0.6221411 0.5229836 -1.1895995 0.2683092
ws2m -0.0140380 0.0138477 -1.0137425 0.3403807
prectotcorr 8.3283030 9.6121569 0.8664344 0.4114831