Alright we have got our final concentration driven results!!

Whenever possible we used all of the avaiable temperature data and the avaiable future heat flux data. However when future heat flux data was unavaiable we used the cmip range for the heat flux as a an aproximation for heat flux. This notebook walks through our results.

# Load required packages. 
library(tidyr)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(ggrepel)
library(hectorcal)
library(GGally)
# Set up the directories. 
PROJECT_DIR <- "/Users/dorh012/Documents/2019/hectorcal"
INPUT_DIR   <- file.path(PROJECT_DIR, 'analysis', 'best_fit')
OUTPUT_DIR  <- file.path(PROJECT_DIR, 'analysis', 'best_fit', 'final_conc')
# Import the parameter data frame.
list.files(OUTPUT_DIR, pattern = '.rds', full.names = TRUE) %>%
    lapply(function(input){
        object <- readRDS(input)
        if(!is.null(object) & object$optim_rslt$convergence == 0 ){
            cbind(model = gsub('conc_|.rds', '', basename(input)),
                  data.frame(matrix(object$optim_rslt$par, nrow = 1, dimnames = list(NULL, names(object$optim_rslt$par))),
                             stringsAsFactors = FALSE),
                  min = object$optim_rslt$value) %>%
                mutate(method = if_else('comparison_plot_range' %in% names(object), 'heatflux range', 'model heatflux'))
        }
    }) %>%
    bind_rows() ->
    best_fit_params
# Save a list of all the rds files 
rds_list <- list.files(OUTPUT_DIR, pattern = '.rds', full.names = TRUE)
my_param_v_param_plot <- function(data, x, y){
    
    xmid <- 0.5*(min(data[[x]]) + max(data[[x]]))
    xrng <- 1.5*(max(data[[x]]) - min(data[[x]]))
    xlo <- xmid - 0.5*xrng
    xhi <- xmid + 0.5*xrng
    
    ggplot(data = data) +
        geom_label(aes(x = data[[x]], y = data[[y]], label = model, color = method)) +
       # geom_point(aes(x = data[[x]], y = data[[y]],  color = method)) + 
       # geom_text_repel(aes(x = data[[x]], y = data[[y]], label = model)) + 
        ggplot2::xlim(c(xlo,xhi)) + 
        labs(x = x, 
             y = y) + 
        theme_bw(base_size = 14)
}

Overview of the Runs

How many best fits convereged?

nrow(best_fit_params)
[1] 30

How many best fits converged for the two different methods?

best_fit_params %>%
    group_by(method) %>%
    summarise('model count' = n()) %>%  
     kable(digits = 2)  %>%
    kable_styling(full_width = F) 
method model count
heatflux range 11
model heatflux 19

Well I think that is pretty neat that more of our fits use their own heat flux data than the range


Parameters

Summary table of the parameters and dot plots.

best_fit_params %>%
    gather(param, value, S, diff, alpha, volscl) %>%
    select(param, method, value) %>%
    group_by(method, param) %>%
    summarise(min = min(value),
              max = max(value),
              mean = mean(value)) %>%
    arrange(param, method) %>%
    ungroup %>%
    select(param, method, min, max, mean) %>%
    kable(digits = 2)  %>%
    kable_styling(full_width = F) %>%
    column_spec(1, bold = T) %>%
    collapse_rows(columns = 1:2, valign = "top")
param method min max mean
alpha heatflux range 0.46 2.18 1.57
model heatflux 0.13 2.41 1.35
diff heatflux range 2.74 18.34 7.14
model heatflux 1.00 9.29 4.66
S heatflux range 2.59 5.58 4.03
model heatflux 2.36 7.41 3.85
volscl heatflux range 0.48 2.00 1.35
model heatflux -0.38 1.84 0.94


ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = S, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()

ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = diff, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()

ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = alpha, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()

ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = volscl, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()

What I take a way from this is that there is no obvious pattern in the parameter values with respect to the best fit method we used. Which is great, it means that the range method is not bad in a sense that it provides too little or too much information that the methodology is skeewing the best fits some how.


Param vs Param Plots

ggpairs(best_fit_params[ , c('S', 'diff', 'alpha', 'volscl')])

Looks like there are some outliers for \(S\) and \(\kappa\).


If we look at the full range of the the S and diff paramter values, notice that MIROC and CISRO-MK3L-1-2 are outliers. I wonder why that is the case, could it be related to the comparison data that was used?

my_param_v_param_plot(data = best_fit_params, x = 'S', y = 'diff')

When we exclude the outlier models we get a better picture of a relationship between S and diff. Excluding MPI-ESM-MR and EC-EARTH we see a negative correlation between \(\kappa\) and \(S\).

What is up with the models that are outliers?

What scenarios are they using? Are they missing data? Is it telling us something about the model?

CSIRO-Mk3L-1-2 has high \(\diff\) but otherwise reasonable S.

object <- readRDS(rds_list[grepl(pattern = 'CSIRO-Mk3L-1-2', x = rds_list)])
object$comparison_plot

object$comparison_plot_range

Well it looks like CSIRO-Mk3L-1-2 has comparison data for historical and future temperature. Hector underestimates the ESM temperature but alos it looks like the the temperature is ESM temp is pretty warm in 2000. The hector heat fux falls on the upper end of the rcp45 cmip heat flux range. May be the high \(\kappa\) is required to warm Hector that rapidly between 1950 to 2050.


MIROC-ESM has high S but otherwise reasonable \(\kappa\).

object <- readRDS(rds_list[grepl(pattern = 'MIROC-ESM.rds', x = rds_list)])
object$comparison_plot

Once again it looks like the ESM is has a higher temp at 2000 than I was expecting.


MIROC-ESM-CHEM has high S but otherwise reasonable \(\kappa\).

object <- readRDS(rds_list[grepl(pattern = 'MIROC-ESM-CHEM', x = rds_list)])
object$comparison_plot

Also looks pretty warm in 2000.


So I thought that if I looked at a comparison of the temperature from 1950 - 2000 vs S I might be able to see a pattern or something.

cmip_individual %>% 
    filter(model %in% best_fit_params$model & variable == 'tas') %>% 
    rename(temp = value) %>% 
    left_join(best_fit_params, by = 'model') %>% 
    filter(year %in% 1950:2005) %>%
    group_by(model, S, diff) %>% 
    summarise(temp = mean(temp)) %>% 
    ungroup %>%
    mutate(outlier = if_else(model %in% c('MIROC-ESM-CHEM', 'MIROC-ESM', 'CSIRO-Mk3L-1-2'), TRUE, FALSE)) %>% 
    ggplot(aes(temp, S, label = model, color = outlier)) + 
    geom_label() + 
    theme_bw() + 
    labs(x = 'mean temp from 1950 - 2005 in deg C')

NA

But it looks like our models with abnormal S and \(\kappa\) values (blue print) fall withtin in the middle of the temperature range. So that is still puzzeling anyways I degress if we exclude these outliers what does our \(S\) vs \(\kappa\) plot look like now?


my_param_v_param_plot(data = best_fit_params, x = 'S', y = 'diff') + 
    coord_cartesian(xlim = c(2, 6), 
                    ylim = c(0, 10))


my_param_v_param_plot(data = best_fit_params, x = 'volscl', y = 'S') + 
    coord_cartesian(ylim = c(2, 6))

Once again I have excluded the larger S values to make patterns more obvious. I wonder what the models with the negative volscl fits look like. What does the output data look like for the models that have negative values for \(volscl\) .

object <- readRDS(rds_list[grepl(pattern = 'CMCC-CM.rds', x = rds_list)])
object$comparison_plot

object <- readRDS(rds_list[grepl(pattern = 'CMCC-CMS', x = rds_list)])
object$comparison_plot

Nothing obvious jumps out at me.

Concluding Thoughts

---
title: "Final Concentration Best Fit Results"
output: html_notebook
code_folding: "hide"
---

Alright we have got our final concentration driven results!! 

Whenever possible we used all of the avaiable temperature data and the avaiable future heat flux data. However when future heat flux data was unavaiable we used the cmip range for the heat flux as a an aproximation for heat flux. This notebook walks through our results. 



```{r, warning=FALSE}
# Load required packages. 
library(tidyr)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(ggrepel)
library(hectorcal)
library(GGally)

# Set up the directories. 
PROJECT_DIR <- "/Users/dorh012/Documents/2019/hectorcal"
INPUT_DIR   <- file.path(PROJECT_DIR, 'analysis', 'best_fit')
OUTPUT_DIR  <- file.path(PROJECT_DIR, 'analysis', 'best_fit', 'final_conc')

# Import the parameter data frame.
list.files(OUTPUT_DIR, pattern = '.rds', full.names = TRUE) %>%
    lapply(function(input){

        object <- readRDS(input)

        if(!is.null(object) & object$optim_rslt$convergence == 0 ){

            cbind(model = gsub('conc_|.rds', '', basename(input)),
                  data.frame(matrix(object$optim_rslt$par, nrow = 1, dimnames = list(NULL, names(object$optim_rslt$par))),
                             stringsAsFactors = FALSE),
                  min = object$optim_rslt$value) %>%
                mutate(method = if_else('comparison_plot_range' %in% names(object), 'heatflux range', 'model heatflux'))

        }

    }) %>%
    bind_rows() ->
    best_fit_params

# Save a list of all the rds files 
rds_list <- list.files(OUTPUT_DIR, pattern = '.rds', full.names = TRUE)

my_param_v_param_plot <- function(data, x, y){
    
    xmid <- 0.5*(min(data[[x]]) + max(data[[x]]))
    xrng <- 1.5*(max(data[[x]]) - min(data[[x]]))
    xlo <- xmid - 0.5*xrng
    xhi <- xmid + 0.5*xrng
    
    ggplot(data = data) +
        geom_label(aes(x = data[[x]], y = data[[y]], label = model, color = method)) +
       # geom_point(aes(x = data[[x]], y = data[[y]],  color = method)) + 
       # geom_text_repel(aes(x = data[[x]], y = data[[y]], label = model)) + 
        ggplot2::xlim(c(xlo,xhi)) + 
        labs(x = x, 
             y = y) + 
        theme_bw(base_size = 14)
}



```


## Overview of the Runs 

How many best fits convereged? 
```{r}
nrow(best_fit_params)
```


How many best fits converged for the two different methods? 
```{r}
best_fit_params %>%
    group_by(method) %>%
    summarise('model count' = n()) %>%  
     kable(digits = 2)  %>%
    kable_styling(full_width = F) 
```

Well I think that is pretty neat that more of our fits use their own heat flux data than the range

*****


## Parameters 


### Summary table of the parameters and dot plots. 

```{r}
best_fit_params %>%
    gather(param, value, S, diff, alpha, volscl) %>%
    select(param, method, value) %>%
    group_by(method, param) %>%
    summarise(min = min(value),
              max = max(value),
              mean = mean(value)) %>%
    arrange(param, method) %>%
    ungroup %>%
    select(param, method, min, max, mean) %>%
    kable(digits = 2)  %>%
    kable_styling(full_width = F) %>%
    column_spec(1, bold = T) %>%
    collapse_rows(columns = 1:2, valign = "top")
```

<br>



```{r}
ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = S, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()
```

```{r}
ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = diff, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()
```

```{r}
ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = alpha, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()
```

```{r}
ggplot(data = best_fit_params) + 
    geom_dotplot(aes(x = volscl, fill = method), stackgroups = TRUE, binpositions = "all") + 
    theme_bw()
```


What I take a way from this is that there is no obvious pattern in the parameter values with respect to the best fit method we used. Which is great, it means that the range method is not bad in a sense that it provides too little or too much information that the methodology is skeewing the best fits some how. 



<br>

### Param vs Param Plots

```{r}

ggpairs(best_fit_params[ , c('S', 'diff', 'alpha', 'volscl')])
```

Looks like there are some outliers for $S$ and $\kappa$. 

<br>

If we look at the full range of the the S and diff paramter values, notice that MIROC and CISRO-MK3L-1-2 are outliers. I wonder why that is the case, could it be related to the comparison data that was used? 

```{r}

my_param_v_param_plot(data = best_fit_params, x = 'S', y = 'diff')

```

When we exclude the outlier models we get a better picture of a relationship between S and diff. Excluding MPI-ESM-MR and EC-EARTH we see a negative correlation between $\kappa$ and $S$.

### What is up with the models that are outliers? 

What scenarios are they using? Are they missing data? Is it telling us something about the model? 

`CSIRO-Mk3L-1-2` has high $\diff$ but otherwise reasonable S. 

```{r, warning = FALSE}
object <- readRDS(rds_list[grepl(pattern = 'CSIRO-Mk3L-1-2', x = rds_list)])

object$comparison_plot
object$comparison_plot_range
```


Well it looks like CSIRO-Mk3L-1-2 has comparison data for historical and future temperature. Hector underestimates the ESM temperature but alos it looks like the the temperature is ESM temp is pretty warm in 2000. The hector heat fux falls on the upper end of the rcp45 cmip heat flux range. May be the high $\kappa$ is required to warm Hector that rapidly between 1950 to 2050.

<br>


`MIROC-ESM` has high S but otherwise reasonable $\kappa$. 

```{r, warning = FALSE}
object <- readRDS(rds_list[grepl(pattern = 'MIROC-ESM.rds', x = rds_list)])

object$comparison_plot
```

Once again it looks like the ESM is has a higher temp at 2000 than I was expecting. 

<br>


`MIROC-ESM-CHEM` has high S but otherwise reasonable $\kappa$. 

```{r, warning = FALSE}
object <- readRDS(rds_list[grepl(pattern = 'MIROC-ESM-CHEM', x = rds_list)])

object$comparison_plot
```


Also looks pretty warm in 2000. 

<br>
 
 So I thought that if I looked at a comparison of the temperature from 1950 - 2000 vs S I might be able to see a pattern or something. 
 
```{r}
cmip_individual %>% 
    filter(model %in% best_fit_params$model & variable == 'tas') %>% 
    rename(temp = value) %>% 
    left_join(best_fit_params, by = 'model') %>% 
    filter(year %in% 1950:2005) %>%
    group_by(model, S, diff) %>% 
    summarise(temp = mean(temp)) %>% 
    ungroup %>%
    mutate(outlier = if_else(model %in% c('MIROC-ESM-CHEM', 'MIROC-ESM', 'CSIRO-Mk3L-1-2'), TRUE, FALSE)) %>% 
    ggplot(aes(temp, S, label = model, color = outlier)) + 
    geom_label() + 
    theme_bw() + 
    labs(x = 'mean temp from 1950 - 2005 in deg C')
 
```
 
 But it looks like our models with abnormal S and $\kappa$ values (blue print) fall withtin in the middle of the temperature range. So that is still puzzeling anyways I degress if we exclude these outliers what does our $S$ vs $\kappa$ plot look like now? 
 
 
<br>
 
 
```{r}
my_param_v_param_plot(data = best_fit_params, x = 'S', y = 'diff') + 
    coord_cartesian(xlim = c(2, 6), 
                    ylim = c(0, 10))

```

<br>

```{r}
my_param_v_param_plot(data = best_fit_params, x = 'volscl', y = 'S') + 
    coord_cartesian(ylim = c(2, 6))
```

Once again I have excluded the larger S values to make patterns more obvious. I wonder what the models with the negative volscl fits look like. What does the output data look like for the models that have negative values for $volscl$ .



```{r}
object <- readRDS(rds_list[grepl(pattern = 'CMCC-CM.rds', x = rds_list)])
object$comparison_plot
object <- readRDS(rds_list[grepl(pattern = 'CMCC-CMS', x = rds_list)])
object$comparison_plot
```

Nothing obvious jumps out at me. 


## Concluding Thoughts 

* I think that there are some patterns may I dare say clusters in the the $S$ and $\kappa$ plots that I think could reveal insight into the ESMs that Hector is calibrating. 
* It is unclear to me why there are the negative values for $\alpha$


