Air Temperature in Alaska

a. Dataset Description

The dataset contains historic average Alaskan air temperatures across seasons in different regions of Alaska. In the dataset, there are temperatures of thirteen Alaskan regions, collected from 1900 to 2015, four times a year, one for each season. The datatset is composed by the following variables:

  • Region
  • RegionID
  • Temperature
  • Year
  • Month
  • Season

The following figure gives an idea of the general situation in terms of temperature for the Alaskan territory:

Alaskan Air Temperature.

Alaskan Air Temperature.

These data were derived from the “Historical Monthly Temperature - 1 km CRU TS” dataset, available at http://ckan.snap.uaf.edu/dataset/historical-monthly-temperature-1-km-cru-ts.

b. Installation of the dataone package

With the command “install.packages(”dataone“)”, we can install the package for the datasets available on the platform dataone. The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as “a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data”. Thus, we set the library both for this package and for others.

library(dataone)
library(ggplot2)
library(tidyverse)
library(rmarkdown)
library(memisc)

c. Case Study: Yukon Time Series of Air temperature 1900-2015

In our analysis, we are going to concentrate on the time series evolution of air temperatures for the region of Yukon. Yukon is one of the largest Alaskan regions and its territory is divided between Alaska and Canada.

Map of Yukon Region.

Map of Yukon Region.

Our varaibles of interest are the Region, the Year, the Month, the Season and the air temperature. More details about the dataset and the variables can be found on the web page http://ckan.snap.uaf.edu/dataset/historical-monthly-temperature-1-km-cru-ts.

d. Download of data

First of all, we use the following R script to search the ID of our data-package of interest. To find it in an easier way, we search the keywords “Alaskan air temperatures” in the abstract. Then R gives us a dataframe with a list of all dataone datasets that contain the specified keywords in the abstract.

d1c <- D1Client("PROD", "urn:node:KNB")
# Ask for the id, title and abstract
queryParams <- list(q="abstract:\"Alaskan air temperatures\"", fq="id:doi*", 
                    fq="formatType:METADATA", fl="id,title") 
result <- query(d1c@mn, solrQuery=queryParams, as="data.frame", parse=FALSE)
pid <- result[1,'id']
dataObj <- getDataObject(d1c, pid)
bytes <- getData(dataObj)
metadataXML <- rawToChar(bytes)
paged_table(result)

The dataone data-packages usually contain more than one dataset, thus, using the right ID, we can dowload a zip, which contains all the datasets, in cvs format, related to our data-package of interest; the following R commands displays us the direction of folder in which R has downloaded the datasets:

cn <- CNode()
mn <- getMNode(cn, "urn:node:KNB")
bagitFileName <- getPackage(mn, id="doi:10.5063/F1MK6B60")
bagitFileName
## [1] "C:\\Users\\loren\\AppData\\Local\\Temp\\Rtmp6XkNsj\\0972e0b8-0608-11ea-b4d2-d3b44de60042-5788733826e3.zip"

Then, using the “UnZip” command provided by the memisc package, we can unzip the downloaded file.

UnZip("0a55d2ae-fbce-11e9-882c-2dc8b3ab35e1-2ed055d159c4.zip","air_temp_seasonal_long.csv")

Finally, setting correctly the directory in the unzipped folder, we can download the selected csv:

ATSLfile <- "air_temp_seasonal_long.csv" 
ATSL <- read.csv(ATSLfile, sep=";", skip=2, dec=",", stringsAsFactors=FALSE)

We can rename the variable, using the following commands:

names(ATSL)[names(ATSL) == "X"] <- "Region"
names(ATSL)[names(ATSL) == "X.1"] <- "RegionID"
names(ATSL)[names(ATSL) == "X.2"] <- "Temperature"
names(ATSL)[names(ATSL) == "X.3"] <- "Year"
names(ATSL)[names(ATSL) == "X.4"] <- "Month"
names(ATSL)[names(ATSL) == "X.5"] <- "Season"
paged_table(ATSL)

e. Data Operations and Statistics

Regional Average Temperature

The following R script is used to create the mean of temperatures at the level of region. As we can see from the table, Yukon, our region of interest, is the third coldest region in Alska with an average temperature of -4.54 celsius degrees. The coldest region in Alaska is the Arctic, while the hottest is Kodiak.

MeanRegion <- aggregate(x = ATSL$Temperature, by = list(ATSL$Region), FUN = mean) 
names(MeanRegion)[names(MeanRegion) == "Group.1"] <- "Region"
names(MeanRegion)[names(MeanRegion) == "x"] <- "MeanTemperature"
paged_table(MeanRegion)

The following bar-chart represents the average temperature of Alaskan regions from a graphic perspective

ggplot(MeanRegion, aes(Region, MeanTemperature, fill = Region)) +
  geom_bar(stat="identity") + theme_minimal() + xlab("Region") + ylab("Temperature") + ggtitle("Region Average Air Temperature") + theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())

Yukon Case Study

With the following command, we create a subset of the initial dataset; more precisely, we are generating the subset for the time series analysis of the Yukon region, that is the one on which we are interesting in.

Yukon <- ATSL[which(ATSL$Region == "Yukon"),names(ATSL) %in% c("Region","RegionID","Temperature","Year","Month","Season")]
paged_table(Yukon)

With the following command we can show the general informations about the structure of the Yukon data-frame:

str(Yukon)
## 'data.frame':    461 obs. of  6 variables:
##  $ Region     : chr  "Yukon" "Yukon" "Yukon" "Yukon" ...
##  $ RegionID   : int  13 13 13 13 13 13 13 13 13 13 ...
##  $ Temperature: num  -23.44 -4.99 11.14 -4.95 -18.56 ...
##  $ Year       : int  1900 1901 1901 1901 1901 1902 1902 1902 1902 1903 ...
##  $ Month      : int  12 3 6 9 12 3 6 9 12 3 ...
##  $ Season     : chr  "winter" "spring" "summer" "fall" ...

Creation of a subset

Now we can create a subset of the Yukon data-frame, in particular we display only the season and the temperature for each observation:

YukonTempSeas <- Yukon[,c("Season", "Temperature")]
paged_table(YukonTempSeas)

Here we display another subset for the observations with negative temperatures:

YukonFilter <- filter(Yukon, Temperature < 0)
paged_table(YukonFilter)

Order of observations

We can show the observations for ascendent order of temperatures:

YukonOrderTemperature <- Yukon[order(Yukon$Temperature),]
paged_table(YukonOrderTemperature)

Alternatively, we can display observations for descendent order of temperatures:

YukonUnOrderTemperature <- Yukon[order(- Yukon$Temperature),]
paged_table(YukonUnOrderTemperature)

Statistics

Here we compute the mean of the temperature for the region of Yukon:

MeanTemperature <- mean(Yukon$Temperature)
print(MeanTemperature)
## [1] -4.539925

We are also interested about the minimum and the maximum level of temperature reached in Yukon between 1900 and 2015:

MinTemperature <- min(Yukon$Temperature)
print(MinTemperature)
## [1] -25.95014
MaxTemperature <- max(Yukon$Temperature)
print(MaxTemperature)
## [1] 14.19767

Thus, we compute the variance and the standard deviation of the temperature in Yukon:

VarTemperature <- var(Yukon$Temperature)
print(VarTemperature)
## [1] 130.3119
SdTemperature <- sd(Yukon$Temperature)
print(SdTemperature)
## [1] 11.41542

Finally, using the mean and the standard deviation of temperatures, we can also compute manually, the confidence intervals:

n <- length(Yukon$Temperature)
error <- qnorm(0.975)* SdTemperature /sqrt(n)
Left <- MeanTemperature - error
Right <- MeanTemperature + error
print(Left)
## [1] -5.581977
print(Right)
## [1] -3.497872

Graphical Analysis

We display the graph of the overall set of temperatures for Yukon, from 1900 to 2015:

ggplot(Yukon, aes(Yukon$Year, Yukon$Temperature, col='red')) + geom_point(aes(Yukon$Year, Yukon$Temperature), col = "red")+ xlab("Year") + ylab("Temperature") + ggtitle("Yukon Air Temperature")

Now we concentrate on the seasonal variation, computing the graph for each specific season and analysing the related time series graphs. As we can denote, for each season we have an increase in the level of temperatures from 1900 to 2015:

YukonW <- Yukon[which(Yukon$Season == "winter"),names(Yukon) %in% c("Year","Month","Season","Temperature")]
ggplot(YukonW, aes(YukonW$Year, YukonW$Temperature, col='blue'))+ xlab("Year") + ylab("Temperature") + ggtitle("Winter Yukon Air Temperature") + stat_smooth(colour='blue', span=0.2)

YukonSpr <- Yukon[which(Yukon$Season == "spring"),names(Yukon) %in% c("Year","Month","Season","Temperature")]
ggplot(YukonSpr, aes(YukonSpr$Year, YukonSpr$Temperature, col='blue'))+ xlab("Year") + ylab("Temperature") + ggtitle("Spring Yukon Air Temperature") + stat_smooth(colour='green', span=0.2)

YukonS <- Yukon[which(Yukon$Season == "summer"),names(Yukon) %in% c("Year","Month","Season","Temperature")]
ggplot(YukonS, aes(YukonS$Year, YukonS$Temperature, col='blue'))+ xlab("Year") + ylab("Temperature") + ggtitle("Summer Yukon Air Temperature") + stat_smooth(colour='red', span=0.2)

YukonA <- Yukon[which(Yukon$Season == "fall"),names(Yukon) %in% c("Year","Month","Season","Temperature")]
ggplot(YukonA, aes(YukonA$Year, YukonA$Temperature, col='blue'))+ xlab("Year") + ylab("Temperature") + ggtitle("Autumn Yukon Air Temperature") + stat_smooth(colour='orange', span=0.2)

f. Native Data Format

The native data format of the dataset is “long”, in fact each line is composed by a set of observations, which inlclude the region, the year, the month and the season.

g. Native Data Format Transformation

With the following procedure we can transform our tall dataset in a spread version; Here we have selected a subset of our dataset, concentrating on observations for the period between 1910 and 1920:

Yukon1920 <- Yukon[which(Yukon$Year < 1920 & Yukon$Year > 1910), names(Yukon) %in% c("Year","Season","Temperature")]
paged_table(Yukon1920)
Yukon1920Wide <- spread(Yukon1920, key = Season, value = Temperature)
paged_table(Yukon1920Wide)

Health Spending in OECD Countries

a. Dataset Description

Our dataset describes Helath spending composition of a set of OECD countries. Health spending measures the final consumption of health care goods and services, including personal health care (curative care, rehabilitative care, long-term care, ancillary services and medical goods) and collective services (prevention and public health services as well as health administration), but excluding spending on investments. Health care is financed through a mix of financing arrangements including:

  • Government spending and compulsory health insurance
  • Voluntary health insurance and private funds

Our dataset contains data on:

  • Country: Set of OECD country
  • Expenditure Type: Voluntary, Compulsory and Out-of-Pocket
  • Year: 2018
  • Value: Value for health spending measured as per capita share of total health spending and in USD (using economy-wide PPPs)

More information can be found on the OECD we site: https://data.oecd.org/healthres/health-spending.htm

b. Installation of the OECD package

With the command “install.packages(”OECD“)”, we can install the package for the datasets available on the platform OECD. The OECD R package enables R scripts to search, download and upload data and metadata from/to the OECD (Organisation for Economic Co-operation and Development). Thus, we set the library both for this package and for others.

library(OECD)
library(ggplot2)
library(tidyverse)
library(rmarkdown)
library(dplyr)
library(corrplot)

c. Case Study: Health Spending Composition of OECD Countries

Our analysis is related to the differences in composition of health spending among OECD countries. We will understand which countries depend more on compulsory rather than voluntary expenditure and viceversa. The OECD countries included in our analysis are highlighted in the following map:

OECD Countries.

OECD Countries.

The variables of interest are the country, the expenditure type (voluntary, compulsory or out of pocket) and the value of health spending. More detailes can be found on https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance_19991312

d. Download of data

First of all, we search fot the ID of our dataset of interest, using the following command; R gives us a list of all datasets related to the theme “Health”

search_dataset("Health", data = get_datasets(), ignore.case = TRUE)
## # A tibble: 11 x 2
##    id          title                                                       
##    <fct>       <fct>                                                       
##  1 HEALTH_STAT Health Status                                               
##  2 HEALTH_REAC Health Care Resources                                       
##  3 HEALTH_PROC Health Care Utilisation                                     
##  4 HEALTH_HCQI Health Care Quality Indicators                              
##  5 HEALTH_LVNG Non-Medical Determinants of Health                          
##  6 SHA         Health expenditure and financing                            
##  7 EBDAG       Expenditure by disease, age and gender under the System of ~
##  8 HEALTH_WFMI Health Workforce Migration                                  
##  9 SHA_FP      Input costs for health care provision                       
## 10 SHA_FS      Revenues of health care financing schemes                   
## 11 SHA_HK      Gross fixed capital formation in the health care system

Using the right ID and adding the filters that are needed to obtain our data of interest, R directly dowload the dataframe in its environment; then, we rename variables and the names of “ExpenditureType”. The filter can be found on the OECD Statistics web-sites (https://stats.oecd.org/):

dataOrigin <- get_dataset("SHA", filter = "HF1+HF2+HF3.HCTOT.HPTOT.PPPPER.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+NMEC+BRA+CHN+COL+CRI+IND+IDN+RUS+ZAF", start_time = 2018, end_time = 2018)

OECD_Health_Exp <- select(dataOrigin, -HC, -HP, -MEASURE, -TIME_FORMAT, -POWERCODE, -OBS_STATUS)

names(OECD_Health_Exp)[names(OECD_Health_Exp) == "HF"] <- "ExpenditureType"
names(OECD_Health_Exp)[names(OECD_Health_Exp) == "LOCATION"] <- "Country"
names(OECD_Health_Exp)[names(OECD_Health_Exp) == "obsTime"] <- "Year"
names(OECD_Health_Exp)[names(OECD_Health_Exp) == "obsValue"] <- "Value"

OECD_Health_Exp$ExpenditureType[OECD_Health_Exp$ExpenditureType == "HF2"] <- "Voluntary"
OECD_Health_Exp$ExpenditureType[OECD_Health_Exp$ExpenditureType == "HF1"] <- "Compulsory"
OECD_Health_Exp$ExpenditureType[OECD_Health_Exp$ExpenditureType == "HF3"] <- "Out-Of-Pocket"
paged_table(OECD_Health_Exp)

With the following command we can show the general informations about the structure of the Yukon data-frame:

str(OECD_Health_Exp)
## 'data.frame':    68 obs. of  5 variables:
##  $ ExpenditureType: chr  "Compulsory" "Compulsory" "Voluntary" "Voluntary" ...
##  $ Country        : chr  "GRC" "NOR" "NOR" "CAN" ...
##  $ Year           : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
##  $ UNIT           : chr  "USD" "USD" "USD" "USD" ...
##  $ Value          : num  1348.8 5288.8 21.4 759.1 749.3 ...

e. Data Operations and Statistics

Creation of a subset

Now we can create a subset, limiting our analysis on country and value of health expenditures:

Country_Value <- select(OECD_Health_Exp, Country, Value)
paged_table(Country_Value)

We display the entire dataframe without the year:

No_Year <- select(OECD_Health_Exp, -Year)
paged_table(No_Year)

Order of observations

We order the variables for descendent order of value of health spending:

Order_Value <- OECD_Health_Exp[order(OECD_Health_Exp$Value , decreasing = TRUE),]
paged_table(Order_Value)

Graphical Analysis

Here we plot a pie-chart, representing the different percentages of the composition of health expenditure for a subset of countries, namely Switzerland, Italy and Korea:

Canada

Canada <- OECD_Health_Exp[which(OECD_Health_Exp$Country == "CAN"), names(OECD_Health_Exp) %in% c("ExpenditureType", "Value")]
paged_table(Canada)
CanadaPercentage <- Canada$Value / sum(Canada$Value)*100
expenditureType <- c("Voluntary", "Out-of-Pocket", "Compulsory")

CanadaPerc <- data.frame(ExpenditureType=expenditureType, Value=CanadaPercentage)
paged_table(CanadaPerc)
ggplot(CanadaPerc, aes(x = "", y = Value, fill = ExpenditureType)) +  coord_polar("y", start=0) + geom_bar(width = 1, stat = "identity", color = "white") + theme_minimal() + theme_void() + ggtitle("Health Spending Composition in Canada")

Italy

Italy <- OECD_Health_Exp[which(OECD_Health_Exp$Country == "ITA"), names(OECD_Health_Exp) %in% c("ExpenditureType", "Value")]
paged_table(Italy)
ItalyPercentage <- Italy$Value / sum(Italy$Value)*100

ItaPerc <- data.frame(ExpenditureType=expenditureType, Value=ItalyPercentage)
paged_table(ItaPerc)
ggplot(ItaPerc, aes(x = "", y = Value, fill = ExpenditureType)) +  coord_polar("y", start=0) + geom_bar(width = 1, stat = "identity", color = "white") + theme_minimal() + theme_void() + ggtitle("Health Spending Composition in Italy")

Germany

Germany <- OECD_Health_Exp[which(OECD_Health_Exp$Country == "DEU"), names(OECD_Health_Exp) %in% c("ExpenditureType", "Value")]
paged_table(Germany)
GermanyPercentage <- Germany$Value / sum(Germany$Value)*100
expenditureType2 <- c("Voluntary", "Compulsory","Out-of-Pocket" )

GermanyPerc <- data.frame(ExpenditureType=expenditureType2, Value=GermanyPercentage)
paged_table(GermanyPerc)
ggplot(GermanyPerc, aes(x = "", y = Value, fill = ExpenditureType)) +  coord_polar("y", start=0) + geom_bar(width = 1, stat = "identity", color = "white") + theme_minimal() + theme_void() + ggtitle("Health Spending Composition in Germany")

Finally, we create a histogram for the health spending composition of a subset of countries: Korea, Italy, Germany, Japan, Canada, Netherlands, New Zeland and Sweden:

OECD_Subset <- OECD_Health_Exp[which(OECD_Health_Exp$Country == "ITA" | OECD_Health_Exp$Country == "KOR" | OECD_Health_Exp$Country == "JPN" | OECD_Health_Exp$Country == "CAN" | OECD_Health_Exp$Country == "NLD" | OECD_Health_Exp$Country == "NZL" | OECD_Health_Exp$Country == "DEU" | OECD_Health_Exp$Country == "SWE"), names(OECD_Health_Exp) %in% c("Country", "ExpenditureType", "Value")]
paged_table(OECD_Subset)
ggplot(OECD_Subset,aes(x=Country, y=Value, fill=ExpenditureType)) + geom_bar(stat="identity", position=position_dodge()) + ggtitle("Health Spending Composition in OECD Countries")

f. Native Data Format

The native data format of the dataset is “long”, in fact each line is composed by a set of observations, which inlclude the country, the year, the type of expenditure (volntary or compulsory) and the respective value.

g. Native Data Format Transformation

With the following procedure we can transform our tall dataset in a spread version:

OECDHEWide <- spread(OECD_Health_Exp, key = ExpenditureType, value = Value)
paged_table(OECDHEWide)

To compare it with the original version, we display also the tall one:

paged_table(OECD_Health_Exp)

e.1 Statistics

We show the statistics at this stage, since we use the spread version of the dataset for their computation.

We create a version of the dataset without missing values, avoiding errors of computation:

OECDWide_NoNA <- na.omit(OECDHEWide)

We compute the covariance between Voluntary and Compulsory expenditure:

Covariance <- cov(OECDWide_NoNA$Voluntary, OECDWide_NoNA$Compulsory)
print(Covariance)
## [1] 35680.47

Here we have the correlation for the different types of health expenditures:

OECDCorr <- select(OECDWide_NoNA, - Year, - Country, -UNIT)
OECDWide_cor = cor(OECDCorr, method = c("spearman"))
print(OECDWide_cor)
##               Compulsory Out-Of-Pocket  Voluntary
## Compulsory     1.0000000     0.3176471  0.1970588
## Out-Of-Pocket  0.3176471     1.0000000 -0.2941176
## Voluntary      0.1970588    -0.2941176  1.0000000
corrplot(OECDWide_cor)

Finally, we calculate the mean for the compulsory health expenditures for our set of OECD countries:

OECDMean <- mean(OECDWide_NoNA$Compulsory)
print(OECDMean)
## [1] 2807.921