Preparations:
getwd()
## [1] "C:/Users/laure/Desktop"
setwd("C:/Users/laure/Desktop")
getwd()
## [1] "C:/Users/laure/Desktop"
rm(list=ls())
Sys.setenv(LANG = "en")
The European Statistical Office, known as Eurostat is a Directorate-General of the European Commission located in Luxembourg.
Its main responsibilities are to provide statistical information to the institutions of the European Union (EU) and to promote the harmonisation of statistical methods across its member states as well as EFTA countries. Its mission is to provide high quality statistics that enable comparisons between countries and regions.
The Eurostat statistical work is spread into a great variety of subjects such as Economy and Finance, Population and social conditions, Science and Technology and Industry and Agriculture.
Its statistics work and its statistical databases are accessible to the public.
Sources: https://ec.europa.eu/eurostat/home? https://es.wikipedia.org/wiki/Eurostat
#install.packages("eurostat", repos = "https://ec.europa.eu/")
# syntax to install the "eurostat" package
library(eurostat)
# load the installed package
Data set chosen: Life Expectancy by age and sex: “demo_mlexpec”
Life expectancy is a statistical measure of the average time an organism is expected to live, based on the year of its birth, its current age and other demographic factors including gender.
Source of the “demo_mlexpec” data set: https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=demo_mlexpec&lang=en
We base our analysis in comparing the life expectancy for men in different Europe Regions by Generations:
Europe Regions: Scandinavia, CentralEU, EasternEU, Mediterranean
Scandinavia is composed by Denmark (DK). Finnland (FI), Norway (NO), Sweden (SE)
CentralEU is composed by Swtizerland (CH), Germany (DE), Belgium (BE), Netherlands (NL), Luxemburg (LU), France (FR), Austria (AT)
EasternEU is composed by Slovenia (SI), Bulgaria (BG), Hungary (HU), Poland (PO), Ukrania (UA), Czech Republic (CZ)
Mediterranen is composed by Italy (IT), Spain (ES), Portugal (PT), Greece (GR)
Generations: Silent, Boomers, Generation X, Millenials, Generation Z
We choose the year “2017” four our analysis, that means, the life expectancy from that year on.
Let’s download our data “demo_mlexpec” from the eurostats package and assign it to “LifeExp” which means “Life Expectancy”
LifeExp <- get_eurostat("demo_mlexpec")
## Table demo_mlexpec cached at C:\Users\laure\AppData\Local\Temp\RtmpwjGuoe/eurostat/demo_mlexpec_date_code_TF.rds
Let’s take a look at our data and open it in a table
LifeExp
## # A tibble: 434,988 x 6
## unit sex age geo time values
## <fct> <fct> <fct> <fct> <date> <dbl>
## 1 YR F Y1 AL 2017-01-01 79.7
## 2 YR F Y1 AM 2017-01-01 78.5
## 3 YR F Y1 AT 2017-01-01 83.2
## 4 YR F Y1 AZ 2017-01-01 77.7
## 5 YR F Y1 BE 2017-01-01 83.2
## 6 YR F Y1 BG 2017-01-01 77.9
## 7 YR F Y1 BY 2017-01-01 78.6
## 8 YR F Y1 CH 2017-01-01 84.9
## 9 YR F Y1 CY 2017-01-01 83.3
## 10 YR F Y1 CZ 2017-01-01 81.2
## # ... with 434,978 more rows
View(LifeExp)
Let’s take a closer look at the variables
summary(LifeExp)
## unit sex age geo time
## YR:434988 F:145168 Y1 : 5058 BE : 14964 Min. :1960-01-01
## M:144910 Y10 : 5058 BG : 14964 1st Qu.:1985-01-01
## T:144910 Y11 : 5058 CH : 14964 Median :1999-01-01
## Y12 : 5058 CZ : 14964 Mean :1996-01-24
## Y13 : 5058 DE_TOT : 14964 3rd Qu.:2009-01-01
## Y14 : 5058 EE : 14964 Max. :2017-01-01
## (Other):404640 (Other):345204
## values
## Min. : 1.50
## 1st Qu.:17.30
## Median :35.60
## Mean :37.32
## 3rd Qu.:56.10
## Max. :86.80
## NA's :258
str(LifeExp)
## Classes 'tbl_df', 'tbl' and 'data.frame': 434988 obs. of 6 variables:
## $ unit : Factor w/ 1 level "YR": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 3 levels "F","M","T": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : Factor w/ 86 levels "Y1","Y10","Y11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ geo : Factor w/ 55 levels "AL","AM","AT",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ time : Date, format: "2017-01-01" "2017-01-01" ...
## $ values: num 79.7 78.5 83.2 77.7 83.2 77.9 78.6 84.9 83.3 81.2 ...
LifeExp$age[1]
## [1] Y1
## 86 Levels: Y1 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17 Y18 Y19 Y2 Y20 Y21 Y22 ... Y_LT1
Analysis conclusion is that our choosen data set contains the following variables:
We choose this data set as an interesting example, because our R Mini Project consists of people from different european countries, so we want to figure out who has the longest life expectancy.
#Vignette
browseVignettes(package = "eurostat")
## starting httpd help server ... done
Subsetting idea:
Subset by Gender (M) and Date (2017):
LifeExp_M <- subset(LifeExp, sex == "M" & time == "2017-01-01")
# We assign this subset to "LifeExp_M" which means the life expectancy only for males for the date 01-01-2017
LifeExp_M
## # A tibble: 4,300 x 6
## unit sex age geo time values
## <fct> <fct> <fct> <fct> <date> <dbl>
## 1 YR M Y1 AL 2017-01-01 76.7
## 2 YR M Y1 AM 2017-01-01 72.1
## 3 YR M Y1 AT 2017-01-01 78.7
## 4 YR M Y1 AZ 2017-01-01 73.1
## 5 YR M Y1 BE 2017-01-01 78.5
## 6 YR M Y1 BG 2017-01-01 70.9
## 7 YR M Y1 BY 2017-01-01 68.5
## 8 YR M Y1 CH 2017-01-01 80.9
## 9 YR M Y1 CY 2017-01-01 79.3
## 10 YR M Y1 CZ 2017-01-01 75.3
## # ... with 4,290 more rows
View(LifeExp_M)
Subset by Generations (age)
Source: https://www.pewresearch.org/topics/generations-and-age/
GenZ <- c("Y_LT1","Y1","Y2","Y3","Y4","Y5","Y6","Y7","Y8","Y9","Y10","Y11","Y12","Y13","Y14","Y15","Y16","Y17","Y18","Y19")
# Generation Z "GenZ" is defined as people from the age of less than 1yr to 19yrs
Millenials <- c("Y20","Y21","Y22","Y23","Y24","Y25","Y26","Y27","Y28","Y29","Y30","Y31","Y33","Y34","Y35","Y36","Y37","Y38")
# Generation Y "Millenials" is defined as people from the age of less than 20yrs to 38yrs
GenX <- c("Y39","Y40","Y41","Y42","Y43","Y44","Y45","Y46","Y47","Y48","Y49","Y50","Y51","Y52","Y53","Y54")
# Generation X "GenX" is defined as people from the age of less than 1yr to 19yrs
Boomers <- c("Y55","Y56","Y57","Y58","Y59","Y60","Y61","Y62","Y63","Y64","Y65","Y66","Y67","Y68","Y69","Y70","Y71","Y72","Y73")
# The Baby Boomer Generation "Boomers" is defined as people from the age of less than 1yr to 19yrs
Silent <- c("Y74","Y75","Y76","Y77","Y78","Y79","Y80","Y81","Y82","Y83","Y84","Y_GE85")
# The Silent Generation "Silent" is defined as people from the age of 74yrs to greater than 85yrs
Subset by Region (geo):
#The Life Expectancy for the Scandinavian countries for each Generation
LifeExp_M_Scandinavia_GenZ <- subset(LifeExp_M, geo %in% c("DK","FI","NO","SE") & age %in% GenZ)
LifeExp_M_Scandinavia_Millenials <- subset(LifeExp_M, geo %in% c("DK","FI","NO","SE") & age %in% Millenials)
LifeExp_M_Scandinavia_GenX <- subset(LifeExp_M, geo %in% c("DK","FI","NO","SE") & age %in% GenX)
LifeExp_M_Scandinavia_Boomers <- subset(LifeExp_M, geo %in% c("DK","FI","NO","SE") & age %in% Boomers)
LifeExp_M_Scandinavia_Silent <- subset(LifeExp_M, geo %in% c("DK","FI","NO","SE") & age %in% Silent)
#The Life Expectancy for the Eastern EU countries for each Generation
LifeExp_M_EasternEU_GenZ <- subset(LifeExp_M, geo %in% c("SI","BG","HU","PO","UA","CZ") & age %in% GenZ)
LifeExp_M_EasternEU_Millenials <- subset(LifeExp_M, geo %in% c("SI","BG","HU","PO","UA","CZ") & age %in% Millenials)
LifeExp_M_EasternEU_GenX <- subset(LifeExp_M, geo %in% c("SI","BG","HU","PO","UA","CZ") & age %in% GenX)
LifeExp_M_EasternEU_Boomers <- subset(LifeExp_M, geo %in% c("SI","BG","HU","PO","UA","CZ") & age %in% Boomers)
LifeExp_M_EasternEU_Silent <- subset(LifeExp_M, geo %in% c("SI","BG","HU","PO","UA","CZ") & age %in% Silent)
#The Life Expectancy for the Mediterranean countries for each Generation
LifeExp_M_Mediterranean_GenZ <- subset(LifeExp_M, geo %in% c("IT","ES","PT","GR") & age %in% GenZ)
LifeExp_M_Mediterranean_Millenials <- subset(LifeExp_M, geo %in% c("IT","ES","PT","GR") & age %in% Millenials)
LifeExp_M_Mediterranean_GenX <- subset(LifeExp_M, geo %in% c("IT","ES","PT","GR") & age %in% GenX)
LifeExp_M_Mediterranean_Boomers <- subset(LifeExp_M, geo %in% c("IT","ES","PT","GR") & age %in% Boomers)
LifeExp_M_Mediterranean_Silent <- subset(LifeExp_M, geo %in% c("IT","ES","PT","GR") & age %in% Silent)
#The Life Expectancy for the CentralEU countries for each Generation
LifeExp_M_CentralEU_GenZ <- subset(LifeExp_M, geo %in% c("CH","DE","BE","NL","LU","FR","AT") & age %in% GenZ)
LifeExp_M_CentralEU_Millenials <- subset(LifeExp_M, geo %in% c("CH","DE","BE","NL","LU","FR","AT") & age %in% Millenials)
LifeExp_M_CentralEU_GenX <- subset(LifeExp_M, geo %in% c("CH","DE","BE","NL","LU","FR","AT") & age %in% GenX)
LifeExp_M_CentralEU_Boomers <- subset(LifeExp_M, geo %in% c("CH","DE","BE","NL","LU","FR","AT") & age %in% Boomers)
LifeExp_M_CentralEU_Silent <- subset(LifeExp_M, geo %in% c("CH","DE","BE","NL","LU","FR","AT") & age %in% Silent)
Statistics idea: * We want to compare the average life expectancy for each generation against each region for the date 2017 * We want to compare the standard deviation for generation Z against each region for the date 2017 * We want to compare the maximum life expectancy for “Y_LT1” between men and women for the date 2017 * We want to compare the minimum life expectancy for “Y_LT1” between men and women for the date 2017 * We want to compute the quantiles as an additional descriptive statistic for the date 2017
Mean:
mean_Scandinavia_GenZ <- mean(LifeExp_M_Scandinavia_GenZ$values)
mean_EasternEU_GenZ <- mean(LifeExp_M_EasternEU_GenZ$values)
mean_Mediterranean_GenZ <- mean(LifeExp_M_Mediterranean_GenZ$values)
mean_CentralEU_GenZ <- mean(LifeExp_M_CentralEU_GenZ$values)
mean_Scandinavia_Millenials <- mean(LifeExp_M_Scandinavia_Millenials$values)
mean_EasternEU_Millenials <- mean(LifeExp_M_EasternEU_Millenials$values)
mean_Mediterranean_Millenials <- mean(LifeExp_M_Mediterranean_Millenials$values)
mean_CentralEU_Millenials <- mean(LifeExp_M_CentralEU_Millenials$values)
mean_Scandinavia_GenX <- mean(LifeExp_M_Scandinavia_GenX$values)
mean_EasternEU_GenX <- mean(LifeExp_M_EasternEU_GenX$values)
mean_Mediterranean_GenX <- mean(LifeExp_M_Mediterranean_GenX$values)
mean_CentralEU_GenX <- mean(LifeExp_M_CentralEU_GenX$values)
mean_Scandinavia_Boomers <- mean(LifeExp_M_Scandinavia_Boomers$values)
mean_EasternEU_Boomers <- mean(LifeExp_M_EasternEU_Boomers$values)
mean_Mediterranean_Boomers <- mean(LifeExp_M_Mediterranean_Boomers$values)
mean_CentralEU_Boomers <- mean(LifeExp_M_CentralEU_Boomers$values)
mean_Scandinavia_Silent <- mean(LifeExp_M_Scandinavia_Silent$values)
mean_EasternEU_Silent <- mean(LifeExp_M_EasternEU_Silent$values)
mean_Mediterranean_Silent <- mean(LifeExp_M_Mediterranean_Silent$values)
mean_CentralEU_Silent <- mean(LifeExp_M_CentralEU_Silent$values)
Standard Deviation:
stDev_GenZ <- NULL # Create an empty vector in order to assign the standard deviation of each region to an element
stDev_GenZ[1] <- sd(LifeExp_M_Scandinavia_GenZ$values)
stDev_GenZ[2] <- sd(LifeExp_M_EasternEU_GenZ$values)
stDev_GenZ[3] <- sd(LifeExp_M_Mediterranean_GenZ$values)
stDev_GenZ[4] <- sd(LifeExp_M_CentralEU_GenZ$values)
header <- c("Scandinavia", "EasternEU", "Mediterranean", "CentralEU")
names(stDev_GenZ) <- header
stDev_GenZ
## Scandinavia EasternEU Mediterranean CentralEU
## 5.821170 6.612482 5.851897 5.779247
Conclusions:
We see that in EasternEU the standard deviation is the highest, whereas in all other regions the standard deviation is similar.
We hypothesize that in EasternEU the poor people have a relatively low life expectancy, but the rich people have about the same life expectancy as in the other regions.
Quantiles:
quantile(LifeExp_M$values)
## 0% 25% 50% 75% 100%
## 4.0 17.8 35.9 56.5 81.6
### Min & Max
#Subset men (M), time 2017, age Y_LT1
LifeExp_M_YLT1 <- subset(LifeExp, sex == "M" & time == "2017-01-01" & age == "Y_LT1")
LifeExp_M_YLT1
## # A tibble: 50 x 6
## unit sex age geo time values
## <fct> <fct> <fct> <fct> <date> <dbl>
## 1 YR M Y_LT1 AL 2017-01-01 77.1
## 2 YR M Y_LT1 AM 2017-01-01 72.5
## 3 YR M Y_LT1 AT 2017-01-01 79.4
## 4 YR M Y_LT1 AZ 2017-01-01 73.2
## 5 YR M Y_LT1 BE 2017-01-01 79.2
## 6 YR M Y_LT1 BG 2017-01-01 71.4
## 7 YR M Y_LT1 BY 2017-01-01 69.3
## 8 YR M Y_LT1 CH 2017-01-01 81.6
## 9 YR M Y_LT1 CY 2017-01-01 80.2
## 10 YR M Y_LT1 CZ 2017-01-01 76.1
## # ... with 40 more rows
View(LifeExp_M_YLT1)
#Subset women (F), time 2017, age Y_LT1
LifeExp_F_YLT1 <- subset(LifeExp, sex == "F" & time == "2017-01-01" & age == "Y_LT1")
LifeExp_F_YLT1
## # A tibble: 50 x 6
## unit sex age geo time values
## <fct> <fct> <fct> <fct> <date> <dbl>
## 1 YR F Y_LT1 AL 2017-01-01 80.1
## 2 YR F Y_LT1 AM 2017-01-01 78.9
## 3 YR F Y_LT1 AT 2017-01-01 84
## 4 YR F Y_LT1 AZ 2017-01-01 77.9
## 5 YR F Y_LT1 BE 2017-01-01 83.9
## 6 YR F Y_LT1 BG 2017-01-01 78.4
## 7 YR F Y_LT1 BY 2017-01-01 79.3
## 8 YR F Y_LT1 CH 2017-01-01 85.6
## 9 YR F Y_LT1 CY 2017-01-01 84.2
## 10 YR F Y_LT1 CZ 2017-01-01 82
## # ... with 40 more rows
View(LifeExp_F_YLT1)
#Min-Max Life Expectancy (M,2017,Y_LT1)
max_LifeExp_M_YLT1 <- LifeExp_M_YLT1[which.max(LifeExp_M_YLT1$values),c(4,6)]
max_LifeExp_M_YLT1 # Men (M) have the highest life expectancy in Switzerland of 81.6yrs for "less than 1 year (Y_LT1)", year 2017
## # A tibble: 1 x 2
## geo values
## <fct> <dbl>
## 1 CH 81.6
min_LifeExp_M_YLT1 <- LifeExp_M_YLT1[which.min(LifeExp_M_YLT1$values),c(4,6)]
min_LifeExp_M_YLT1 # Men (M) have the lowest life expectancy in Ukraina of 68.3yrs for "less than 1 year (Y_LT1)", year 2017
## # A tibble: 1 x 2
## geo values
## <fct> <dbl>
## 1 UA 68.3
#Min-Max Life Expectancy (F,2017,Y_LT1)
max_LifeExp_F_YLT1 <- LifeExp_F_YLT1[which.max(LifeExp_F_YLT1$values),c(4,6)]
max_LifeExp_F_YLT1 # Women (F) have the highest life expectancy in Switzerland of 86.1yrs for "less than 1 year (Y_LT1)", year 2017
## # A tibble: 1 x 2
## geo values
## <fct> <dbl>
## 1 ES 86.1
min_LifeExp_F_YLT1 <- LifeExp_F_YLT1[which.min(LifeExp_F_YLT1$values),c(4,6)]
min_LifeExp_F_YLT1 # Women (F) have the lowest life expectancy in Ukraina of 77.8yrs for "less than 1 year (Y_LT1)", year 2017
## # A tibble: 1 x 2
## geo values
## <fct> <dbl>
## 1 GE 77.8
#How much longer live women vs. men when considering the maximum life expectancy (2017,Y_LT1)?
max_LifeExp_YLT1_Difference <- (max_LifeExp_F_YLT1[,2] - max_LifeExp_M_YLT1[,2])
max_LifeExp_YLT1_Difference # Women have a 4.5 years longer maximum life expectancy than men
## values
## 1 4.5
#How much longer live women vs. men when considering the minimum life expectancy (2017,Y_LT1)?
min_LifeExp_YLT1_Difference <- (min_LifeExp_F_YLT1[,2] - min_LifeExp_M_YLT1[,2])
min_LifeExp_YLT1_Difference # Women have a 9.5 years longer minimum life expectancy than men
## values
## 1 9.5
Plotting idea:
Instructions:
The package “plotly” is able to “Create interactive web graphics from ‘ggplot2’ graphs and/or a custom interface to the (MIT-licensed) JavaScript library ‘plotly.js’ inspired by the grammar of graphics.”
In our plot you can double click on the legend to isolate one trace (useful to compare between countries within a certain generation).
In our plot you can hover over the bar to display the life expectancy Source: https://www.rdocumentation.org/packages/plotly/versions/4.9.1
Install package “plotly” & enable it in the library:
#install.packages("plotly", repos = "http://cran.rstudio.com/")
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Plot (M,2017,Y_LT1)
x <- c("Scandinavia","EasternEU","Mediterranean","CentralEU")
y_GenZ <- round(c(mean_Scandinavia_GenZ,mean_EasternEU_GenZ,mean_Mediterranean_GenZ,mean_CentralEU_GenZ), digits = 1)
y_Millenials <- round(c(mean_Scandinavia_Millenials,mean_EasternEU_Millenials,mean_Mediterranean_Millenials,mean_CentralEU_Millenials), digits = 1)
y_GenX <- round(c(mean_Scandinavia_GenX,mean_EasternEU_GenX,mean_Mediterranean_GenX,mean_CentralEU_GenX), digits = 1)
y_Boomers <- round(c(mean_Scandinavia_Boomers,mean_EasternEU_Boomers,mean_Mediterranean_Boomers,mean_CentralEU_Boomers), digits = 1)
y_Silent <- round(c(mean_Scandinavia_Silent,mean_EasternEU_Silent,mean_Mediterranean_Silent,mean_CentralEU_Silent), digits = 1)
data <- data.frame(x, y_GenZ, y_Millenials, y_GenX, y_Boomers)
data %>%
plot_ly() %>%
add_trace(x = ~x, y = ~y_Silent, name = 'Silent', type = 'bar',
text = y_Silent, textposition = 'auto',
marker = list(color = 'rgb(204,229,255)',
line = list(color = 'rgb(0,51,102)', width = 1.5))) %>%
add_trace(x = ~x, y = ~y_Boomers, name = 'Boomers', type = 'bar',
text = y_Boomers, textposition = 'auto',
marker = list(color = 'rgb(153,204,255)',
line = list(color = 'rgb(0,51,102)', width = 1.5))) %>%
add_trace(x = ~x, y = ~y_GenX, name = 'GenX', type = 'bar',
text = y_GenX, textposition = 'auto',
marker = list(color = 'rgb(102,178,255)',
line = list(color = 'rgb(0,51,102)', width = 1.5))) %>%
add_trace(x = ~x, y = ~y_Millenials, name = 'Millenials', type = 'bar',
text = y_Millenials, textposition = 'auto',
marker = list(color = 'rgb(51,153,255)',
line = list(color = 'rgb(0,51,102)', width = 1.5))) %>%
add_trace(x = ~x, y = ~y_GenZ, name = 'GenZ', type = 'bar',
text = y_GenZ, textposition = 'auto',
marker = list(color = 'rgb(0,102,204)',
line = list(color = 'rgb(0,51,102)', width = 1.5))) %>%
layout(title = "Comparison of average Life Expectancy per Generation by Region",
barmode = 'group',
xaxis = list(title = "Region"),
yaxis = list(title = "Mean of Life Expectancy per Generation"))
The data format of our source as shown on the website online [ https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=demo_mlexpec&lang=en) ] is presented with each different data variable in a separate column. Time for each line grouped by Country, therefore we conclude that its format is wide.
Whereas the data format of our source when downloaded as shown [ view(LifeExp) ] is long
View(LifeExp)
# install the package "tidyverse"
#install.packages("tidyverse", repos = "https://tidyverse.tidyverse.org/")
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble 2.1.3 v purrr 0.3.3
## v tidyr 1.0.0 v dplyr 0.8.3
## v readr 1.3.1 v stringr 1.4.0
## v tibble 2.1.3 v forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
# Long
LifeExp_M_CentralEU_Boomers
## # A tibble: 133 x 6
## unit sex age geo time values
## <fct> <fct> <fct> <fct> <date> <dbl>
## 1 YR M Y55 AT 2017-01-01 26.7
## 2 YR M Y55 BE 2017-01-01 26.6
## 3 YR M Y55 CH 2017-01-01 28.5
## 4 YR M Y55 DE 2017-01-01 26
## 5 YR M Y55 FR 2017-01-01 27.5
## 6 YR M Y55 LU 2017-01-01 27.1
## 7 YR M Y55 NL 2017-01-01 27.2
## 8 YR M Y56 AT 2017-01-01 25.8
## 9 YR M Y56 BE 2017-01-01 25.8
## 10 YR M Y56 CH 2017-01-01 27.6
## # ... with 123 more rows
View(LifeExp_M_CentralEU_Boomers)
# Wide
LifeExp_M_CentralEU_Boomers_wide <- spread(data = LifeExp_M_CentralEU_Boomers, key = geo, value = values)
View(LifeExp_M_CentralEU_Boomers_wide)
Deutscher Wetterdienst (called “rdwd” R package)
?rdwd
## No documentation for 'rdwd' in specified packages and libraries:
## you could try '??rdwd'
browseVignettes(package = "rdwd")
Sources: https://www.dwd.de/SharedDocs/broschueren/EN/press/kurzportraet_en.pdf?__blob=publicationFile&v=9 https://www.dwd.de/EN/aboutus/aboutus_node.html #??rdwd
# install the package "rdwd"
#install.packages("rdwd", repos = "https://bookdown.org/brry/rdwd")
library(rdwd)
# install the package "lubridate"
#install.packages("lubridate", repos = "https://cran.r-project.org/")
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
# install the package "ggplot2"
#install.packages("ggplot2", repos = "https://cran.r-project.org/")
library(ggplot2)
findID("Zugspitze")
## Zugspitze
## 5792
findID("Regensburg")
## Regensburg
## 4104
selectDWD: Select files for downloading * name: Choose the name of the weather station * res: Choose resolution, e.g. “hourly”,“daily”, “monthly” * var: Choose variable of interest, e.g. “air_temperature”, “cloudiness” * per: Choose desired time priod
Zugspitze
link1 <- selectDWD(name="Zugspitze", res="daily", var="kl", per="recent")
file1 <- dataDWD(link1, read=FALSE)
## rmarkdown::render -> knitr::knit -> call_block -> block_exec -> in_dir -> evaluate -> evaluate::evaluate -> evaluate_call -> timing_fn -> handle -> dataDWD -> dirDWD: adding to directory 'C:/Users/laure/Desktop/DWDdata'
## rmarkdown::render -> knitr::knit -> call_block -> block_exec -> in_dir -> evaluate -> evaluate::evaluate -> evaluate_call -> timing_fn -> handle -> dataDWD: 1 file already existing and not downloaded again: 'daily_kl_recent_tageswerte_KL_05792_akt.zip'
## Now downloading 0 files...
# dataDWD: Get climate data from the German Weather Service (DWD) FTP-server.
# The desired .zip (or .txt) dataset is downloaded
# Complete file URL(s) (including base and filename.zip) as returned by 'selectDWD'
Zugspitze <- readDWD(file1, varnames=TRUE)
# Read climate data that was downloaded with dataDWD
# The data is unzipped and subsequently, the file is read, processed and returned as a data.frame
# varnames: TRUE to obtein more informative column names
Regensburg
link2 <- selectDWD(name="Regensburg", res="daily", var="kl", per="recent")
# selectDWD: Select files for downloading
# name: Choose the name of the weather station
# res: Choose resolution, e.g. "hourly","daily", "monthly"
# var: Choose variable of interest, e.g. "air_temperature", "cloudiness"
# per: Choose desired time priod
# "recent": data from the last year, up to date usually within a few days
# "historical": long time series
file2 <- dataDWD(link2, read=FALSE)
## rmarkdown::render -> knitr::knit -> call_block -> block_exec -> in_dir -> evaluate -> evaluate::evaluate -> evaluate_call -> timing_fn -> handle -> dataDWD -> dirDWD: adding to directory 'C:/Users/laure/Desktop/DWDdata'
## rmarkdown::render -> knitr::knit -> call_block -> block_exec -> in_dir -> evaluate -> evaluate::evaluate -> evaluate_call -> timing_fn -> handle -> dataDWD: 1 file already existing and not downloaded again: 'daily_kl_recent_tageswerte_KL_04104_akt.zip'
## Now downloading 0 files...
# dataDWD: Get climate data from the German Weather Service (DWD) FTP-server.
# The desired .zip (or .txt) dataset is downloaded
# Complete file URL(s) (including base and filename.zip) as returned by 'selectDWD'
Regensburg <- readDWD(file2, varnames=TRUE)
# Read climate data that was downloaded with dataDWD
# The data is unzipped and subsequently, the file is read, processed and returned as a data.frame
# varnames: TRUE to obtein more informative column names
#Subset of Zugspitze
tempZugspitze <- Zugspitze[,c(2,14)]
# Subset of Regensburg
tempRegensburg <- Regensburg[,c(2,14)]
Statistics of Zugspitze
length(Zugspitze$MESS_DATUM) # number of data points
## [1] 550
mean(Zugspitze$TMK.Lufttemperatur) # mean of air temperature
## [1] -1.525636
max(Zugspitze$TMK.Lufttemperatur) # max air temperature
## [1] 13.6
min(Zugspitze$TMK.Lufttemperatur) # min air temperature
## [1] -21
mean(Zugspitze$SHK_TAG.Schneehoehe, na.rm=TRUE) # mean of snow fall height
## [1] 191.0712
Statistics of Regensburg
length(Regensburg$MESS_DATUM) # number of data points
## [1] 550
mean(Regensburg$TMK.Lufttemperatur) # mean of sunshine duration
## [1] 12.59127
max(Regensburg$TMK.Lufttemperatur) # max duration of sunshine
## [1] 27.3
min(Regensburg$TMK.Lufttemperatur) # min duration of sunshine
## [1] -5.4
mean(Regensburg$SHK_TAG.Schneehoehe, na.rm=TRUE) # mean of snow fall height
## [1] 0.7896825
Comparison of the Statistics: Zugspitze vs. Regensburg
Air temperature:
Zugspitze: -1.497455 (mean), 13.6 (max), -21 (min)
Regensburg: 12.61327 (mean), 27.3 (max), -5.4 (min)
Snowfall:
Zugspitze: 192 (mean)
Regensburg: 0.79 (mean)
# Plot of Zugspitze & Regensburg
par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8)
plot(Zugspitze[,c(2,14)], type="l", ylim=c(-20,30), col="blue", xaxt="n", las=1, main="Daily temp Regensburg")
lines(Regensburg[,c(2,14)], type="l", col="red", xaxt="n", las=1, main="Daily temp Zugspitze vs. Regensburg")
berryFunctions::monthAxis() ; abline(h=0)
legend("top", c("Zugspitze","Regensburg"),cex=.8,col=c("red","blue"),lty =c(1,1))
View(Zugspitze)
# The data format is wide, because a subject's repeated responses is in a single row, and each response is in a separate column.