suppressPackageStartupMessages(library('BBmisc'))
pkgs <- c('knitr', 'kableExtra', 'devtools', 'lubridate', 'data.table', 'tidyquant', 'stringr', 'magrittr', 'tidyverse', 'plyr', 'dplyr', 'broom', 'highcharter', 'formattable', 'DT', 'httr', 'openxlsx')
suppressAll(lib(pkgs))
#funs <- c('')
#l_ply(funs, function(x) source(paste0('./function/', x)))
options(warn=-1)
rm(pkgs)
This course provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Whether you are part of a data science team or working individually within a community of developers, this course will give you the knowledge of R needed to make useful contributions in those settings. As the first course in the Specialization, the course provides the essential foundation of R needed for the following courses. We cover basic R concepts and language fundamentals, key concepts like tidy data and related “tidyverse” tools, processing and manipulation of complex and large datasets, handling textual data, and basic data science tasks. Upon completing this course, learners will have fluency at the R console and will be able to create tidy datasets from a wide range of possible data sources.
Kindly refer to manual Mastering Software Development in R (web-base). (or Mastering Software Development in R.pdf)
Dynamic Documents for R using R Markdown introduce some useful functions and also packages for R users.
Kindly refer to Mastering Software Development in R Specialization to know the whole courses for the R specialization.
swirl::install_course("The R Programming Environment")
# 1: Setting Up Swirl <- 'uYaZJxzcqIQ1y6cKCB4e'
# 2: Basic Building Blocks <- 'htv8enScWY5qMnuWwNXV'
# 3: Sequences of Numbers <- 'H8yEkfT49PAQ1qroHCRn'
# 4: Vectors <- 'nwXVEoJ1smaLlEHl0DIm'
# 5: Missing Values <- 'nMVLxE9f6vDEIjixH7ZQ'
# 6: Subsetting Vectors <- 'fBPFjy5E34DWbK51f0qh'
# 7: Matrices and Data Frames <- 'qGozNYaCJ5f3TdYEKVbn'
# 8: Logic <- 'r4PQMCG9k2a3RPsIJfHy'
# 9: Workspace and Files <- 'F1BHKCEnAeirZMc0YCSq'
#10: Reading Tabular Data <- 'B8Y4eKmHny5uivnpYJ05'
#11: Looking at Data <- 'katPyvJxaPMD1q8Ug6EF'
#12: Data Manipulation <- 'NopXxlF40CchYURJEbVg'
#13: Text Manipulation Functions <- 'uv2hFE4U3wjdMRfHVGY0'
#14: Regular Expressions <- 'Djrf113sbiWxZTfg7Vek'
#15: The stringr Package <- 'mixvNacXgbIdBnzauoNt'
Some of the datasets described in the readings are not easily accessible and so we have made them available for download in this reading. Attached to his reading is a zip file ("datasets.zip")
that contains some of the data referred to in the readings of this course.
# ---------------- eval=FALSE ------------------------
lnk <- 'https://d3c33hcgiwev3.cloudfront.net/_c51e36d1897263b6cfadc7de150f05a1_datasets.zip?Expires=1538697600&Signature=R63JeOjOgKMZuQZh6aXKWTLIHXPNJoE--Zxem6jL6XoNjWGGhXgdX9HDPzRUUNbnlrDTxnrzHehRZZEzv2B5hP2ZlwJljDocWHIlu7mCr09VtF~2Yz4Lyvl7mr9DaFyVdKTBmyxN6KXYtAj9MGZTyNPvixx41XgWc-s2Y-CLEcU_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A'
download.file(lnk, destfile = 'data/datasets.zip')
unzip(zipfile = 'data/datasets.zip', exdir = './data')
lnk2 <- 'https://d3c33hcgiwev3.cloudfront.net/_f018d9fe5547b1a722ce260af0fa71af_quiz_data.zip?Expires=1538784000&Signature=LPxLR-yO3YS-caxtkS7lE5nOM4QQiRWMTiVVvaJZ8jhIY6ydzqHe7UmUcgCj-GiYcxCMEVBHdjRHX4eaa-96fbJSLB0AFYPUZxbRJyBS7SN7c3CHLnEvGyvmvj5qayqb8cib~yvxUhYlj1qdp5Xo874U-u6J0VdqZFfC69xSayo_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A'
ext_tracks_file <- paste0('http://rammb.cira.colostate.edu/research/',
'tropical_cyclones/tc_extended_best_track_dataset/',
'data/ebtrk_atlc_1988_2015.txt')
zika_file <- paste0('https://raw.githubusercontent.com/cdcepi/zika/master/',
'Brazil/COES_Microcephaly/data/COES_Microcephaly-2016-06-25.csv')
library(httr)
meso_url <- 'https://mesonet.agron.iastate.edu/cgi-bin/request/asos.py/'
denver <- GET(url = meso_url,
query = list(station = 'DEN',
data = 'sped',
year1 = '2016',
month1 = '6',
day1 = '1',
year2 = '2016',
month2 = '6',
day2 = '30',
tz = 'America/Denver',
format = 'comma')) %>%
content() %>%
read_csv(skip = 5, na = 'M')
denver %>% slice(1:3)
# A tibble: 3 × 3
# station valid sped
# <chr> <dttm> <dbl>
#1 DEN 2016-06-01 00:00:00 9.2
#2 DEN 2016-06-01 00:05:00 9.2
#3 DEN 2016-06-01 00:10:00 6.9
readr function | Use |
---|---|
read_csv | Reads comma-separated file |
read_csv2 | Reads semicolon-separated file |
read_tsv | Reads tab-separated file |
read_delim | General function for reading delimited files |
read_fwf | Reads fixed width files |
read_log | Reads log files |
Operator | Meaning | Example |
---|---|---|
== | Equals | storm_name == KATRINA |
!= | Does not equal | min_pressure != 0 |
> | Greater than | latitude > 25 |
>= | Greater than or equal to | max_wind >= 160 |
< | Less than | min_pressure < 900 |
<= | Less than or equal to | distance_to_land <= 0 |
%in% | Included in | storm_name %in% c(“KATRINA”, “ANDREW”) |
is.na() | Is a missing value | is.na(radius_34_ne) |
#The mutate function in dplyr can be used to add new columns to a data frame or change existing columns in the data frame. As an example, I’ll use the worldcup dataset from the package faraway, which statistics from the 2010 World Cup. To load this example data frame, you can run:
library(faraway)
data(worldcup)
library(tidyr)
library(ggplot2)
worldcup %>%
select(Position, Time, Shots, Tackles, Saves) %>%
gather(Type, Number, -Position, -Time) %>%
ggplot(aes(x = Time, y = Number)) +
geom_point() +
facet_grid(Type ~ Position)
library(knitr)
# Summarize the data to create the summary statistics you want
wc_table <- worldcup %>%
filter(Team %in% c("Spain", "Netherlands", "Uruguay", "Germany")) %>%
select(Team, Position, Passes) %>%
group_by(Team, Position) %>%
summarize(ave_passes = mean(Passes),
min_passes = min(Passes),
max_passes = max(Passes),
pass_summary = paste0(round(ave_passes), " (",
min_passes, ", ",
max_passes, ")")) %>%
select(Team, Position, pass_summary)
# What the data looks like before using `spread`
wc_table
Source: local data frame [16 x 3]
Groups: Team [4]
Team Position pass_summary
<fctr> <fctr> <chr>
1 Germany Defender 190 (44, 360)
2 Germany Forward 90 (5, 217)
3 Germany Goalkeeper 99 (99, 99)
4 Germany Midfielder 177 (6, 423)
5 Netherlands Defender 182 (30, 271)
6 Netherlands Forward 97 (12, 248)
7 Netherlands Goalkeeper 149 (149, 149)
8 Netherlands Midfielder 170 (22, 307)
9 Spain Defender 213 (1, 402)
10 Spain Forward 77 (12, 169)
11 Spain Goalkeeper 67 (67, 67)
12 Spain Midfielder 212 (16, 563)
13 Uruguay Defender 83 (22, 141)
14 Uruguay Forward 100 (5, 202)
15 Uruguay Goalkeeper 75 (75, 75)
16 Uruguay Midfielder 100 (1, 252)
# Use spread to create a prettier format for a table
wc_table %>%
spread(Position, pass_summary) %>%
kable()
Team | Defender | Forward | Goalkeeper | Midfielder |
---|---|---|---|---|
Germany | 190 (44, 360) | 90 (5, 217) | 99 (99, 99) | 177 (6, 423) |
Netherlands | 182 (30, 271) | 97 (12, 248) | 149 (149, 149) | 170 (22, 307) |
Spain | 213 (1, 402) | 77 (12, 169) | 67 (67, 67) | 212 (16, 563) |
Uruguay | 83 (22, 141) | 100 (5, 202) | 75 (75, 75) | 100 (1, 252) |
#teams <- read_csv('data/team_standings.csv')
team_standings <- read_csv('data/team_standings.csv')
left_join(world_cup, team_standings, by = 'Team')
worldcup %>%
mutate(Name = rownames(worldcup),
Team = as.character(Team)) %>%
select(Name, Position, Shots, Team) %>%
arrange(desc(Shots)) %>%
slice(1:5) %>%
left_join(team_standings, by = "Team") %>% # Merge in team standings
rename("Team Standing" = Standing) %>%
kable()
Name | Position | Shots | Team | Team Standing |
---|---|---|---|---|
Gyan | Forward | 27 | Ghana | 7 |
Villa | Forward | 22 | Spain | 1 |
Messi | Forward | 21 | Argentina | 5 |
Suarez | Forward | 19 | Uruguay | 4 |
Forlan | Forward | 18 | Uruguay | 4 |
## https://jeevanyue.github.io/post/2018-01-08-read_data_in_r/
if(!file.exists('data/daily_SPEC_2014.csv')) unzip(zipfile = 'data/daily_SPEC_2014.zip', exdir = './data')
pth <- ifelse(dir.exists('The R Programming Environment'),
file.path('The R Programming Environment', 'data'),
file.path('data'))
lf <- list.files(pth, pattern = '.xlsx$|.csv$')
Qz <- llply(lf, function(x) {
if(grepl('.csv$', x)) {
fread(paste0(pth, '/', x)) %>% tbl_df
} else {
read.xlsx(paste0(pth, '/', x)) %>% tbl_df
}
})
names(Qz) <- lf %>% str_replace_all('\\.[a-z]{3,4}$', '')
Qzdf <- Qz$`daily_SPEC_2014` %>%
ddply(.(`State Name`, `Parameter Name`), summarize,
AM = mean(`Arithmetic Mean`, na.rm = TRUE)) %>%
tbl_df
file.remove('data/daily_SPEC_2014.csv')
## [1] TRUE
## Question 1
Q1 <- Qzdf %>%
dplyr::filter(`State Name` == 'Wisconsin' &
`Parameter Name` == 'Bromine PM2.5 LC')
Q1
## # A tibble: 1 x 3
## `State Name` `Parameter Name` AM
## <chr> <chr> <dbl>
## 1 Wisconsin Bromine PM2.5 LC 0.00396
## Question 2
Q2 <- Qzdf %>%
dplyr::filter(`Parameter Name` == 'EC2 PM2.5 LC'|
`Parameter Name` == 'Sodium PM2.5 LC'|
`Parameter Name` == 'Sulfur PM2.5 LC'|
`Parameter Name` == 'OC CSN Unadjusted PM2.5 LC TOT') %>%
tbl_df %>%
arrange(desc(AM))
Q2
## # A tibble: 157 x 3
## `State Name` `Parameter Name` AM
## <chr> <chr> <dbl>
## 1 District Of Columbia OC CSN Unadjusted PM2.5 LC TOT 411.
## 2 Michigan OC CSN Unadjusted PM2.5 LC TOT 3.44
## 3 Illinois OC CSN Unadjusted PM2.5 LC TOT 2.46
## 4 Missouri OC CSN Unadjusted PM2.5 LC TOT 2.41
## 5 Maine OC CSN Unadjusted PM2.5 LC TOT 2.27
## 6 Texas OC CSN Unadjusted PM2.5 LC TOT 2.18
## 7 Nevada OC CSN Unadjusted PM2.5 LC TOT 1.31
## 8 Ohio Sulfur PM2.5 LC 0.811
## 9 Kentucky Sulfur PM2.5 LC 0.799
## 10 Indiana Sulfur PM2.5 LC 0.790
## # ... with 147 more rows
## Question 3
Q3 <- Qz$`daily_SPEC_2014` %>%
ddply(.(`State Code`, `County Code`, `Site Num`, `Parameter Name`),
summarize, AM = mean(`Arithmetic Mean`, na.rm = TRUE)) %>%
tbl_df
Q3 %<>% dplyr::filter(`Parameter Name` == 'Sulfate PM2.5 LC') %>%
tbl_df %>%
arrange(desc(AM))
Q3
## # A tibble: 358 x 5
## `State Code` `County Code` `Site Num` `Parameter Name` AM
## <int> <int> <int> <chr> <dbl>
## 1 39 81 17 Sulfate PM2.5 LC 3.18
## 2 42 3 64 Sulfate PM2.5 LC 3.06
## 3 54 39 1005 Sulfate PM2.5 LC 2.94
## 4 18 19 6 Sulfate PM2.5 LC 2.74
## 5 39 153 23 Sulfate PM2.5 LC 2.71
## 6 39 35 60 Sulfate PM2.5 LC 2.64
## 7 39 87 12 Sulfate PM2.5 LC 2.64
## 8 54 51 1002 Sulfate PM2.5 LC 2.62
## 9 21 111 67 Sulfate PM2.5 LC 2.55
## 10 18 37 2001 Sulfate PM2.5 LC 2.52
## # ... with 348 more rows
## Question 4
Q4 <- Qz$`daily_SPEC_2014` %>%
ddply(.(`State Name`, `Parameter Name`),
summarize, AM = mean(`Arithmetic Mean`, na.rm = TRUE)) %>%
tbl_df
Q4 %<>% dplyr::filter(
(`State Name` == 'California' | `State Name` == 'Arizona') &
`Parameter Name` == 'EC PM2.5 LC TOR') %>%
tbl_df %>%
arrange(desc(AM))
Q4
## # A tibble: 2 x 3
## `State Name` `Parameter Name` AM
## <chr> <chr> <dbl>
## 1 California EC PM2.5 LC TOR 0.198
## 2 Arizona EC PM2.5 LC TOR 0.179
## highest - secondary
Q4$AM[1] - Q4$AM[2]
## [1] 0.01856696
## Question 5
Q5 <- Qz$`daily_SPEC_2014` %>% dplyr::filter(
Longitude < -100 & `Parameter Name` == 'OC PM2.5 LC TOR') %>%
tbl_df
median(Q5$`Arithmetic Mean`, na.rm = TRUE)
## [1] 0.43
## Question 6
Q6 <- Qz$`aqs_sites`
Q6 %<>% dplyr::count(Land.Use, Location.Setting) %>%
dplyr::filter(Land.Use == 'RESIDENTIAL' & Location.Setting == 'SUBURBAN')
Q6
## # A tibble: 1 x 3
## Land.Use Location.Setting n
## <chr> <chr> <int>
## 1 RESIDENTIAL SUBURBAN 3527
## Question 7
Qz7 <- left_join(Qz$`aqs_sites`, Qz$`daily_SPEC_2014`)
## Joining, by = c("Latitude", "Longitude", "Datum", "Address")
Q7 <- Qz7 %>%
dplyr::filter(Longitude >= -100 & `Parameter Name` == 'EC PM2.5 LC TOR' &
Land.Use == 'RESIDENTIAL' & Location.Setting == 'SUBURBAN') %>%
.$`Arithmetic Mean` %>% median(na.rm = TRUE)
Q7
## [1] 0.61
## Question 8
Q8 <- Qz7 %>%
dplyr::filter(`Parameter Name` == 'Sulfate PM2.5 LC' &
Land.Use == 'COMMERCIAL')
Q8 %<>% mutate(month = lubridate::month(ymd(`Date Local`))) %>%
select(month, `Arithmetic Mean`) %>%
ddply(.(month), summarize, `Arithmetic Mean` = mean(`Arithmetic Mean`, na.rm=TRUE)) %>%
arrange(desc(`Arithmetic Mean`))
Q8
## month Arithmetic Mean
## 1 2 2.021325
## 2 3 1.805260
## 3 7 1.777605
## 4 8 1.761226
## 5 6 1.750571
## 6 9 1.645010
## 7 4 1.567614
## 8 5 1.558096
## 9 12 1.537649
## 10 1 1.316738
## 11 10 1.313770
## 12 11 1.295837
## Question 9
Q9 <- Qz7 %>%
dplyr::filter((`Parameter Name` == 'Sulfate PM2.5 LC' |
`Parameter Name` == 'Total Nitrate PM2.5 LC') &
State.Code == 6 & County.Code == 65 &
Site.Number == 8001)
Q9 %<>% mutate(`Date Local` = ymd(`Date Local`)) %>%
select(`Date Local`, `Arithmetic Mean`) %>%
ddply(.(`Date Local`), summarize,
`Arithmetic Mean` = sum(`Arithmetic Mean`, na.rm=TRUE)) %>%
dplyr::filter(`Arithmetic Mean` > 10) %>% tbl_df
Q9 #nrow(Q9) = 37
## # A tibble: 37 x 2
## `Date Local` `Arithmetic Mean`
## <date> <dbl>
## 1 2014-01-11 40.8
## 2 2014-01-29 51.5
## 3 2014-02-10 23.6
## 4 2014-02-16 19.8
## 5 2014-02-19 17.7
## 6 2014-02-22 12.4
## 7 2014-02-25 16.9
## 8 2014-03-06 34.1
## 9 2014-03-24 30.2
## 10 2014-04-05 11.6
## # ... with 27 more rows
## Question 10
Q10 <- Qz7 %>%
dplyr::filter(`Parameter Name` == 'Sulfate PM2.5 LC' |
`Parameter Name` == 'Total Nitrate PM2.5 LC')
Q10 %<>% ddply(.(State.Code, County.Code, Site.Number, `Parameter Name`),
summarize,
`Arithmetic Mean` = mean(`Arithmetic Mean`, na.rm=TRUE)) %>%
tbl_df
Q10 %<>% spread(`Parameter Name`, `Arithmetic Mean`)
## In order to know all possible correlation methods and dataset used, here I simulate all possible options.
use <- c('everything', 'all.obs', 'complete.obs', 'na.or.complete', 'pairwise.complete.obs')
method <- c('pearson', 'kendall', 'spearman')
corr <- llply(use, function(x) {
llply(method, function(y) {
Q10 %>% mutate(
use = x, method = y,
corr = tryCatch(cor(`Sulfate PM2.5 LC`, `Total Nitrate PM2.5 LC`,
use = x, method = y), error=function(e) NA))
}) %>% bind_rows
}) %>% bind_rows
corr
## # A tibble: 5,355 x 8
## State.Code County.Code Site.Number `Sulfate PM2.5 ~ `Total Nitrate ~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 73 23 2.06 0.644
## 2 1 73 2003 2.36 0.607
## 3 1 79 2 1.87 0.557
## 4 1 89 14 1.98 0.911
## 5 1 101 1002 1.94 0.484
## 6 1 113 1 1.87 0.543
## 7 10 1 3 2.03 2.78
## 8 10 3 2004 1.96 1.56
## 9 11 1 42 1.85 1.05
## 10 11 1 43 1.94 1.34
## # ... with 5,345 more rows, and 3 more variables: use <chr>, method <chr>,
## # corr <dbl>
## Filter all correlation possibilities.
sumr <- corr %>% na.omit %>%
dplyr::select(use, method, corr) %>% unique
sumr
## # A tibble: 9 x 3
## use method corr
## <chr> <chr> <dbl>
## 1 complete.obs pearson 0.526
## 2 complete.obs kendall 0.506
## 3 complete.obs spearman 0.685
## 4 na.or.complete pearson 0.526
## 5 na.or.complete kendall 0.506
## 6 na.or.complete spearman 0.685
## 7 pairwise.complete.obs pearson 0.526
## 8 pairwise.complete.obs kendall 0.506
## 9 pairwise.complete.obs spearman 0.685
## Here I compare the correlation methods.
sapply(method, function(x) {
xx <- na.omit(corr)
suppressWarnings(cor.test(
xx$`Sulfate PM2.5 LC`,
xx$`Total Nitrate PM2.5 LC`,
method = x))
})
## $pearson
##
## Pearson's product-moment correlation
##
## data: xx$`Sulfate PM2.5 LC` and xx$`Total Nitrate PM2.5 LC`
## t = 34.717, df = 3148, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5004545 0.5509808
## sample estimates:
## cor
## 0.5261819
##
##
## $kendall
##
## Kendall's rank correlation tau
##
## data: xx$`Sulfate PM2.5 LC` and xx$`Total Nitrate PM2.5 LC`
## z = 42.48, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.5061482
##
##
## $spearman
##
## Spearman's rank correlation rho
##
## data: xx$`Sulfate PM2.5 LC` and xx$`Total Nitrate PM2.5 LC`
## S = 1642600000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6846733
## tidy data
llply(method, function(x) {
xx <- na.omit(corr)
suppressWarnings(cor.test(
xx$`Sulfate PM2.5 LC`,
xx$`Total Nitrate PM2.5 LC`,
method = x)) %>% broom::tidy()
}) %>% bind_rows
## # A tibble: 3 x 8
## estimate statistic p.value parameter conf.low conf.high method
## <dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr>
## 1 0.526 3.47e1 6.93e-224 3148 0.500 0.551 Pears~
## 2 0.506 4.25e1 0. NA NA NA Kenda~
## 3 0.685 1.64e9 0. NA NA NA Spear~
## # ... with 1 more variable: alternative <chr>
Q10B <- corr %>% na.omit %>%
dplyr::select(-`Sulfate PM2.5 LC`, -`Total Nitrate PM2.5 LC`) %>%
tbl_df %>% unique %>%
dplyr::filter(State.Code %in% c(2,5,16,42) &
County.Code %in% c(37,45,90,113) &
Site.Number %in% c(2,3,35)) %>%
arrange(desc(corr))
Q10B %>% ddply(.(method), head, corr = corr, 1)
## State.Code County.Code Site.Number use method
## 1 16 37 2 complete.obs kendall
## 2 16 37 2 pairwise.complete.obs pearson
## 3 16 37 2 complete.obs spearman
## corr
## 1 0.5061482
## 2 0.5261819
## 3 0.6846733
Pearson r correlation: Pearson r correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. For example, in the stock market, if we want to measure how two stocks are related to each other, Pearson r correlation is used to measure the degree of relationship between the two. The point-biserial correlation is conducted with the Pearson correlation formula except that one of the variables is dichotomous. The following formula is used to calculate the Pearson r correlation:
pearson r correlation
r = Pearson r correlation coefficient N = number of observations ∑xy = sum of the products of paired scores ∑x = sum of x scores ∑y = sum of y scores ∑x2= sum of squared x scores ∑y2= sum of squared y scores
Types of research questions a Pearson correlation can examine:
Kendall rank correlation: Kendall rank correlation is a non-parametric test that measures the strength of dependence between two variables. If we consider two samples, a and b, where each sample size is n, we know that the total number of pairings with a b is n(n-1)/2
. The following formula is used to calculate the value of Kendall rank correlation:
kendall rank correlation
Nc= number of concordant Nd= Number of discordant
Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables. The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal.
The following formula is used to calculate the Spearman rank correlation:
spearman rank correlation
ρ= Spearman rank correlation di= the difference between the ranks of corresponding variables n= number of observations
Types of research questions a Spearman Correlation can examine:
Is there a statistically significant relationship between horse’s finishing position a race and horse’s age?
A comparison of the Pearson and Spearman correlation methods
Pearson product moment correlation The Pearson correlation evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.
For example, you might use a Pearson correlation to evaluate whether increases in temperature at your production facility are associated with decreasing thickness of your chocolate coating.
Spearman rank-order correlation The Spearman correlation evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate. The Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.
Spearman correlation is often used to evaluate relationships involving ordinal variables. For example, you might use a Spearman correlation to evaluate whether the order in which employees complete a test exercise is related to the number of months they have been employed.
It is always a good idea to examine the relationship between variables with a scatterplot. Correlation coefficients only measure linear (Pearson) or monotonic (Spearman) relationships. Other relationships are possible.
0.0001
, correlation coefficients always greater than 0.5
).Dynamic Documents for R using R Markdown introduce some useful functions and also packages for R users.
Kindly refer to Mastering Software Development in R Specialization to know the whole courses for the R specialization.
swirl::install_course("The R Programming Environment")
It’s useful to record some information about how your file was created.
Category | session_info | Category | Sys.info |
---|---|---|---|
version | R version 3.5.1 (2018-07-02) | sysname | Windows |
system | x86_64, mingw32 | release | 10 x64 |
ui | RTerm | version | build 17134 |
language | en | nodename | RSTUDIO-SCIBROK |
collate | Japanese_Japan.932 | machine | x86-64 |
tz | Asia/Tokyo | login | scibr |
date | 2018-10-19 | user | scibr |
Current time | 2018-10-19 20:28:45 JST | effective_user | scibr |