cor.test

Calculates the Pearson correlation coefficient

x <- rnorm(10)
y <- rnorm(10)
x
 [1] -0.40674264  2.41954178  2.42796127 -0.28477615 -0.54452441  0.69373477
 [7]  0.85569233 -1.42600450 -0.06299629  0.94166159
y
 [1]  0.03117989  1.16086128  0.25655903  1.02594775 -0.82415760  0.32674814
 [7]  1.23306169  1.24575192  0.14762524  0.87471457

cor.test

plot(x,y)

cor.test

x and y should be numeric vectors

cor.test(x,y)
    Pearson's product-moment correlation

data:  x and y
t = 0.51281, df = 8, p-value = 0.6219
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5083266  0.7264281
sample estimates:
      cor 
0.1783965 

cor.test

z = x + rnorm(10, mean = 0, sd = 0.1)
plot(x,z)

cor.test

cor.test(x,z)
    Pearson's product-moment correlation

data:  x and z
t = 60.21, df = 8, p-value = 6.433e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9951623 0.9997495
sample estimates:
      cor 
0.9988985 

cor.test

test = cor.test(x,y)

cor.test

test = cor.test(x,y)
test$statistic
        t 
0.5128077 
test$p.value
[1] 0.6219418
test$estimate
      cor 
0.1783965 

cor.test for matrices

x_matrix <- matrix(rnorm(30), nrow = 10)
dim(x_matrix)
[1] 10  3
apply(x_matrix, 2, function(x) cor.test(x,y)$estimate)
[1] -0.2527349  0.2279156 -0.1988432

cor

Calculates Pearson’s correlation coefficient: x and y can be numeric vectors, matrices or data frames

cor(x_matrix,y)
           [,1]
[1,] -0.2527349
[2,]  0.2279156
[3,] -0.1988432

cor

Calculates Pearson’s correlation coefficient: x and y can be numeric vectors, matrices or data frames

cor(x_matrix,y)
           [,1]
[1,] -0.2527349
[2,]  0.2279156
[3,] -0.1988432
apply(x_matrix, 2, function(x) cor.test(x,y)$estimate)
[1] -0.2527349  0.2279156 -0.1988432

cor

Calculates Pearson’s correlation coefficient

cor(x,y)
[1] 0.1783965

x and y can be numeric vectors, matrices or data frames

cor - dealing with missing values

x[2] = NA
x
 [1] -0.40674264          NA  2.42796127 -0.28477615 -0.54452441  0.69373477
 [7]  0.85569233 -1.42600450 -0.06299629  0.94166159

cor - dealing with missing values

x[2] = NA
x
 [1] -0.40674264          NA  2.42796127 -0.28477615 -0.54452441  0.69373477
 [7]  0.85569233 -1.42600450 -0.06299629  0.94166159
is.na(x)
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

cor - dealing with missing values

x[2] = NA
x
 [1] -0.40674264          NA  2.42796127 -0.28477615 -0.54452441  0.69373477
 [7]  0.85569233 -1.42600450 -0.06299629  0.94166159
is.na(x)
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
which(is.na(x))
[1] 2

cor - dealing with missing values

is.na(x)
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
which(is.na(x))
[1] 2
cor(x,y)
[1] NA

cor

cor(x,y,use = "complete.obs")
[1] 0.00559633

cor

x_matrix[1,1] = NA
cor(x_matrix,y)
           [,1]
[1,]         NA
[2,]  0.2279156
[3,] -0.1988432

cor

cor(x_matrix,y)
           [,1]
[1,]         NA
[2,]  0.2279156
[3,] -0.1988432
cor(x_matrix,y,use = "complete.obs")
           [,1]
[1,] -0.3989906
[2,]  0.3619255
[3,] -0.2409004

cor

cor(x_matrix[2:10,],y[2:10])
           [,1]
[1,] -0.3989906
[2,]  0.3619255
[3,] -0.2409004
cor(x_matrix,y,use = "complete.obs")
           [,1]
[1,] -0.3989906
[2,]  0.3619255
[3,] -0.2409004

cor

use = “complete.obs” - removes the entire row if there is any NA value found in the row

cor(x_matrix[2:10,],y[2:10]) == cor(x_matrix,y,use = "complete.obs")
     [,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE

cor

cor(x_matrix,y)
           [,1]
[1,]         NA
[2,]  0.2279156
[3,] -0.1988432
cor(x_matrix,y,use = "pairwise.complete.obs")
           [,1]
[1,] -0.3989906
[2,]  0.2279156
[3,] -0.1988432

cor

use = “complete.obs” - removes the entire row if there is any NA value found in the row

cor(x_matrix[2:10,],y[2:10]) == cor(x_matrix,y,use = "complete.obs")
     [,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE
cor(x_matrix[2:10,],y[2:10]) == cor(x_matrix,y,use = "pairwise.complete.obs")
      [,1]
[1,]  TRUE
[2,] FALSE
[3,] FALSE

Linear regression

lm(formula)

lm(y ~ x)
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   0.478887     0.003394  

Linear regression

fit = lm(y ~ x)
summary(fit)
Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.3012 -0.3311 -0.1545  0.5480  0.7717 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.478887   0.248491   1.927   0.0953 .
x           0.003394   0.229233   0.015   0.9886  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7264 on 7 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  3.132e-05, Adjusted R-squared:  -0.1428 
F-statistic: 0.0002192 on 1 and 7 DF,  p-value: 0.9886

Sorting vectors

sample = sample.int(10,5)
sample
[1]  6  8  5  2 10

Sorting vectors

sample
[1]  6  8  5  2 10
order(sample)
[1] 4 3 1 2 5

order returns a permutation which rearranges its first argument into ascending or descending order

Sorting vectors

sample
[1]  6  8  5  2 10
order(sample)
[1] 4 3 1 2 5

order returns a permutation which rearranges its first argument into ascending or descending order

sample[order(sample)]
[1]  2  5  6  8 10

Sorting vectors: order

order(sample)
[1] 4 3 1 2 5
order(sample, decreasing = TRUE)
[1] 5 2 1 3 4

Sorting vectors: order

order(sample)
[1] 4 3 1 2 5
order(sample, decreasing = TRUE)
[1] 5 2 1 3 4
sample[order(sample, decreasing = TRUE)]
[1] 10  8  6  5  2