Bivariate data that are coupled or matched together. They are not necessarily independent.
require(UsingR)
## Loading required package: UsingR
## Warning: package 'UsingR' was built under R version 4.4.1
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 4.4.1
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 4.4.1
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
names(fat)
## [1] "case" "body.fat" "body.fat.siri" "density"
## [5] "age" "weight" "height" "BMI"
## [9] "ffweight" "neck" "chest" "abdomen"
## [13] "hip" "thigh" "knee" "ankle"
## [17] "bicep" "forearm" "wrist"
plot(fat$wrist, fat$neck)
par(mfrow=c(1,2))
plot(neck~wrist, data=fat)
plot(neck~wrist, data=fat, subset=20<=age &age <30)
plot(fat$wrist, fat$neck)
abline(v=mean(fat$wrist))
abline(h=mean(fat$neck))
points(mean(fat$wrist), mean(fat$neck), pch=16, col=rgb(.35,0,0))
If related, then most of data should be in first and third box!
cor(fat$wrist, fat$neck)
## [1] 0.7448264
require(MASS)
plot(Animals$body,Animals$brain)
cor(Animals$body,Animals$brain)
## [1] -0.005341163
cor(rank(Animals$body), rank(Animals$brain))
## [1] 0.7162994
Note: Countries with more per capita chocolate consumption have more per capita Nobel laureates. Conclude: Chocolate consumption cause better scientific research!
Spurious: Facebook Users and Marks of users Causality: Smoking and lung cancer, Wine and heart risk.
Pearson correlation coefficient is a measure of the linearity of the (possible) relationship between two variables X and Y. Even if correlation coefficient is high, it does not mean there is causal relationship between X and Y. Does not tell you cause and effect?
Care to be taken when used for predictive purposes. Causality: Domain Knowledge, design a good control experiment
Squared loss:
#y = read.csv("annual_temp.csv", header=TRUE)
#head(y)
#plot(Temp ~ CO2, data=y, pch=19,cex=.5, col="#440154")
#abline(lm(Temp ~ CO2, data=y), col="#21918c")
require(MASS)
plot(calls ~ year, data=phones, pch=19,cex=.5, col="#440154")
abline(lm(calls ~ year, data=phones), col="#21918c")
ABSMINLINE = function(x)
{ with (phones, sum(abs(calls- x[1] -x[2]*year)))}
OPTIMAL = optim(c(0,0), fn = ABSMINLINE)
abline(lm(calls ~ year, data=phones), col="#21918c")
abline(OPTIMAL$par, col="#3b528b")
abline(rlm(calls ~ year, data=phones), col="#fc8961")
## Warning in rlm.default(x, y, weights, method = method, wt.method = wt.method, :
## 'rlm' failed to converge in 20 steps
par(mfrow=c(2,2))
plot(lm(calls ~ year, data=phones), col="#440154")