These equations are similar to variance of a random variable calculation.
\[\sigma^{2}=E(x-\mu_{X})^{2}\]
Whereas variance can not be negative, covariance can.
E is the expected value operator. If we do population related calculations E and \(\mu\) is the same. However when calculating sample covariance it will work a bit different.
#Setup X and YX=c(1,5,9)Y=c(9,5,1)#Sample covariancecov(X,Y)
[1] -16
#If you change the order of the pairs of numbers in X,Y covariance does not change. It will change if the order of X or Y by itself is changed.#Setup X and Y first set of values are changed with 3rd.X=c(9,5,1)Y=c(1,5,9)cov(X,Y)
[1] -16
#Setup X and Y X=c(5,9,1)Y=c(9,5,1)#Sample covariance does changecov(X,Y)
[1] 8
Correlation
Now that we discussed covariance let us remember \(\rho\) and \(r\).
Those of you who are practically minded might want to simulate a large number of \(X^{2}\) and compute the mean. This would provide an approximate solution and we can revert to this if we do not have an exact solution.
Can we do \(E(X)*E(X)\) to calculate \(E(X^{2})\)?
Curious Cats: Discussion
NO! This would only work if you have 2 r.v.s that are independent.
Obviously r.v. X is not independent of X. \(p(X=x|X=x)=1 \neq p(X=x)\)
Ok now think about what we know about the random variable X. \[X \sim N(0,1)\]
The r.v. X has a normal distribution with mean 0 and standard deviation 1.
Curious Cats: Discussion
In that case X has a random variable whose variance is 1 as well.
Why is this helpful? Think about the variance calculation!
3 If you have 3 sets of correlations \(\rho_{X,Y}\) =0.45, \(\rho_{A,Y}\) = 0.75 and \(\rho_{A,X}\)=0.8 what are the conclusions you can draw?
Some Answers: 1
\(\rho\) is a population parameter whose value is calculated based on known population parameters. \(r\) is a statistic calculated from the sample values. If \(n\) is \(\infty\)\(r\) should be \(\rho\) due to \(r\) being an unbiased estimator of \(\rho\).
tldr; if the sample size is large enough, you would expect the value of r be close to \(\rho\). However since r is calculated based on a finite sample size, it will not equal to the population parameter \(\rho\).
Some Answers: 2
If \(\rho_{X,Y}=0.45\) this means as the values of X increases (decreases) you would expect Y to also increase (decrease). Some like to assign weak, moderate, strong to values of correlation based on some cutoff points that an important statistician suggested. These are not dogma cutoff values and are subjective therefore we avoid assigning adjectives in that manner.
tldr; I would say that X and Y have a moderate positive linear relationship but the word moderate is subjective.
Some Answers: 3
The r.v. A has a stronger correlation to Y compared to X. This does not necessarily mean A causes Y and X just is trivially correlated to Y. Data by itself can not determine causal relationships but it should drive our search in building explanations.
Y can be caused by both X and A. In fact perhaps X has a stronger relationship to Y when X and A. Perhaps the relationship of A to Y is trivial for the problem we are trying to solve (ex: city population size and number of fires in the city).
Some Answers: 3
A and X are positively correlated. The strength of \(\rho_{A,X}\) are stronger compared to the correlation of each variable compared to Y. This large value indicates if we try to build a model to explain how Y occurs, one of these variables (A,X) might be causing the other one.
None of these are given in the data but they are questions you need to think about when investigating the statistics from the data.
In the environment of automation it becomes more important to ask good questions.
Table of Correlations:Excel 1
Table of Correlations:Excel 2
- We have by default New Worksheet Ply Selected. This will create the table of correlations in a new worksheet.
#Read the dataset which has header labels (X,Y,A), #columns separated by tab delimited '\t'data=read.table('C:/Users/rm84/Desktop/teaching/2333/2333_Linear_Regression_files/data.txt',header=TRUE,sep='\t')cor(data)
X Y A
X 1.0000000 0.4219863 0.6999944
Y 0.4219863 1.0000000 0.6101665
A 0.6999944 0.6101665 1.0000000
read.table function reads the tab delimited values.
Full table instead of just the diagonal and lower triangle since the cor table can be used as an object in algorithms that require the whole table.