Correlation and Regression

Correlation

Simple Correlation

Two variates X and Y are said be correlated if the increase or decrease of one variable affects the increase or decrease in another variable.

Positive Correlation:

X and Y are said to be positively correlated if Y increases if X increases and Y decreases if X decreases and vice versa. ie their both increases or both decreases.

Negative Correlation:

X and Y are said to be positively correlated if Y decreases if X increases and Y increases if X decreases and vice versa. ie their both increases or both decreases.

Zero Correlation:

If there is no change in Y with the increase of decrease in X

Scatter Diagram

If \((x_i,y_i),i=1,2,\ldots,N\) be n paired data of X and Y, by taking one variable on X axis and the other on Y axis, the point plot is a scatter diagram.

Merits: 1. It is simple to draw. 2. Give quick visualization. 3. Good starting point.

Demerits: 1. It is rough and not accurate.

x=c(84, 66, 68, 129, 90, 91, 74, 76, 70, 85, 122, 74, 104, 97, 104, 91)
y=c(86, 82, 92, 151, 96, 86, 99, 88, 87, 98, 125, 85, 127, 108, 104, 102)
plot(x,y,main='positive correlated data',pch=16)
abline(h=mean(y),col=2)
abline(v=mean(x),col=4)
points(mean(x),mean(y),pch=14)

Karl Pearson Coefficient of Correlation

The coefficient of correlation \(r_{xy}\) is \[ r_{xy} = \frac{COV(X,Y)}{\sqrt{Var(X)Var(Y)}} \]

A simple to compute formula

\[ r_{xy} = \frac{N\sum X\, Y-\sum X \, \sum Y}{ \sqrt{\left[ N\, \sum X^2 - (\sum X)^2 \right] \left[ N\, \sum Y^2 - (\sum Y)^2 \right] }} \]

Example 1 Compute the variances of X and Y. Also find covariance of X and Y to the following data

x 84 66 68 129 90 91 74 76 70 85 122 74 104 97 104 91

y 86 82 92 151 96 86 99 88 87 98 125 85 127 108 104 102

cov_xy=sum((x-mean(x))*(y-mean(y)))/length(x)

data.frame(mean(x),mean(y),cov_xy)

##   mean.x. mean.y. cov_xy
## 1 89.0625     101 291.75

# sample correlation coefficient divides (n-1) insteads of n
cov(x,y)

## [1] 311.2

## to find simple correlation

x=c(84, 66, 68, 129, 90, 91, 74, 76, 70, 85, 122, 74, 104, 97, 104, 91)
y=c(86, 82, 92, 151, 96, 86, 99, 88, 87, 98, 125, 85, 127, 108, 104, 102)

##sample size
N=length(x)


## shift the origin A=90 and B=110
A=90
B=110

u=x-A
v=y-B
uv=u*v
u2=u*u
v2=v*v


df=data.frame(x,y,u,v,uv,u2,v2)
srow=apply(df,2,sum)
df = rbind(df,srow)
df

##       x    y   u    v   uv   u2   v2
## 1    84   86  -6  -24  144   36  576
## 2    66   82 -24  -28  672  576  784
## 3    68   92 -22  -18  396  484  324
## 4   129  151  39   41 1599 1521 1681
## 5    90   96   0  -14    0    0  196
## 6    91   86   1  -24  -24    1  576
## 7    74   99 -16  -11  176  256  121
## 8    76   88 -14  -22  308  196  484
## 9    70   87 -20  -23  460  400  529
## 10   85   98  -5  -12   60   25  144
## 11  122  125  32   15  480 1024  225
## 12   74   85 -16  -25  400  256  625
## 13  104  127  14   17  238  196  289
## 14   97  108   7   -2  -14   49    4
## 15  104  104  14   -6  -84  196   36
## 16   91  102   1   -8   -8    1   64
## 17 1425 1616 -15 -144 4803 5217 6658

#product moment formula
r_uv=(N*sum(uv)-sum(u)*sum(v))/
(sqrt(N*sum(u2)-sum(u)^2)*sqrt(sum(N*v2)-sum(v)^2))

data.frame(r_uv=round(r_uv,3))

##    r_uv
## 1 0.884

#pearsons formula

pearson_r=cov(u,v)/sqrt(var(u)*var(v))

data.frame(pearson_r=round(pearson_r,3))

##   pearson_r
## 1     0.884

The variables have strong positive correlation.

f<-function(x)  x
curve(f(x),-1,1,lwd=2,col=2,main='linear correlation')
text(0.65,0.5,'+ve r',col=2)
curve(-f(x),-1,1,lwd=2,col=4,add = TRUE,lty=2)
text(0.65,-0.50,'-ve r',col=4)

Properties of Correlation

Correlation coefficient is purely real no
It lies in -1 and +1
r +1 means strong positive correlation, -1 negative correlation and
zero means No linear correlation
Invariant in magnitude under linear transformation \(|r_{xy}|=|r_{uv}|\) where \(u=\frac{X-a}{b}\) and \(v=\frac{y-c}{d}\).

\[ r_{xy}=\frac{b\times d}{|b|\times |d|}\, r_{uv} \]

If X and Y are independent, then correlation of X and Y is zero and the converse not necessarily true. Below is counter example

x=c(-3, -2, -1, 1, 2,3)
y=c(9, 4, 1, 1, 4, 9) 
df=data.frame(x,y,xy=x*y,x2=x*x,y2=y*y)
df

##    x y  xy x2 y2
## 1 -3 9 -27  9 81
## 2 -2 4  -8  4 16
## 3 -1 1  -1  1  1
## 4  1 1   1  1  1
## 5  2 4   8  4 16
## 6  3 9  27  9 81

apply(df,2,sum)

##   x   y  xy  x2  y2 
##   0  28   0  28 196

print(c('correlation coefficient is 0 though y=x^2'))

## [1] "correlation coefficient is 0 though y=x^2"

rm(list=ls())

Example-2

Calculate the correlauon coefficient for ihe following heights (in inches) of father’s (X) and his sons (Y) :

X : 65 66 67 67 68 69 70 72

Y: 61 68 6S 68 12 12 69 71

Method-1

x = c(65, 66, 67, 67, 68, 69, 70, 72) 
y = c(67, 68, 65, 68, 72, 72, 69, 71)
df=data.frame(x,y,xy=x*y,x2=x*x,y2=y*y)
srow=apply(df,2,sum)
rbind(df,srow)

##     x   y    xy    x2    y2
## 1  65  67  4355  4225  4489
## 2  66  68  4488  4356  4624
## 3  67  65  4355  4489  4225
## 4  67  68  4556  4489  4624
## 5  68  72  4896  4624  5184
## 6  69  72  4968  4761  5184
## 7  70  69  4830  4900  4761
## 8  72  71  5112  5184  5041
## 9 544 552 37560 37028 38132

data.frame(srow)

##     srow
## x    544
## y    552
## xy 37560
## x2 37028
## y2 38132

N=length(x)
rxy=(N*srow['xy']-srow['x']*srow['y'])/
  sqrt((N*srow['x2']-srow['x']^2)*(N*srow['y2']-srow['y']^2))

names(rxy)<-'correlation x and y'
rxy

## correlation x and y 
##           0.6030227

Example 1

Show that the coeficient of correlation r is independent of a change of scale and origin of the variables. Also prove that for two independent variables r = O. Show by an example that the converse is not true. State the Iimits between which r lies and give its proof.

Example 2

Let r be the correlation coefficient between two jointly distributed random variables X and Y. Show that \(|r|<1\) and that \(|r|=1\) if and only if X and Y are linearly related.

Example 2

Calculate the coefficient of correlation between X and Y for the following:

X 1 3 4 5 7 8 10

Y 2 6 8 10 14 16 20

x=c(1, 3, 4, 5, 7, 8, 10)
y=c(2, 6, 8, 10, 14, 16, 20)
df=data.frame(x,y,xy=x*y,x2=x*x,y2=y*y)
df

##    x  y  xy  x2  y2
## 1  1  2   2   1   4
## 2  3  6  18   9  36
## 3  4  8  32  16  64
## 4  5 10  50  25 100
## 5  7 14  98  49 196
## 6  8 16 128  64 256
## 7 10 20 200 100 400

srow=apply(df,2,sum)
n=length(x)
r= (n*srow[3]-srow[1]*srow[2])/
sqrt((n*srow[4]-srow[1]^2)*(n*srow[5]-srow[2]^2))

names(r)<-'correlation'
r

## correlation 
##           1

Example

By effecting suitable change of origin and scale, compute product moment correlation coefficient for the following set of 5 observations on (X. Y) :

X: -10 -5 0 5 10

Y: 5 9 7 11 13

x=c(-10, -5, 0, 5, 10) 
y=c(5, 9, 7, 11, 13)
A=0
B=7
u=x-A
v=y-B
df=data.frame(x,y,u,v,uv=u*v,u2=u*u,v2=v*v)
srow=apply(df,2,sum)
df

##     x  y   u  v  uv  u2 v2
## 1 -10  5 -10 -2  20 100  4
## 2  -5  9  -5  2 -10  25  4
## 3   0  7   0  0   0   0  0
## 4   5 11   5  4  20  25 16
## 5  10 13  10  6  60 100 36

srow

##   x   y   u   v  uv  u2  v2 
##   0  45   0  10  90 250  60

n=length(x)
r= (n*srow[5]-srow[3]*srow[4])/
sqrt((n*srow[6]-srow[3]^2)*(n*srow[7]-srow[4]^2))

names(r)<-'correlation'
r

## correlation 
##         0.9

x=c(15.5, 16.5, 17.5, 18.5, 19.5, 20.5) 
y=c(75, 60, 50, 50, 45, 40) 
#Ans. r = 0·94.
data.frame(correlation=round(cor(x,y),2))

##   correlation
## 1       -0.94

Example 5

From the following data, compute the co-efficient of correlation between X and Y.

	X	Y
No. of items	15	15
Arithmetic mean	25	18
Sum of squared deviations from mean	136	138

Summation of product of deviations of X and Y series from respective arithmetic means = 122

Correlation coefficient is given by

\[ \begin{array}{rcl} r &=& \frac{COV(X,Y)}{\sqrt{VAR(X) \times VAR(Y)}}\\ &=& \frac{\frac{1}{N}\sum (X-\bar{X}) (Y-\bar{Y})} {\sqrt{ \frac{1}{N} \sum (X-\bar{X})^2 \frac{1}{N} \sum (Y-\bar{Y})^2 }}\\ &=& \frac{122}{\sqrt{136 \times 138}}\\ &=& 0.891 \end{array} \]

Example Bivariate correlation coeffient

d=c(4,2,2,0,5,4,6,4,6, 8, 10, 11,4,4,6,8,0,2, 4, 4,0,2, 3,1) 
fij=matrix(d,nrow=6,byrow=TRUE)
fij

##      [,1] [,2] [,3] [,4]
## [1,]    4    2    2    0
## [2,]    5    4    6    4
## [3,]    6    8   10   11
## [4,]    4    4    6    8
## [5,]    0    2    4    4
## [6,]    0    2    3    1

fxj=apply(fij,2,sum)
fxj

## [1] 19 22 31 28

fix=apply(fij,1,sum)
fix

## [1]  8 19 35 22 10  6

xi=seq(15,65,10)
xi

## [1] 15 25 35 45 55 65

yj=18:21
yj

## [1] 18 19 20 21

A=35
B=19
h=10

ui=(xi-A)/h
vj=yj-B

ui

## [1] -2 -1  0  1  2  3

vj

## [1] -1  0  1  2

N=sum(fix)

uifix=ui%*%fix

vjfxj=vj%*%fxj

ui2fix=ui^2%*%fix

vj2fxj=vj^2%*%fxj


uivjfij=t(ui)%*%(fij%*%vj)


res=data.frame(N,uifix,vjfxj,ui2fix,vj2fxj,uivjfij)


r=(N*uivjfij-uifix*vjfxj)/sqrt(
(N*ui2fix-uifix^2)*(N*vj2fxj-vjfxj^2))
res

##     N uifix vjfxj ui2fix vj2fxj uivjfij
## 1 100    25    68    167    162      52

names(r)<-'correlation coefficient'
r

##           [,1]
## [1,] 0.2565744
## attr(,"names")
## [1] "correlation coefficient"

\[ \begin{array}{rcl} r&=&\frac{N*\sum_i\sum_j x_iy_j f_{ij} - \left( \sum_ix_i f_{ix}\right) \left( \sum_jy_j f_{xj}\right) }{\sqrt{ \left[ N \sum_ix_i^2 f_{ix}-\left(\sum_ix_i f_{ix} \right)^2\right] \left[ N \sum_jy_j^2 f_{xj}-\left(\sum_jy_j f_{xj} \right)^2\right] }}\\ &=& \frac{100 \times 52}{\sqrt{ \left[100\times 167 -25^2\right] \left[100\times 162 -68^2\right] }}\\ &=& 0.257 \end{array} \]

[N] Size of the sample
[\(x_i\)] mid values of X class
[\(y_j\)] mid values of Y class
[\(f_{ij}\)] frequency ith class of X and jth class of Y
[\(f_{ix}\)] marginal frequency of ith class of X(row sum)
[\(f_{xj}\)] marginal frequency of jth class of Y(column sum)

Merits and Demerits of Pearson Product Moment Correlation

Merits - Accurate - Invariant in Location change - Bounded

De Merits - Computationlly intensive - Sensitive to changes in the data

Spearman Rank Correlation Coefficient

Rank Correlation is computed using the ranks instead of actual values.

Raw Formula for Rank Correlation

\[ rho = \frac{N\sum R_x \, R_y-\sum R_x \, \sum R_y}{ \sqrt{\left[ N\, \sum R_x^2 - (\sum R_x)^2 \right] \left[ N\, \sum R_y^2 - (\sum R_y)^2 \right] }} \]

where \(R_x\) and \(R_y\) are ranks.

Simple Formula for Non-Repeated Ranks

\[ rho = 1- \frac{6\times \sum_i d_i^2}{n(n^2-1)} \]

where \(d_i\) is the difference between the ranks of X and Y values of the ith observation.

Example 6

Find the rank correlatio of

x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
y=c(1, 10, 3, 4, 5, 7, 2, 6, 8, 11, 15, 9, 14, 12, 16, 13)
n=length(x)

df=data.frame(x,y,di2=(x-y)^2)
srow=apply(df,2,sum)
srow

##   x   y di2 
## 136 136 136

rho=1-(6*sum(df$di2)/(n*(n^2-1)))
  
print(rho)

## [1] 0.8

Example 6

## ex2 rank correlation - Repeated ranks
X=c( 68, 64, 75, 50, 64, 80, 75, 40, 55, 64) 
Y=c( 62, 58, 68, 45, 81, 60, 68, 48, 50, 70)

round(cor(rank(X),rank(Y)),3)

## [1] 0.556

x=rank(X)
y=rank(Y)
n=length(x)

df=data.frame(x,y,di2=(x-y)^2)
srow=apply(df,2,sum)
df

##       x    y di2
## 1   7.0  6.0   1
## 2   5.0  4.0   1
## 3   8.5  7.5   1
## 4   2.0  1.0   1
## 5   5.0 10.0  25
## 6  10.0  5.0  25
## 7   8.5  7.5   1
## 8   1.0  2.0   1
## 9   3.0  3.0   0
## 10  5.0  9.0  16

srow

##   x   y di2 
##  55  55  72

rho=1-(6*sum(df$di2)/(n*(n^2-1)))
  
print(rho)

## [1] 0.5636364

Example 7

X=c(10, 15, 12, 17, 13, 16, 24,14, 22, 20) 
Y=c(30, 42, 45, 46, 33, 34, 40, 35, 39, 38)
u=X-16
v=Y-34
df=data.frame(X,Y,u,v,u*v,u^2,v^2)
df

##     X  Y  u  v u...v u.2 v.2
## 1  10 30 -6 -4    24  36  16
## 2  15 42 -1  8    -8   1  64
## 3  12 45 -4 11   -44  16 121
## 4  17 46  1 12    12   1 144
## 5  13 33 -3 -1     3   9   1
## 6  16 34  0  0     0   0   0
## 7  24 40  8  6    48  64  36
## 8  14 35 -2  1    -2   4   1
## 9  22 39  6  5    30  36  25
## 10 20 38  4  4    16  16  16

apply(df,2,sum)

##     X     Y     u     v u...v   u.2   v.2 
##   163   382     3    42    79   183   424

Regression

The word regression means going backwards first introduced by a British biometrician Sir Francis Galton (1822-1911) in studying the heights of the offsprings in connection with the inheritance.

Definition.

Regression analysis is a mathematical measure of the average relationship between two or more variables in terms of the original units of the data.

In regression there will be one variable which will be predicted (Y) based on the values of other variables X. Y is called dependent variable and X independent variable.

If X is a single variable then it is called simple regression
If X is more than one then the regression is called multiple regression
X is called regressor or predictor or explanatory variable
Y is called regressed variable or explained variable or target variable

Scatter Diagram

If the data points are clustered around some curve, it is called curve of regression.

if the curve is straight line then the regression linear regression \[ Y=b_0+b_1\times x \]
if the curve is a polynomial and any non-linear then it is curvilinear regresion \[ Y=b_0+b_1\times x+b_2\times x^2+\cdots \]
exponential regression \[ Y=b_0 e^{b_1\times x} \]
power regression (Gas Equation) \[ Y=b_0 x^{b_1} \]

### Simple Linear Regression - Let \((x_i,y_i),i=1,2,\ldots,N\) be the data - To fit the model \(Y=b_0+b_1\times x\) - Write the normal equations

\[ \begin{array}{rcl} \sum y_i &=& N\, b_0 +b_1 \sum x_i \\ \sum x_i y_i &=& N\, b_0 \sum x_i+b_1 \sum x_i^2 \\ \end{array} \]

Solving the normal equations for \(b_0\) and \(b_1\) to get \(\hat{b}_0\) and \(\hat{b}_1\)
The model is \[Y=\hat{b}_0 + \hat{b}_1\times x \]

###Example1
Suppose the observations on X and Y are given as :

X	59	65	45	52	60	62	70
Y	75	70	55	65	60	69	80

where N = 10 students, and Y = Marks in Maths, X = Marks in Economics. Compute the least square regression equations of Y on X and of X on Y.

x=c(59, 65, 45, 52, 60, 62, 70) 
y=c(75, 70, 55, 65, 60, 69, 80)
n=length(x)
u=x-52
v=y-65
df=data.frame(x,y,u,v,uv=u*v,u2=u*u,v2=v*v)
df

##    x  y  u   v  uv  u2  v2
## 1 59 75  7  10  70  49 100
## 2 65 70 13   5  65 169  25
## 3 45 55 -7 -10  70  49 100
## 4 52 65  0   0   0   0   0
## 5 60 60  8  -5 -40  64  25
## 6 62 69 10   4  40 100  16
## 7 70 80 18  15 270 324 225

apply(df,2,sum)

##   x   y   u   v  uv  u2  v2 
## 413 474  49  19 475 755 491

A=matrix(c(n,sum(x),sum(x),sum(x^2)),nrow = 2)
A

##      [,1]  [,2]
## [1,]    7   413
## [2,]  413 24779

b=c(sum(y),sum(x*y))
print(b)

## [1]   474 28308

solve(A,b)

## [1] 18.7385576  0.8300971

We can use direct formula

model=lm(y~x)
model

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##     18.7386       0.8301

part(b)

If a student gets 61 marks in Economics, what would you estimate his marks in Maths to be ?

# model2=lm(v~u,df)
# model2
model_61=predict(model,newdata = data.frame(x=61))
print(model_61)

##        1 
## 69.37448

Another formulae of regression line Y on X

\[ Y-\bar{Y}=r\, \frac{\sigma_y}{\sigma_x} (X-\bar{X}) \]

###Example 9.1 Fit a straight line to the following data. X: 1 2 3 4 6 8 Y. : 2·4 3 3·6 4 5 6

X=c(1,2,3,4,6,8) 
Y=c(2.4,3,3.6,4,5,6)
lm(Y~X)

## 
## Call:
## lm(formula = Y ~ X)
## 
## Coefficients:
## (Intercept)            X  
##      1.9765       0.5059

df=data.frame(X,Y,X*Y,X^2)
df

##   X   Y X...Y X.2
## 1 1 2.4   2.4   1
## 2 2 3.0   6.0   4
## 3 3 3.6  10.8   9
## 4 4 4.0  16.0  16
## 5 6 5.0  30.0  36
## 6 8 6.0  48.0  64

srow=apply(df,2,sum)
srow

##     X     Y X...Y   X.2 
##  24.0  24.0 113.2 130.0

sx=sum(X)
sy=sum(Y)
sxy=sum(X*Y)
sx2=sum(X*X)
n=length(X)
xmat=matrix(c(n,sx,sx,sx2),nrow = 2)
xmat

##      [,1] [,2]
## [1,]    6   24
## [2,]   24  130

b=c(sy,sxy)
b

## [1]  24.0 113.2

solve(xmat,b)

## [1] 1.9764706 0.5058824

###Example9.2 Fit a parabola of second degree to the following data: X: 0 1 2 3 4 Y: 1 1·8 1·3 2·5 6·3

X=c(0,1,2,3,4) 
Y=c(1,1.8,1.3,2.5,6.3) 
df=data.frame(X,Y,X^2,X^3,X^4,X*Y,X^2*Y)
df

##   X   Y X.2 X.3 X.4 X...Y X.2...Y
## 1 0 1.0   0   0   0   0.0     0.0
## 2 1 1.8   1   1   1   1.8     1.8
## 3 2 1.3   4   8  16   2.6     5.2
## 4 3 2.5   9  27  81   7.5    22.5
## 5 4 6.3  16  64 256  25.2   100.8

apply(df,2,sum)

##       X       Y     X.2     X.3     X.4   X...Y X.2...Y 
##    10.0    12.9    30.0   100.0   354.0    37.1   130.3

n=length(X)
sx=sum(X)
sy=sum(Y)
sx2=sum(X^2)
sx3=sum(X^3)
sx4=sum(X^4)
sxy=sum(X*Y)
sx2y=sum(X^2*Y)
dmat=matrix(c(n,sx,sx2,sx,sx2,sx3,sx2,sx3,sx4),nrow = 3)
dmat

##      [,1] [,2] [,3]
## [1,]    5   10   30
## [2,]   10   30  100
## [3,]   30  100  354

b=c(sy,sxy,sx2y)
b

## [1]  12.9  37.1 130.3

solve(dmat,b)

## [1]  1.42 -1.07  0.55

A simple R command gives the following

lm(Y~X+I(X^2))

## 
## Call:
## lm(formula = Y ~ X + I(X^2))
## 
## Coefficients:
## (Intercept)            X       I(X^2)  
##        1.42        -1.07         0.55

###Example 9.4

X :1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0 Y: 1.1, 1.3, 1.6, 2.6, 2.7, 3.4, 4.1

X=c(1.0,1.5,2.0,2.5,3.0,3.5,4.0) 
Y=c(1.1,1.3,1.6,2.6,2.7,3.4,4.1)
lm(Y~X+I(X^2))

## 
## Call:
## lm(formula = Y ~ X + I(X^2))
## 
## Coefficients:
## (Intercept)            X       I(X^2)  
##      0.5214       0.3786       0.1286

###Example 9.5

Fit an exponential curve of the form \(Y=a\, b^x\) to the following data :

X: 1 ·2 3 4 5 6 7 8

Y: 1·0 1·2 1·8 2·5 3·6 4·7 6·6 9·1

X=c(1,2,3,4,5,6,7,8) 
Y=c(1.0,1.2,1.8,2.5,3.6,4.7,6.6,9.1)
y=round(log10(Y),3)
df=data.frame(X,Y,y)
df

##   X   Y     y
## 1 1 1.0 0.000
## 2 2 1.2 0.079
## 3 3 1.8 0.255
## 4 4 2.5 0.398
## 5 5 3.6 0.556
## 6 6 4.7 0.672
## 7 7 6.6 0.820
## 8 8 9.1 0.959

b=10^coef(lm(y~X))[2]
a=10^coef(lm(y~X))[1]
round(data.frame(a,b),3)

##                 a     b
## (Intercept) 0.682 1.383

#alternatively
model_expo<-nls(Y~a*(b^X),
               start = list(a = 0.5, b = 0.2))

round(coef(model_expo),3)

##     a     b 
## 0.690 1.381

Exmple 9.6

Fit an equation of the form \(Y = ab^x\) to the following data :

X: 2 3 4 5 6

Y: 144 112·8 207·4 248·8 298·6

Ans. \(Y= 101·3\, (1·1961)^X\)

X=c(2,3,4,5,6) 
Y=c(144,172.8,207.4,248.8,298.6) 

#alternatively
model_expo<-nls(Y~a*(b^X),
               start = list(a = 0.5, b = 1.0))

round(coef(model_expo),4)

##        a        b 
## 100.0089   1.2000

X=c(2,3,4,5,6) 
Y=c(144,172.8,207.4,248.8,298.6) 
y=log(Y)
x2=X^2
xy=X*y
n=length(X)
df=data.frame(X,Y,y,x2,xy)
df

##   X     Y        y x2        xy
## 1 2 144.0 4.969813  4  9.939627
## 2 3 172.8 5.152135  9 15.456405
## 3 4 207.4 5.334649 16 21.338597
## 4 5 248.8 5.516649 25 27.583247
## 5 6 298.6 5.699105 36 34.194629

srow=round(apply(df,2,sum),3)
srow

##        X        Y        y       x2       xy 
##   20.000 1071.600   26.672   90.000  108.513

xmat=round(matrix(c(n,sum(X),sum(X),sum(xy)),nrow = 2),2)

rhs=c(sum(y),sum(xy))
rhs

## [1]  26.67235 108.51250

xmat

##      [,1]   [,2]
## [1,]    5  20.00
## [2,]   20 108.51

10^solve(xmat,rhs)

## [1] 1.198634e+05 1.158633e+00

Correlation and Regression

Prof.V.Ravindranath for II B.Tech. CSE Class of UCEK, JNTUK

1/5/2023

Correlation

Simple Correlation

Positive Correlation:

Negative Correlation:

Zero Correlation:

Scatter Diagram

Karl Pearson Coefficient of Correlation

Example 1 Compute the variances of X and Y. Also find covariance of X and Y to the following data

Properties of Correlation

If X and Y are independent, then correlation of X and Y is zero and the converse not necessarily true. Below is counter example

Example-2

Example 1

Example 2

Example 2

Example

Example 5

Example Bivariate correlation coeffient

Merits and Demerits of Pearson Product Moment Correlation

Spearman Rank Correlation Coefficient

Raw Formula for Rank Correlation

Simple Formula for Non-Repeated Ranks

Example 6

Example 6

Example 7

Regression

Definition.

Scatter Diagram

part(b)

Another formulae of regression line Y on X

Exmple 9.6