correlation
Abstract:
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, “correlation” may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demandcurve.
Introduction:
Correlation means connection. Correlation analysis studies the relationship or connection between two or more variables.Two variables are said to be correlated if they differ in such a way that changes in one variable accompany changes in the other.
Example: The relationship between a student's height and weight in a class, as well as the relationship between a family's earnings and the amount spent each month.
Correlation is a statistical tool used to establish the relationship between two or more variables. It defines the relationship between two variables.
Example: As summer approaches, the heat rises, and atmospheric temperature increases. So, people tend to travel to hill stations to enjoy the cold weather. Hence, the hill stations get crowded. Similarly, we can observe that the sales of ice creams, cool drinks, and fruits like watermelon increased during this period.
What does Correlation Measure?
Correlation is a means of systematically examining such relationships or associations.Although correlation measures the direction and degree of correlation, it does not say anything about the cause-and-effect relationship between two or more variables.
Example: We know that the demand for a commodity and its price are closely related. But, we cannot say exactly whether demand is causing the changes in price or whether it is the changes in price that is causing the demand.
Even though the cause-and-effect relationship cannot be established, we can conclude that the two variables, demand and price, are correlated.
Thus, correlation does notestablish the causation, cause, and effect in a relationship.
Types of Correlation
There are three types of correlations between the two variables.
Positive Correlation
Negative Correlation
No correlation
1. Positive Correlation
A positive correlation occurs when the values of two variables move in the same direction. In other words, an increase or decrease in one variable causes an increase or decrease in the other variable.
Examples:
When income falls, consumption also falls.
The sale of ice cream increases as the temperature increases.
2. Negative Correlation
A negative correlation occurs when the values of two variables move in opposite directions. In other words, an increase or decrease in one variable causes a decrease or increase in the other variable. When the variable xx increases, the variable yy decreases.
Examples:
When the price of mango increases, the demand for mango decreases.
As the price of mango drops, the demand for mango increases.
Formula:
Real Life Applications
in statistics, correlation is a measure of the linear relationship between two variables.The value for a correlation coefficient is always between -1 and 1 where:
-1 indicates a perfectly negative linear correlation between two variables
0 indicates no linear correlation between two variables
1 indicates a perfectly positive linear correlation between two variables
Solved Examples on Correlation
Q.1. Tom has started a new catering business, where he is first analysing the cost of making a sandwich and what price should he sell them. He has gathered the below information after talking to various other cooks.
Number of Sandwiches Cost of Bread Vegetable Total Cost 1010 100100 3030 130130 2020 200200 6060 260260 3030 300300 9090 390390 4040 400400 120120 520520 Tom was convinced that there is a positive linear relationship between the number of sandwiches and the total cost of making them. Analyse if this statement is true.
Sol: Plot the points between the number of sandwiches prepared versus the cost of making them.Observe that there is a positive relationship between them.
Q.2. Find the correlation coefficient between the data given below.
Roll No. Marks in Subject A Marks in Subject B 11 4848 4545 22 3535 2020 33 1717 4040 44 2323 2525 View more
Sol: Let the marks in subject AA be denoted by xx and that in subject BB by yy.
X
(X–34)
=x
x2
Y
(Y–35)
=y
y2
xy
48
14
196
45
10
100
140
35
1
1
20
−15
125
−15
17
−17
289
40
5
25
−85
23
−11
121
25
−10
100
110
View more
Mean, ∑x5=34∑x5=34
r=∑xy∑x2∑y2√r=∑xy∑x2∑y2
=280776√×550=280776×550
=280653.3=280653.3
∴r=0.429∴r=0.429
Since r=0.429r=0.429 it means that there is a moderate positive correlation between both the subjects AA and BB.
Q.3.Draw the scatter diagram showing a positive correlation.
Sol:
When the points in the graph are rising, moving from left to right, then the scatter plot shows a positive correlation.
Q.4. The following data gives the heights (in inches) of a father and his eldest son. Compute the correlation coefficient between the heights of fathers and sons using Karl Pearson's method.
Height of father
65
66
67
67
68
69
70
72
Height of son
67
68
65
68
72
72
69
71
Sol:
Let xx denote height of father and yy denote height of son. The data is on the ratio scale.
We use Karl Pearson's method.
r=∑ni=1xiyi–∑ni=1xi∑ni=1yin∑ni=1x2i–(∑ni=1xi)2√n∑ni=1y2i–(∑ni=1yi)2√r=∑i=1nxiyi–∑i=1nxi∑i=1nyin∑i=1nxi2–(∑i=1nxi)2n∑i=1nyi2–(∑i=1nyi)2
xi
yi
x2i
y2i
xiyi
65
67
4225
4489
4355
66
68
4356
4624
4488
67
65
4489
4225
4355
67
68
4489
4624
4556
68
72
4624
5184
4896
69
72
4761
5184
4968
70
69
4900
4761
4830
72
71
5184
5041
5112
544
552
37028
38132
37560
View less
r=8×37560–544×5528×37028–(544)2√8×38132–(552)2√=0.603r=8×37560–544×5528×37028–(544)28×38132–(552)2=0.603
Heights of father and son are positively correlated. It means that on the average, if fathers are tall then sons will probably tall and if fathers are short, probably sons may be short.
Q.5. Calculate the Spearman's rank correlation coefficient for the following data.
Candidates
1
2
3
4
5
Marks in Tamil
75
40
52
65
60
Marks in English
25
42
35
29
33
Sol:
Tamil
English
Di=R1i–R2i
D2i
Marks
Rank (R1i)
Marks
Rank (R2i)
75
1
25
5
−4
16
40
5
42
1
4
16
52
4
35
2
2
4
View more
∑i–1nD2i=40∑i–1nDi2=40 and n=5n=5
ρ=1–6∑ni=1D2in(n2–1)ρ=1–6∑i=1nDi2n(n2–1)
=1–6×405(52–1)=1–2405(24)=–1=1–6×405(52–1)=1–2405(24)=–1
Interpretation: This perfect negative rank correlation −1−1 indicates that scores in the subjects totally disagree. A student who is best in Tamil is the weakest in English and vice-versa.
conclusion:
Correlation defines the relationshipbetween two variables. Correlation expresses the direction and strength of the relationship. It says nothing about a cause-and-effect relationship between two or more variables. Correlation can be positive or negative, or it can be non-existent. Scatter plots, Karl Pearson's Coefficient of Correlation, and Spearman's Rank Correlation are the most commonly used methods for studying correlation. The degree of correlation is expressed by the value of rr. If r=+1r=+1, the variables are highly positively correlated. If the value is −1−1, the variables are highly correlated or have a perfect negative correlation, and if the value is 00, there is no correlation between the variables.
references:
Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN 9780273403159 (page 625)
^ Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605 (Page 331)