correlation

Author

sai

Abstract:

In statisticscorrelation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, “correlation” may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demandcurve.

Introduction:

Correlation means connection. Correlation analysis studies the relationship or connection between two or more variables.Two variables are said to be correlated if they differ in such a way that changes in one variable accompany changes in the other.

Example: The relationship between a student's height and weight in a class, as well as the relationship between a family's earnings and the amount spent each month.

Correlation is a statistical tool used to establish the relationship between two or more variables. It defines the relationship between two variables.

Example: As summer approaches, the heat rises, and atmospheric temperature increases. So, people tend to travel to hill stations to enjoy the cold weather. Hence, the hill stations get crowded. Similarly, we can observe that the sales of ice creams, cool drinks, and fruits like watermelon increased during this period.

What does Correlation Measure?

Correlation is a means of systematically examining such relationships or associations.Although correlation measures the direction and degree of correlation, it does not say anything about the cause-and-effect relationship between two or more variables.

Example: We know that the demand for a commodity and its price are closely related. But, we cannot say exactly whether demand is causing the changes in price or whether it is the changes in price that is causing the demand.

Even though the cause-and-effect relationship cannot be established, we can conclude that the two variables, demand and price, are correlated.

Thus, correlation does notestablish the causation, cause, and effect in a relationship.

Types of Correlation

There are three types of correlations between the two variables.

  • Positive Correlation

  • Negative Correlation

  • No correlation

  • 1. Positive Correlation

A positive correlation occurs when the values of two variables move in the same direction. In other words, an increase or decrease in one variable causes an increase or decrease in the other variable.

Examples:

  • When income falls, consumption also falls.

  • The sale of ice cream increases as the temperature increases.

    2. Negative Correlation 

  • A negative correlation occurs when the values of two variables move in opposite directions. In other words, an increase or decrease in one variable causes a decrease or increase in the other variable. When the variable xx increases, the variable yy decreases.

    Examples:

    • When the price of mango increases, the demand for mango decreases.

    • As the price of mango drops, the demand for mango increases.

    • Formula:

    • Real Life Applications

      in statistics, correlation is a measure of the linear relationship between two variables.The value for a correlation coefficient is always between -1 and 1 where:

      • -1 indicates a perfectly negative linear correlation between two variables

      • 0 indicates no linear correlation between two variables

      • 1 indicates a perfectly positive linear correlation between two variables

        Solved Examples on Correlation

        Q.1. Tom has started a new catering business, where he is first analysing the cost of making a sandwich and what price should he sell them. He has gathered the below information after talking to various other cooks.

        Number of Sandwiches Cost of Bread Vegetable Total Cost
        1010 100100 3030 130130
        2020 200200 6060 260260
        3030 300300 9090 390390
        4040 400400 120120 520520

        Tom was convinced that there is a positive linear relationship between the number of sandwiches and the total cost of making them. Analyse if this statement is true.
        Sol:
         Plot the points between the number of sandwiches prepared versus the cost of making them.

        Observe that there is a positive relationship between them.

      • Q.2. Find the correlation coefficient between the data given below.

        Roll No. Marks in Subject A Marks in Subject B
        11 4848 4545
        22 3535 2020
        33 1717 4040
        44 2323 2525

        View more

        Sol: Let the marks in subject AA be denoted by xx and that in subject BB by yy.

        X

        (X–34)

        =x

        x2

        Y

        (Y–35)

        =y

        y2

        xy

        48

        14

        196

        45

        10

        100

        140

        35

        1

        1

        20

        −15

        125

        −15

        17

        −17

        289

        40

        5

        25

        −85

        23

        −11

        121

        25

        −10

        100

        110

        View more

        Mean, ∑x5=34∑x5=34

        r=∑xy∑x2∑y2√r=∑xy∑x2∑y2

        =280776√×550=280776×550

        =280653.3=280653.3

        ∴r=0.429∴r=0.429

        Since r=0.429r=0.429 it means that there is a moderate positive correlation between both the subjects AA and BB.

        Q.3.Draw the scatter diagram showing a positive correlation.
        Sol:

        When the points in the graph are rising, moving from left to right, then the scatter plot shows a positive correlation.

      • Q.4. The following data gives the heights (in inches) of a father and his eldest son. Compute the correlation coefficient between the heights of fathers and sons using Karl Pearson's method.

        Height of father

        65

        66

        67

        67

        68

        69

        70

        72

        Height of son

        67

        68

        65

        68

        72

        72

        69

        71

        Sol:

        Let xx denote height of father and yy denote height of son. The data is on the ratio scale.

        We use Karl Pearson's method.

        r=∑ni=1xiyi–∑ni=1xi∑ni=1yin∑ni=1x2i–(∑ni=1xi)2√n∑ni=1y2i–(∑ni=1yi)2√r=∑i=1nxiyi–∑i=1nxi∑i=1nyin∑i=1nxi2–(∑i=1nxi)2n∑i=1nyi2–(∑i=1nyi)2

        xi

        yi

        x2i

        y2i

        xiyi

        65

        67

        4225

        4489

        4355

        66

        68

        4356

        4624

        4488

        67

        65

        4489

        4225

        4355

        67

        68

        4489

        4624

        4556

        68

        72

        4624

        5184

        4896

        69

        72

        4761

        5184

        4968

        70

        69

        4900

        4761

        4830

        72

        71

        5184

        5041

        5112

        544

        552

        37028

        38132

        37560

        View less

        r=8×37560–544×5528×37028–(544)2√8×38132–(552)2√=0.603r=8×37560–544×5528×37028–(544)28×38132–(552)2=0.603

        Heights of father and son are positively correlated. It means that on the average, if fathers are tall then sons will probably tall and if fathers are short, probably sons may be short.

        Q.5. Calculate the Spearman's rank correlation coefficient for the following data.

        Candidates

        1

        2

        3

        4

        5

        Marks in Tamil

        75

        40

        52

        65

        60

        Marks in English

        25

        42

        35

        29

        33

        Sol:

        Tamil

        English

        Di=R1i–R2i

        D2i

        Marks

        Rank (R1i)

        Marks

        Rank (R2i)

        75

        1

        25

        5

        −4

        16

        40

        5

        42

        1

        4

        16

        52

        4

        35

        2

        2

        4

        View more

        ∑i–1nD2i=40∑i–1nDi2=40 and n=5n=5

        ρ=1–6∑ni=1D2in(n2–1)ρ=1–6∑i=1nDi2n(n2–1)

        =1–6×405(52–1)=1–2405(24)=–1=1–6×405(52–1)=1–2405(24)=–1

        Interpretation: This perfect negative rank correlation −1−1 indicates that scores in the subjects totally disagree. A student who is best in Tamil is the weakest in English and vice-versa.

        conclusion:

        Correlation defines the relationshipbetween two variables. Correlation expresses the direction and strength of the relationship. It says nothing about a cause-and-effect relationship between two or more variables. Correlation can be positive or negative, or it can be non-existent. Scatter plots, Karl Pearson's Coefficient of Correlation, and Spearman's Rank Correlation are the most commonly used methods for studying correlation. The degree of correlation is expressed by the value of rr. If r=+1r=+1, the variables are highly positively correlated. If the value is −1−1, the variables are highly correlated or have a perfect negative correlation, and if the value is 00, there is no correlation between the variables.

        references:

        1. Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN 9780273403159 (page 625)

        2. ^ Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605 (Page 331)