--- title: "Lab 3 - Correlation, Regression, Scatterplots" author: "Muhammad Sami (SAMM3D2203)" date: "2023-02-18" output: pdf_document: default html_document: default --- ## Learning Objectives By the end of this lab, you should have a grasp on the following concepts: * How to create a scatterplot in `R` * How to calculate the sample correlation coefficient in `R` * How to modify the units of a vector * How to do linear regression in `R` * How to plot a linear regression line overtop of a scatterplot in `R` ## Instructions To complete this worksheet, add code as needed into the R code chunks given below. **Do not** delete the question text. All text should be in complete English sentences. Be sure to change the author of this file to reflect your name and student number. To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it to .pdf and upload your output to Crowdmark. \clearpage # Exercises Import the dataset `FHA`, which contains the Heights, Armlengths, and Footlengths of a sample of Grade 12 students across the United States. All measurements are done in centimeters. ```r FHA <- read.csv("C:/Users/msami/Downloads/FHA.csv") ``` Create a scatterplot comparing Height (X) to Armspan (Y), based on our `FHA` dataset. Use the `xlab`, `ylab`, and `main` arguments to set the x- and y-axes as well as the main title. ```r plot(FHA$Height,FHA$Armspan, xlab="Height(cm)", ylab="Armspan (cm)", main="Hieght vs Armspan") ``` Compare relationship between Footlength (X) and Armspan (Y) with a scatterplot. ```r plot(FHA$Footlength, FHA$Armspan) ``` Compare the previous two scatterplots. Do they both show a linear relationship? Is one relationship stronger than the other? If so, which one? *They both show linear relationship.However, "Armspan vs Height" has a stronger relation than "Footlength vs Armspan".* **Exercise: Create a scatterplot comparing the students' foot lengths (X) to their heights (Y). Give appropriate labels for the x- and y-axes.** ```r plot(FHA$Footlength, FHA$Armspan, xlab="Footlength", ylab="Height") ``` Calculate the correlation coefficient between Height and Armspan, as well as the correlation coefficient between Footlength and Armspan. ```r cor(FHA$Height, FHA$Armspan) ``` ``` ## [1] 0.7899087 ``` ```r cor(FHA$Footlength, FHA$Armspan) ``` ``` ## [1] 0.6132733 ``` **Exercise: Calculate the correlation between students' foot lengths (X) and heights (Y).** ```r cor(FHA$Footlength, FHA$Height) ``` ``` ## [1] 0.6857033 ``` **Exercise: Without using any functions, or making any calculations, what would be the correlation between Foot Lengths and Heights, if height were instead measured in inches? (Recall: 1 inch = 2.54cm).** *The correlation will be the same as the correlation is free of the unit.So changing the unit will not affect the correlation.* Find the least-squares regression equation for predicting Armspan (Y) from Height (X). ```r lm(FHA$Armspan~FHA$Height) ``` ``` ## ## Call: ## lm(formula = FHA$Armspan ~ FHA$Height) ## ## Coefficients: ## (Intercept) FHA$Height ## 0.1431 0.9954 ``` Find the least-squares regression equation for predicting Armspan from Footlength: ```r lm(FHA$Armspan~FHA$Footlength) ``` ``` ## ## Call: ## lm(formula = FHA$Armspan ~ FHA$Footlength) ## ## Coefficients: ## (Intercept) FHA$Footlength ## 92.254 3.109 ``` Save the linear models you found earlier as objects. ```r lm.HvA<-lm(FHA$Armspan~FHA$Height) lm.FvA<-lm(FHA$Armspan~FHA$Footlength) ``` Using the linear models saved as objects, print access the coefficients with the `$coefficients` suffix. ```r lm.HvA$coefficients ``` ``` ## (Intercept) FHA$Height ## 0.1431401 0.9953772 ``` ```r lm.FvA$coefficients ``` ``` ## (Intercept) FHA$Footlength ## 92.25401 3.10936 ``` **Exercise: Determine the least-squares regression equation (i.e., print out the coefficients of the equation) for predicting Height from Foot Length.** ```r lm.FvH<-lm(FHA$Height~FHA$Footlength) lm.FvH$coefficient ``` ``` ## (Intercept) FHA$Footlength ## 101.642512 2.758941 ``` Plot this regression line for predicting Armspan from Height overtop of its corresponding scatterplot with the function `abline`. Change the colour of the regression line with `col` and line width with `lwd`. ```r plot(FHA$Height, FHA$Armspan, xlab="Height(cm)", ylab="Armspan(cm)", main="Height vs Armspan") abline(lm.HvA, col="tomato2", lwd=4) ``` Do the same for comparing Footlength (X) to Armspan (Y). ```r plot(FHA$Footlength, FHA$Armspan, xlab="Footlenght(cm)", ylab="Armspan(cm)", main="Footlength vs Armspan") abline(lm.FvA, col="maroon", lwd=4) ``` **Exercise: Reproduce the scatterplot you created, comparing Footlengths (X) to Height (Y). Overlay the least-squared regression line. Thicken the line and change the colour, to improve visibility.** ```r plot(FHA$Footlength, FHA$Height, xlab="Footlength(cm)", ylab="Height(cm)", main="Footlength vs Height") abline(lm.FvH, col="pink", lwd=4) ```