STA 111 Lab 3

Complete all Questions, and submit final documents in PDF form on Canvas.

Goal

In the last lab, we focused on visualizations as well as summary statistics for one numeric variable. Today, we are going to use visualizations and numeric measures to describe the relationship between two numeric variables. The primary visualization we will use is a scatter plot, and the primary numeric measure we will use is called correlation. We will also see how to create a least squares linear regression line. This is another tool we use to describe the relationship between two numeric variables.

The Data

We are going to continue to work on the data set on student stress and sleep from our last lab. Recall that the data were collected from \(n=1442\) university students who wore a Fit Bit 3 device that recorded information on student sleep, stress, and motion.

The data set can be found on Canvas and is also linked here: https://www.dropbox.com/scl/fi/vj2cr3xdjo4gces1273d1/StressStudy.csv?rlkey=napepcy0dpo56ysw4xpeskpkd&st=9a1o84vi&dl=0

As we did in Lab 2, we are going to use StatKey to do our analysis. However, we will use a different StatKey tool today:https://www.lock5stat.com/StatKey/descriptive_2_quant/descriptive_2_quant.html

Graphing the Relationship between two numeric variables

Suppose a counselor from a University Counseling Center ask you if the quality of sleep a student gets each night has a strong relationship with the Fit Bit recorded stress score. The client is specifically interested in having you use a linear regression model to describe the relationship between X = sleep score and Y = stress score. Remember that response variables are usually denoted by a capital letter Y, and explanatory variables are usually denoted by a capital letter X.

To complete this task, we need to

  1. Determine if using linear regression is appropriate and
  2. If so, build and interpret a linear regression line.

We need to start with (1), because it will not make sense to use linear regression if the relationship between X and Y is not the right shape!

Question 1

What shape does the relationship between X and Y need to be in order for us to reasonably use linear regression?

In order to check the shape, we need to create a visualization that explores the relationship between sleep score and stress score.

Question 3

What type of plot would you use to determine the shape of the relationship between X = sleep score bats and Y = stress score?

Now that we know what type of plot we need, let’s make it! Once you upload your data, you should see this on your screen:

Click the column for sleep score first, and then the column for stress. The order you click tells the applet which variable is X and which is Y.

Question 3

Take a screen shot of your plot and include it here as your answer to this question.

Creating the plot is an excellent first step to answering the client’s question, but once we have created the plot we then have to be able to interpret what this plot is telling us. Generally, when we are comparing two numeric variables using this type of plot, we are interested in four things:

  1. Does the relationship seem linear?
  2. Is the relationship positive, negative, or neither?
  3. Are there any points that seem not to follow the trend of the rest of the data? In other words, are there any visually evident outliers?
  4. Does the relationship between these variables seem to be strong or weak?

There is a reason why we are interested in these specific questions. The first question (Does the relationship seem linear?) tells us whether or not using a regression line makes sense for these two variables. All models have assumptions. A big one for regression lines is that the relationship between the two variables we are modeling should be able to be described using a line. In other words, the relationship should look linear. If it does not, we should not use a regression line.

Question 4

Does it seem reasonable to consider a regression line to describe the relationship between X = sleep score and Y= stress score? Explain.

The second question (Is the relationship positive, negative, or neither?) tells us what sort of slope to expect when we fit a regression line to the data. Do we expect a positive slope? A negative one? This intuition will allows us to check our line once we fit it. Do the results of the line make sense with what the plot is telling us?

Question 5

Is the relationship positive or negative?

The third question (Are there any points that seem not to follow the trend of the rest of the data?) helps with identifying outliers that can strongly impact the fit of a line.

Question 6

Do you see any visually evident outliers? If so, state the coordinates of the outlier(s). Hint: you can get the coordinates by hovering your mouse over a point.

NOTE: There is not a specific correct answer to this question. We just want to see how you are thinking!

The final question asks about the strength of the relationship between the two variables. This is important because a regression line like the client has asked us for uses X to describe the variation in Y. In other words, we are hoping to be able to say something like as X goes up by 1, Y generally goes up by 2.5. If the relationship between X and Y is not very strong, than the line may not be a good tool for describing the relationship.

It can be difficult to assess the strength of a linear relationship just by looking at a plot. Luckily, we have learned that a numerical way to quantify the strength of a relationship is with the correlation.

Question 7

What is the correlation between X = sleep score and Y= stress score? Does this suggest (a) a strong relationship, (b) a weak relationship, or (c) no relationship?

Hint: The information you need is to the right hand side of your plot.

Based on all of this, regression seems reasonable! We do not need a strong fit to use regression, we just need a line to look like a reasonable choice.

Fitting the regression line

Once we have determined that regression is an appropriate choice, we can move into the actual process of estimating our line. The question then becomes…which line.

Question 8

Write down the regression line.

Hint: The information you need is to the right hand side of your plot.

Question 9

Think back to your plot. Does it make sense that the slope coefficient is positive? Why or why not?

Question 10

Interpret the slope in the context of the data.

Hint: The units for both X and Y are points.

Question 11

Do you think the slope represents a practically important change in stress score? In other words, is this change large enough to make a sizable impact on student stress? Explain your reasoning.

Question 12

Is the intercept an extrapolation in this case? Why or why not?

Making Predictions

Question 12

Using our line, what stress score would you predict for a student with a sleep score of 52.

Question 13

Using the plot, there is only one student with a sleep score of 52. What is the stress score for that student?

Question 14

  1. What is the residual for this student with a sleep score of 52? Show your work.

  2. Interpret the residual from (a).

Question 15

Recall that we have a client who works in a counseling center and is interested in the relationship between student sleep and stress. Based on these data, what would you tell them? Make sure to comment on how confident you are in the conclusions based on the data that we have and the analysis we have run.

Creative Commons License
This work was created by Nicole Dalzell is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Last updated 2026 June 6.

The data set used in this lab is from:

“A Dataset of University Students’ Stress and Anxiety Levels based on Questionnaires and Wearable Sensors”, Enrique Garcia-Ceja, Joanna Alvarado-Uribe, Ponciano Jorge Escamilla-Ambrosio, Adriana Lara, Alma Mena-Martinez, Gina Gallegos-Garcia, Miguel Gonzalez-Mendoza, Raul Monroy, Gilberto Martinez Luna, Juan Manuel Fernández-Cárdenas (2026). .