STA 111 Lab 2: Data Visualization

Complete all Questions and submit final PDF or html under Assignments in Canvas.

Submission Set Up

Create a Google Doc or Microsoft Word document for your responses. You will be answering each of the Questions in the boxes below. When you are done, submit your document as a PDF!

The Data Set

We have been learning about visualizing and describing data. Today, we are going to put that into practice and see how we can create the visualizations we have been talking about!

The first thing we need to start an analysis is raw data. Our data set for today comes from \(n=1442\) rows of data from university students who wore a FitBit 3 device that recorded information on student sleep, stress, and motion.

The data set can be found on Canvas and is also linked here: https://www.dropbox.com/scl/fi/vj2cr3xdjo4gces1273d1/StressStudy.csv?rlkey=napepcy0dpo56ysw4xpeskpkd&st=9a1o84vi&dl=0

Question 1

How many variables are in this data set? Hint: You can answer this by clicking on the link above and looking at the raw data sheet

Question 2

Classify the variables in the data set as either numeric or categorical.

Question 3

What do you think each row in the data set represents?

So far, we have answered all the lab questions just by looking at the data set ourselves. For the rest of the lab, we will need to use computing to help us analyze the data set. At this point, you will need to download this data set onto your computer in order to complete the rest of the lab.

Getting Started with StatKey

Once we have the data, we need to use some sort of statistical software to help us create visualizations of the data. For today, we are going to use a free online tool called StatKey.

To access the tool, go to the following website: https://www.lock5stat.com/StatKey/descriptive_1_quant/descriptive_1_quant.html

You should see something that looks like this:

None of this is useful to us at this point, because it has nothing to do with our data on student stress, sleep, and motion. Luckily, we can upload our data set into the tool. To do this, look above the plot and find “Upload File”.

Navigate to the StressStudy.csv data set you have downloaded onto your computer and upload it! For most of you, the data set is likely in your downloads folder. If you need help, let me know!

Once you have successfully uploaded the file, you should see something like this:

At this point, you are ready to begin!! We will use StatKey again in other labs, and the same steps will be used to upload data sets.

Stress Score

One variable in the data set is a stress score. This is a score measured by the FitBit device each day for each study. Our first research question: What does the distribution of stress scores look like?

We are now interested in looking at one specific column. To choose that column in the data set, just click on stress_score in the data set and then click OK.

You will now be at a screen the gives you 3 options for plots: dot plot, histogram, box plot. What is showing on your screen is a dot plot.

Question 4

Typically, we make a histogram or a dot plot, not both. In this case, a histogram makes more sense. Why is that?

Since a histogram is what we want, change the plot so you are looking at a histogram of stress. Histograms allow us to look at the distribution of a numeric variable. Remember that a distribution just means what values are possible for a particular variable and how often those different values occur.

Question 5

How many bins (buckets) are in the histogram when you first look at it?

Question 6

How many stress scores are between 72 and 76? Hint: The tool can give you this information! Let me know if you get stuck finding it.

Question 7

Which bin has the most stress scores in it?

Question 8

One of the things we have to do with a histogram is decide on how many bins (buckets) we want to use. The goal is to use enough so that we can see the distribution, but not so many that it is hard to read.

Play with the histogram and decide whether you would recommend (a) one bin, (b) 4 bins, (c) 19 bins, or (d) 40 bins. Explain your choice.

Note: There is no specific right answer to this, we want to see how you are thinking!

Question 9

Based on your choice in Question 8, describe the distribution of stress scores. Remember, there are two things we comment on. Make sure your answer includes both!

Question 10

Which measure of center would you use to describe stress scores? Explain your choice and state the numeric value of your chosen measure of center.

Question 11

Which measure of center would you use to describe stress scores: IQR or standard deviation? Explain your choice, and state AND interpret the numeric value of your chosen measure of spread.

Box plots

Histograms are very useful, but they are not the only tool that we use to visualize the distribution of a numeric variable. Another tool we use is a boxplot, which visualizes the center and spread of a distribution quite differently from a histogram. Specifically boxplots show the first quartile, median, and third quartile of a variable. Box plots also make it easier to see outliers, i.e., unusually large or small values of the variable.

Question 12

Which measure of center is depicted in a boxplot: the mean or the median?

Question 13

Change to the Box plot tab and create a box plot for stress. Based on the boxplot, are there any outliers? If so, is there one outlier, just a handful of outliers, or many outliers? State whether these outliers are abnormally large, abnormally small, or if both types of outliers are present.

Question 14

Create a box plot for total number of steps. Based on the boxplot, are there any outliers? If so, state the stress scores of these outliers.

Hint: You can hover over them to get the exact values!

Question 15

What is the IQR of stress in this data set? What does this tell us in words?

Question 16

We have now seen two different visualizations of the distribution of height. What pieces of information about the distribution of height is provided in the histogram but not the boxplot, and vice versa?

Practical Exercise

What we have done so far is a step by step walk through an analysis. This is useful for helping us practice concepts and review what we have done thus far. However, when we do this in practice, the process is less structured.

Question 17

Choose either Total Steps or Sleep Score as your variable of interest. Briefly explain why you found this variable more interesting.

Question 18

Take a screen shot of the histogram and box plot of your chosen variable. Using these visuals, describe the distribution of your chosen variable. Explain to an interested student what your analysis suggests about the typical value of your chosen variable, as well as how the values spread out around that typical value.

Submitting

  • Make sure your name is on the document, along with the name of your partner if you worked with a partner.
  • Make sure you run spell check.
  • Convert your document to a PDF and submit on Canvas!

References

Creative Commons License
This work was created by Nicole Dalzell is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Last updated 2026 June 5.

The data set used in this lab is from:

“A Dataset of University Students’ Stress and Anxiety Levels based on Questionnaires and Wearable Sensors”, Enrique Garcia-Ceja, Joanna Alvarado-Uribe, Ponciano Jorge Escamilla-Ambrosio, Adriana Lara, Alma Mena-Martinez, Gina Gallegos-Garcia, Miguel Gonzalez-Mendoza, Raul Monroy, Gilberto Martinez Luna, Juan Manuel Fernández-Cárdenas (2026). .