Topic 2: Descriptive Statistics

🎧 Online students

Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. Each emoji has a particular meaning and will sometimes be associated with additional instructions:

Prompts for you

💬 Write your answer in the chat.

Modes at different times during the lab

🏡 Main room. All together in the main room – your computer lab demonstrator will be presenting information or facilitating class discussion

💡 Breakout rooms. Person with birthday closest to (your computer lab demonstrator will pick a random date) shares their screen or whiteboard. Here you will discuss a question together and bring your group’s answer back to the main room.

💻 Focus mode. You will still be in the main room, but working independently. All students will be sharing screen during this time so that your computer lab demonstrator (but not other students) can see your screen.

🏫 Face-to-face (blended) students

Throughout the computer lab question sheets, you will see emojis and/or collapsible sections like this one. You can ignore the emojis and collapsible sections, as they contain information relevant to students who are studying online.

In Topic 2, we focused on descriptive statistics. In this computer lab, we will practise describing data using numerical and graphical measures using real data.

After working through the questions in this computer lab, you will be ready to complete Quiz 3. If you have time during today’s lab, you may like to work on the quiz.

1 Descriptive statistics and plots for a single variable

For this question, we will be assessing happiness data published in the World Happiness Report (WHR) of the Sustainable Development Solutions Network (Gapminder 2021a).

This data has been collected between the years 2005 and 2019 for 163 countries. Each year, a happiness score from 0 to 10 is assigned to each country, based on the national average response to the Cantril life ladder question, summarised here:

Imagine a ladder with rungs from 0, being the worst possible life for you, up to 10, representing the best possible life for you.
On which rung do you feel you stand at this point in time?

For convenience, this score has been converted from 0 to 100 in our data set. Let’s take a look at the results for Australia over the years by looking at the following R output:

##     country happiness_2005 happiness_2006 happiness_2007 happiness_2008
## 6 Australia           73.4             NA           72.9           72.5
##   happiness_2009 happiness_2010 happiness_2011 happiness_2012 happiness_2013
## 6             NA           74.5           74.1             72           73.6
##   happiness_2014 happiness_2015 happiness_2016 happiness_2017 happiness_2018
## 6           72.9           73.1           72.5           72.6           71.8
##   happiness_2019
## 6           72.2

We can see that in 2019, Australia’s happiness score was 72.2. You’ll notice that the happiness score seems fairly consistent across the years, although there are some years with missing data, denoted by NA.

Suppose that we want to assess the 2019 happiness scores for countries around the world, and compare them to Australia’s score.

1.1

💻 The happiness_income_2019.csv file in the Week 3 tile in LMS contains data on the 163 surveyed countries for 2019. Download this file now, and save it in a relevant location on your PC.

Once you have done so, import the happiness_income_2019.csv file in jamovi. For revision on how to do this, see Computer Lab 1.

1.2

💻 In jamovi, create a descriptives table for the happiness_2019 variable that includes the following:

N
Missing
Mean 💬
Median 💬
Standard deviation
Variance
IQR 💬
Skewness
Minimum 💬
Maximum 💬
Quartiles (i.e. 25th, 50th and 75th percentiles) 💬

1.3

🏡

What is the mean happiness score, and how does this compare to Australia’s happiness score?
What is the median? Can you explain what this result means?
Considering the minimum, maximum, as well as the \(25\%\), \(50\%\) and \(75\%\) quantiles, what do these values tell you about the spread of the data? Which quartile does Australia lie in?
What is the IQR? What can you infer from this value?
Add one more number to the ‘Percentiles’ box to try and find a percentile, by trial and error, which corresponds to Australia’s happiness score in 2019. Are you surprised by the result?

1.4

💻 In addition to the happiness scores, the happiness_income_2019.csv data set also contains the average income per person for each country - i.e. the GDP (gross domestic product) per person, adjusted for purchasing power differences (Gapminder 2021b).

So far, we have found that happiness scores differ quite markedly between countries, with Australia enjoying a higher average level of happiness than most other countries in 2019. Suppose that we are now interested in determining whether or not there is a relationship between a country’s GDP per person and its citizens’ average happiness.

Add the income_2019 variable to the table you created in Question 1.2, so that the same list of descriptive statistics are now displayed for both variables in the same table.

1.5

💻 Create the following plots for both the happiness_2019 and income_2019 variables:

Histogram (without density)
Boxplot with violin plot added (but without data and mean)

1.6

💻 Considering both variables one at a time, answer the following question: Based on what you can tell about how skewed vs. symmetric the data is, what measure of spread and what measure of location do you think would be best to use? Make sure to provide a justification for your choices. 💬

2 Descriptive statistics and plots to assess the relationship between two variables

2.1

🏡 Although jamovi does not give us the option to calculate the covariance, consider the below output from R, which is the covariance between the income and happiness variables:

## [1] 157188

What does this result tell us about the relationship between these two variables? 💬

2.2

💻 While the covariance value is helpful, it is hard to interpret. Typically, it is more beneficial to calculate the correlation coefficient for two variables. Using jamovi, calculate the correlation coefficient for the GDP per capita and the happiness score for countries in 2019.

What is the correlation between the two variables? 💬 Does the result seem reasonable? How would you describe the correlation in terms of strength?

Note: There is actually more than one way to calculate the correlation, depending on what type of data we are assessing. We will discuss this later on in the semester, but for now it is sufficient to use the default method in the Pearson correlation coefficient.

2.3

💻 To help visualise the data, create a scatter plot of the 2019 variables using jamovi. Make sure to consider which axis each variable should be plotted on.

Are you surprised by anything shown in the graph?

Hints: If you are not sure which variable should be used for each axis, remember that the variable listed on the \(y\)-axis is (generally speaking) reliant to some extent upon the variable listed on the \(x\)-axis.

Since we are assessing income and happiness, do you think it would be more reasonable to say income is reliant on happiness, or that happiness is reliant on income? (Of course it is more nuanced than this, but for the purposes of this question, these are the only two variables under consideration).

If you are still not quite sure, or would like to check if you are on the right track, you can also refer back to the Topic 2 section on scatter plots .

3 Extension

If you have completed all the questions above, and would like some extra practice, try the following questions:

3.1

Download the happiness_income_2015.csv file from LMS, save it in a relevant location on your PC, and load it into jamovi.

3.2

Repeat Question 1, for the 2015 data. Are your results similar to those for 2019?

3.3

Repeat Question 2, for the 2015 data. Are your results similar to those for 2019?

That’s everything for today! If you still have time, you may like to have a go at Quiz 3, which is based on the Topic 3 readings.

Before you finish up, remember to save your work (e.g. your jamovi and Word files) somewhere safe (e.g. OneDrive) so that you can access it at a later time.

References

Gapminder. 2021a. “Happiness Score (WHR) [.csv File].” 2021. http://gapm.io/dhapiscore\_whr.

———. 2021b. “Income Per Person [.csv File].” 2021. http://gapm.io/dgdppc.

These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

STM1001: Computer Lab 3 (jamovi)