Lab 4

Author

Raphael Aseron

Lab 4

Use R to answer the following questions. Make sure that you show your R code and the output (result or graphs.) You can do this by taking a screenshot and pasting into your Google / Word Document, or (for students wanting an extra challenge) working in a Quarto document.

Problem 1. World Happiness Problems.

Download the “Country Happiness” data, and import these data into R. Note, this is a slightly different dataset than the “World Happiness Data” that were in Chapter 4 and you may have downloaded. Apologies for the confusion! Please trust professor has reasons and is doing his best :) thank you for being on this spinning earth with me.

h <- read.csv("~/Downloads/DATASET_happy_data.csv", stringsAsFactors=TRUE)

Choose one variable from the dataset. Graph the variable (make the graph look nice), report the relevant descriptive statistics, and describe what you learn about this variable below the graph. Then, conclude with some other questions you might have about the variable. Make sure to be transparent about any data cleaning that you did (e.g., removing outliers).

hist(h$Child.Mortality, main = "Child Mortality", xlab = "Number of Deaths of Children (age < 5), per 100 Live Births")

mean(h$Child.Mortality,na.rm = T)
[1] 2.378706
median(h$Child.Mortality,na.rm = T)
[1] 1.328307
sd(h$Child.Mortality,na.rm = T)
[1] 2.563054
range(h$Child.Mortality,na.rm = T)
[1]  0.2069039 11.4788685

Based on the graph and descriptive statistics, child mortallity is right skewed, which means that the number of deaths per 100 live birth is on the lower end of the range. The range is .21 and 11.48. There are possibly outliers which include the greater values in the range. The mean is 2.38, which tells us that the prediction of number of deaths of children is approximately 2 per 100 live births. The median is 1.33, which tells us the middle of the values that we collected. Mean is influenced by all values, even if the values are not that frequent in the data. The median is lower than mean because it’s accounting for all of the more frequently occurring smaller values.

h$Child.Mortality[h$Child.Mortality > 8] <- NA
hist(h$Child.Mortality, main = "Child Mortality", xlab = "Number of Deaths of Children (age < 5), per 100 Live Births")

mean(h$Child.Mortality,na.rm = T)
[1] 1.988164
median(h$Child.Mortality,na.rm = T)
[1] 1.284895
sd(h$Child.Mortality,na.rm = T)
[1] 1.921985
range(h$Child.Mortality,na.rm = T)
[1] 0.2069039 7.7902117

For child mortality under the age of 5 there are approximately 2 deaths per 100 live births.

Problem 2. More Problems.

Repeat the steps above for another variable in the same dataset.

hist(h$LifeExpectancy, main = "Life Expectancy of Different Countries", xlab = "Life expectancy for individuals who have reached age 0")

mean(h$LifeExpectancy, na.rm = T)
[1] 73.829
median(h$LifeExpectancy, na.rm = T)
[1] 75.008
sd(h$LifeExpectancy,na.rm = T)
[1] 7.247281
range(h$LifeExpectancy,na.rm = T)
[1] 54.462 84.712

Based on the graph and the descriptive statistics, the life expectancy for individuals are left-skewed. IThis means that the age of death for individuals are on the higher end of the range, in which life expectancy ranges between 54.46 to 84.71. In this graph, I already removed the outliers. The mean is 73, which tells us that the average of life expectancy of individuals around the world are very high The median is 75 which tells us that the typical life expectancy lies in the 70s

Problem 3. Quarto Problems.

Try to create a quarto document that contains your analyses from Problems 1 and Problems 2, and organizes your analyses in a nice looking report. We discussed Quarto in class, and there are lots of tutorials on how to use Quarto online that you can draw from. Make sure that you are loading the dataset into the Quarto document before rendering; I also find it easiest to render as an html and then export / print as a PDF. If you get stuck on this for more than 30 minutes, please stop working on the problem, explain what you tried and where you got stuck, and you’ll get full credit for this question.

  • Quarto is very straightforward for me, I didn’t really struggle with how to figure out how things worked. The only problem I had with it was putting in the code; it was easy but it took me a while to get used to this UI. I wouldn’t really use it personally because I just prefer R-script over this.

Problem 4. Get started on Milestone #2 (and writing the final project.)

Look at Milestone #2, and copy / adapt the Introduction Outline template. Share a link to this with your GSI (make sure anyone can VIEW the link, but do not allow others to edit!) and consider posting this on the Final Project Vision Board.

https://docs.google.com/document/d/1S_SaMyiGkrOWRlobitRpEemgGh0EhNmXbsrS1XydlQ8/edit?usp=sharing