Coding Temple R-Programming Challenge: World Happiness Dataset

1. Setting up the Environment

Import necessary libraries and load the dataset:

library(tidyverse)
library(lubridate)
library(skimr)
library(dplyr)
whr <- read.csv("data/WHR2023.csv")

2. Initial Exploration of the Dataset

Explore the structure and summary of the dataset:

str(whr)  # View the structure of the dataset

## 'data.frame':    137 obs. of  19 variables:
##  $ Country.name                              : chr  "Finland" "Denmark" "Iceland" "Israel" ...
##  $ Ladder.score                              : num  7.8 7.59 7.53 7.47 7.4 ...
##  $ Standard.error.of.ladder.score            : num  0.036 0.041 0.049 0.032 0.029 0.037 0.044 0.043 0.069 0.038 ...
##  $ upperwhisker                              : num  7.88 7.67 7.62 7.54 7.46 ...
##  $ lowerwhisker                              : num  7.73 7.51 7.43 7.41 7.35 ...
##  $ Logged.GDP.per.capita                     : num  10.8 11 10.9 10.6 10.9 ...
##  $ Social.support                            : num  0.969 0.954 0.983 0.943 0.93 0.939 0.943 0.92 0.879 0.952 ...
##  $ Healthy.life.expectancy                   : num  71.2 71.2 72 72.7 71.5 ...
##  $ Freedom.to.make.life.choices              : num  0.961 0.934 0.936 0.809 0.887 0.948 0.947 0.891 0.915 0.887 ...
##  $ Generosity                                : num  -0.019 0.134 0.211 -0.023 0.213 0.165 0.141 0.027 0.024 0.175 ...
##  $ Perceptions.of.corruption                 : num  0.182 0.196 0.668 0.708 0.379 0.202 0.283 0.266 0.345 0.271 ...
##  $ Ladder.score.in.Dystopia                  : num  1.78 1.78 1.78 1.78 1.78 ...
##  $ Explained.by..Log.GDP.per.capita          : num  1.89 1.95 1.93 1.83 1.94 ...
##  $ Explained.by..Social.support              : num  1.58 1.55 1.62 1.52 1.49 ...
##  $ Explained.by..Healthy.life.expectancy     : num  0.535 0.537 0.559 0.577 0.545 0.562 0.544 0.582 0.549 0.513 ...
##  $ Explained.by..Freedom.to.make.life.choices: num  0.772 0.734 0.738 0.569 0.672 0.754 0.752 0.678 0.71 0.672 ...
##  $ Explained.by..Generosity                  : num  0.126 0.208 0.25 0.124 0.251 0.225 0.212 0.151 0.149 0.23 ...
##  $ Explained.by..Perceptions.of.corruption   : num  0.535 0.525 0.187 0.158 0.394 0.52 0.463 0.475 0.418 0.471 ...
##  $ Dystopia...residual                       : num  2.36 2.08 2.25 2.69 2.11 ...

head(whr)  # Preview the first few rows

summary(whr)  # Summary statistics of the dataset

##  Country.name        Ladder.score   Standard.error.of.ladder.score
##  Length:137         Min.   :1.859   Min.   :0.02900               
##  Class :character   1st Qu.:4.724   1st Qu.:0.04700               
##  Mode  :character   Median :5.684   Median :0.06000               
##                     Mean   :5.540   Mean   :0.06472               
##                     3rd Qu.:6.334   3rd Qu.:0.07700               
##                     Max.   :7.804   Max.   :0.14700               
##                                                                   
##   upperwhisker    lowerwhisker   Logged.GDP.per.capita Social.support  
##  Min.   :1.923   Min.   :1.795   Min.   : 5.527        Min.   :0.3410  
##  1st Qu.:4.980   1st Qu.:4.496   1st Qu.: 8.591        1st Qu.:0.7220  
##  Median :5.797   Median :5.529   Median : 9.567        Median :0.8270  
##  Mean   :5.667   Mean   :5.413   Mean   : 9.450        Mean   :0.7991  
##  3rd Qu.:6.441   3rd Qu.:6.243   3rd Qu.:10.540        3rd Qu.:0.8960  
##  Max.   :7.875   Max.   :7.733   Max.   :11.660        Max.   :0.9830  
##                                                                        
##  Healthy.life.expectancy Freedom.to.make.life.choices   Generosity      
##  Min.   :51.53           Min.   :0.3820               Min.   :-0.25400  
##  1st Qu.:60.65           1st Qu.:0.7240               1st Qu.:-0.07400  
##  Median :65.84           Median :0.8010               Median : 0.00100  
##  Mean   :64.97           Mean   :0.7874               Mean   : 0.02243  
##  3rd Qu.:69.41           3rd Qu.:0.8740               3rd Qu.: 0.11700  
##  Max.   :77.28           Max.   :0.9610               Max.   : 0.53100  
##  NA's   :1                                                              
##  Perceptions.of.corruption Ladder.score.in.Dystopia
##  Min.   :0.1460            Min.   :1.778           
##  1st Qu.:0.6680            1st Qu.:1.778           
##  Median :0.7740            Median :1.778           
##  Mean   :0.7254            Mean   :1.778           
##  3rd Qu.:0.8460            3rd Qu.:1.778           
##  Max.   :0.9290            Max.   :1.778           
##                                                    
##  Explained.by..Log.GDP.per.capita Explained.by..Social.support
##  Min.   :0.000                    Min.   :0.000               
##  1st Qu.:1.099                    1st Qu.:0.962               
##  Median :1.449                    Median :1.227               
##  Mean   :1.407                    Mean   :1.156               
##  3rd Qu.:1.798                    3rd Qu.:1.401               
##  Max.   :2.200                    Max.   :1.620               
##                                                               
##  Explained.by..Healthy.life.expectancy
##  Min.   :0.0000                       
##  1st Qu.:0.2485                       
##  Median :0.3895                       
##  Mean   :0.3662                       
##  3rd Qu.:0.4875                       
##  Max.   :0.7020                       
##  NA's   :1                            
##  Explained.by..Freedom.to.make.life.choices Explained.by..Generosity
##  Min.   :0.000                              Min.   :0.0000          
##  1st Qu.:0.455                              1st Qu.:0.0970          
##  Median :0.557                              Median :0.1370          
##  Mean   :0.540                              Mean   :0.1485          
##  3rd Qu.:0.656                              3rd Qu.:0.1990          
##  Max.   :0.772                              Max.   :0.4220          
##                                                                     
##  Explained.by..Perceptions.of.corruption Dystopia...residual
##  Min.   :0.0000                          Min.   :-0.110     
##  1st Qu.:0.0600                          1st Qu.: 1.555     
##  Median :0.1110                          Median : 1.849     
##  Mean   :0.1459                          Mean   : 1.778     
##  3rd Qu.:0.1870                          3rd Qu.: 2.079     
##  Max.   :0.5610                          Max.   : 2.955     
##                                          NA's   :1

skim(whr)  # Detailed skim of the dataset

Data summary
Name	whr
Number of rows	137
Number of columns	19
_______________________
Column type frequency:
character	1
numeric	18
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Country.name	0	1	4	25	0	137	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Ladder.score	0	1.00	5.54	1.14	1.86	4.72	5.68	6.33	7.80	▁▂▆▇▃
Standard.error.of.ladder.score	0	1.00	0.06	0.02	0.03	0.05	0.06	0.08	0.15	▆▇▃▁▁
upperwhisker	0	1.00	5.67	1.12	1.92	4.98	5.80	6.44	7.88	▁▂▆▇▃
lowerwhisker	0	1.00	5.41	1.16	1.79	4.50	5.53	6.24	7.73	▁▂▆▇▃
Logged.GDP.per.capita	0	1.00	9.45	1.21	5.53	8.59	9.57	10.54	11.66	▁▃▆▇▆
Social.support	0	1.00	0.80	0.13	0.34	0.72	0.83	0.90	0.98	▁▂▃▆▇
Healthy.life.expectancy	1	0.99	64.97	5.75	51.53	60.65	65.84	69.41	77.28	▃▃▇▇▂
Freedom.to.make.life.choices	0	1.00	0.79	0.11	0.38	0.72	0.80	0.87	0.96	▁▁▃▇▇
Generosity	0	1.00	0.02	0.14	-0.25	-0.07	0.00	0.12	0.53	▃▇▅▁▁
Perceptions.of.corruption	0	1.00	0.73	0.18	0.15	0.67	0.77	0.85	0.93	▁▁▁▅▇
Ladder.score.in.Dystopia	0	1.00	1.78	0.00	1.78	1.78	1.78	1.78	1.78	▁▁▇▁▁
Explained.by..Log.GDP.per.capita	0	1.00	1.41	0.43	0.00	1.10	1.45	1.80	2.20	▁▃▆▇▆
Explained.by..Social.support	0	1.00	1.16	0.33	0.00	0.96	1.23	1.40	1.62	▁▂▃▆▇
Explained.by..Healthy.life.expectancy	1	0.99	0.37	0.16	0.00	0.25	0.39	0.49	0.70	▃▃▇▇▂
Explained.by..Freedom.to.make.life.choices	0	1.00	0.54	0.15	0.00	0.46	0.56	0.66	0.77	▁▁▃▇▇
Explained.by..Generosity	0	1.00	0.15	0.08	0.00	0.10	0.14	0.20	0.42	▃▇▅▁▁
Explained.by..Perceptions.of.corruption	0	1.00	0.15	0.13	0.00	0.06	0.11	0.19	0.56	▇▅▁▁▁
Dystopia…residual	1	0.99	1.78	0.50	-0.11	1.56	1.85	2.08	2.96	▁▂▅▇▂

Check for missing data:

missing_data <- colSums(is.na(whr))
missing_percent <- (missing_data/nrow(whr))*100
missing_df <- data.frame(
    variable = names(missing_data),
    missing_percent = missing_percent
)

Visualize missing data:

ggplot(missing_df, aes(x = reorder(variable, missing_percent),
                    y = missing_percent)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_minimal() +
    labs(title = "Percentage of Missing Values by Variable",
        x = "Variables", y = "Missing Percentage")

3. Data Cleaning and Transformation

Create a GDP category: Classify countries into High GDP or Low GDP based on whether their Logged.GDP.per.capita is above or below the median value. Hint: Use the median() function to find the median GDP, and ifelse() to categorize the countries. Clean the data: Remove any rows where the happiness score Ladder.score is missing

whr_clean <- whr %>%
    mutate(# Create a GDP variable
    GDP = ifelse(Logged.GDP.per.capita >= median(whr$Logged.GDP.per.capita), "High GDP", "Low GDP")) %>%
    filter(!is.na(Ladder.score))  # Remove rows with missing ladder score values

2. Data Summarization

Calculate average happiness scores: Group the dataset by GDP category (high vs. low GDP) and calculate the average happiness score Ladder.score for each group.

country_stats <- whr_clean %>%
    group_by(GDP) %>%
    summarise(
        Avg_Happiness_Score = mean(Ladder.score)) %>%
    arrange(desc(Avg_Happiness_Score))

3. Data Visualization

Create a box plot: Create a box plot that compares the happiness scores Ladder.score between high and low GDP countries. Use ggplot2 to create the plot.s)

Box Plot (Happiness Score by GDP):

box_plot <- ggplot(whr_clean,
                aes(x = GDP, y = Ladder.score)) +
    geom_boxplot() +
    geom_jitter(alpha = 0.1) +
    theme_minimal() +
    labs(title = "Happiness Score by GDP",
        x = "GDP", y = "Happiness Score")
print(box_plot)

4. Statistical Analysis

Perform a t-test: Perform a t-test to compare the average happiness scores between high and low GDP countries. Interpret the result briefly (focus on the p-value).

t_test_result <- t.test(Ladder.score ~ GDP, data = whr_clean)
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  Ladder.score by GDP
## t = 10.779, df = 130.93, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group High GDP and group Low GDP is not equal to 0
## 95 percent confidence interval:
##  1.262241 1.829667
## sample estimates:
## mean in group High GDP  mean in group Low GDP 
##               6.307130               4.761176