MATH1324 Introduction to Statistics Assignment 2

Between 2010 and 2014, was there a general upward or downward trend in the unemployment rate?

Individual| Student Name : EI THIRI LWIN , Student ID : S3866360

Last updated: 15 October, 2022

RPubs link information

Introduction

Unemployment happens when people are eager and able to work but are unable to find work.

People feel that the evolution of an individual’s scientific and technical level through time, as well as the expansion in work options, will have a favourable impact on the low unemployment rate.

Introduction Cont.

World Bank - Youth Unemployment rates (IPO) by country, 2010 - 2014
Name of Country 219 unique values
Unambiguous abbreviation of country name 219 unique values

Problem Statement

This study seeks to determine whether the unemployment rate differs statistically significantly between 2010 and 2014.
The results of this test would be sufficient to confirm or deny the problem statement.
Is there a rise or fall in unemployment between 2010 and 2014?
The unemployment rate between 2010 and 2014 was compared using the period test to see if there were any differences.

Data

World Bank Youth Unemployment Rates This dataset contains youth unemployment rates (% of total labor force ages 15-24) (modeled ILO estimate). https://www.kaggle.com/datasets/sovannt/world-bank-youth-unemployment/download?datasetVersionNumber=1

Data Cont.

This data set contains the unemployment rate in nations throughout the globe from 2010 to 2014.

This information was acquired on the url referenced in the preceding section.

The data collection is a sample of 219 countries revealing the unemployment rate from 2010 to 2014.

I selected to examine the difference in the unemployment rate between the years 2010 and 2014 with a gap of four years, which is a considerable period for the occurrence of evident changes worthy of investigation.

There are seven variables, namely:

Year variables from 2010 to 2014 are double variables and correspond to the unemployment rate in each year.

Country Code is a character type and an abbreviation for the nation’s name.

Country name is a character type that relates to every country in the globe.

Descriptive Statistics and Visualisation

The dependent samples t-test, sometimes referred to as the paired-samples t-test, must be used since the data are dependent. It is used to determine if the unemployment rate differs statistically between 2010 and 2014.

data <- read_csv("API_ILO_country_YU.csv")

colnames(data)[3:7]<-c("y2010_unemployment_count", "y2011_unemployment_count", "y2012_unemployment_count", "y2013_unemployment_count", "y2014_unemployment_count")
d2<- data %>% dplyr::select(-('y2011_unemployment_count':'y2013_unemployment_count'))

favstats(~y2010_unemployment_count, data = d2)

favstats(~y2014_unemployment_count, data = d2)

Evidently, the average unemployment rate increased little.
The paired-samples t-test is used to evaluate if this change is statistically significant.
According to the Central Limit Theorem, if the number of observations exceeds 30, the data are regularly distributed.

d3 <- d2 %>% mutate(d = y2014_unemployment_count - y2010_unemployment_count)
#Aassessed whether or not there are outliers in the difference column(s) to which the test will be applied.
boxplot(d3$d)

#In this section,different method is used to count the number of outliers and their location, such as z-scores.
z.scores <- d3$d %>%  scores(type = "z")
z.scores %>% summary()

##       Dim1          
##  Min.   :-16.30000  
##  1st Qu.: -1.30000  
##  Median : -0.10000  
##  Mean   :  0.05058  
##  3rd Qu.:  0.60000  
##  Max.   : 21.50000

which( abs(z.scores) >3 )

##  [1]   3   4   7  18  37  43  47  55  56  58  59  69  74  78  81  83  88  94  96
## [20] 105 110 119 121 124 127 134 135 136 141 142 143 146 155 163 168 179 185 188
## [39] 197 201 203 209 213

length(which(abs(z.scores) >3 ))

## [1] 43

Decsriptive Statistics Cont.

# manipulate outliers
cap <- function(x){
  quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
  x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
  x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
  x}
#To ensure that the processing of ourliers had completed properly,boxblot once more after cap() in this stage.
capped_data <- d3$d %>% cap()
boxplot(capped_data)

Hypothesis Testing

As sample size is greater than 30, proceeding with the paired-samples t-test. The following are the statistical hypotheses for the paired-samples t-test: H0:μΔ=0 H0:μΔ≠0

t.test(capped_data, 
       mu = 0, 
       alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  capped_data
## t = -0.90086, df = 218, p-value = 0.3687
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.4703550  0.1752595
## sample estimates:
##  mean of x 
## -0.1475477

Test statistic t = -0.90086

Hypthesis Testing Cont.

Obtaining a two-tailed t-critical value t using df to compare with the t-statistic

qt(p = 0.025, df = 218)

## [1] -1.970906

-The test statistic from a one-sample t-test of the differences, d, was t = -0.90086 and no more severe than -1.970906 according to the output presented above.

-So, H0 should be accepted because there is no statistically significant difference between the unemployment rates in 2010 and 2014. Continued Calculating two-tailed p-value is as follows:

2*pt(q = -0.90086, df = 218)

## [1] 0.3686569

-Because p=0.3686569 > 0.05, the p-value technique fails to reject H0. -Consequently, there was no statistically significant difference in the unemployment rate between 2010 and 2014. # Discussion

The findings of the period test indicated that between 2010 and 2014, there was no statistically significant difference in the mean unemployment rate.
The good things about this study are that it looked at a large sample and was done over four years, which is a good amount of time to see changes in the unemployment rate.