MATH1324 Introduction to Statistics Assignment 2

Between 2010 and 2014, was there a general upward or downward trend in the unemployment rate?

Individual| Student Name : EI THIRI LWIN , Student ID : S3866360

Last updated: 15 October, 2022

Introduction

Unemployment happens when people are eager and able to work but are unable to find work.

People feel that the evolution of an individual’s scientific and technical level through time, as well as the expansion in work options, will have a favourable impact on the low unemployment rate.

Introduction Cont.

Problem Statement

Data

World Bank Youth Unemployment Rates This dataset contains youth unemployment rates (% of total labor force ages 15-24) (modeled ILO estimate). https://www.kaggle.com/datasets/sovannt/world-bank-youth-unemployment/download?datasetVersionNumber=1

Data Cont.

This data set contains the unemployment rate in nations throughout the globe from 2010 to 2014.

This information was acquired on the url referenced in the preceding section.

The data collection is a sample of 219 countries revealing the unemployment rate from 2010 to 2014.

I selected to examine the difference in the unemployment rate between the years 2010 and 2014 with a gap of four years, which is a considerable period for the occurrence of evident changes worthy of investigation.

There are seven variables, namely:

Year variables from 2010 to 2014 are double variables and correspond to the unemployment rate in each year.

Country Code is a character type and an abbreviation for the nation’s name.

Country name is a character type that relates to every country in the globe.

Descriptive Statistics and Visualisation

The dependent samples t-test, sometimes referred to as the paired-samples t-test, must be used since the data are dependent. It is used to determine if the unemployment rate differs statistically between 2010 and 2014.

data <- read_csv("API_ILO_country_YU.csv")

colnames(data)[3:7]<-c("y2010_unemployment_count", "y2011_unemployment_count", "y2012_unemployment_count", "y2013_unemployment_count", "y2014_unemployment_count")
d2<- data %>% dplyr::select(-('y2011_unemployment_count':'y2013_unemployment_count'))

favstats(~y2010_unemployment_count, data = d2)
favstats(~y2014_unemployment_count, data = d2)
d3 <- d2 %>% mutate(d = y2014_unemployment_count - y2010_unemployment_count)
#Aassessed whether or not there are outliers in the difference column(s) to which the test will be applied.
boxplot(d3$d)

#In this section,different method is used to count the number of outliers and their location, such as z-scores.
z.scores <- d3$d %>%  scores(type = "z")
z.scores %>% summary()
##       Dim1          
##  Min.   :-16.30000  
##  1st Qu.: -1.30000  
##  Median : -0.10000  
##  Mean   :  0.05058  
##  3rd Qu.:  0.60000  
##  Max.   : 21.50000
which( abs(z.scores) >3 )
##  [1]   3   4   7  18  37  43  47  55  56  58  59  69  74  78  81  83  88  94  96
## [20] 105 110 119 121 124 127 134 135 136 141 142 143 146 155 163 168 179 185 188
## [39] 197 201 203 209 213
length(which(abs(z.scores) >3 ))
## [1] 43

Decsriptive Statistics Cont.

# manipulate outliers
cap <- function(x){
  quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
  x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
  x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
  x}
#To ensure that the processing of ourliers had completed properly,boxblot once more after cap() in this stage.
capped_data <- d3$d %>% cap()
boxplot(capped_data)

Hypothesis Testing

t.test(capped_data, 
       mu = 0, 
       alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  capped_data
## t = -0.90086, df = 218, p-value = 0.3687
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.4703550  0.1752595
## sample estimates:
##  mean of x 
## -0.1475477

Hypthesis Testing Cont.

qt(p = 0.025, df = 218)
## [1] -1.970906

-The test statistic from a one-sample t-test of the differences, d, was t = -0.90086 and no more severe than -1.970906 according to the output presented above.

-So, H0 should be accepted because there is no statistically significant difference between the unemployment rates in 2010 and 2014. Continued Calculating two-tailed p-value is as follows:

2*pt(q = -0.90086, df = 218)
## [1] 0.3686569

-Because p=0.3686569 > 0.05, the p-value technique fails to reject H0. -Consequently, there was no statistically significant difference in the unemployment rate between 2010 and 2014. # Discussion

References