A.Introduction My Research question: How have male primary school exclusion rates changed in France and Tunisia from 2000 to 2004 and which country showed greater improvement? The comparison examines the way in which male primary school exclusion rates have evolved in both France and Tunisia between 2000 and 2004 in order to establish in which country there was an improvement. The Data is provided by World Bank Group. To answer my question, the data set will be reduced to 2 rows(France and Tunisia) and the analysis will concentrate on the columns representing the exclusion rates from 2000 to 2004 and the Country Names. A comparison of trends of the two countries over this four year period will enable us to determine which country was more successful at reducing its count of excluded male students.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
project <- read_csv("C:/Users/mezni/OneDrive/Desktop/Project 1/dataset (1).csv")
## Rows: 266 Columns: 54
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country Name, Country Code
## dbl (52): 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(project)
## # A tibble: 6 × 54
## `Country Name` `Country Code` `1973` `1974` `1975` `1976` `1977` `1978` `1979`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Aruba ABW NA NA NA NA NA NA NA
## 2 Africa Easter… AFE NA NA NA NA NA NA NA
## 3 Afghanistan AFG NA 528840 NA NA NA NA NA
## 4 Africa Wester… AFW NA NA NA NA NA NA NA
## 5 Angola AGO NA NA NA NA NA NA NA
## 6 Albania ALB NA NA NA NA NA NA NA
## # ℹ 45 more variables: `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
## # `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
## # `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
## # `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
## # `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>,
## # `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>,
## # `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, …
dim(project)
## [1] 266 54
names(project)
## [1] "Country Name" "Country Code" "1973" "1974" "1975"
## [6] "1976" "1977" "1978" "1979" "1980"
## [11] "1981" "1982" "1983" "1984" "1985"
## [16] "1986" "1987" "1988" "1989" "1990"
## [21] "1991" "1992" "1993" "1994" "1995"
## [26] "1996" "1997" "1998" "1999" "2000"
## [31] "2001" "2002" "2003" "2004" "2005"
## [36] "2006" "2007" "2008" "2009" "2010"
## [41] "2011" "2012" "2013" "2014" "2015"
## [46] "2016" "2017" "2018" "2019" "2020"
## [51] "2021" "2022" "2023" "2024"
cleaned_data <- project |>
select('Country Name', `2000`,`2001`, `2002`, `2003`, `2004`) |>
filter(`Country Name` %in% c("France", "Tunisia"))
cleaned_data
## # A tibble: 2 × 6
## `Country Name` `2000` `2001` `2002` `2003` `2004`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France 72517 73823 75192 79580 81089
## 2 Tunisia 29607 26867 13043 9991 9073
long_data <- data.frame(
Country = rep(c("Tunisia", "France"), each = 5),#For This i was struggling how to fix it so i needed to use Youtube and AI correction
Year = rep(2000:2004, 2),
Excluded = c(29607, 26867, 13043, 9991, 9073,
72517, 73823, 75192, 79580, 81089)
)
summary(long_data$Excluded[long_data$Country == "Tunisia"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9073 9991 13043 17716 26867 29607
summary(long_data$Excluded[long_data$Country == "France"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 72517 73823 75192 76440 79580 81089
names(long_data)
## [1] "Country" "Year" "Excluded"
#AI help
library(ggplot2)
# First create the long format data (like we discussed)
long_data <- data.frame(
Country = rep(c("Tunisia", "France"), each = 5),
Year = rep(2000:2004, 2),
Excluded = c(29607, 26867, 13043, 9991, 9073, # Tunisia
72517, 73823, 75192, 79580, 81089) # France
)
# Create the bar plot using the same structure as your class example
ggplot(long_data, aes(x = factor(Year), y = Excluded, fill = Country)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Male Primary School Exclusion Rates: France vs Tunisia (2000-2004)",
x = "Year",
y = "Number of Boys Excluded",
fill = "Country"
) +
scale_fill_manual(values = c("Tunisia" = "#2ca02c", # green
"France" = "#FF4040")) + # red
theme_minimal()
To investigate how male exclusion rates from primary school changed in France and Tunisia between the years 2000 and 2004, I undertook exploratory analysis and data cleaning. I started by selecting only the desired years and countries using filter() and select() functions. Then, I created a long-format data set to organize the data for plotting. For exploratory analysis, I used names() function to verify column names and calculated summary statistics to observe the data distribution. Finally, I drew a comparative bar plot by using ggplot2(with AI help) to graphically present both countries’ exclusion patterns over the five-year period, which clearly illustrates the various patterns between France and Tunisia.
C. Conclusion and Future Directions Exploratory data analysis from the demographic point of view reveals a clear divergence in French and Tunisian male primary school exclusion trends between 2000 and 2004. According to the data, Tunisia posted an impressive 69.4% reduction in the exclusion of male pupils, from 29,607 to 9,073. Such an improving trend is also confirmed by Tunisia’s five-number summary, revealing a positive, left-skewed distribution with a high dip from the maximum to the minimum value.
Conversely, France experienced a steady 11.8% increase in exclusions over the same period, from 72,517 to 81,089. France’s total statistics also confirm this steady increase, yielding a tightly clustered, right-skewed series of values all of which are significantly higher than those in Tunisia.
For Further analysis, we should study exactly what Tunisia did right in its schools during this time, so other countries can copy their success. Second, we should check newer numbers to see if Tunisia kept improving and if France’s situation got better or worse.
D.References https://www.youtube.com/watch?v=BvKETZ6kr9Q https://r4ds.hadley.nz/functions.html