Project 1

A.Introduction My Research question: How have male primary school exclusion rates changed in France and Tunisia from 2000 to 2004 and which country showed greater improvement? The comparison examines the way in which male primary school exclusion rates have evolved in both France and Tunisia between 2000 and 2004 in order to establish in which country there was an improvement. The Data is provided by World Bank Group. To answer my question, the data set will be reduced to 2 rows(France and Tunisia) and the analysis will concentrate on the columns representing the exclusion rates from 2000 to 2004 and the Country Names. A comparison of trends of the two countries over this four year period will enable us to determine which country was more successful at reducing its count of excluded male students.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

project <- read_csv("C:/Users/mezni/OneDrive/Desktop/Project 1/dataset (1).csv")

## Rows: 266 Columns: 54
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Country Name, Country Code
## dbl (52): 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(project)

## # A tibble: 6 × 54
##   `Country Name` `Country Code` `1973` `1974` `1975` `1976` `1977` `1978` `1979`
##   <chr>          <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Aruba          ABW                NA     NA     NA     NA     NA     NA     NA
## 2 Africa Easter… AFE                NA     NA     NA     NA     NA     NA     NA
## 3 Afghanistan    AFG                NA 528840     NA     NA     NA     NA     NA
## 4 Africa Wester… AFW                NA     NA     NA     NA     NA     NA     NA
## 5 Angola         AGO                NA     NA     NA     NA     NA     NA     NA
## 6 Albania        ALB                NA     NA     NA     NA     NA     NA     NA
## # ℹ 45 more variables: `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
## #   `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
## #   `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
## #   `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
## #   `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>,
## #   `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>,
## #   `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, …

dim(project)

## [1] 266  54

names(project)

##  [1] "Country Name" "Country Code" "1973"         "1974"         "1975"        
##  [6] "1976"         "1977"         "1978"         "1979"         "1980"        
## [11] "1981"         "1982"         "1983"         "1984"         "1985"        
## [16] "1986"         "1987"         "1988"         "1989"         "1990"        
## [21] "1991"         "1992"         "1993"         "1994"         "1995"        
## [26] "1996"         "1997"         "1998"         "1999"         "2000"        
## [31] "2001"         "2002"         "2003"         "2004"         "2005"        
## [36] "2006"         "2007"         "2008"         "2009"         "2010"        
## [41] "2011"         "2012"         "2013"         "2014"         "2015"        
## [46] "2016"         "2017"         "2018"         "2019"         "2020"        
## [51] "2021"         "2022"         "2023"         "2024"

cleaned_data <- project |>
  select('Country Name', `2000`,`2001`, `2002`, `2003`, `2004`) |>
  filter(`Country Name` %in% c("France", "Tunisia"))
cleaned_data

## # A tibble: 2 × 6
##   `Country Name` `2000` `2001` `2002` `2003` `2004`
##   <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 France          72517  73823  75192  79580  81089
## 2 Tunisia         29607  26867  13043   9991   9073

long_data <- data.frame(
  Country = rep(c("Tunisia", "France"), each = 5),#For This i was struggling how to fix it so i needed to use Youtube and AI correction
  Year = rep(2000:2004, 2),
  Excluded = c(29607, 26867, 13043, 9991, 9073,   
               72517, 73823, 75192, 79580, 81089)  
)
summary(long_data$Excluded[long_data$Country == "Tunisia"])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9073    9991   13043   17716   26867   29607

summary(long_data$Excluded[long_data$Country == "France"])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   72517   73823   75192   76440   79580   81089

names(long_data)

## [1] "Country"  "Year"     "Excluded"

#AI help 
library(ggplot2)
# First create the long format data (like we discussed)
long_data <- data.frame(
  Country = rep(c("Tunisia", "France"), each = 5),
  Year = rep(2000:2004, 2),
  Excluded = c(29607, 26867, 13043, 9991, 9073,   # Tunisia
               72517, 73823, 75192, 79580, 81089)  # France
)

# Create the bar plot using the same structure as your class example
ggplot(long_data, aes(x = factor(Year), y = Excluded, fill = Country)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Male Primary School Exclusion Rates: France vs Tunisia (2000-2004)",
    x = "Year",
    y = "Number of Boys Excluded",
    fill = "Country"
  ) +
  scale_fill_manual(values = c("Tunisia" = "#2ca02c",  # green
                               "France" = "#FF4040")) +  # red
  theme_minimal()

To investigate how male exclusion rates from primary school changed in France and Tunisia between the years 2000 and 2004, I undertook exploratory analysis and data cleaning. I started by selecting only the desired years and countries using filter() and select() functions. Then, I created a long-format data set to organize the data for plotting. For exploratory analysis, I used names() function to verify column names and calculated summary statistics to observe the data distribution. Finally, I drew a comparative bar plot by using ggplot2(with AI help) to graphically present both countries’ exclusion patterns over the five-year period, which clearly illustrates the various patterns between France and Tunisia.

C. Conclusion and Future Directions Exploratory data analysis from the demographic point of view reveals a clear divergence in French and Tunisian male primary school exclusion trends between 2000 and 2004. According to the data, Tunisia posted an impressive 69.4% reduction in the exclusion of male pupils, from 29,607 to 9,073. Such an improving trend is also confirmed by Tunisia’s five-number summary, revealing a positive, left-skewed distribution with a high dip from the maximum to the minimum value.

Conversely, France experienced a steady 11.8% increase in exclusions over the same period, from 72,517 to 81,089. France’s total statistics also confirm this steady increase, yielding a tightly clustered, right-skewed series of values all of which are significantly higher than those in Tunisia.

For Further analysis, we should study exactly what Tunisia did right in its schools during this time, so other countries can copy their success. Second, we should check newer numbers to see if Tunisia kept improving and if France’s situation got better or worse.

D.References https://www.youtube.com/watch?v=BvKETZ6kr9Q https://r4ds.hadley.nz/functions.html

Project 1

Mezni Youssef

2025-10-14