Brief Introduction

There are 10 provinces and 3 territories in Canada, In these provinces and territories, there are multiple schools that offer education at various levels and they have been made accessible to everybody. There are schools where the male gender are more in terms of population than the female gender, this can be due to various reasons such as type of courses made available and other external factors. Education should be a priority to everybody as we all learn and equip ourselves with knowledge that we can use to shape the world and imperatively, our future. A survey has been carried out and each gender has been represented fairly. We will be examining 5 levels of education as well as the provinces and territories in the education dataset after which we will draw some conclusions and reveal relationships between different parts of the data.

Objective

Which province or territory in the country, has the most educated number of women and what level of education do we see them attain the most? Are more women educated than men? Is there any sort of similarity between the education levels across provinces and territories? As we begin the analysis on the Education Level Data, we shall reveal answers to these questions, discover patterns that will bring to light, how much we see women in the educational sector and produce a solid data-driven report.

To achieve all of the listed above, we will also make visualizations to convey the relationships between the age, level of education and gender in order to portray our results and deductions more effectively.

Descriptive Statistics Of The Data

We begin with a summary of the data.

Descriptive Statistics of the Quantitative Variables.
Mean Std Deviation Median Max Min Range
Age - - - 55-64 25-34 -
Total_Count 338060.5 619992.018 68405 2549315 1085 2548230
No Education 38746.03 74996.16 9270 411550 150 411400
High School 80260.76 150060.4 18822.5 771475 90 771385
Apprenticeship/Trades 36472.14 72901.86 5725 387665 55 387610
College 75749.91 142351.8 14697.5 692990 235 692975
Certificate 10372.99 19302.76 1762.5 95360 15 95345
University 96458.97 184542.8 14192.5 941135 190 940945
Descriptive Statistics of the Qualitative Variables.
Count of Genders Surveyed
Male : 56 Female : 56

Overall, we can see from the summary statistic that the average number of people at the university level of education is 96459, which is also the highest average observed in the data with a standard deviation of 184542.8.

In Addition to this, the number of people with no level of education across provinces and territories ranged between 150 to 411550 with an average of 38746.03 and a standard deviation of 74996.16. Roughly 95% of the people surveyed were below the age of 55-64.

Finally, seeing that the count of genders are both equal shows that there will be no form of bias as each gender has been equally represented in the data.

Exploratory Data Analysis

The first visualization will be a summary of the total number of females educated in each province. As we can see from Figure 1 below, Ontario takes the lead with a total number of over 300,000 females educated, followed by Quebec with almost 200,000 women and then we have Nuvanut being the province with the least educated total number of females.

Visualisation 1

Number of Educated Women in Canada by Province

Number of Educated Women in Canada by Province

Visualisation 2

The second visualization which is presented below as Figure 2, shows the 2 genders and the total number of people at the university level. This is also classified by Age. Out of the 4 age brackets listed by the right of the diagram, the most educated females who are at the university level are within the age group of 25-34. The least age group educated at the university level are within the 55-64. On the other hand, there aren’t so many men who are at the university level. We can see that compared to the age groups of the females and males, majority of the male gender count at the university level is between 600,000 and 700,000.

Gender vs Total Number At The University Level  by Age

Gender vs Total Number At The University Level by Age

Visualisation 3

To gain a better insight into the gender just like the visualization done above, the geographic name has been added as a class for better analysis and for easier level of comprehension. We can see just as the diagram has been split into two, that the most educated and present at the university level are the females within the age group of 25-34 and the males within the age group of 25-34 both coming out of Ontario. The horizontal line has been placed in order to convey how many women are educated vs men and from the illustration below, we can also take away a significant measure that shows that more women are educated and possess a university degree than men. We have 55% of women at the university level and 45% of men at the university level.

Gender University
Female 2984140
Male 2417550

Visualisation 4

From the table and diagram below, we can see that 56% of men have no formal level of education while 44% of women do not have a formal level of education.

Gender None
Female 963335
Male 1200085

Claim Evidence Reasoning

To Buttress the claim, according to scientific data from environics analytics, data showed us that as at 2017, right after the 2016 census, it was discovered that 64.7 percent of the population aged 25 to 64 who are women/females now have a postsecondary degree or diploma compared to 60.8 percent in 2006.

Our data and illustration in Figure 2, shows us that over 55% of the women have a university degree compared the men.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwi7n7PC1-T0AhXJJzQIHVLkC34QFnoECAwQAw&url=https%3A%2F%2Fenvironicsanalytics.com%2Fresources%2Fblogs%2Fea-blog%2F2017%2F11%2F29%2Flatest-census-release-shows-canadian-women-more-educated-than-ever&usg=AOvVaw2AzUjBXrcwUOHATGm7zGKU

Clustering

Optimal Number of Clusters = 3

University None HS College Cert App cluster
Alberta 635725 244745 569435 496380 73315 238220 1
British Columbia 758005 244000 671010 528810 99705 231445 1
Manitoba 165085 94305 182325 137935 21210 53795 2
New Brunswick 80815 55635 114335 104590 7665 37720 2
Newfoundland and Labrador 52780 45160 65205 81960 6680 36080 2
Northwest Territories 5800 4855 4565 5135 660 2625 2
Nova Scotia 125975 60290 116280 128010 11600 52375 2
Nunavut 2360 6740 2410 3185 205 1590 2
Ontario 2307320 753000 1768970 1782530 170925 446395 3
Prince Edward Island 17605 8945 18670 20975 1860 6255 2
Quebec 1116305 580625 808970 832430 167025 866595 3
Saskatchewan 127580 69200 168035 115085 19375 66960 2
Yukon 6335 2255 4410 4975 670 2395 2

From the dendrogram below, we can see that Prince Edward Island, Nuvanut, Northwest Territories and Yukon are the most similar, as the length or height of the dendrogram link that joins them together is the smallest, it is almost impossible to see. The next 5 most similar provinces are Manitoba, Sasktachewan, Newfoundland and Labrador, New Brunswick and Nova Scotia. After which we have Ontario and Quebec grouped together as they share similar features and finally, Alberta and British Columbia sharing similar features as well.

The dendrogram below also shows us that there is a significant difference between the clusters A and B versus C where A is the red outlined rectangle, B is the lime outlined rectangle and C is the blue outlined rectangle.

Dendrogram of Provinces Using Features From The Education Dataset

Dendrogram of Provinces Using Features From The Education Dataset

Reporting Confidence Intervals

BarPlot 1

The confidence interval for the total number of educated people is 196413.3 or 402215.7. With this, we can conclude with 95% confidence that the mean number of educated is between 196413.3 and 402215.7. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of number of educated. We can further make our analysis clearer by breaking it down and showing the confidence interval for the

Barplot 2

The confidence interval for the total number of people with no form of education is 24703.71 or 52788.34. With this, we can conclude with 95% confidence that the mean population with no form of education is between 24703.71 and 52788.34. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of people with no level of education. We can further make our analysis clearer by breaking it down and showing the confidence interval for the

Conclusion

  1. Ontario and Quebec has the most educated females.
  2. Most of the women who are educated at the university level are within the age group of 25-34.
  3. Based on the data, a larger percentage of women are educated than men.
  4. 56% of men have no formal level of education while 44% of women do not have a formal level of education.
  5. Prince Edward Island, Nuvanut, Northwest Territories and Yukon have extreme similarities/patterns in their education levels.
  6. Ontario and Quebec possess a decent level of similarity in their education levels.
  7. No misrepresentation has been made due to the unbiasness of the data.

Code - Appendix

Brief Introduction

There are 10 provinces and 3 territories in Canada, In these provinces and territories, there are multiple schools that offer education at various levels and they have been made accessible to everybody. There are schools where the male gender are more in terms of population than the female gender, this can be due to various reasons such as type of courses made available and other external factors. We shall reflect through some possible reasons based on external data to solidify the claim of my result in the hypothesis testing section of this project. Education should be a priority to everybody as we all learn and equip ourselves with knowledge that we can use to shape the world and imperatively, our future. We will be examining 5 levels of education.

Objective

Which province or territory in the country, has the most educated number of women and what level of education do we see them attain the most? Are there cases where we see a certain age gap of women being the most exposed to the learning center? Does the age play a factor in what level of education a woman receives? As we begin the analysis on the Education Level Data, we shall reveal answers to these questions, discover patterns that will bring to light, how much we see women in the educational sector and produce a solid data-driven report.

To achieve all of the listed above, we will also make visualizations to convey the relationships between the age, level of education and gender in order to portray our results and deductions more effectively.

Descriptive Statistics

We begin with a summary of the data.

Descriptive Statistics of the Quantitative Variables.
Mean Std Deviation Median Max Min Range
Age - - - 55-64 25-34 -
No Education 38746.03 74996.16 9270 411550 150 411400
High School 80260.76 150060.4 18822.5 771475 90 771385
Apprenticeship/Trades 36472.14 72901.86 5725 387665 55 387610
College 75749.91 142351.8 14697.5 692990 235 692975
Certificate 10372.99 19302.76 1762.5 95360 15 95345
University 96458.97 184542.8 14192.5 941135 190 940945
table(education$Gender)
## 
##  0  1 
## 56 56
education %>%
summarise_at(vars(Age,None,HS,App, College,Cert,University),
list(mean=mean, sd=sd,
                    median=median,
                    max = max,
                    min=min))
t1 <- education %>% select(2,4)  %>% group_by(Gender)
t1 %>% filter(Gender==0) 
table(t1)
##      Gender
## NRR   0 1
##   4.3 4 4
##   4.6 4 4
##   4.9 4 4
##   5.1 4 4
##   5.3 4 4
##   5.5 4 4
##   5.6 4 4
##   6.1 8 8
##   6.3 4 4
##   6.8 8 8
##   8.7 4 4
##   8.8 4 4
Descriptive Statistics of the Qualitative Variables.
Count of Genders Surveyed
Male : 56 Female : 56

Overall, we can see from the summary statistic that the average number of people at the university level of education is 96458.97, which is also the highest average observed in the data with a standard deviation of 184542.8. Thus, we can conclude that 95% of the total number of people with the university level of education across the provinces and territories does not exceed 465,545.

In Addition to this, the number of people with no level of education across provinces and territories ranged between 150 to 411550 with an average of 38746.03 and a standard deviation of 74996.16. Roughly 95% of the people surveyed were below the age of 55-64.

Exploratory Data Analysis

The first visualization will be a summary of the total number of females educated in each province. As we can see from Figure 1 below, Ontario takes the lead with a total number of over 300,000 females educated, followed by Quebec with almost 200,000 women and then we have Nuvanut being the province with the least educated total number of females.

Visualisation 1

genders_select <- education %>% select(-c(NRR)) 
fem <- genders_select %>% filter(Gender == 0)
barp_Female <- fem %>% mutate(NumOfEducated = Total_Count - None) %>% select(-c(None, Total_Count)) %>% filter(Geographic_Name != "Canada")
aggr_female <- aggregate(NumOfEducated ~ Geographic_Name, barp_Female, sum)
aggr_female %>%
  filter(!is.na(NumOfEducated)) %>%
  arrange(NumOfEducated) %>%
  tail(20) %>%
  mutate(Geographic_Name=factor(Geographic_Name, Geographic_Name)) %>%
  ggplot( aes(x=Geographic_Name, y=NumOfEducated) ) +
    geom_bar(stat="identity", fill="#69b3a2") +
    coord_flip() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Number of Educated Females In Each Province")

everything <- data.frame(Gender=c("Female","Male"),
                 HS=c(4495155, 4494050),
                 App=c(1329320, 2755560),
                College=c(4911850, 3572140),
                 Certification=c(681700, 480075),
                 University=c(5968300, 4835105))
fem_age_Adjust <- fem %>%
  mutate(Age = case_when(Age == "1" ~ "25-34",
    Age == "2" ~ "35-44",
  Age == "3" ~ "45-54",
  Age == "4" ~ "55-64"))
barp_Female <- barp_Female %>%
  mutate(Age = case_when(Age == "1" ~ "25-34",
    Age == "2" ~ "35-44",
  Age == "3" ~ "45-54",
  Age == "4" ~ "55-64"))
genders_select11 <- genders_select %>%
  mutate(Age = case_when(Age == "1" ~ "25-34",
    Age == "2" ~ "35-44",
  Age == "3" ~ "45-54",
  Age == "4" ~ "55-64")) %>% mutate(Gender = case_when(Gender == "0" ~ "Female",
                                                       Gender == "1" ~ "Male"))
val <- genders_select11 %>% select(c(Gender,University))
val3 <- genders_select %>% select(c(Gender, Age,University, None, Geographic_Name))
val1 <- aggregate(University ~ Gender, val, sum)
val2 <- genders_select11 %>% select(c(Gender,Age,University, Geographic_Name, None)) %>% filter(Geographic_Name != "Canada")
val2a <- aggregate(University ~ Age + Gender, val2, sum)
val4 <- aggregate(University ~ Age + Gender + None, val2, sum)

Visualisation 2

The second visualization which is presented below as Figure 2, shows the 2 genders and the total number of people at the university level. This is also classified by Age. Out of the 4 age brackets listed by the right of the diagram, the most educated females who are at the university level are within the age group of 25-34. The least age group educated at the university level are within the 55-64. On the other hand, there aren’t so many men who are at the university level. We can see that compared to the age groups of the females and males, majority of the male gender count at the university level is between 600,000 and 700,000.

ggplot(val2a, aes(x = Gender, y = University,
                     color = Age)) + 
  geom_point()

Visualisation 3

To gain a better insight into the gender just like the visualization done above, the geographic name has been added as a class for better analysis and for easier level of comprehension. We can see just as the diagram has been split into two, that the most educated and present at the university level are the females within the age group of 25-34 and the males within the age group of 25-34 both coming out of Ontario. The horizontal line has been placed in order to convey how many women are educated vs men and from the illustration below, we can also take away a significant measure that shows that more women are educated and possess a university degree than men. There are 6 groups above 150,000 for women and 4 groups above 150,000 for men.

valw_prov <- genders_select11 %>% select(c(Gender,University,None,Geographic_Name,Age)) %>% filter(Geographic_Name != "Canada")

xy <- ggplot(valw_prov, aes(x = Gender, y = University, color=Age, size=Geographic_Name)) + 
  geom_point(position = "jitter") + geom_ref_line(v=1.5, colour = "black")
xy + geom_ref_line(h=150000, size = 1.5, colour = "black")
## Warning: Using size for a discrete variable is not advised.

Visualisation 4

ggplot(val4, aes(x = Gender, y = None,
                     color = Age)) + 
  geom_point(position = "jitter") + geom_ref_line(h=50000, colour = "yellow")

ttt1 <-aggregate(None ~ Gender, val4, sum)
ttt1
ttt <-aggregate(University ~ Gender, val4, sum)
ttt

Claim Evidence Reasoning

To Buttress the claim, according to scientific data from environics analytics, data showed us that as at 2017, right after the 2016 census, it was discovered that 64.7 percent of the population aged 25 to 64 who are women/females now have a postsecondary degree or diploma compared to 60.8 percent in 2006.

Our data and illustration in Figure 2, shows us that over 55% of the women have a university degree compared the men.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwi7n7PC1-T0AhXJJzQIHVLkC34QFnoECAwQAw&url=https%3A%2F%2Fenvironicsanalytics.com%2Fresources%2Fblogs%2Fea-blog%2F2017%2F11%2F29%2Flatest-census-release-shows-canadian-women-more-educated-than-ever&usg=AOvVaw2AzUjBXrcwUOHATGm7zGKU

Clustering

cl1 <- genders_select %>% select(-c(Gender, Age)) %>% filter(Geographic_Name != "Canada")
cl1_final<- aggregate(cbind(University,None,HS,College, Cert,App) ~ Geographic_Name, cl1, sum)
cl1_final <- cl1_final%>%
  na.omit() %>%
  remove_rownames %>% column_to_rownames(var="Geographic_Name")
df <- scale(cl1_final)  
set.seed(123)           
head(df, n=5)
##                           University       None         HS    College
## Alberta                    0.3290878  0.3260308  0.4403318  0.3357524
## British Columbia           0.5118260  0.3229104  0.6402769  0.3997749
## Manitoba                  -0.3742481 -0.3040769 -0.3216738 -0.3718805
## New Brunswick             -0.5001833 -0.4660436 -0.4555086 -0.4377094
## Newfoundland and Labrador -0.5420794 -0.5099174 -0.5522184 -0.4823850
##                                 Cert        App
## Alberta                    0.4548789  0.3238777
## British Columbia           0.8741571  0.2968241
## Manitoba                  -0.3729531 -0.4125577
## New Brunswick             -0.5881528 -0.4767475
## Newfoundland and Labrador -0.6038023 -0.4832962
Distance <- dist(df, method = "euclidean")
h_clust <- hclust(d = Distance,
                         method = "ward.D2")

Optimal Number of Clusters = 3

library(factoextra)
fviz_nbclust(df, FUN = hcut, method = "wss")

group <- cutree(h_clust, k=3)     
table(group)  
## group
## 1 2 3 
## 2 9 2
cl1_final$cluster <- group
cl1_final %>% knitr::kable()
University None HS College Cert App cluster
Alberta 635725 244745 569435 496380 73315 238220 1
British Columbia 758005 244000 671010 528810 99705 231445 1
Manitoba 165085 94305 182325 137935 21210 53795 2
New Brunswick 80815 55635 114335 104590 7665 37720 2
Newfoundland and Labrador 52780 45160 65205 81960 6680 36080 2
Northwest Territories 5800 4855 4565 5135 660 2625 2
Nova Scotia 125975 60290 116280 128010 11600 52375 2
Nunavut 2360 6740 2410 3185 205 1590 2
Ontario 2307320 753000 1768970 1782530 170925 446395 3
Prince Edward Island 17605 8945 18670 20975 1860 6255 2
Quebec 1116305 580625 808970 832430 167025 866595 3
Saskatchewan 127580 69200 168035 115085 19375 66960 2
Yukon 6335 2255 4410 4975 670 2395 2

From the dendrogram below, we can see that Prince Edward Island, Nuvanut, Northwest Territories and Yukon are the most similar, as the length or height of the dendrogram link that joins them together is the smallest, it is almost impossible to see. The next 5 most similar provinces are Manitoba, Sasktachewan, Newfoundland and Labrador, New Brunswick and Nova Scotia. After which we have Ontario and Quebec grouped together as they share similar features and finally, Alberta and British Columbia sharing similar features as well.

The dendrogram below also shows us that there is a significant difference between the clusters A and B versus C where A is the red outlined rectangle, B is the lime outlined rectangle and C is the blue outlined rectangle.

plot(h_clust, cex = 0.6, hang = -1)
rect.hclust(h_clust, k = 3, border = 2:5)

fviz_cluster(list(data = df, cluster = group))

Reporting Confidence Intervals

BarPlot 1

The confidence interval for the total number of educated people is 196413.3 or 402215.7. With this, we can conclude with 95% confidence that the mean number of educated is between 196413.3 and 402215.7. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of number of educated. We can further make our analysis clearer by breaking it down and showing the confidence interval for the

data_t<- genders_select11 %>% select(Gender,Total_Count,None) %>% mutate(NumOfEducated = Total_Count - None)
data_gr <- data_t %>% select(c(Gender,NumOfEducated))
dt <- data_gr %>%
  dplyr::group_by(Gender)%>%
  dplyr::summarise(
    mean = mean(NumOfEducated),
    lci = t.test(NumOfEducated, conf.level = 0.95)$conf.int[1],
    uci = t.test(NumOfEducated, conf.level = 0.95)$conf.int[2])
dt
t.test(data_gr$NumOfEducated, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  data_gr$NumOfEducated
## t = 5.7639, df = 111, p-value = 0.00000007485
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  196413.3 402215.7
## sample estimates:
## mean of x 
##  299314.5
pl1 <- ggplot(data = dt)
pl1 <- pl1 + geom_bar(aes(x=Gender, y=mean, fill = Gender), stat="identity")
pl1 <- pl1 + geom_errorbar(aes(x=Gender, ymin=lci, ymax= uci), width = 0.4, color ="red",size =1)
pl1 <- pl1 + geom_text(aes(x=Gender, y=lci, label = round(lci,1)), size= 2, vjust = 1)
pl1 <- pl1 + geom_text(aes(x=Gender, y=uci, label = round(uci,1)), size= 2, vjust = -1)
pl1 <- pl1 + theme_classic()
pl1 <- pl1 + labs(title = "Bar chart with 95% confidence intervals for Total Count - None")
pl1 <- pl1 + labs(x= "Gender", y = "Mean of Number of Educated")
pl1

Barplot 2

The confidence interval for the total number of people with no form of education is 24703.71 or 52788.34. With this, we can conclude with 95% confidence that the mean population with no form of education is between 24703.71 and 52788.34. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of people with no level of education. We can further make our analysis clearer by breaking it down and showing the confidence interval for the

dt2<- data_t %>% select(c(Gender,None))
dt3 <- dt2 %>%
  dplyr::group_by(Gender)%>%
  dplyr::summarise(
    mean = mean(None),
    lci = t.test(None, conf.level = 0.95)$conf.int[1],
    uci = t.test(None, conf.level = 0.95)$conf.int[2])
dt3
t.test(dt2$None, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  dt2$None
## t = 5.4676, df = 111, p-value = 0.0000002841
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  24703.71 52788.34
## sample estimates:
## mean of x 
##  38746.03
pl2 <- ggplot(data = dt3)
pl2 <- pl2 + geom_bar(aes(x=Gender, y=mean, fill = Gender), stat="identity")
pl2 <- pl2 + geom_errorbar(aes(x=Gender, ymin=lci, ymax= uci), width = 0.4, color ="red",size =1)
pl2 <- pl2 + geom_text(aes(x=Gender, y=lci, label = round(lci,1)), size= 2, vjust = 1)
pl2 <- pl2 + geom_text(aes(x=Gender, y=uci, label = round(uci,1)), size= 2, vjust = -1)
pl2 <- pl2 + theme_classic()
pl2 <- pl2 + labs(title = "Bar chart with 95% confidence intervals for None/ No Formal Education")
pl2 <- pl2 + labs(x= "Gender", y = "Mean of Number of Population with No Formal Education")
pl2

Conclusion

  1. Ontario and Quebec has the most educated females.
  2. Most of the women who are educated at the university level are within the age group of 25-34.
  3. Based on the data, a larger percentage of women are educated than men.
  4. 56% of men have no formal level of education while 44% of women do not have a formal level of education.
  5. Prince Edward Island, Nuvanut, Northwest Territories and Yukon have extreme similarities/patterns in their education levels.
  6. Ontario and Quebec possess a decent level of similarity in their education levels
  7. No misrepresentation has been made due to the unbiasness of the data

Codebook For Education Dataset

Original Data Name Variable Name Variable Label Missing Data Range Data Type Value Label
Geographic_Name GName Name of Province / Country None - Char
NRR NRR None Response Rate None 4.3 - 6.8 Num
Age Age Age of Individual None - Num 1 25- 34
2 35-44
3 45-54
4 55-66
Gender Gender Gender of Individual None - Num

0

1

Female

Male

Total_Count TCount Total Number of Educated and Uneducated None 1085- 2549315 Num
None None Number of Uneducated None 150-411550 Num
HS HS Total Number of People at High School Level None 90-771475 Num
App App Total Number of People at Apprenticeship & Trades Level None 55-387665 Num
College College Total Number of People at College Level None 235-692990 Num
Cert Cert Total Number of People at Certification Level None 15-95360 Num
University University Total Number of People at University Level None 190-941135 Num