There are 10 provinces and 3 territories in Canada, In these provinces and territories, there are multiple schools that offer education at various levels and they have been made accessible to everybody. There are schools where the male gender are more in terms of population than the female gender, this can be due to various reasons such as type of courses made available and other external factors. Education should be a priority to everybody as we all learn and equip ourselves with knowledge that we can use to shape the world and imperatively, our future. A survey has been carried out and each gender has been represented fairly. We will be examining 5 levels of education as well as the provinces and territories in the education dataset after which we will draw some conclusions and reveal relationships between different parts of the data.
Which province or territory in the country, has the most educated number of women and what level of education do we see them attain the most? Are more women educated than men? Is there any sort of similarity between the education levels across provinces and territories? As we begin the analysis on the Education Level Data, we shall reveal answers to these questions, discover patterns that will bring to light, how much we see women in the educational sector and produce a solid data-driven report.
To achieve all of the listed above, we will also make visualizations to convey the relationships between the age, level of education and gender in order to portray our results and deductions more effectively.
We begin with a summary of the data.
| Mean | Std Deviation | Median | Max | Min | Range | |
|---|---|---|---|---|---|---|
| Age | - | - | - | 55-64 | 25-34 | - |
| Total_Count | 338060.5 | 619992.018 | 68405 | 2549315 | 1085 | 2548230 |
| No Education | 38746.03 | 74996.16 | 9270 | 411550 | 150 | 411400 |
| High School | 80260.76 | 150060.4 | 18822.5 | 771475 | 90 | 771385 |
| Apprenticeship/Trades | 36472.14 | 72901.86 | 5725 | 387665 | 55 | 387610 |
| College | 75749.91 | 142351.8 | 14697.5 | 692990 | 235 | 692975 |
| Certificate | 10372.99 | 19302.76 | 1762.5 | 95360 | 15 | 95345 |
| University | 96458.97 | 184542.8 | 14192.5 | 941135 | 190 | 940945 |
| Count of Genders Surveyed | |
|---|---|
| Male : 56 | Female : 56 |
Overall, we can see from the summary statistic that the average number of people at the university level of education is 96459, which is also the highest average observed in the data with a standard deviation of 184542.8.
In Addition to this, the number of people with no level of education across provinces and territories ranged between 150 to 411550 with an average of 38746.03 and a standard deviation of 74996.16. Roughly 95% of the people surveyed were below the age of 55-64.
Finally, seeing that the count of genders are both equal shows that there will be no form of bias as each gender has been equally represented in the data.
The first visualization will be a summary of the total number of females educated in each province. As we can see from Figure 1 below, Ontario takes the lead with a total number of over 300,000 females educated, followed by Quebec with almost 200,000 women and then we have Nuvanut being the province with the least educated total number of females.
Number of Educated Women in Canada by Province
The second visualization which is presented below as Figure 2, shows the 2 genders and the total number of people at the university level. This is also classified by Age. Out of the 4 age brackets listed by the right of the diagram, the most educated females who are at the university level are within the age group of 25-34. The least age group educated at the university level are within the 55-64. On the other hand, there aren’t so many men who are at the university level. We can see that compared to the age groups of the females and males, majority of the male gender count at the university level is between 600,000 and 700,000.
Gender vs Total Number At The University Level by Age
To gain a better insight into the gender just like the visualization done above, the geographic name has been added as a class for better analysis and for easier level of comprehension. We can see just as the diagram has been split into two, that the most educated and present at the university level are the females within the age group of 25-34 and the males within the age group of 25-34 both coming out of Ontario. The horizontal line has been placed in order to convey how many women are educated vs men and from the illustration below, we can also take away a significant measure that shows that more women are educated and possess a university degree than men. We have 55% of women at the university level and 45% of men at the university level.
| Gender | University |
|---|---|
| Female | 2984140 |
| Male | 2417550 |
From the table and diagram below, we can see that 56% of men have no formal level of education while 44% of women do not have a formal level of education.
| Gender | None |
|---|---|
| Female | 963335 |
| Male | 1200085 |
To Buttress the claim, according to scientific data from environics analytics, data showed us that as at 2017, right after the 2016 census, it was discovered that 64.7 percent of the population aged 25 to 64 who are women/females now have a postsecondary degree or diploma compared to 60.8 percent in 2006.
Our data and illustration in Figure 2, shows us that over 55% of the women have a university degree compared the men.
Optimal Number of Clusters = 3
| University | None | HS | College | Cert | App | cluster | |
|---|---|---|---|---|---|---|---|
| Alberta | 635725 | 244745 | 569435 | 496380 | 73315 | 238220 | 1 |
| British Columbia | 758005 | 244000 | 671010 | 528810 | 99705 | 231445 | 1 |
| Manitoba | 165085 | 94305 | 182325 | 137935 | 21210 | 53795 | 2 |
| New Brunswick | 80815 | 55635 | 114335 | 104590 | 7665 | 37720 | 2 |
| Newfoundland and Labrador | 52780 | 45160 | 65205 | 81960 | 6680 | 36080 | 2 |
| Northwest Territories | 5800 | 4855 | 4565 | 5135 | 660 | 2625 | 2 |
| Nova Scotia | 125975 | 60290 | 116280 | 128010 | 11600 | 52375 | 2 |
| Nunavut | 2360 | 6740 | 2410 | 3185 | 205 | 1590 | 2 |
| Ontario | 2307320 | 753000 | 1768970 | 1782530 | 170925 | 446395 | 3 |
| Prince Edward Island | 17605 | 8945 | 18670 | 20975 | 1860 | 6255 | 2 |
| Quebec | 1116305 | 580625 | 808970 | 832430 | 167025 | 866595 | 3 |
| Saskatchewan | 127580 | 69200 | 168035 | 115085 | 19375 | 66960 | 2 |
| Yukon | 6335 | 2255 | 4410 | 4975 | 670 | 2395 | 2 |
From the dendrogram below, we can see that Prince Edward Island, Nuvanut, Northwest Territories and Yukon are the most similar, as the length or height of the dendrogram link that joins them together is the smallest, it is almost impossible to see. The next 5 most similar provinces are Manitoba, Sasktachewan, Newfoundland and Labrador, New Brunswick and Nova Scotia. After which we have Ontario and Quebec grouped together as they share similar features and finally, Alberta and British Columbia sharing similar features as well.
The dendrogram below also shows us that there is a significant difference between the clusters A and B versus C where A is the red outlined rectangle, B is the lime outlined rectangle and C is the blue outlined rectangle.
Dendrogram of Provinces Using Features From The Education Dataset
The confidence interval for the total number of educated people is 196413.3 or 402215.7. With this, we can conclude with 95% confidence that the mean number of educated is between 196413.3 and 402215.7. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of number of educated. We can further make our analysis clearer by breaking it down and showing the confidence interval for the
The confidence interval for the total number of people with no form of education is 24703.71 or 52788.34. With this, we can conclude with 95% confidence that the mean population with no form of education is between 24703.71 and 52788.34. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of people with no level of education. We can further make our analysis clearer by breaking it down and showing the confidence interval for the
There are 10 provinces and 3 territories in Canada, In these provinces and territories, there are multiple schools that offer education at various levels and they have been made accessible to everybody. There are schools where the male gender are more in terms of population than the female gender, this can be due to various reasons such as type of courses made available and other external factors. We shall reflect through some possible reasons based on external data to solidify the claim of my result in the hypothesis testing section of this project. Education should be a priority to everybody as we all learn and equip ourselves with knowledge that we can use to shape the world and imperatively, our future. We will be examining 5 levels of education.
Which province or territory in the country, has the most educated number of women and what level of education do we see them attain the most? Are there cases where we see a certain age gap of women being the most exposed to the learning center? Does the age play a factor in what level of education a woman receives? As we begin the analysis on the Education Level Data, we shall reveal answers to these questions, discover patterns that will bring to light, how much we see women in the educational sector and produce a solid data-driven report.
To achieve all of the listed above, we will also make visualizations to convey the relationships between the age, level of education and gender in order to portray our results and deductions more effectively.
We begin with a summary of the data.
| Mean | Std Deviation | Median | Max | Min | Range | |
|---|---|---|---|---|---|---|
| Age | - | - | - | 55-64 | 25-34 | - |
| No Education | 38746.03 | 74996.16 | 9270 | 411550 | 150 | 411400 |
| High School | 80260.76 | 150060.4 | 18822.5 | 771475 | 90 | 771385 |
| Apprenticeship/Trades | 36472.14 | 72901.86 | 5725 | 387665 | 55 | 387610 |
| College | 75749.91 | 142351.8 | 14697.5 | 692990 | 235 | 692975 |
| Certificate | 10372.99 | 19302.76 | 1762.5 | 95360 | 15 | 95345 |
| University | 96458.97 | 184542.8 | 14192.5 | 941135 | 190 | 940945 |
table(education$Gender)
##
## 0 1
## 56 56
education %>%
summarise_at(vars(Age,None,HS,App, College,Cert,University),
list(mean=mean, sd=sd,
median=median,
max = max,
min=min))
t1 <- education %>% select(2,4) %>% group_by(Gender)
t1 %>% filter(Gender==0)
table(t1)
## Gender
## NRR 0 1
## 4.3 4 4
## 4.6 4 4
## 4.9 4 4
## 5.1 4 4
## 5.3 4 4
## 5.5 4 4
## 5.6 4 4
## 6.1 8 8
## 6.3 4 4
## 6.8 8 8
## 8.7 4 4
## 8.8 4 4
| Count of Genders Surveyed | |
|---|---|
| Male : 56 | Female : 56 |
Overall, we can see from the summary statistic that the average number of people at the university level of education is 96458.97, which is also the highest average observed in the data with a standard deviation of 184542.8. Thus, we can conclude that 95% of the total number of people with the university level of education across the provinces and territories does not exceed 465,545.
In Addition to this, the number of people with no level of education across provinces and territories ranged between 150 to 411550 with an average of 38746.03 and a standard deviation of 74996.16. Roughly 95% of the people surveyed were below the age of 55-64.
The first visualization will be a summary of the total number of females educated in each province. As we can see from Figure 1 below, Ontario takes the lead with a total number of over 300,000 females educated, followed by Quebec with almost 200,000 women and then we have Nuvanut being the province with the least educated total number of females.
genders_select <- education %>% select(-c(NRR))
fem <- genders_select %>% filter(Gender == 0)
barp_Female <- fem %>% mutate(NumOfEducated = Total_Count - None) %>% select(-c(None, Total_Count)) %>% filter(Geographic_Name != "Canada")
aggr_female <- aggregate(NumOfEducated ~ Geographic_Name, barp_Female, sum)
aggr_female %>%
filter(!is.na(NumOfEducated)) %>%
arrange(NumOfEducated) %>%
tail(20) %>%
mutate(Geographic_Name=factor(Geographic_Name, Geographic_Name)) %>%
ggplot( aes(x=Geographic_Name, y=NumOfEducated) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Number of Educated Females In Each Province")
everything <- data.frame(Gender=c("Female","Male"),
HS=c(4495155, 4494050),
App=c(1329320, 2755560),
College=c(4911850, 3572140),
Certification=c(681700, 480075),
University=c(5968300, 4835105))
fem_age_Adjust <- fem %>%
mutate(Age = case_when(Age == "1" ~ "25-34",
Age == "2" ~ "35-44",
Age == "3" ~ "45-54",
Age == "4" ~ "55-64"))
barp_Female <- barp_Female %>%
mutate(Age = case_when(Age == "1" ~ "25-34",
Age == "2" ~ "35-44",
Age == "3" ~ "45-54",
Age == "4" ~ "55-64"))
genders_select11 <- genders_select %>%
mutate(Age = case_when(Age == "1" ~ "25-34",
Age == "2" ~ "35-44",
Age == "3" ~ "45-54",
Age == "4" ~ "55-64")) %>% mutate(Gender = case_when(Gender == "0" ~ "Female",
Gender == "1" ~ "Male"))
val <- genders_select11 %>% select(c(Gender,University))
val3 <- genders_select %>% select(c(Gender, Age,University, None, Geographic_Name))
val1 <- aggregate(University ~ Gender, val, sum)
val2 <- genders_select11 %>% select(c(Gender,Age,University, Geographic_Name, None)) %>% filter(Geographic_Name != "Canada")
val2a <- aggregate(University ~ Age + Gender, val2, sum)
val4 <- aggregate(University ~ Age + Gender + None, val2, sum)
The second visualization which is presented below as Figure 2, shows the 2 genders and the total number of people at the university level. This is also classified by Age. Out of the 4 age brackets listed by the right of the diagram, the most educated females who are at the university level are within the age group of 25-34. The least age group educated at the university level are within the 55-64. On the other hand, there aren’t so many men who are at the university level. We can see that compared to the age groups of the females and males, majority of the male gender count at the university level is between 600,000 and 700,000.
ggplot(val2a, aes(x = Gender, y = University,
color = Age)) +
geom_point()
To gain a better insight into the gender just like the visualization done above, the geographic name has been added as a class for better analysis and for easier level of comprehension. We can see just as the diagram has been split into two, that the most educated and present at the university level are the females within the age group of 25-34 and the males within the age group of 25-34 both coming out of Ontario. The horizontal line has been placed in order to convey how many women are educated vs men and from the illustration below, we can also take away a significant measure that shows that more women are educated and possess a university degree than men. There are 6 groups above 150,000 for women and 4 groups above 150,000 for men.
valw_prov <- genders_select11 %>% select(c(Gender,University,None,Geographic_Name,Age)) %>% filter(Geographic_Name != "Canada")
xy <- ggplot(valw_prov, aes(x = Gender, y = University, color=Age, size=Geographic_Name)) +
geom_point(position = "jitter") + geom_ref_line(v=1.5, colour = "black")
xy + geom_ref_line(h=150000, size = 1.5, colour = "black")
## Warning: Using size for a discrete variable is not advised.
ggplot(val4, aes(x = Gender, y = None,
color = Age)) +
geom_point(position = "jitter") + geom_ref_line(h=50000, colour = "yellow")
ttt1 <-aggregate(None ~ Gender, val4, sum)
ttt1
ttt <-aggregate(University ~ Gender, val4, sum)
ttt
To Buttress the claim, according to scientific data from environics analytics, data showed us that as at 2017, right after the 2016 census, it was discovered that 64.7 percent of the population aged 25 to 64 who are women/females now have a postsecondary degree or diploma compared to 60.8 percent in 2006.
Our data and illustration in Figure 2, shows us that over 55% of the women have a university degree compared the men.
cl1 <- genders_select %>% select(-c(Gender, Age)) %>% filter(Geographic_Name != "Canada")
cl1_final<- aggregate(cbind(University,None,HS,College, Cert,App) ~ Geographic_Name, cl1, sum)
cl1_final <- cl1_final%>%
na.omit() %>%
remove_rownames %>% column_to_rownames(var="Geographic_Name")
df <- scale(cl1_final)
set.seed(123)
head(df, n=5)
## University None HS College
## Alberta 0.3290878 0.3260308 0.4403318 0.3357524
## British Columbia 0.5118260 0.3229104 0.6402769 0.3997749
## Manitoba -0.3742481 -0.3040769 -0.3216738 -0.3718805
## New Brunswick -0.5001833 -0.4660436 -0.4555086 -0.4377094
## Newfoundland and Labrador -0.5420794 -0.5099174 -0.5522184 -0.4823850
## Cert App
## Alberta 0.4548789 0.3238777
## British Columbia 0.8741571 0.2968241
## Manitoba -0.3729531 -0.4125577
## New Brunswick -0.5881528 -0.4767475
## Newfoundland and Labrador -0.6038023 -0.4832962
Distance <- dist(df, method = "euclidean")
h_clust <- hclust(d = Distance,
method = "ward.D2")
Optimal Number of Clusters = 3
library(factoextra)
fviz_nbclust(df, FUN = hcut, method = "wss")
group <- cutree(h_clust, k=3)
table(group)
## group
## 1 2 3
## 2 9 2
cl1_final$cluster <- group
cl1_final %>% knitr::kable()
| University | None | HS | College | Cert | App | cluster | |
|---|---|---|---|---|---|---|---|
| Alberta | 635725 | 244745 | 569435 | 496380 | 73315 | 238220 | 1 |
| British Columbia | 758005 | 244000 | 671010 | 528810 | 99705 | 231445 | 1 |
| Manitoba | 165085 | 94305 | 182325 | 137935 | 21210 | 53795 | 2 |
| New Brunswick | 80815 | 55635 | 114335 | 104590 | 7665 | 37720 | 2 |
| Newfoundland and Labrador | 52780 | 45160 | 65205 | 81960 | 6680 | 36080 | 2 |
| Northwest Territories | 5800 | 4855 | 4565 | 5135 | 660 | 2625 | 2 |
| Nova Scotia | 125975 | 60290 | 116280 | 128010 | 11600 | 52375 | 2 |
| Nunavut | 2360 | 6740 | 2410 | 3185 | 205 | 1590 | 2 |
| Ontario | 2307320 | 753000 | 1768970 | 1782530 | 170925 | 446395 | 3 |
| Prince Edward Island | 17605 | 8945 | 18670 | 20975 | 1860 | 6255 | 2 |
| Quebec | 1116305 | 580625 | 808970 | 832430 | 167025 | 866595 | 3 |
| Saskatchewan | 127580 | 69200 | 168035 | 115085 | 19375 | 66960 | 2 |
| Yukon | 6335 | 2255 | 4410 | 4975 | 670 | 2395 | 2 |
From the dendrogram below, we can see that Prince Edward Island, Nuvanut, Northwest Territories and Yukon are the most similar, as the length or height of the dendrogram link that joins them together is the smallest, it is almost impossible to see. The next 5 most similar provinces are Manitoba, Sasktachewan, Newfoundland and Labrador, New Brunswick and Nova Scotia. After which we have Ontario and Quebec grouped together as they share similar features and finally, Alberta and British Columbia sharing similar features as well.
The dendrogram below also shows us that there is a significant difference between the clusters A and B versus C where A is the red outlined rectangle, B is the lime outlined rectangle and C is the blue outlined rectangle.
plot(h_clust, cex = 0.6, hang = -1)
rect.hclust(h_clust, k = 3, border = 2:5)
fviz_cluster(list(data = df, cluster = group))
The confidence interval for the total number of educated people is 196413.3 or 402215.7. With this, we can conclude with 95% confidence that the mean number of educated is between 196413.3 and 402215.7. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of number of educated. We can further make our analysis clearer by breaking it down and showing the confidence interval for the
data_t<- genders_select11 %>% select(Gender,Total_Count,None) %>% mutate(NumOfEducated = Total_Count - None)
data_gr <- data_t %>% select(c(Gender,NumOfEducated))
dt <- data_gr %>%
dplyr::group_by(Gender)%>%
dplyr::summarise(
mean = mean(NumOfEducated),
lci = t.test(NumOfEducated, conf.level = 0.95)$conf.int[1],
uci = t.test(NumOfEducated, conf.level = 0.95)$conf.int[2])
dt
t.test(data_gr$NumOfEducated, conf.level = 0.95)
##
## One Sample t-test
##
## data: data_gr$NumOfEducated
## t = 5.7639, df = 111, p-value = 0.00000007485
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 196413.3 402215.7
## sample estimates:
## mean of x
## 299314.5
pl1 <- ggplot(data = dt)
pl1 <- pl1 + geom_bar(aes(x=Gender, y=mean, fill = Gender), stat="identity")
pl1 <- pl1 + geom_errorbar(aes(x=Gender, ymin=lci, ymax= uci), width = 0.4, color ="red",size =1)
pl1 <- pl1 + geom_text(aes(x=Gender, y=lci, label = round(lci,1)), size= 2, vjust = 1)
pl1 <- pl1 + geom_text(aes(x=Gender, y=uci, label = round(uci,1)), size= 2, vjust = -1)
pl1 <- pl1 + theme_classic()
pl1 <- pl1 + labs(title = "Bar chart with 95% confidence intervals for Total Count - None")
pl1 <- pl1 + labs(x= "Gender", y = "Mean of Number of Educated")
pl1
The confidence interval for the total number of people with no form of education is 24703.71 or 52788.34. With this, we can conclude with 95% confidence that the mean population with no form of education is between 24703.71 and 52788.34. If repeated samples were taken and the 95% confidence interval is computed for each of those samples, then 95% of the intervals would contain the population mean of people with no level of education. We can further make our analysis clearer by breaking it down and showing the confidence interval for the
dt2<- data_t %>% select(c(Gender,None))
dt3 <- dt2 %>%
dplyr::group_by(Gender)%>%
dplyr::summarise(
mean = mean(None),
lci = t.test(None, conf.level = 0.95)$conf.int[1],
uci = t.test(None, conf.level = 0.95)$conf.int[2])
dt3
t.test(dt2$None, conf.level = 0.95)
##
## One Sample t-test
##
## data: dt2$None
## t = 5.4676, df = 111, p-value = 0.0000002841
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 24703.71 52788.34
## sample estimates:
## mean of x
## 38746.03
pl2 <- ggplot(data = dt3)
pl2 <- pl2 + geom_bar(aes(x=Gender, y=mean, fill = Gender), stat="identity")
pl2 <- pl2 + geom_errorbar(aes(x=Gender, ymin=lci, ymax= uci), width = 0.4, color ="red",size =1)
pl2 <- pl2 + geom_text(aes(x=Gender, y=lci, label = round(lci,1)), size= 2, vjust = 1)
pl2 <- pl2 + geom_text(aes(x=Gender, y=uci, label = round(uci,1)), size= 2, vjust = -1)
pl2 <- pl2 + theme_classic()
pl2 <- pl2 + labs(title = "Bar chart with 95% confidence intervals for None/ No Formal Education")
pl2 <- pl2 + labs(x= "Gender", y = "Mean of Number of Population with No Formal Education")
pl2
| Original Data Name | Variable Name | Variable Label | Missing Data | Range | Data Type | Value | Label | |
|---|---|---|---|---|---|---|---|---|
| Geographic_Name | GName | Name of Province / Country | None | - | Char | |||
| NRR | NRR | None Response Rate | None | 4.3 - 6.8 | Num | |||
| Age | Age | Age of Individual | None | - | Num | 1 | 25- 34 | |
| 2 | 35-44 | |||||||
| 3 | 45-54 | |||||||
| 4 | 55-66 | |||||||
| Gender | Gender | Gender of Individual | None | - | Num | 0 1 |
Female Male |
|
| Total_Count | TCount | Total Number of Educated and Uneducated | None | 1085- 2549315 | Num | |||
| None | None | Number of Uneducated | None | 150-411550 | Num | |||
| HS | HS | Total Number of People at High School Level | None | 90-771475 | Num | |||
| App | App | Total Number of People at Apprenticeship & Trades Level | None | 55-387665 | Num | |||
| College | College | Total Number of People at College Level | None | 235-692990 | Num | |||
| Cert | Cert | Total Number of People at Certification Level | None | 15-95360 | Num | |||
| University | University | Total Number of People at University Level | None | 190-941135 | Num |