Group Members: Nithin Reddy Padicherla, Chegu Hitesh Sai Sushanth, Tirumala Naga sai Gottumukkala, Sai Pavani Gutha, Susenthar Raj Jegadeesh Chandra Bose
The list of libraries that we utilized to make tables, format material, and style graphs for our exploratory analysis is provided below.
# List of all the libraries used in this project
library(stargazer)
library(dplyr)
library(gmodels)
library(epiDisplay)
library(ggplot2)
library(kableExtra)
library(xtable)
library(janitor)
library(naniar)
library(summarytools)
library(vcd)
library(prettydoc)
The Canadian Internet Use Survey (CIUS), which provides statistics on the adoption, use, and location of internet access for people older than 15 living in Canada’s ten provinces, was used as the dataset for this analysis. The data set has 23 dimensions, which include information on variables like province, area, age, gender, education levels levels, internet use, and their accessibility.
The aforementioned data can be examined using various exploratory and predictive analysis techniques, and the findings can be applied to conduct evidence-based policy-making, resource management, and development planning in all of the provinces, as well as provide internationally comparable statistics on the use and access trends of the internet in Canada.
Some of the benefits include
Additional information about the CIUS data set can be found here Link
According to our preliminary analysis, the implementation of public programs depends on evidence-based policy formulation. The Internet usage data in this case contains a number of characteristics that can be used to interpret and evaluate different aspects of how the general public uses the internet.
Analyzing where the survey respondents live like
region and province can reveal information the
usage patterns of the individuals based on the location.
Analyzing the physiological variables like age and
gender can help us understand the influencing factors on
the personality of the respondents which can eventually combined with
Internet usage patterns in different regions to gain useful
insights.
Also, analyzing the educational and
employement status can reveal information on the specific
purposes of the use of internet like educational research and workplace
usages.
Finally analyzing on how many years did the respondents use
internet and where have they used them from, like work,from
home,from school,from
public library or from a friends place can
reveal information on the pattern of usage at all these different
places, and their purposes. Combining them with regional,
educational and physiological variables that
are available above can provide detailed and specific insights into the
usage dynamics of the respondents.
The dataset Canadian Internet Use Survey (CIUS) used as part of this analysis is a categorical dataset which contains mostly nominal and ordinal variables. So as part of our analysis, we would like to use contingency tables, relative frequency tables,bar charts, mosaic plots, density plots etc. for exploratory analysis. Along with that we also are considering logistic and chi-square tests for predictive analysis to understand the relation between the variables and draw effective conclusions and provide recommendations.
Before conducting the analysis, this dataset required pre-processing, so we used a few pre-processing techniques to enhance the dataset and improve our results efficiency.
The variable names in the dataset were unclear and were encoded in accordance with the needs of the survey to be compatible with their processing systems. It is not practical to interpret the data using the encoded variable headers. Therefore, we changed the header names to a more comprehensible and meaningful format.
# Renames all the columns specified below
locationofUse <- read.csv("~/University of Windsor/locationofUse.csv")
locationofUse <-
locationofUse %>% rename(
"Customer ID" = "PUMFID",
"Province" = "PROVINCE",
"Region" = "REGION",
"Community" = "G_URBRUR",
"Age" = "GCAGEGR6",
"Gender" = "CSEX",
"Education" = "G_CEDUC",
"Student_Status" = "G_CSTUD",
"Employment" = "G_CLFSST",
"Houshold_Type" = "GFAMTYPE",
"House_Size" = "G_HHSIZE",
"Household_Education" = "G_HEDUC",
"Student_Household" = "G_HSTUD",
"Internet_User" = "EV_Q01",
"Internet_Usage_Years" = "EV_Q02",
"Internet_Usage_Home" = "LU_Q01",
"Internet_Usage_Work" = "LU_Q02",
"Internet_Usage_School" = "LU_G03",
"Internet_Usage_Library" = "LU_Q04",
"Internet_Usage_Others" = "LU_Q05",
"Internet_Usage_Relatives" = "LU_Q06A",
"Internet_Usage_Neighbours" = "LU_Q06B",
"Internet_Others" = "LU_G06",
)
We checked if the dataset already has any missing values that might hinder our analysis and we found no missing values.
# check if there are any missing values
sapply(locationofUse, function(x) sum(is.na(x)))
This dataset has 23 different dimensions and each of these variables
have different levels. So, for better interpretation and analysis we
have reassigned few levels in the dataset. Starting with
2[NO] as 0, 6,7,8,9 as
NA for the columns 16 to 23 and would interpret all of the
values as other category. The already existing 1[YES] is
interpreted as 1 with no change.
# This reassigns values 6,7,8,9 to 'NA' and 2 to '0' for the columns in the dataset.
Recode_columns <- function(startcol, endCol) {
for (i in startcol:endCol) {
locationofUse[, i] <<-
ifelse(locationofUse[, i] == 2, 0, locationofUse[, i])
locationofUse[, i] <<-
ifelse(locationofUse[, i] == 6, NA, locationofUse[, i])
locationofUse[, i] <<-
ifelse(locationofUse[, i] == 7, NA, locationofUse[, i])
locationofUse[, i] <<-
ifelse(locationofUse[, i] == 8, NA, locationofUse[, i])
locationofUse[, i] <<-
ifelse(locationofUse[, i] == 9, NA, locationofUse[, i])
}
}
# Function call - This calls the function 'Recode_columns' and parses startcol and endcol values.
Recode_columns(16, 23)
After processing, R interpreted the data in this dataset as numeric
datatype for different variables. This can cause an issue while working
with categorical variables because numeric variables are sometimes
interpreted as continuous in nature but the categorical once here are
discrete which can cause logic issues while executing the code. So, we
will be using as.character, as.factor build in
functions to change the numerical data type to a character or a factor
when appropriately needed.
#Change the datatype to character or factor for the mentioned columns
locationofUse <- locationofUse %>% mutate_at(c('column name(s)'), as.character)
#or
locationofUse <- locationofUse %>% mutate_at(c('column name(s)'), as.factor)
As part of exploratory analysis we wanted to understand all the individual dimensions and their underlying patterns and develop an effective analysis to get maximum insights from the available data.
This frequency table’s purpose is to show how frequently a particular province was chosen by clients. we can accomplish this by counting each province in the table. Based on this we would like to understand which provinces was selected the most. This data can also used to understand the dynamics of the provinces like the total observations, least and highest repeated provinces in the dataset, and each province’s respondent count contribution to the dataset.
# Bind the frequency, cumulative and relative frequency of the provinces
cbind(
Frequency = table(locationofUse$Province),
Cummulative_Frequency = cumsum(table(locationofUse$Province)),
Relative_Frequency = prop.table(table(locationofUse$Province))
) %>%
kable(caption = " Table:1 A Frequency Table on Provinces") %>%
kable_classic(font_size = "13", full_width = F)
| Frequency | Cummulative_Frequency | Relative_Frequency | |
|---|---|---|---|
| 10 | 882 | 882 | 0.0380533 |
| 11 | 592 | 1474 | 0.0255415 |
| 12 | 1240 | 2714 | 0.0534990 |
| 13 | 1084 | 3798 | 0.0467685 |
| 24 | 4437 | 8235 | 0.1914315 |
| 35 | 6518 | 14753 | 0.2812149 |
| 46 | 2023 | 16776 | 0.0872810 |
| 47 | 1627 | 18403 | 0.0701959 |
| 48 | 2242 | 20645 | 0.0967297 |
| 59 | 2533 | 23178 | 0.1092847 |
We found that Ontario [35] was the most selected
province by the respondents in the survey with an occurrence of
6518 times and it had a relative frequency of
0.28.
We found that Prince Edward Island [11] was the
least selected province by the respondents in the survey with the lowest
occurrence of 592 and with a relative frequency of
0.02.
We also found that Ontario[11] is followed by
Quebec [24], British Colombia [59], and
Alberta [48] with occurrences of
4437,2533,2242 and relative
frequencies 0.19,0.10,0.09
respectively.
One of the reasons for Ontario [35] being the highest
occurrence may be possibly due to the volume of respondents responding
the survey might have highly been from the province and it also may have
to do with the population of the region [highest populated province in
Canada]. This reasoning might also be valid for
Prince Edward Island [11] being selected the least number
of times and so on for the other provinces.
The objective of using the density plot is to help us understand the
distribution of the age of the respondents of this survey, which helps
us in providing the probability density function of the age
of the survey respondents. This can further be combined and analysed on
how different age sections in the data set are in relation with the
region/province and gender.
# Filled Density Plot
dplot_variable <- density(locationofUse$Age)
plot(dplot_variable,xlab=" Fig:1 Age Ranges of Respondents", main="Age Distribution of Respondents ")
polygon(dplot_variable, col="#fb8072", border="black")
We found that respondents above 65 or older [6] are
the once with the highest amount of density around 0.49,
which shows that people in this age range are the once that mostly
responded to this survey and their data is the largest part of this
survey.
We found that respondents from the 16 to 24 [1] are
the once with least density in the dataset of around 0.20,
which shows that the people in this age range are the once that least
responded to this survey and contribute the least amount of
responses.
Finally, We also found that respondents of age
45 to 54 [4] and 55 to 64 [5] are around the
same density of 0.40 in this survey, which shows that their
responses are almost the same in number.
One of the reasons for a higher density of respondents above the age
of 45 may be due to the fact that the average age of
respondents who take internet surveys is around 53.51 years
according to (Price, 2012). So, there is higher probability
that respondents who mostly take surveys can be in a higher number when
their age is more than 50 years.
The objective of using the below pivot table is to summarize and organize education levels based on the provinces. This would help us understand how education levels are distributed among different provinces. Based on the findings we can have an understanding on what level of education do respondents hold in different provinces which can provide supporting and additional information on which province’s respondents has highest and lowest education levels and what are their percentages.
#Create a pivot table with province and education as variables
locationofUse %>%
tabyl(Province, Education) %>%
adorn_totals(c("row", "col")) %>%
adorn_percentages("row") %>%
adorn_pct_formatting() %>%
adorn_ns() %>%
adorn_title("combined") %>%
kable(caption = "Table:2 A Pivot Table on Provinces and Education") %>%
kable_classic(font_size = "13")
| Province/Education | 1 | 2 | 3 | Total |
|---|---|---|---|---|
| 10 | 43.3% (382) | 44.2% (390) | 12.5% (110) | 100.0% (882) |
| 11 | 38.3% (227) | 45.8% (271) | 15.9% (94) | 100.0% (592) |
| 12 | 39.2% (486) | 42.9% (532) | 17.9% (222) | 100.0% (1240) |
| 13 | 41.9% (454) | 42.6% (462) | 15.5% (168) | 100.0% (1084) |
| 24 | 39.4% (1748) | 42.8% (1900) | 17.8% (789) | 100.0% (4437) |
| 35 | 38.2% (2489) | 40.8% (2657) | 21.0% (1372) | 100.0% (6518) |
| 46 | 43.4% (878) | 39.2% (794) | 17.4% (351) | 100.0% (2023) |
| 47 | 43.0% (699) | 39.5% (642) | 17.6% (286) | 100.0% (1627) |
| 48 | 38.9% (872) | 43.2% (969) | 17.9% (401) | 100.0% (2242) |
| 59 | 33.4% (847) | 44.8% (1136) | 21.7% (550) | 100.0% (2533) |
| Total | 39.2% (9082) | 42.1% (9753) | 18.7% (4343) | 100.0% (23178) |
We found that in all the 10 provinces
39.2% [9082] respondents have
high school level or less education [1] in which
British Colombia [59] has the least number of respondents
33.4% [847] that have level [1] education and also
Manitoba [46] has the highest count of respondents
43.4% [699] that have level [1] education.
We found that in all the 10 provinces
42.1% [9753] respondents have
College or some post-secondary level education [2] in which
Prince Edward Island [11] has the highest number of
respondents 45.8% [271] that have level [2] education and
also Manitoba [46] has the lowest count of respondents
39.2% [699] that have level [1] education.
Finally, We found that in all the 10 provinces
18.7% [4343] respondents have
University Certificate or degree [3] in which
Newfoundland and Labrador [10] has the least number of
respondents 12.5% [110] that have level [3] education and
also British Colombia [59] has the highest count of
respondents 21.7% [550] that have level [3]
education.
16 and in some provinces like Nova Scotia,
Manitoba, New Brunswick till the age of 18(Nair, 2022). So,
that may be the reason why significant portion have a college
degree.Our objective is to understand which region has the highest number of users who have ever used the Internet (E-mail or World Wide Web) from home, work, school, or any other location for personal non-business use. Based on this we can identify in which region has the most Internet users concentrated in.
#Create a subset for the columns
Internet_Userset <- locationofUse[c(3, 14)]
# Change the datatype of the variables for processing
Internet_Userset$Internet_User <-
as.character(Internet_Userset$Internet_User)
Internet_Userset$Region <-
as.character(Internet_Userset$Region)
# Create a plot with Internet users and region variables
Internet_Userset %>%
filter(Internet_User == "1") %>% # filter on values
ggplot(aes(Region, ..count..)) + geom_bar(aes(fill = Internet_User),
position = "dodge2" ,
show.legend = FALSE, colour="Black") + ggtitle("Fig:2 Internet Users Across the Regions") +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) + labs(x =
"Region", y = "Count") +
scale_x_discrete(
labels = c(
"1" = "Atlantic Regions",
"2" = "Quebec",
"3" = "Ontario",
"4" = "Manitoba/Saskatchewan",
"5" = "Alberta",
"6" = "British Columbia"
)
) +
geom_bar(fill = "00BFC4")
We found that Ontario[3] followed by
Quebec[2] regions has the highest count of users who have
used internet [e-mail or world wide web] from home, work, school or
other locations for personal non-business users and their count is above
5000 and above 3000 respectively.
We also found that Atlantic and
Manitoba/Saskatchewan[4] regions had similar user counts of
around 2700 users using internet for personal non-business
purposes.
Finally we also found that Alberta[5] region had the
lowest count of internet users of around 1800 who used
internet for personal non-business purposes.
One of the reasons for Ontario [1] and
Quebec[2] regions have the highest occurrence may be
possibly due to the volume of respondents responding the survey might
have highly been from the province and it also may have to do with the
population of the provinces [1st and 2nd most highly populated provinces
in Canada].
Our objective is to understand based on the survey, if the respondents have used internet for personal non-business related use from their home. These results can help us understand if respondents are using internet at home for recreational/personal use. Based on this we can identify which province has the most number of internet users who prefer to use internet from home for personal use. This can further be combined with other variables to understand the rise of home internet usage in recent years.
#Create a subset for the columns
Internet_Province <- locationofUse[c(2, 16)]
# Change the datatype of the variables for processing
Internet_Province$Internet_Usage_Home <-
as.character(Internet_Province$Internet_Usage_Home)
Internet_Province$Province <-
as.character(Internet_Province$Province)
# Create a plot with Internet usage at home and province variables
Internet_Province %>%
filter(Internet_Usage_Home != "NA" & Internet_Usage_Home != "0") %>% # filter on non-missing values
ggplot(aes(Province, ..count..)) +geom_bar(aes(fill = Internet_Usage_Home),
position = "dodge2" ,
show.legend = FALSE, colour="Black") + ggtitle(" Fig:3 Internet Usage At Home by Province") +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) + labs(x =
"Province", y = "Internet Users") +
scale_x_discrete(
labels = c(
"10" = "NL",
"11" = "PE",
"12" = "NS",
"13" = "NB",
"24" = "QC",
"35" = "ON",
"46" = "MB",
"47" = "SK",
"48" = "AB",
"59" = "BC"
)
) +
geom_bar(fill = "00BFC4")
We found that Ontario [ON] followed by
Quebec [QC] provinces have the most number of Internet
users for personal non-business use from home which are close to
5000 and 3000 users respectively.
we found that British Columbia [BC] was the next
province that had a user count of around 2000 who used
internet at home.
we found that Newfoundland [NL],
Prince Edward Islands [PE], Nova Scotia [NS],
and New Brunswick [NB] where the only provinces that had a
user count of below 1000 users using internet usage at home
for personal non-business purposes.
Similar reason like population density might apply to this finding as
well for Ontario and Quebec having the most
number of Internet users for personal use. Along with that as regions
tend to advance and modernize so does there communications means. This
can include social media, shopping, online entertainment, information
seeking etc. which can eventually mean more of internet usage for
personal non-business uses in the regions.
Our objective is to understand if respondent has used internet, how
many years have they used them and in which province. Based on this we
can identify how many users (respondents) belong to which section of the
usage years like if its less than 1 year, or
1 to 2, or 2 to 5 years, or greater than
5 years. This can further analysed based on usage patterns
like [Home, work,school] etc. for further analysis.
#Create a subset for the columns
Internet_yearsset <- locationofUse[c(2, 15)]
# Change the datatype of the variables for processing
Internet_yearsset$Province <-
as.character(Internet_yearsset$Province)
Internet_yearsset$Internet_Usage_Years <-
as.character(Internet_yearsset$Internet_Usage_Years)
# Change the values 6,7,8 in this subset to NA
Internet_yearsset[Internet_yearsset == "6" |
Internet_yearsset == "7" | Internet_yearsset == "8"] <- NA
# Create a plot with internet usage years and provinces
Internet_yearsset %>%
filter(!is.na(Internet_Usage_Years)) %>% # filter values
ggplot(aes(Province, ..count..)) + geom_bar(aes(fill = Internet_Usage_Years), position = "stack", colour="Black") +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) + labs(title = " Fig:4 Internet Usage Years Across the Provinces", x = "Provinces", y =
"Count") +
scale_x_discrete(
labels = c(
"10" = "Newfoundland and Labrador",
"11" = "Prince Edward Island",
"12" = "Nova Scotia",
"13" = "New Brunswick ",
"24" = "Quebec",
"35" = "Ontario",
"46" = "Manitoba",
"47" = "Saskatchewan",
"48" = "Alberta",
"59" = "British Columbia"
)
) +
scale_fill_discrete(name = "Internet Usage Years", labels = c("<1", "1-2", "2-5", ">5")) +
coord_flip()
We found that a significant portion of the users in all the
10 provinces have been using internet for more than
5 years and in that respondents of Ontario
province has the largest user set with a count of 4000
respondents using Internet for greater than five years.
We found that very less respondents have been using internet for less than a year in all the provinces.
We also found that Ontario and Quebec
has the most number of users who have been using internet for a minimum
of 2 years and more.
Finally, we found that Prince Edward Islands has the
lowest user count <1000 who have been using internet
greater than 5 years.
Since Canada is already a developed country it is likely that all the
province might have access to technology from a long time. This is
evident with the results where in most of the provinces have longest
internet usage users. That being said,
Prince Edward Islands respondents being the least users who
have used internet greater than 5 years might be because of
the population settlement speed and the density which is lower in the
province.
Our objective is to find how many respondents who are employed where using internet for personal non-business use from work place. This will help us identify if users are using internet for personal uses at work which can help us further interpret reasons and usage patterns.
#Create a subset for the columns
Internet_Workset <- locationofUse[c(9,17)]
# Change the datatype of the variables for processing
Internet_Workset$Internet_Usage_Work <-
as.character(Internet_Workset$Internet_Usage_Work)
Internet_Workset$Employment <-
as.character(Internet_Workset$Employment)
#Create a plot with internet usage frequency who are employed
Internet_Workset %>%
filter(!is.na(Internet_Usage_Work) & Employment == "1") %>% # filter on non-missing values
ggplot(aes(Internet_Usage_Work,
..count..)) + geom_bar(aes(fill = Employment), position = "dodge2", show.legend = FALSE, colour="Black")+ theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) + labs(title=" Fig:5 Internet Usage At Work",
x="Internet Usage at Work", y= "Employee count") +
scale_x_discrete(labels=c("0" = "No", "1" = "Yes"))
6000 and who are not using are around 5500 for
all provinces.The results show that a large portion of people who are employed use internet at work for personal use. This might be due to not having stringent internet usage policies at workplaces, might also be due to people wanting to finish personal tasks during office hours while work loads are low, and finally might also be to kill boredom during office hours.
As part of this analysis our objective is to understand how different age groups of respondents use internet. This will help us understand which age group has been using the internet or world wide web services more than the other. This can further be analysed on how individual age category uses internet at home, work, school, and other places.
#Create a subset for the columns
Internet_ageset <- locationofUse[c(5, 14)]
# Change the datatype of the variables for processing
Internet_ageset$Internet_User <-
as.character(Internet_ageset$Internet_User)
Internet_ageset$Age <-
as.character(Internet_ageset$Age)
#Create a plot with internet user and age
Internet_ageset %>%
filter(!is.na(Internet_User) &
Internet_User == "1") %>% # filter values
ggplot(aes(Age, ..count..)) + geom_bar(aes(fill = Internet_User), show.legend = FALSE, colour="Black") + theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) +
labs(title = " Fig:6 Internet Users Among Different Age Groups", x = "Age Groups", y =
"Count") +
scale_x_discrete(labels = c(
"1" = "16-24",
"2" = "25-34",
"3" = "35-44",
"4" = "45-54",
"5" = "55-64",
"6" = ">65"
))
We found that the age group 45-54 was the group that
used the internet the most with a count of almost 4000
users.
We also found that the age group greater than 65 was
the group that used the internet the least with a count of around
2000 users.
Finally, we also found that the age groups 16-24 and
greater than 65 were using internet the same with a count
of around 2000 users.
The respondents in the age group 45-54 where the once
that used the internet the most mainly because there are the working
class and they where the generation that started the internet revolution
so it is likely that the usage pattern of internet grew along side the
generation.
Our objective is to analyse the frequency distribution of combination of educations and gender variables. This will help us understand how different genders get educated in different levels.
# Create a contingency table
CrossTable(locationofUse$Gender, locationofUse$Education)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 23178
##
##
## | locationofUse$Education
## locationofUse$Gender | 1 | 2 | 3 | Row Total |
## ---------------------|-----------|-----------|-----------|-----------|
## 1 | 4012 | 4357 | 1992 | 10361 |
## | 0.563 | 0.002 | 1.319 | |
## | 0.387 | 0.421 | 0.192 | 0.447 |
## | 0.442 | 0.447 | 0.459 | |
## | 0.173 | 0.188 | 0.086 | |
## ---------------------|-----------|-----------|-----------|-----------|
## 2 | 5070 | 5396 | 2351 | 12817 |
## | 0.455 | 0.001 | 1.066 | |
## | 0.396 | 0.421 | 0.183 | 0.553 |
## | 0.558 | 0.553 | 0.541 | |
## | 0.219 | 0.233 | 0.101 | |
## ---------------------|-----------|-----------|-----------|-----------|
## Column Total | 9082 | 9753 | 4343 | 23178 |
## | 0.392 | 0.421 | 0.187 | |
## ---------------------|-----------|-----------|-----------|-----------|
##
##
From the above contingency table we found that
males [1] with
high school or less education [1] in all the provinces are
4012 and females [2] with same level of
education are 5070 totaling to around 9082
males and females with education levels high school or less.
We found that males with
college or some post secondary level education [2] in all
provinces are 4357 and females [2] with same
level of education are 5396 totaling to 9753
males and females with post secondary level education.
Finally, we found that males with
university degree or certificate [3] in all provinces are
1993 and females [2] with same level of
education are 2351 totaling to 4343 males and
females with university level education.
The lowest count was males with university level education and highest was females with secondary level education.
Educational indicators show that females [2] tend to get
more education than men in all levels of education which is clearly
evident from the findings (Zechuan Deng, 2021). Some studies show that
men stop continuing education for family, financial, and other personal
reasons which might be the case here and research also shows that women
tend to choose and hang on to education even when the stream is harder
to pass. while men tend to drop out and look for other means of making a
living (Guo, 2016).
We want to visually analyse the proportions of different age categories and their respective gender type to understand the ratios of respondents(gender types) in different age category.
counts_subset <- table(locationofUse$Age, locationofUse$Gender)
#create mosaic plot on age vs gender
mosaicplot(counts_subset, xlab='Age', ylab='Gender',
main='Fig:7 Age vs Gender', col='#00CCCC', border = "black")
We found that the respondents both male [1] and
female [2] are in equal proportions in the age category of
16-24 [1] and are relatively the least number of
respondents in all the age groups in the data set[ based on visual
interpretation].
We found that the female[2] respondents are slightly
large in proportion in the age category of 25-34 [2] than
males[ based on visual interpretation].
We found similar pattern of almost equal proportion of
male [1] and female [2] like in age category
[1] with respondents in the category 45-54 [4][ based on
visual interpretation].
We also found a slightly higher proportions of
female [2] respondents similar to age category
25-34 [2] with respondents in age category ’55-65 [5]`[
based on visual interpretation].
Finally, we also found that age category
65 and older [6] also has a higher count of
females [2] than males [1] and this category
respondents are the highest number of respondents in all the age groups
in the data set [ based on visual interpretation].
The findings show that there is almost equal distribution gender in
the survey with just ages 25-34 [2] and
65 and older [6] slightly having more female respondents
but the age category of people who responded to the survey are higher in
the 65 and older [6] mainly because that generation tends
to show interest in answering surveys,provide feedback, and they tend to
signup for such actively more often than younger generations.
We want to analyse on how many respondents have used internet in the school for personal non-business uses. This will help us understand the how each gender is using the internet facilities at school for any personal purposes. This can be further analysed and segregated into regions and province and also with additional information analysis can be done on internet usage patterns and types in school.
#Create a subset for the columns
Internet_schoolset <- locationofUse[c(6,18)]
# Change the datatype of the variables for processing
Internet_schoolset$Gender <-
as.character(Internet_schoolset$Gender)
Internet_schoolset$Internet_Usage_School <-
as.character(Internet_schoolset$Internet_Usage_School)
#Create a plot with Internet usage, gender
Internet_schoolset %>%
filter(!is.na(Internet_Usage_School)) %>% # filter values
ggplot(aes(Gender, ..count..)) + geom_bar(aes(fill = Internet_Usage_School), colour="Black") + theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) +
labs(title = " Fig:8 Internet Usage at School for Personal Use", x = "Gender", y =
"Count") +
scale_x_discrete(labels = c(
"1" = "Male",
"2" = "Female")) +
scale_fill_discrete(name = "Internet Usage School", labels = c("No", "Yes"))
We found that overall the number of people [both males and
females] using internet for personal non-business purposes in school are
very low.
We found that male students who use internet for personal
purposes are around 500 in count and female students are
around 700 in count.
Finally, We also found that there are more number of people who
have not use internet for personal purposes at school that then once
that use. There are around 7500 male students who have not
used school internet for personal purposes and there are around
8500 female students who have not used internet for any
personal non-business purposes at school.
The number of people both male and females who use internet at school for personal purposes are very low because the schools might have a stricter policies for internet access or simply the students might not have the need to use the internet at school because they might have access to technology at home. Also the students are likely disciplined enough to use school resources for the purposes they are intended.
We want to analyse the internet usage pattern of respondents at library for personal non-business usage by different age groups. This will help us understand how different age groups have used the library for internet.
#Create a subset for the columns
Internet_libraryset <- locationofUse[c(5, 19)]
# Change the datatype of the variables for processing
Internet_libraryset$Age <-
as.character(Internet_libraryset$Age)
Internet_libraryset$Internet_Usage_Library <-
as.character(Internet_libraryset$Internet_Usage_Library)
#Create a plot with internet usage library, and age group
Internet_libraryset %>%
filter(!is.na(Internet_Usage_Library)) %>% # filter values
ggplot(aes(Age, ..count..)) + geom_bar(aes(fill = Internet_Usage_Library), colour =
"Black") + theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) +
labs(title = " Fig:9 Internet Usage at Library", x = "Age Groups", y =
"Count") +
scale_x_discrete(labels = c(
"1" = "16-24",
"2" = "25-34",
"3" = "35-44",
"4" = "45-54",
"5" = "55-64",
"6" = ">65"
)) +
scale_fill_discrete(name = "Internet Used at Library", labels = c("No", "Yes"))
We found that in the age range of 16-24 the
respondents have used the internet for personal non-business uses in
library the most in all the age ranges.
We found that the age range older than 65 have used
the internet the least for personal purposes.
We found that the respondents in the age range of
45-54 have not used the library for personal purposes,
which is the highest in all the age categories.
we found that the age range 25-34 and
55-64 also have one of the highest number of respondents
that have not used the internet for any personal purposes and their
count is relatively similar in number.
Finally, we found that almost all the age ranges of the respondents are almost normally distributed with respect to internet usage at library.
The findings show that the age range 16-24 tend to use
the library’s internet for personal and non business purposes mainly
because people in that age range tend to attend schools, colleges, and
universities and they may tend to use the library for internet more
often than others. The age group 45-54 utilizes library
internet for personal purposes less frequently than other age groups,
likely because they may not approach a library for internet access in
the first place since they may not be students or because they may use
the internet at home or at work.
We want to analyse the internet usage pattern of the respondents specifically understanding how different age category of the respondents have accessed internet for a friends’ or Neighbors’ home. This is done by analyzing data to conclude how many respondents have or haven’t used the internet and segregate them based on their age category.
#Create a subset for the columns
Internet_friendsset <- locationofUse[c(5, 22)]
# Change the datatype of the variables for processing
Internet_friendsset$Age <-
as.character(Internet_friendsset$Age)
Internet_friendsset$Internet_Usage_Neighbours <-
as.character(Internet_friendsset$Internet_Usage_Neighbours)
#Create a plot with internet usage Friends or Neighbor's home, and age group
Internet_friendsset %>%
filter(!is.na(Internet_Usage_Neighbours)) %>% # filter values
ggplot(aes(Age, ..count..)) + geom_bar(aes(fill = Internet_Usage_Neighbours), colour =
"Black") + theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) +
labs(title = " Fig:10 Internet Usage at Friends or Neighbor's home", x = "Age Groups", y =
"Count") +
scale_x_discrete(labels = c(
"1" = "16-24",
"2" = "25-34",
"3" = "35-44",
"4" = "45-54",
"5" = "55-64",
"6" = ">65"
)) +
scale_fill_discrete(name = "Usage at Friends'/ Neighbor's home", labels = c("No", "Yes"))
We found that the respondents in the age of 16-24
are the highest once to use internet at a friends’ or neighbors’ home
with a count of 750 respondents.
We found that the respondents in the age of 25-34
are in equal number in using and not using internet from a friends’ or
neighbors’ home with a count of 750 respectively.
We found that respondents in the age of 35-44 are
showing a decline in the usage from a friends’ or neighbors’ home with
Yes being around 450 and No around
750.
We also found that consecutive respondents of ages
[45-54 and 55-64] are also showing a downward trend in
using the internet from a friends’ or neighbors’ place.
Finally, we found that respondents of age
65 and above are the once that used internet the least from
a friends’ or neighbors’ place. Less than 100 people have
said yes and more that 250 people have said no.
The results show that respondents aged 16-24 are the
once that use internet at friends or a neighbors place. This might be
due to the fact that the younger generation tend to work and relax in
groups and do group studies, play online games with friends, watch
movies together at friends place. So, all these accounts to internet
usage at friends or neighbors place. Also, older people tend to stay
isolated and alone so it is evident that the use the least internet from
a friends or neighbors place.
We want to analyse the number of people
[1,2,3, or more than 4] in a household based on each
province. The respondents have provided information on how many people
live with them as part of their household which can be used to analyse
the above question. This can further help us to understand the household
dynamics of each province in future.
#Create a subset for the columns
Householdsset <- locationofUse[c(2, 10)]
# Change the datatype of the variables for processing
Householdsset$Province <-
as.character(Householdsset$Province)
Householdsset$Houshold_Type <-
as.character(Householdsset$Houshold_Type)
# Create a plot with no.of people in household and provinces
Householdsset %>%
filter(!is.na(Houshold_Type)) %>% # filter values
ggplot(aes(Province, ..count..)) + geom_bar(aes(fill = Houshold_Type), position = "stack", colour="Black") +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 13)) + labs(title = " Fig:11 No.of People in Household Across the Provinces", x = "Provinces", y =
"Count") +
scale_x_discrete(
labels = c(
"10" = "Newfoundland and Labrador",
"11" = "Prince Edward Island",
"12" = "Nova Scotia",
"13" = "New Brunswick ",
"24" = "Quebec",
"35" = "Ontario",
"46" = "Manitoba",
"47" = "Saskatchewan",
"48" = "Alberta",
"59" = "British Columbia"
)
) +
scale_fill_discrete(name = "No.of People in Household", labels = c("1 Persons", "2 Persons", "3 Persons", "4 or more persons")) +
coord_flip()
We found that Ontario[35] had the largest respondent
data and 1 or 2 or 3 people living in a household was the
highest in the region with more than 6000 plus respondents
responding that no more than 3 people lived in their household, which
contributes to around 98% of the responses.
we found that Quebec[24] also had similar ratios as
of Ontario where major chuck of their respondents responded
that no more than 3 people lived in their household.
we also found that British Columbia[59] and
Alberta[48] were the next once in order that had the
maximum number of respondents stating no more than 3 people in the
household and the count was around 3000.
We also found that Prince Edward Island[11] was the
only province that had the lowest total respondent count around
700 and they had a very minimal portion of respondents who
have 4 or more people in their household.
Finally, all the provinces had a very little portion of
respondents who responded stating that they have 4 or more
people in their household.
The provinces Ontario[35] and Quebec[24]’s
significant number of respondents had 3 or less people in the homes.
This can be because urbanization has increased the complexity of living
with a combined family due to work, financial, and personal reasons. So,
the families might not be willing to say together or even have large
families. As the results show that all the provinces have a very less
portion of people living together [more than 4] the findings are inline
with the results.
Price, A. C. (2012, April 16). The AAVSO 2011 Demographic and Background Survey. arXiv.org. https://arxiv.org/abs/1204.3582
Nair, M. (2022, July 18). Understanding The Canadian Education System. University of the People. https://www.uopeople.edu/blog/understanding-the-canadian-education-system/
Gender-related differences in desired level of educational attainment among students in Canada. (2021, September 22). https://www150.statcan.gc.ca/n1/pub/36-28-0001/2021009/article/00004-eng.htm
Guo, J. (2016, January 28). The serious reason boys do worse than girls. Washington Post. https://www.washingtonpost.com/news/wonk/wp/2016/01/28/the-serious-reason-boys-do-worse-than-girls/