The main purpose of this project is to retrieve data from the survey to compare the enthusiasm and rewards that the students have invested in their daily study at school.
The most significant discovery we get from the data is investing more time in learning makes it more efficient to achieve relatively high grades.
According to the results of the National Opinion Research Center, “Learning time has a significant impact on students’ performance.” Based on this, we focused on the research of the analysis of the study on the time spent and results obtained by undergraduate students. Reference: https://psycnet.apa.org/record/1982-24391-001
library(readxl)
StudentSurvey <- read_excel("StudentSurvey.xlsx")
# View(StudentSurvey)
For this project, we created a 10-question survey that focused on exploring how the lifestyle/habits of undergraduate students impact their academics. This survey was shared with our friends on Facebook. The following are the 10 questions asked:
To make this survey ethical, respondents were kept anonymous to protect their privacy. We also clarified that collected responses would solely be used for this project. However, ethical integrity lead to limitations such as:
# Top 5 rows of data
# StudentSurvey=read.csv("StudentSurvey.csv",header=T)
StudentSurvey <- StudentSurvey[1:45,]
head(StudentSurvey)
## # A tibble: 6 x 11
## Timestamp Age Sleep Grades Study Lectures Job Attendance Socialise
## <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 9/23/201… 19 6 Fail … 2 20% (2 … 30 2 3
## 2 9/23/201… 18 7.5 High … 18 60% (6 … 2 5 2
## 3 9/23/201… 20 7 Credi… 10 60% (6 … 0 4 7
## 4 9/23/201… 18 6 Pass … 3 60% (6 … 8 4 6
## 5 9/23/201… 19 7 Pass … 8 20% (2 … 0 2 4
## 6 9/23/201… 20 5 Credi… 6 20% (2 … 0 4 2
## # … with 2 more variables: Exercise <dbl>, Distance <dbl>
## Size of data
dim(StudentSurvey)
## [1] 45 11
## R's classification of data
class(StudentSurvey)
## [1] "tbl_df" "tbl" "data.frame"
## R's classification of variables
str(StudentSurvey)
## Classes 'tbl_df', 'tbl' and 'data.frame': 45 obs. of 11 variables:
## $ Timestamp : chr "9/23/2019 12:40:08" "9/23/2019 12:47:57" "9/23/2019 12:53:59" "9/23/2019 12:56:50" ...
## $ Age : num 19 18 20 18 19 20 19 20 18 19 ...
## $ Sleep : num 6 7.5 7 6 7 5 7 8 7 8 ...
## $ Grades : chr "Fail (0-49%)" "High distinction (85-100%)" "Credit (65-74%)" "Pass (50-64%)" ...
## $ Study : num 2 18 10 3 8 6 20 17 15 6 ...
## $ Lectures : chr "20% (2 lectures)" "60% (6 lectures)" "60% (6 lectures)" "60% (6 lectures)" ...
## $ Job : num 30 2 0 8 0 0 0 8 0 20 ...
## $ Attendance: num 2 5 4 4 2 4 5 4 5 3 ...
## $ Socialise : num 3 2 7 6 4 2 2 3 2 2 ...
## $ Exercise : num 5 1 4 0 1 5 5 3 1 1 ...
## $ Distance : num 55 5 10 5 5 5 15 40 10 15 ...
sapply(StudentSurvey, class)
## Timestamp Age Sleep Grades Study Lectures
## "character" "numeric" "numeric" "character" "numeric" "character"
## Job Attendance Socialise Exercise Distance
## "numeric" "numeric" "numeric" "numeric" "numeric"
Summary: - Each row represents an undergraduate students response to the survey - Each column represents a different variable that gives us an insight into their lifestyle/habits
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
p=ggplot(StudentSurvey, aes(Distance, Attendance), xlab = "Commute", ylab = "Academic Attendance")
p+geom_point(aes(col=Lectures))
## Warning: Removed 1 rows containing missing values (geom_point).
p1 = plot_ly(StudentSurvey, x = ~Distance, y = ~Attendance, main = "scatter")
#Analysis of Graph
The residual plots are not random scatterplots, indicating that our data cannot be modeled by a linear relationship. This supports the spontaneity of our scatterplot despite the fact that it is showing moderate correlation. However, there still is a correlation/relationship in the Commute Distance vs Academic Attendance graph, it just isn’t a linear one. Additionally, the “fanning” behavior of our residual plot shows that our data is not homoscedastic (the scatter is different). Hence, the data collected from our survey alone is inadequate to produce a prediction model, meaning that further investigation with a larger sample size would be required and more beneficial. Moreover, our data does not produce a prediction model, this is emphasized by our graphical summary, which depicts random sets of points with very weak/faint trends. In conclusion, there is a moderately strong correlation between Commute Distance and Academic Attendance, as indicated by our correlation coefficient value of -0.51. This is signified by an increase in attendance as the commute distance decreases.
library(plotly)
p=ggplot(StudentSurvey, aes(Exercise, Sleep), xlab = "Hours of sleep", ylab = "Hours spent excercising")
p+geom_point(aes(col=Grades))
## Warning: Removed 1 rows containing missing values (geom_point).
#Analysis of Graph
By looking at the residual plots on the graph we can see that the data is rather homoscedastic as the points are about the same distance from the regression line meaning that the two variables, Excercise and Sleep, have similar levels of variance. By looking at the correlation coefficient value relating to this graph which is approximately 0.15, we can tell there is a weak correlation between both the hours of sleep and number of days exercising undegraduate students had during the week. This is indicated by a slight increase in excercise as sleep increases as well. Yet when looking at the grades of the students, we can see a slight trend in that those with lower grades, i.e. credit, tend to have less hours of sleep and exercise compared to students with high distinctions who have more sleep and all excercise for at least one day a week. Thus showing that an undergraduate student with what might be deemed a healtheir lifestyle, more hours of sleep and exercise, will improve their academics shown by obtaining higher grades.
#Summary
Whilst there are several factors to take into consideration when looking at an undergraduate students lifestyle/habits, when analysing the variables we observed we noticed a correlation between certain trends which attributed to higher academic results.
The closer a student lived to university, the more likely they were to attend for more days and go to lectures more frequently. Whilst eating and sleeping more resulted in higher grades.
Ultimately, the more willing a student is to put effort into commuting to university and trying to adapt a healthier lifestyle by sleeping and exercising more, the more likely their academics will be impacted positively.