1 Executive Summary

The main purpose of this project is to retrieve data from the survey to compare the enthusiasm and rewards that the students have invested in their daily study at school.

The most significant discovery we get from the data is investing more time in learning makes it more efficient to achieve relatively high grades.

According to the results of the National Opinion Research Center, “Learning time has a significant impact on students’ performance.” Based on this, we focused on the research of the analysis of the study on the time spent and results obtained by undergraduate students. Reference: https://psycnet.apa.org/record/1982-24391-001

2 Full Report

2.1 Initial Data Analysis (IDA)

library(readxl)
StudentSurvey <- read_excel("StudentSurvey.xlsx")
# View(StudentSurvey)

For this project, we created a 10-question survey that focused on exploring how the lifestyle/habits of undergraduate students impact their academics. This survey was shared with our friends on Facebook. The following are the 10 questions asked:

  1. How old are you?
  2. On average, how many hours of sleep do you get per night?
  3. What grades do you usually get?
  4. How much time (hours), do you spend studying every week?
  5. What is your lecture attendance rate? Assuming you have 10 lectures a week (roughly)
  6. Do you have a job? If so, how many hours do you work every week?
  7. How many days of the week do you really attend university?
  8. How many days of the week do you go out and socialize?
  9. How many days of the week do you exercise?
  10. How long does it take you to get to university (minutes)

To make this survey ethical, respondents were kept anonymous to protect their privacy. We also clarified that collected responses would solely be used for this project. However, ethical integrity lead to limitations such as:

  • Selection bias within our dataset. The survey was only shared on Facebook, hence, only our friends (majority undergraduate students within ages 17-20) had access, rather than the entire student demographic of the university. Meaning that almost none of the respondents are doing their master or doctorate degree.
  • Limited respondents. 40 responses do not accurately reflect the overall student population. We cannot make generalizations based on a selective group of 40 students. A larger sample size would provide more reliable predictions and correlations.
  • We did not address whether respondents were full-time or part-time students. This would affect their answers for academic attendance and work hours.

Survey: https://docs.google.com/forms/d/e/1FAIpQLSf2OL7jOsk8h24qQwHUE1C4BAv0bS-ZrVM757ZIPBfT9s3KkQ/viewform?usp=sf_link

# Top 5 rows of data
# StudentSurvey=read.csv("StudentSurvey.csv",header=T)
StudentSurvey <- StudentSurvey[1:45,]

head(StudentSurvey)
## # A tibble: 6 x 11
##   Timestamp   Age Sleep Grades Study Lectures   Job Attendance Socialise
##   <chr>     <dbl> <dbl> <chr>  <dbl> <chr>    <dbl>      <dbl>     <dbl>
## 1 9/23/201…    19   6   Fail …     2 20% (2 …    30          2         3
## 2 9/23/201…    18   7.5 High …    18 60% (6 …     2          5         2
## 3 9/23/201…    20   7   Credi…    10 60% (6 …     0          4         7
## 4 9/23/201…    18   6   Pass …     3 60% (6 …     8          4         6
## 5 9/23/201…    19   7   Pass …     8 20% (2 …     0          2         4
## 6 9/23/201…    20   5   Credi…     6 20% (2 …     0          4         2
## # … with 2 more variables: Exercise <dbl>, Distance <dbl>
## Size of data
dim(StudentSurvey)
## [1] 45 11
## R's classification of data
class(StudentSurvey)
## [1] "tbl_df"     "tbl"        "data.frame"
## R's classification of variables
str(StudentSurvey)
## Classes 'tbl_df', 'tbl' and 'data.frame':    45 obs. of  11 variables:
##  $ Timestamp : chr  "9/23/2019 12:40:08" "9/23/2019 12:47:57" "9/23/2019 12:53:59" "9/23/2019 12:56:50" ...
##  $ Age       : num  19 18 20 18 19 20 19 20 18 19 ...
##  $ Sleep     : num  6 7.5 7 6 7 5 7 8 7 8 ...
##  $ Grades    : chr  "Fail (0-49%)" "High distinction (85-100%)" "Credit (65-74%)" "Pass (50-64%)" ...
##  $ Study     : num  2 18 10 3 8 6 20 17 15 6 ...
##  $ Lectures  : chr  "20% (2 lectures)" "60% (6 lectures)" "60% (6 lectures)" "60% (6 lectures)" ...
##  $ Job       : num  30 2 0 8 0 0 0 8 0 20 ...
##  $ Attendance: num  2 5 4 4 2 4 5 4 5 3 ...
##  $ Socialise : num  3 2 7 6 4 2 2 3 2 2 ...
##  $ Exercise  : num  5 1 4 0 1 5 5 3 1 1 ...
##  $ Distance  : num  55 5 10 5 5 5 15 40 10 15 ...
sapply(StudentSurvey, class)
##   Timestamp         Age       Sleep      Grades       Study    Lectures 
## "character"   "numeric"   "numeric" "character"   "numeric" "character" 
##         Job  Attendance   Socialise    Exercise    Distance 
##   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"

Summary: - Each row represents an undergraduate students response to the survey - Each column represents a different variable that gives us an insight into their lifestyle/habits


2.2 Research Question 1 - What is the correlation/relationship between commute distance and academic attendance?

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p=ggplot(StudentSurvey, aes(Distance, Attendance), xlab = "Commute", ylab = "Academic Attendance")
p+geom_point(aes(col=Lectures))
## Warning: Removed 1 rows containing missing values (geom_point).

p1 = plot_ly(StudentSurvey, x = ~Distance, y = ~Attendance, main = "scatter")

#Analysis of Graph

The residual plots are not random scatterplots, indicating that our data cannot be modeled by a linear relationship. This supports the spontaneity of our scatterplot despite the fact that it is showing moderate correlation. However, there still is a correlation/relationship in the Commute Distance vs Academic Attendance graph, it just isn’t a linear one. Additionally, the “fanning” behavior of our residual plot shows that our data is not homoscedastic (the scatter is different). Hence, the data collected from our survey alone is inadequate to produce a prediction model, meaning that further investigation with a larger sample size would be required and more beneficial. Moreover, our data does not produce a prediction model, this is emphasized by our graphical summary, which depicts random sets of points with very weak/faint trends. In conclusion, there is a moderately strong correlation between Commute Distance and Academic Attendance, as indicated by our correlation coefficient value of -0.51. This is signified by an increase in attendance as the commute distance decreases.

2.3 Research Question 2 - How does an undergraduate students lifestyle; sleep and excercise habits affect their grade?

library(plotly)
p=ggplot(StudentSurvey, aes(Exercise, Sleep), xlab = "Hours of sleep", ylab = "Hours spent excercising")
p+geom_point(aes(col=Grades))
## Warning: Removed 1 rows containing missing values (geom_point).

#Analysis of Graph

By looking at the residual plots on the graph we can see that the data is rather homoscedastic as the points are about the same distance from the regression line meaning that the two variables, Excercise and Sleep, have similar levels of variance. By looking at the correlation coefficient value relating to this graph which is approximately 0.15, we can tell there is a weak correlation between both the hours of sleep and number of days exercising undegraduate students had during the week. This is indicated by a slight increase in excercise as sleep increases as well. Yet when looking at the grades of the students, we can see a slight trend in that those with lower grades, i.e. credit, tend to have less hours of sleep and exercise compared to students with high distinctions who have more sleep and all excercise for at least one day a week. Thus showing that an undergraduate student with what might be deemed a healtheir lifestyle, more hours of sleep and exercise, will improve their academics shown by obtaining higher grades.

#Summary

Whilst there are several factors to take into consideration when looking at an undergraduate students lifestyle/habits, when analysing the variables we observed we noticed a correlation between certain trends which attributed to higher academic results.

The closer a student lived to university, the more likely they were to attend for more days and go to lectures more frequently. Whilst eating and sleeping more resulted in higher grades.

Ultimately, the more willing a student is to put effort into commuting to university and trying to adapt a healthier lifestyle by sleeping and exercising more, the more likely their academics will be impacted positively.