Mucheli Sudhakar Sairam Sukesh

library(readr)
mydata <- read.csv("./Assignment 1.csv", header=TRUE, 
                   sep=",", dec=".")

#discovered in later parts that there is someone with 0 CGPA and some NA results, which I think is not possible so will be removing it here

library(tidyr)
mydata <-na.omit(mydata)
mydata <-mydata[mydata$CGPA!=0,]

#Explain the data Unit of Observation: One individual Sample Size: 27,889 students (after removing the exceptions) Gender: Classifying if the individual is Male or Female Age: Age of the student (Unit: in Years) City: Geographic location of where the Individual is Profession: Current Profession of the individual (almost everyone is student but there are some workers) Academic Pressure: Each individual ranks how much academic pressure they are facing (Unit: on a scale of 1 - 5) Work Pressure: Each individual ranks how much work pressure they are facing (Unit: on a scale of 1 - 5) CGPA: The Cumulative GPA of the individual, aka their grades (Unit: on a scale of 0 - 10) Study Satisfaction: Each individual ranks how satisfied with their results and studies (Unit: On a scale from 0 - 5), 5 means the most satisfied Job Satisfaction: Each individual ranks how satisfied with their current job if they have any (Unit: On a scale from 0 - 5), 5 means the most satisfied Work/Study hours: The number of hours spent on work or study each day (Unit: Hours per day) Financial Stress: The level of stress experienced due to financial constraints (Unit: On a scale from 0 - 5), 5 means the most satisfied Family history of mental health issue: Whether the individual has a family history of mental health problems (Unit: True or False) Depression: Whether the individual has experienced depression (Unit: Yes-1 or No-0) Sleep duration: The average number of hours a person sleeps per night (Unit: based on the categories provided, hours of sleep per night) Degree: The highest level of education completed or pursuing (Unit: categorical) Suicidal Thoughts: Whether the individual has every had sucidal thoughts (Unit: Yes-1 or No-0)

#Name the source of the data Kaggle

#Carry out data manipulation

mydata2 <-mydata[,c(-1,-4,-5,-7,-10,-12,-13,-14,-16,-17)]

#removing not needed variables like Work pressure, Job Satisfaction etc.

mydata2$DepressionYorN <- factor(mydata2$Depression,
                                 levels= c(0,1),
                                 labels= c("No","Yes"))

#converting the binary 0 and 1 for depression to a categorical answer of yes or no

library(psych)
describe(mydata2$CGPA)
##    vars     n mean   sd median trimmed  mad  min max range  skew kurtosis   se
## X1    1 27889 7.66 1.46   7.77    7.67 1.87 5.03  10  4.97 -0.07    -1.23 0.01

#Present the descriptive statistics for the selected variable I have selected the cumulative GPA as the variable to explain. The average CGPA for people, included in the sample is 7.66. The standard deviation which is the measure of variability is 1.47. The skew is -0.11 which means that the histogram will be left skewed when drawn as it is negatively skewed. The median is 7.77 which means that 50% of the people have a CGPA of 7.77 and lower while the other 50% have a CGPA of 7.77 and higher.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
ggplot(mydata, aes(x=Age))+
  geom_boxplot(fill="darkred")

The Minimum age is slightly below 20 at around 16 and the 75th percentile is slightly above 20 at about 22. Furthermore, the median age is around 25 where 50% of the people are younger or equal to 25 and the other 50% of the people are older than or equal to 25. The 25th percentile is at about 28 years old and the maximum is shown to be around 42 years old. The Box plot shows you that there are a lot of anomalies of ages with people who are old studying their degree. But I think that is not any anomaly as I believe age will not be a restriction to stopping people from studying and pursuing their dream of getting a degree.

library(ggplot2)
ggplot(mydata, aes(x=CGPA)) +
  geom_histogram(binwidth=0.1,fill="darkred")+
  ylab("Frequency of the Cumulative GPA")+
  xlab("Cumulative GPA")

This distribution of the Cumulative GPA shows you that there is no normal distribution of the results even though IQ can be associated to be normally distributed and IQ can have a direct relationship with Cumulative GPA. There are various reasons as to why the CGPA is not normally distributed and some can be attributed to the academic pressure and Depression as given in this study.