This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
mental_health=read.csv(choose.files(), header=TRUE)
attach(mental_health)
summary(mental_health)
## User_ID Age Gender UserType
## Min. : 1.0 Min. :18.00 Length:2036 Length:2036
## 1st Qu.: 509.8 1st Qu.:18.00 Class :character Class :character
## Median :1018.5 Median :19.00 Mode :character Mode :character
## Mean :1018.5 Mean :19.05
## 3rd Qu.:1527.2 3rd Qu.:20.00
## Max. :2036.0 Max. :22.00
## Platform Screen Content Activity
## Length:2036 Min. : 0.500 Length:2036 Length:2036
## Class :character 1st Qu.: 2.440 Class :character Class :character
## Mode :character Median : 4.310 Mode :character Mode :character
## Mean : 4.285
## 3rd Qu.: 6.130
## Max. :11.310
## Night Social Sleep Anxiety
## Min. :0.0000 Min. :0.000 Min. :3.000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:4.900 1st Qu.: 4.000
## Median :0.0000 Median :0.000 Median :5.800 Median : 8.000
## Mean :0.3806 Mean :0.164 Mean :5.788 Mean : 7.592
## 3rd Qu.:1.0000 3rd Qu.:0.000 3rd Qu.:6.700 3rd Qu.:11.000
## Max. :1.0000 Max. :1.000 Max. :9.900 Max. :21.000
## Depression
## Min. : 0.000
## 1st Qu.: 1.000
## Median : 5.000
## Mean : 5.385
## 3rd Qu.: 9.000
## Max. :22.000
summary(Anxiety)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 4.000 8.000 7.592 11.000 21.000
summary(Depression)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 5.000 5.385 9.000 22.000
# It seems like the 'Anxiety' variable is slightly left skewed as the mean is less than the median.
# It appears that the 'Depression' variable is slightly right skewed as the mean is greater than the median.
# 1b
par(mfrow=c(3,2))
hist(Anxiety)
hist(Depression)
boxplot(Anxiety)
boxplot(Depression)
qqnorm(Anxiety)
qqline(Anxiety)
qqnorm(Depression)
qqline(Depression)
# From the graphs, both the Anxiety and Depression variables appear to be right skewed. The Depression variable has an outlier that is likely contributing to the higher level of skewness compared to the Anxiety variable which doesn't show any outlier in the boxplot graph.
# 1c
par(mfrow=c(1,2))
hist(Anxiety)
hist(sqrt(Anxiety))
# After performing the square root transformation on the Anxiety variable, the data is closer to a normal distribution and the skewness from the original histogram is not present
# 1d
par(mfrow=c(2,1))
hist(Depression)
hist(log(Depression))
# After performing the log normal transformation on the Depression variable the data appears more normal and less right skewness.
# 1e
hist(Screen)
# The Screen variable is fairly normally distributed with slight right skewness.
# 1f
boxplot(Screen ~ UserType)
# Screen time seems to differ based on the user time as seen by the boxplots having varying means, with Hyper-connected showcasing the highest mean around 7 and Digital Minimalist having the lowest around 1.75.
# 1g
subset=subset(mental_health, Screen>8, select = c(Age))
head(subset)
## Age
## 7 19
## 34 18
## 40 18
## 42 20
## 56 19
## 99 19
par(mfrow=c(3,2))
hist(Age)
boxplot(Age)
qqnorm(Age)
qqline(Age)
# The graphs for Age shown are right skewed with no outliers.
# 2a
attach(mental_health)
## The following objects are masked from mental_health (pos = 3):
##
## Activity, Age, Anxiety, Content, Depression, Gender, Night,
## Platform, Screen, Sleep, Social, User_ID, UserType
cor(Screen, Anxiety)
## [1] 0.6272098
# 0.6272098 is a positive correlation that is moderately strong and shows Screen with the strongest correlation with Anxiety out of the three variables.
cor(Sleep, Anxiety)
## [1] -0.4379324
# -.4379324 is a negative correlation that is somewhat weak.
cor(Age, Anxiety)
## [1] 0.03827854
# 0.03827854 is a positive correlation that is not very strong as the decimal is low, showing the lowest correlation
# 2b
plot(Age, Anxiety)
plot(Screen, Anxiety)
plot(Sleep, Anxiety)
#I do feel that the correlations and the plots coincide as I can see the positive correlation for Screen and Anxiety represented as well as the negative correlation between Sleep and Anxiety. While I did expect the Age and Anxiety graph to appear as it does, it makes sense as to why it is around .43 for the correlation coefficient as the points are in 'columns' for every integer of age.
# 2c
m1= lm(Anxiety ~ Age)
summary(m1)
##
## Call:
## lm(formula = Anxiety ~ Age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0585 -3.5843 0.0996 3.4157 13.5738
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5806 1.7463 2.623 0.00878 **
## Age 0.1581 0.0915 1.728 0.08421 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.674 on 2034 degrees of freedom
## Multiple R-squared: 0.001465, Adjusted R-squared: 0.0009743
## F-statistic: 2.985 on 1 and 2034 DF, p-value: 0.08421
# 2d
#Equation: Anxiety = 4.5806 + 0.1581(Age)
# 2e
m2= lm(Depression ~ Screen)
summary(m2)
##
## Call:
## lm(formula = Depression ~ Screen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.4820 -2.6447 -0.6219 2.3826 13.8267
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03636 0.18170 0.20 0.841
## Screen 1.24799 0.03732 33.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.891 on 2034 degrees of freedom
## Multiple R-squared: 0.3547, Adjusted R-squared: 0.3544
## F-statistic: 1118 on 1 and 2034 DF, p-value: < 2.2e-16
# 2f
dim(mental_health)
## [1] 2036 13
alpha=.05
#Ho: B1=0
#Ha: B1=/0
qt(.025, 2022)
## [1] -1.961138
qt(.975, 2022)
## [1] 1.961138
(1-pt(1.961138, 2022))*2
## [1] 0.04999999
confint(m2, level=.95)
## 2.5 % 97.5 %
## (Intercept) -0.3199737 0.3926877
## Screen 1.1748020 1.3211814
# 2g
confint(m2, level=.99)
## 0.5 % 99.5 %
## (Intercept) -0.4321019 0.5048159
## Screen 1.1517711 1.3442123
# 2h
Screen[23]
## [1] 0.68
predict(m2)[23]
## 23
## 0.8849914
residuals(m2)[23]
## 23
## 4.115009
# 2i
newdata = data.frame(Screen=.68)
predict(m2, newdata, interval="prediction", level=.95)
## fit lwr upr
## 1 0.8849914 -6.751506 8.521488
# 2j
You can also embed plots, for example:
{r pressure, echo=FALSE}
plot(pressure)
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.