STAT 170 HW 1

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

mental_health=read.csv(choose.files(), header=TRUE)
attach(mental_health)

summary(mental_health)

##     User_ID            Age           Gender            UserType        
##  Min.   :   1.0   Min.   :18.00   Length:2036        Length:2036       
##  1st Qu.: 509.8   1st Qu.:18.00   Class :character   Class :character  
##  Median :1018.5   Median :19.00   Mode  :character   Mode  :character  
##  Mean   :1018.5   Mean   :19.05                                        
##  3rd Qu.:1527.2   3rd Qu.:20.00                                        
##  Max.   :2036.0   Max.   :22.00                                        
##    Platform             Screen         Content            Activity        
##  Length:2036        Min.   : 0.500   Length:2036        Length:2036       
##  Class :character   1st Qu.: 2.440   Class :character   Class :character  
##  Mode  :character   Median : 4.310   Mode  :character   Mode  :character  
##                     Mean   : 4.285                                        
##                     3rd Qu.: 6.130                                        
##                     Max.   :11.310                                        
##      Night            Social          Sleep          Anxiety      
##  Min.   :0.0000   Min.   :0.000   Min.   :3.000   Min.   : 0.000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:4.900   1st Qu.: 4.000  
##  Median :0.0000   Median :0.000   Median :5.800   Median : 8.000  
##  Mean   :0.3806   Mean   :0.164   Mean   :5.788   Mean   : 7.592  
##  3rd Qu.:1.0000   3rd Qu.:0.000   3rd Qu.:6.700   3rd Qu.:11.000  
##  Max.   :1.0000   Max.   :1.000   Max.   :9.900   Max.   :21.000  
##    Depression    
##  Min.   : 0.000  
##  1st Qu.: 1.000  
##  Median : 5.000  
##  Mean   : 5.385  
##  3rd Qu.: 9.000  
##  Max.   :22.000

summary(Anxiety)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   8.000   7.592  11.000  21.000

summary(Depression)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   5.000   5.385   9.000  22.000

# It seems like the 'Anxiety' variable is slightly left skewed as the mean is less than the median.
# It appears that the 'Depression' variable is slightly right skewed as the mean is greater than the median.

# 1b
par(mfrow=c(3,2))
hist(Anxiety)
hist(Depression)

boxplot(Anxiety)
boxplot(Depression)

qqnorm(Anxiety)
qqline(Anxiety)

qqnorm(Depression)
qqline(Depression)

# From the graphs, both the Anxiety and Depression variables appear to be right skewed. The Depression variable has an outlier that is likely contributing to the higher level of skewness compared to the Anxiety variable which doesn't show any outlier in the boxplot graph.

# 1c
par(mfrow=c(1,2))
hist(Anxiety)
hist(sqrt(Anxiety))

# After performing the square root transformation on the Anxiety variable, the data is closer to a normal distribution and the skewness from the original histogram is not present

# 1d
par(mfrow=c(2,1))
hist(Depression)
hist(log(Depression))

# After performing the log normal transformation on the Depression variable the data appears more normal and less right skewness.

# 1e
hist(Screen)
# The Screen variable is fairly normally distributed with slight right skewness.


# 1f
boxplot(Screen ~ UserType)

# Screen time seems to differ based on the user time as seen by the boxplots having varying means, with Hyper-connected showcasing the highest mean around 7 and Digital Minimalist having the lowest around 1.75.

# 1g
subset=subset(mental_health, Screen>8, select = c(Age))
head(subset)

##    Age
## 7   19
## 34  18
## 40  18
## 42  20
## 56  19
## 99  19

par(mfrow=c(3,2))
hist(Age)

boxplot(Age)

qqnorm(Age)
qqline(Age)
# The graphs for Age shown are right skewed with no outliers.

# 2a
attach(mental_health)

## The following objects are masked from mental_health (pos = 3):
## 
##     Activity, Age, Anxiety, Content, Depression, Gender, Night,
##     Platform, Screen, Sleep, Social, User_ID, UserType

cor(Screen, Anxiety)

## [1] 0.6272098

# 0.6272098 is a positive correlation that is moderately strong and shows Screen with the strongest correlation with Anxiety out of the three variables.
cor(Sleep, Anxiety)

## [1] -0.4379324

# -.4379324 is a negative correlation that is somewhat weak.
cor(Age, Anxiety)

## [1] 0.03827854

# 0.03827854 is a positive correlation that is not very strong as the decimal is low, showing the lowest correlation

# 2b
plot(Age, Anxiety)

plot(Screen, Anxiety)

plot(Sleep, Anxiety)

#I do feel that the correlations and the plots coincide as I can see the positive correlation for Screen and Anxiety represented as well as the negative correlation between Sleep and Anxiety. While I did expect the Age and Anxiety graph to appear as it does, it makes sense as to why it is around .43 for the correlation coefficient as the points are in 'columns' for every integer of age. 


# 2c
m1= lm(Anxiety ~ Age)  

summary(m1)

## 
## Call:
## lm(formula = Anxiety ~ Age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0585 -3.5843  0.0996  3.4157 13.5738 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   4.5806     1.7463   2.623  0.00878 **
## Age           0.1581     0.0915   1.728  0.08421 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.674 on 2034 degrees of freedom
## Multiple R-squared:  0.001465,   Adjusted R-squared:  0.0009743 
## F-statistic: 2.985 on 1 and 2034 DF,  p-value: 0.08421

# 2d
#Equation: Anxiety = 4.5806 + 0.1581(Age)

# 2e
m2= lm(Depression ~ Screen)  

summary(m2)

## 
## Call:
## lm(formula = Depression ~ Screen)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4820  -2.6447  -0.6219   2.3826  13.8267 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03636    0.18170    0.20    0.841    
## Screen       1.24799    0.03732   33.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.891 on 2034 degrees of freedom
## Multiple R-squared:  0.3547, Adjusted R-squared:  0.3544 
## F-statistic:  1118 on 1 and 2034 DF,  p-value: < 2.2e-16

# 2f 
dim(mental_health)

## [1] 2036   13

alpha=.05

#Ho: B1=0
#Ha: B1=/0

qt(.025, 2022)

## [1] -1.961138

qt(.975, 2022)

## [1] 1.961138

(1-pt(1.961138, 2022))*2

## [1] 0.04999999

confint(m2, level=.95)

##                  2.5 %    97.5 %
## (Intercept) -0.3199737 0.3926877
## Screen       1.1748020 1.3211814

# 2g
confint(m2, level=.99)

##                  0.5 %    99.5 %
## (Intercept) -0.4321019 0.5048159
## Screen       1.1517711 1.3442123

# 2h
Screen[23]

## [1] 0.68

predict(m2)[23]

##        23 
## 0.8849914

residuals(m2)[23]

##       23 
## 4.115009

# 2i 
newdata = data.frame(Screen=.68)
predict(m2, newdata, interval="prediction", level=.95)

##         fit       lwr      upr
## 1 0.8849914 -6.751506 8.521488

# 2j

STAT 170 HW 1

2026-02-16

R Markdown

Including Plots