aliens <- read.csv ("aliens.csv", header = TRUE, stringsAsFactors = TRUE)
make.my.sample <- function(studentID, n, data) {
RNGversion("3.2.1")
set.seed(studentID)
sample_values <- sample(size = n, c(1:nrow(data)))
my_sample <- data[data$ID %in% sample_values,]
return(my_sample)
}
my_sample <- suppressWarnings(make.my.sample(33002176, 100, aliens))
library(lsr)

Question 1

plot(my_sample$anxiety, my_sample$depression)

cor(my_sample$anxiety, my_sample$depression)
## [1] 0.6263347

I would say this is a positive linear associated. It is not very strong however there is a clear correlation.

Question 2

regress.1 <- lm(my_sample$depression~my_sample$anxiety)
summary(regress.1)
## 
## Call:
## lm(formula = my_sample$depression ~ my_sample$anxiety)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.7262  -5.2236   0.2738   5.7102  16.8985 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        30.9601     8.7168   3.552  0.00059 ***
## my_sample$anxiety   1.3753     0.1729   7.954 3.18e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.517 on 98 degrees of freedom
## Multiple R-squared:  0.3923, Adjusted R-squared:  0.3861 
## F-statistic: 63.26 on 1 and 98 DF,  p-value: 3.183e-12
plot(my_sample$anxiety, my_sample$depression)
abline(regress.1)

The slope is 1.3753. , this tells me that the more anxiety the aliens have, the more likely they are to have depression. The null hypothesis being testing is that the variable has no correlation with the dependent variable. I would reject this because I believe there is a correlation

Question 3

Based on your regression line, the predicted depression score for an alien with an anxiety score of 60? would be about 82.518. I did this by multiplying the slope intercept of 1.3753 by 60.

Question 4

Do Questions 1-2 again, but now we’re interested in how the sociable variable might depend on anxiety.

plot(my_sample$anxiety, my_sample$sociable)

cor(my_sample$anxiety, my_sample$sociable)
## [1] -0.5737381

I would say this is a negative linear associated. It is not very strong however there is a clear correlation.

regress.1 <- lm(my_sample$sociable~my_sample$anxiety)
summary(regress.1)
## 
## Call:
## lm(formula = my_sample$sociable ~ my_sample$anxiety)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.971  -5.578   0.821   5.944  26.465 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        170.206     10.286  16.548  < 2e-16 ***
## my_sample$anxiety   -1.415      0.204  -6.935 4.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.05 on 98 degrees of freedom
## Multiple R-squared:  0.3292, Adjusted R-squared:  0.3223 
## F-statistic: 48.09 on 1 and 98 DF,  p-value: 4.384e-10
plot(my_sample$anxiety, my_sample$sociable)
abline(regress.1)

The slope is -1.415, this tells me that the more sociable the aliens are, the less likely they are to have anxiety The null hypothesis being testing is that the variable has no correlation with the dependent variable. I would reject this because I believe there is a correlation

Question 5

Based on everything I’ve done so far, I would expect the depression and sociable variables to be negatively associated because of the affect sociability had on anxiety and if depression is influenced by anxiety, the deprfession rate is likely to be reduced.

regress.1 <- lm(my_sample$sociable~my_sample$depression)
summary(regress.1)
## 
## Call:
## lm(formula = my_sample$sociable ~ my_sample$depression)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.9500  -1.1252   0.7963   1.7993  18.0379 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          200.38726    4.94501   40.52   <2e-16 ***
## my_sample$depression  -1.01208    0.04918  -20.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.319 on 98 degrees of freedom
## Multiple R-squared:  0.8121, Adjusted R-squared:  0.8101 
## F-statistic: 423.4 on 1 and 98 DF,  p-value: < 2.2e-16
plot(my_sample$depression, my_sample$sociable)
abline(regress.1)

Question 6

my_sample_new <- my_sample
my_sample_new[100,9:10] <- c(20, 140)
 plot(my_sample_new$anxiety, my_sample_new$depression)

Based on the comparison between these new results and what I got for the original Questions 1-2, I can conclude that the sensitivity of correlation and regression to individual data points is not hugely affected when you change the possible datat points because of the correlation being constant in the relationship between depression and anxiety among the aliens.