aliens <- read.csv ("aliens.csv", header = TRUE, stringsAsFactors = TRUE)
make.my.sample <- function(studentID, n, data) {
RNGversion("3.2.1")
set.seed(studentID)
sample_values <- sample(size = n, c(1:nrow(data)))
my_sample <- data[data$ID %in% sample_values,]
return(my_sample)
}
my_sample <- suppressWarnings(make.my.sample(33002176, 100, aliens))
library(lsr)
plot(my_sample$anxiety, my_sample$depression)
cor(my_sample$anxiety, my_sample$depression)
## [1] 0.6263347
I would say this is a positive linear associated. It is not very strong however there is a clear correlation.
regress.1 <- lm(my_sample$depression~my_sample$anxiety)
summary(regress.1)
##
## Call:
## lm(formula = my_sample$depression ~ my_sample$anxiety)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.7262 -5.2236 0.2738 5.7102 16.8985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.9601 8.7168 3.552 0.00059 ***
## my_sample$anxiety 1.3753 0.1729 7.954 3.18e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.517 on 98 degrees of freedom
## Multiple R-squared: 0.3923, Adjusted R-squared: 0.3861
## F-statistic: 63.26 on 1 and 98 DF, p-value: 3.183e-12
plot(my_sample$anxiety, my_sample$depression)
abline(regress.1)
The slope is 1.3753. , this tells me that the more anxiety the aliens
have, the more likely they are to have depression. The null hypothesis
being testing is that the variable has no correlation with the dependent
variable. I would reject this because I believe there is a
correlation
Based on your regression line, the predicted depression score for an alien with an anxiety score of 60? would be about 82.518. I did this by multiplying the slope intercept of 1.3753 by 60.
Do Questions 1-2 again, but now we’re interested in how the sociable variable might depend on anxiety.
plot(my_sample$anxiety, my_sample$sociable)
cor(my_sample$anxiety, my_sample$sociable)
## [1] -0.5737381
I would say this is a negative linear associated. It is not very strong however there is a clear correlation.
regress.1 <- lm(my_sample$sociable~my_sample$anxiety)
summary(regress.1)
##
## Call:
## lm(formula = my_sample$sociable ~ my_sample$anxiety)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.971 -5.578 0.821 5.944 26.465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 170.206 10.286 16.548 < 2e-16 ***
## my_sample$anxiety -1.415 0.204 -6.935 4.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.05 on 98 degrees of freedom
## Multiple R-squared: 0.3292, Adjusted R-squared: 0.3223
## F-statistic: 48.09 on 1 and 98 DF, p-value: 4.384e-10
plot(my_sample$anxiety, my_sample$sociable)
abline(regress.1)
The slope is -1.415, this tells me that the more sociable the aliens
are, the less likely they are to have anxiety The null hypothesis being
testing is that the variable has no correlation with the dependent
variable. I would reject this because I believe there is a
correlation
Based on everything I’ve done so far, I would expect the depression and sociable variables to be negatively associated because of the affect sociability had on anxiety and if depression is influenced by anxiety, the deprfession rate is likely to be reduced.
regress.1 <- lm(my_sample$sociable~my_sample$depression)
summary(regress.1)
##
## Call:
## lm(formula = my_sample$sociable ~ my_sample$depression)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.9500 -1.1252 0.7963 1.7993 18.0379
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 200.38726 4.94501 40.52 <2e-16 ***
## my_sample$depression -1.01208 0.04918 -20.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.319 on 98 degrees of freedom
## Multiple R-squared: 0.8121, Adjusted R-squared: 0.8101
## F-statistic: 423.4 on 1 and 98 DF, p-value: < 2.2e-16
plot(my_sample$depression, my_sample$sociable)
abline(regress.1)
my_sample_new <- my_sample
my_sample_new[100,9:10] <- c(20, 140)
plot(my_sample_new$anxiety, my_sample_new$depression)
Based on the comparison between these new results and what I got for the original Questions 1-2, I can conclude that the sensitivity of correlation and regression to individual data points is not hugely affected when you change the possible datat points because of the correlation being constant in the relationship between depression and anxiety among the aliens.