Question 1: Calc Scores
After performaing a regression test comparing section A’s scores on tests 1 and 4, I determined for a number of reasons that the correlation is not strong. First off, looking at the p-value extracted from the test 1 scores against test 4 scores, 1.84e^-05, I note that it is much smaller than the significance level of .05, so I must reject the hypothesis that the scores are correlated. Additionally, when looking at the residuals vs fitted chart, it does not present normal regression. The lower scores values are distributed on this chart way too different from the higher score values. Furthering this point is the Normal Q-Q chart whcih has most of the data points following on the line, but the points with theoretical quantities between -1 and -2 are concering for they are too far away from the line.
plot(scores$B1, scores$B4, col="blue", main="Scores on Exam 1 and Exam 4 in Math 157 B", xlab="Exam 1 Score", ylab="Exam 4 Score")
n<-lm(scores$B4~scores$B1)
abline(n)

summary(n)
##
## Call:
## lm(formula = scores$B4 ~ scores$B1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.442 -6.869 3.051 8.624 23.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.0561 15.3364 1.438 0.16188
## scores$B1 0.6884 0.1911 3.603 0.00125 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.71 on 27 degrees of freedom
## Multiple R-squared: 0.3247, Adjusted R-squared: 0.2997
## F-statistic: 12.98 on 1 and 27 DF, p-value: 0.001252
plot(n)




After performing a regression test comparing section B’s scores on tests 1 and 4, I determined for a number of reasons that the correlation is not strong. First off, looking at the p-value extracted from the test 1 scores against test 4 scores, 0.00125, I note that it is much smaller than the significance level of .05, so I must reject the hypothesis that the scores are correlated. Additionally, when looking at the residuals vs fitted chart, it does not present normal regression. There is no distinct pattern on this cart. Rather, it forms a sort of wave which does not provide evidence for correlation. Furthering this point is the Normal Q-Q chart whcih has most of the data points following on the line, but the points with theoretical quantities between -1 and -2 are concering for they are too far away from the line.
Some other interesting tests that could be done with the provided data, would be to gather data on how much studying each student did on each test. That data could determine which section studied more, which test was studied more for, and how much influence the study time had on the success on the exams. Another test would be to look at the variances of each section and determine which class is going to hold more volatile scores.
Question 2: Lion Hunting
Given the data on lion hunting success rates by group size and prey, the obvious question to answer is whether group size correlates with hunting success rate. In order to test this, I will do regression tests for group size against gazelle success, group size against Wildebeest and Zebra success, group size against other success, and group size against mean success.
library(readxl)
gazelle <- read_excel("~/R/win-library/3.4/gudatavizfa17/data/MathStatsCalcScores/gazelle.xlsx")
library(readxl)
WildebeestandZebra <- read_excel("~/R/win-library/3.4/gudatavizfa17/data/MathStatsCalcScores/WildebeestandZebra.xlsx")
library(readxl)
Other <- read_excel("~/R/win-library/3.4/gudatavizfa17/data/MathStatsCalcScores/Other.xlsx")
library(readxl)
LionHunting <- read_excel("~/R/win-library/3.4/gudatavizfa17/data/MathStatsCalcScores/Lion Hunting.xlsx")
plot(LionHunting$'Group Size', LionHunting$'G Successes', col="blue", main="Group Size Against Gazelle Success", xlab="Group Size", ylab="Gazelle Success")
q<-lm(LionHunting$'Group Size'~LionHunting$'G Successes')
abline(q)

summary(q)
##
## Call:
## lm(formula = LionHunting$"Group Size" ~ LionHunting$"G Successes")
##
## Residuals:
## 1 2 3 4 5
## 0.1310 -1.6699 -1.1292 0.7973 1.8708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.842 3.178 -0.580 0.603
## LionHunting$"G Successes" 17.915 10.766 1.664 0.195
##
## Residual standard error: 1.655 on 3 degrees of freedom
## Multiple R-squared: 0.48, Adjusted R-squared: 0.3066
## F-statistic: 2.769 on 1 and 3 DF, p-value: 0.1947
plot(q)



## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

This was likely the most relevant of the tests I have perfromed on question 2. After performing this regression test, I have to fail to reject the null hypothesis that there is a correlation between group size and Mean success rate. The p-value extracted was 0.0976 which is greater than the significance level of .05 so I must fail to reject. Again, with single outliers, the residual vs fitted charts and normal q-q charts show the trend the p-value presented.
Now that I have tested group size against success rate I would like to test from the other spectrum and try to determine which group size is most efficient for which prey. Additionally, for the most part, hunts were more successful with larger group size, but that does not necessarily mean the lions are going less hungry as the meal will be spread out across more mouths. I would like to see which group size results in the lions having the fullest stomachs.
Question 3:Life Expectancy with Contraceptive Use
After performing a regression test comparing life expectancies and contraceptive use of different countries, I determined for a number of reasons that the correlation is not strong. First off, looking at the p-value extracted from the regression test was <2e^-16. This value is much smaller than the significance level of .05, so I must reject the hypothesis that life expectancy and contraceptive use are correlated. When looking at the Normal Q-Q chart, more evidence supporting this rejection is presented. While a majority of the data points fall on the correlation line, the values with theoretical quantities less than -1 and greater than 2 drift away from the line which is concerning. That is enough to question the correlation.
Other factors that would be relevant to explore would be data on how much access each country has to contraceptives as well as how many children on average does each family have. I think a stronger correlation could be tied to life expectancy and contraceptive access as opposed to contraceptive use.