7.6
From the scatter plot, the age’s of husbands and wives in this survey have a strong, positive correlation.
From the scatter plot, the height’s of husbands and wives in this survey have a very weak correlation.
Which plot shows a stronger correlation? Explain your reasoning. The age scatter plot shows a much stronger relation since the points are closer in a positive direction.
Yes. This conversion would affect the correlation between husbands’ and wives’ heights because it would create less variance between the heights therefore there will be a stronger correlation.
7.12
There is a moderate, positive correlation between the data
There is a strong positive correlation.
Diameter since the correlation between volume and diameter is stronger.
7.18
If men always made $5,000 more than women there would be a strong positve correlation between men and women salaries.
If men always made 25% more than women there would be a strong positive relationship between men and women salaries.
If men always made 15% less than women there would be a strong negative relationship between men and women.
7.24
TThere is a moderate, positive correlation between the data
The calories are the explanatory variable and the carbs are the response variable
Fitting a regression so we can know how the regression is at predicting.
From the scatter plot we can say that this data does follow a linear trend The data already does not meet the conditions required for fitting a least squared line
7.30
HeartWeight = -0.357 + (4.034)BodyWeight
An intercept of -.357 means a cat that weighs 0kg would have a heart that weighs -.357kg
A slope of 4.034, for every 1kg increase in a cat’s weight, the heart weight will increase by 4.034kg.
64% of the variation in a the heart weights of cats is because of the weight.
Calculate the correlation coefficient The correlation coefficient is .8041
7.36
There is a moderate, positive correlation between the data
BAC = -.0127 + (.018)Cans
The intercept shows that, a person who has consumed 0 cans of beer will have a BAC of -.0127.
The slope shows that, for every can of beer a person has, their BAC will rise by .018.
With a small p-value, the data does provide strong evidence that drinking more cans of beer is associated with an increase in BAC
This means that 79% of the variation in BAC is because of cans of beer consumed.
7.42
head_circumference = 3.91 + 0.78 x (28)= 25.75cm
t = (.78-0)/.35 t = 2.229 df = 23 p-value = .0178 < .05
This model provides strong evidence that gestational age is collocated with head circumference
8.2
Every child who is not a first born will have 1.93 less than the child who is firt born.
The p value of the parity is 0.1052 which is greater than 0.05. So there is no statistically significant relationship.
8.4
Absenteeism = 18.93 - 9.11 * (ethnic background) + 3.10 * (sex) + 2.15 * (learner status)
Ethnic background: The model predicts a 9.11 days decress Sex: The model predicts a 3.10 increase in man Learner status: The model predicts a 2.15 days increase in slow learner
Absent <- 18.93 - 9.11 * (0) + 3.10 * (1) + 2.15 * (1)
Residual <- 2 - Absent
paste("Residual for this student: ", Residual)
## [1] "Residual for this student: -22.18"
R2.Ab <- 1 - (240.57)/(264.17)
R2.Ab.adj <- 1 - (240.57/264.17)*((146-1)/(146-3-1))
paste("R-squared: ", round(R2.Ab,4)) ;paste("R-squared adjusted: ", round(R2.Ab.adj,4))
## [1] "R-squared: 0.0893"
## [1] "R-squared adjusted: 0.0701"
8.6
dia<-c(83,86,88,105,107,108,110,110,111,112,113,114,114,117,120,129,129,133,137,138,140,142,145,160,163,173,175,179,180,180,206)
ht<-c(70,65,63,72,81,83,66,75,80,75,79,76,76,69,75,74,85,86,71,64,78,80,74,72,77,81,82,80,80,80,87)
vol<-c(103,103,102,164,188,197,156,182,226,199,242,210,214,213,191,222,338,274,257,249,345,317,363,383,426,554,557,583,515,510,770)
dia<-dia/10
vol<-vol/10
meas <- cbind(dia,ht,vol)
meas
## dia ht vol
## [1,] 8.3 70 10.3
## [2,] 8.6 65 10.3
## [3,] 8.8 63 10.2
## [4,] 10.5 72 16.4
## [5,] 10.7 81 18.8
## [6,] 10.8 83 19.7
## [7,] 11.0 66 15.6
## [8,] 11.0 75 18.2
## [9,] 11.1 80 22.6
## [10,] 11.2 75 19.9
## [11,] 11.3 79 24.2
## [12,] 11.4 76 21.0
## [13,] 11.4 76 21.4
## [14,] 11.7 69 21.3
## [15,] 12.0 75 19.1
## [16,] 12.9 74 22.2
## [17,] 12.9 85 33.8
## [18,] 13.3 86 27.4
## [19,] 13.7 71 25.7
## [20,] 13.8 64 24.9
## [21,] 14.0 78 34.5
## [22,] 14.2 80 31.7
## [23,] 14.5 74 36.3
## [24,] 16.0 72 38.3
## [25,] 16.3 77 42.6
## [26,] 17.3 81 55.4
## [27,] 17.5 82 55.7
## [28,] 17.9 80 58.3
## [29,] 18.0 80 51.5
## [30,] 18.0 80 51.0
## [31,] 20.6 87 77.0
df=n-2 = 29. For small n we will uset t table to identify the for two-tailed multiplier with a=.05 (.025*2)
8.8
The learner status should be removed from the model first, since we get a better adjusted R^2
8.10
p-value of ethinicity is <0.05 ,so it is the only significant variable so we can add it first since it also contribute to explaining the r2 a lot.
8.12
I think the company should use p-value approach since it is quite useful for selecting variables.
8.14
From the graph, we can know that the observations almost all lie on the normal quantile plot except the first few but there are some initial values do not lie on the plot. Therefore we can know that they may be independent.
8.18
p <- function(temp)
{Orig_Failure_Prob <- 11.6630 - 0.2162 * temp
p_hat <- exp(Orig_Failure_Prob) / (1 + exp(Orig_Failure_Prob))
return (round(p_hat*100,2))}
paste("O-Ring Failure Probability at Temp=51 F: ", p(51),"%") ;
## [1] "O-Ring Failure Probability at Temp=51 F: 65.4 %"
paste("O-Ring Failure Probability at Temp=53 F: ", p(53),"%") ;
## [1] "O-Ring Failure Probability at Temp=53 F: 55.09 %"
paste("O-Ring Failure Probability at Temp=55 F: ", p(55),"%")
## [1] "O-Ring Failure Probability at Temp=55 F: 44.32 %"
temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)
undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)
ShuttleMission <- data.frame(temperature, damaged, undamaged)
library(ggplot2)
ggplot(ShuttleMission,aes(x=temperature,y=damaged)) + geom_point() + stat_smooth(method = 'glm', family = 'binomial')
## Warning: Ignoring unknown parameters: family
## `geom_smooth()` using formula 'y ~ x'
temp.x <- seq(from = 51, to = 71, by = 2)
y <- c(p(51)/100, p(53)/100, p(55)/100, 0.341, 0.251, 0.179, 0.124, 0.084, 0.056, 0.037, 0.024)
plot(temp.x, y, type = "o", col = "red")