### First install the package with the command below"
### install.packages("UsingR") 
### and then load it with the command below
library(UsingR)

Problem 10.4 page 68

Load the Simple data set cancer. Look only at cancer[[‘stomach’]]. These are survival times for stomach cancer patients taking a large dosage of Vitamin C. Test the null hypothesis that the Median is 100 days. Should you also use a t-test? Why or why not? (A boxplot of the cancer data is interesting.)

###First we can do some diagnostic tests:

simple.eda(cancer$stomach)

###From the plots we can see that we have quite few observations and they do not follow the Normal Distribution
###So a T-test is not a proper test for the Median (see how skewed are the data)
###For that reason we will apply a Wlcoxon Test
wilcox.test(cancer$stomach, mu=100, alternative = c("two.sided"))
## 
##  Wilcoxon signed rank test
## 
## data:  cancer$stomach
## V = 61, p-value = 0.3054
## alternative hypothesis: true location is not equal to 100
###Note the p value is around 30%, so the null hypothesis is not rejected.

###For your reference the meidan of the sample is:

median(cancer$stomach)
## [1] 124

Problem 13.1 page 83

Make a scatterplot, and fit the data with a regression line. On the same graph, test the hypothesis that an extra bedroom costs $60,000 against the alternative that it costs more.

price<-c(300,250,400,550,317,389,425,289,389,559)
no.bedrooms<-c(3,3,4,5,4,3,6,3,4,5)

plot(no.bedrooms, price); abline(lm(price~no.bedrooms))

lm(price~no.bedrooms)
## 
## Call:
## lm(formula = price ~ no.bedrooms)
## 
## Coefficients:
## (Intercept)  no.bedrooms  
##        94.4         73.1
### OR
lm.result<-simple.lm(no.bedrooms, price)

summary(lm.result)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -108.00  -53.95   -5.75   59.77   99.10 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    94.40      97.98   0.963   0.3635  
## x              73.10      23.76   3.076   0.0152 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75.15 on 8 degrees of freedom
## Multiple R-squared:  0.5419, Adjusted R-squared:  0.4846 
## F-statistic: 9.462 on 1 and 8 DF,  p-value: 0.01521
###Finally we need to test if the slop is 60 (thousands or not)

es<-resid(lm.result)  #the residulas of our model
b1<-(coef(lm.result))[['x']]
s = sqrt( sum( es^2 ) / (length(price)-2) )
SE = s/sqrt(sum((no.bedrooms-mean(no.bedrooms))^2))

t = (b1 - (60) )/SE ### 60 is the number we want to compare with b1 which is the slope

pt(t,length(price)-2,lower.tail=FALSE) ##find the right tail for this value of t with 10-2 degrees of freedom
## [1] 0.2982602
###Based on the p-value we do not reject the null hypothesis and we can argue that the statment that an extra bedroom costs $60,000
## is valid