### First install the package with the command below"
### install.packages("UsingR")
### and then load it with the command below
library(UsingR)
Load the Simple data set cancer. Look only at cancer[[‘stomach’]]. These are survival times for stomach cancer patients taking a large dosage of Vitamin C. Test the null hypothesis that the Median is 100 days. Should you also use a t-test? Why or why not? (A boxplot of the cancer data is interesting.)
###First we can do some diagnostic tests:
simple.eda(cancer$stomach)
###From the plots we can see that we have quite few observations and they do not follow the Normal Distribution
###So a T-test is not a proper test for the Median (see how skewed are the data)
###For that reason we will apply a Wlcoxon Test
wilcox.test(cancer$stomach, mu=100, alternative = c("two.sided"))
##
## Wilcoxon signed rank test
##
## data: cancer$stomach
## V = 61, p-value = 0.3054
## alternative hypothesis: true location is not equal to 100
###Note the p value is around 30%, so the null hypothesis is not rejected.
###For your reference the meidan of the sample is:
median(cancer$stomach)
## [1] 124
Make a scatterplot, and fit the data with a regression line. On the same graph, test the hypothesis that an extra bedroom costs $60,000 against the alternative that it costs more.
price<-c(300,250,400,550,317,389,425,289,389,559)
no.bedrooms<-c(3,3,4,5,4,3,6,3,4,5)
plot(no.bedrooms, price); abline(lm(price~no.bedrooms))
lm(price~no.bedrooms)
##
## Call:
## lm(formula = price ~ no.bedrooms)
##
## Coefficients:
## (Intercept) no.bedrooms
## 94.4 73.1
### OR
lm.result<-simple.lm(no.bedrooms, price)
summary(lm.result)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108.00 -53.95 -5.75 59.77 99.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 94.40 97.98 0.963 0.3635
## x 73.10 23.76 3.076 0.0152 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.15 on 8 degrees of freedom
## Multiple R-squared: 0.5419, Adjusted R-squared: 0.4846
## F-statistic: 9.462 on 1 and 8 DF, p-value: 0.01521
###Finally we need to test if the slop is 60 (thousands or not)
es<-resid(lm.result) #the residulas of our model
b1<-(coef(lm.result))[['x']]
s = sqrt( sum( es^2 ) / (length(price)-2) )
SE = s/sqrt(sum((no.bedrooms-mean(no.bedrooms))^2))
t = (b1 - (60) )/SE ### 60 is the number we want to compare with b1 which is the slope
pt(t,length(price)-2,lower.tail=FALSE) ##find the right tail for this value of t with 10-2 degrees of freedom
## [1] 0.2982602
###Based on the p-value we do not reject the null hypothesis and we can argue that the statment that an extra bedroom costs $60,000
## is valid