1 Question 1

1.1 Question of interest/goal of the study

It was of interest to build a model to explain the effect or the impact of the time students spend on digital devices with screens after school has to them in relation with their year at school. We want to investigate the data to see the relationship between the time spent on the screens and the year at school as well as speculate other questions of interest such as to identify which one of the four years has the lowest average after school screen time? If so, which year and quantify the increase verses the other years.

1.2 Inspect the data

screen.df <- read.csv("screentime.csv",header=TRUE,stringsAsFactors = TRUE)
screen.df$Year <- factor(screen.df$Year, levels = c("Y4","Y7","Y10","Y13"))
boxplot(Screen~Year,screen.df)

summaryStats(Screen ~ Year,data=screen.df)
##     Sample Size     Mean Median   Std Dev Midspread
## Y4           30 1.596667   1.60 0.9226959     1.150
## Y7           30 1.776667   1.70 1.1443062     1.900
## Y10          30 3.066667   3.05 1.0446360     1.325
## Y13          30 3.473333   3.75 1.4548216     1.975

1.3 Comment on plots and summary statistics

The time spent on screen is much higher in year Y13 as compared to that spent in Y4. The time spent for the first two years i.e. Y4 and Y7 are averagely less than 2 hrs and the of the last two are averagely 3.2 for the years Y10 and Y13. Y13 has one outlier screen time value which is above 6 as per the scale of the data set.

1.5 Fit an appropriate linear Model and Check Assumptions

screen.fit1=lm(Screen ~ Year,data=screen.df)

plot(screen.fit1,which=1)

normcheck(screen.fit1)

cooks20x(screen.fit1)

summary(screen.fit1)
## 
## Call:
## lm(formula = Screen ~ Year, data = screen.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5733 -0.8817  0.0683  0.6333  4.1267 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.5967     0.2115   7.549 1.09e-11 ***
## YearY7        0.1800     0.2991   0.602    0.549    
## YearY10       1.4700     0.2991   4.914 2.95e-06 ***
## YearY13       1.8767     0.2991   6.274 6.27e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.159 on 116 degrees of freedom
## Multiple R-squared:  0.3343, Adjusted R-squared:  0.3171 
## F-statistic: 19.42 on 3 and 116 DF,  p-value: 2.866e-10
summary1way(screen.fit1)
## ANOVA Table:
##                 Df  Sum Squares  Mean Square  F-statistic  p-value   
## Between Groups  3   78.175       26.05833     19.41546     0         
## Within Groups   116 155.68867    1.34214                             
## Total           119 233.86367                                        
## 
## Numeric Summary:
##           Sample size     Mean  Median  Std Dev  Midspread
## All Data          120  2.47833    2.30  1.40187      2.000
## Y4                 30  1.59667    1.60  0.92270      1.150
## Y7                 30  1.77667    1.70  1.14431      1.900
## Y10                30  3.06667    3.05  1.04464      1.325
## Y13                30  3.47333    3.75  1.45482      1.975
## 
## Table of Effects: (GrandMean and deviations from GM)
##  typ.val       Y4       Y7      Y10      Y13 
##  2.47833 -0.88167 -0.70167  0.58833  0.99500

multipleComp(screen.fit1)
##               Estimate Tukey.L Tukey.U Tukey.p
## Y4  -  Y7   -0.1800000 -0.9597  0.5997  0.9313
## Y4  -  Y10  -1.4700000 -2.2497 -0.6903  0.0000
## Y4  -  Y13  -1.8766667 -2.6564 -1.0969  0.0000
## Y7  -  Y10  -1.2900000 -2.0697 -0.5103  0.0002
## Y7  -  Y13  -1.6966667 -2.4764 -0.9169  0.0000
## Y10  -  Y13 -0.4066667 -1.1864  0.3731  0.5272

1.6 Method and Assumption Checks

The screen time was the response variable in the simple regression model that was built above, and the School year of the student was taken as the explanatory variable. The model had no control variables. The residual plot had no much implications on the nature of the data set. The model results demonstrated a poor model fit with the R square value being 0.3343, which means that only 33% of the total variation was accounted for by the sample data. The model coefficient was however statistically significant evidenced by the p-value of 0 which is less than the 0.05 alpha level of significance. Our final model will therefore be, screen=2.47833-0.88167Y4-0.70167Y7+0.58833Y10+0.99500Y13 Where 2.47833 is the slope. Both Y4 and Y7 negatively affects the screen time, this means that the students in these years their screen time hours were not consistence while those in Y10 and Y13 were consistent. The researcher is interested in differences between consecutive school year groups (so comparing Year 4 to Year 7, Year 7 to Year 10 and Year 10 to Year 13). Which of these, if any these comparisons, have significant changes in screen time? For any that are significant, how big are the changes?

1.7 Executive Summary

We were interested in investigating how the different screen time affected the students and particularly the relationship between the screen time and the school year the student year of study. Additionally, we discovered substantial evidence that perhaps the screen time varied significantly among the different years and escalated for every successive year. We found that a week correlation to be existing between screen time and the student year of study. This was clearly that there was no evidence to indicate that there was a strong the relationship between screen time and the student year of study, however the relationship existed between the two variable. The findings also revealed that in some specific years i.e Y10 and Y13 the student had high chances of using much time on their screen such us computers and other devises after school hours. This would probably be as a result of the final research student do in their final years hence need for them to spend much time in the internet doing research.


2 Question 2

2.1 Question of interest/goal of the study

It was of interest to see how the brand of club effects the distance the golf ball travels, and to see if any brand effect depends on the type of club: 5 iron or driver. Another objective was to see if there was consistence bwtween the two clubs and also see if there was a possibilty of one brand being better than the other.

2.2 Read in and inspect the data

golf.df <- read.table("golf2.txt", header = TRUE,stringsAsFactors = TRUE)
head(golf.df)
##     Club Brand Distance
## 1 Driver     A    226.4
## 2 Driver     A    232.6
## 3 Driver     A    234.0
## 4 Driver     A    220.7
## 5 5.Iron     A    163.8
## 6 5.Iron     A    179.4
golf.df$Club <- factor(golf.df$Club, levels = c("Driver","5.Iron"))
golf.df$Brand <- factor(golf.df$Brand, levels = c("A","B","C"))
comb.df<-cbind(golf.df$Club, golf.df$Brand)
boxplot(Distance~Club + Brand ,golf.df)

summaryStats(Distance ~ Club + Brand,data=golf.df)
##        Sample Size     Mean Median  Std Dev Midspread
## Driver          12 235.0833  235.6 7.740547     9.650
## 5.Iron          12 173.7167  176.2 9.230664    12.475

2.3 Comment on the plots

The box plot reveals that in all the different brands, drivers still hit balls greater distances than 5 irons.The box plots also shows that the brands normalized the 5.Irons distances. The drivers distances are seen to increase from Brand A to Brand C progressively for the drivers.

2.4 Fit model and check assumptions

Distance.fit1=lm(Distance ~ Club+Brand,data=golf.df)

plot(Distance.fit1,which=1)

normcheck(Distance.fit1)

cooks20x(Distance.fit1)

summary(Distance.fit1)
## 
## Call:
## lm(formula = Distance ~ Club + Brand, data = golf.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.954  -5.648   2.019   4.579  11.079 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  230.546      3.306  69.730  < 2e-16 ***
## Club5.Iron   -61.367      3.306 -18.561 4.47e-14 ***
## BrandB         8.338      4.049   2.059   0.0528 .  
## BrandC         5.275      4.049   1.303   0.2075    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.099 on 20 degrees of freedom
## Multiple R-squared:  0.9458, Adjusted R-squared:  0.9376 
## F-statistic: 116.3 on 3 and 20 DF,  p-value: 7.927e-13
summary2way(Distance.fit1)
## ANOVA Table:
##           Df Sum Squares Mean Square F-statistic p-value
## Club       1     22595.2     22595.2      344.50 0.00000
## Brand      2       284.6       142.3        2.17 0.14037
## Residuals 20      1311.8        65.6
summary2way(
  Distance.fit1,
  page = c("table", "means", "effects", "interaction", "nointeraction"),
  digit = 4,
  conf.level = 0.95,
  print.out = TRUE,
  new = TRUE,
  all = FALSE,
  FUN = "identity"
)
## ANOVA Table:
##           Df Sum Squares Mean Square F-statistic p-value
## Club       1     22595.2     22595.2      344.50 0.00000
## Brand      2       284.6       142.3        2.17 0.14037
## Residuals 20      1311.8        65.6

2.5 Methods and assumption checks

I fitted an appropriate two-way ANOVA on the data set and build a multiple regression model that evaluates the effects of the brand on the distances. The residual plot looked fine but overdispersion was evident and the multiple regression model was as fitted. The normality of the dataset was evaluated using a histrogram and according to the plot, the data normally distributed. The response variable is taken to be the Distances while Club and Brand taken as the explanatory variables. The overall and final model that resulted from the analysis was; Distances = 230.546-61.367Club5.Iron + 8.338BrandB + 5.275*BrandC Where the model intercept is 230.546 while the slope coefficients for the interactions are -61.367 for Club5.irons, 8.338 for BrandB and 5.275 for BrandC respectively. The model coefficients shows that the interaction between Club and 5.Iron have a negative effect on the total distance the drivers hit while brands B and C positively affect the distance the golf ball will travel. The significance of the model coefficients was explored using the two way anova table. The club variable accoding to the final table of the two way anova it was significant at 95% level of significance while that of Brand was seen to be insinginificant since the p-value is 0.14037 which higher than 0.05 alpha level of significance.

2.6 Executive Summary

This analysis was focused on investigating the brand of club effects on the distance the golf ball travels and more specifically on seeing whether there is any brand effect that depend on the type of club. The findings of the analyses conducted including the ANOVA table and the model shows that indeed there was brand of club affected the distance of the golf ball. The results revealed brand B and C to be better than brand A since they were selected to the model. The two brands were seen to have a very substantial effect on the model as evidenced by the higher value of the R square making it to be the best model fit. The analysis reveal that there was consistence for the drivers club while the 5.Iron club lacked consistence and as a results the driver club was considered to be one of the best club to make predictions on the distance traveled and that brand B was taken to be the best brand since it had a higher coefficient of 8.338.