In-class part

This data set is taken from the BYSH exercise Gender discrimination in bank salaries, and a description of the variables can be found there. Use the following code to read the data into R, or click here to download as a file.

banksalary = read.csv(file='http://personal.psu.edu/~sar320/497/data/banksalary.csv', header=T)
  1. These parts are taken directly from the BYSH exercise. In addition to those given in Chapter 1, here are some helpful commands:
  1. Identify observational units, the response variable, and explanatory variables.

observational units : The people Subjects response variable : Beginning Salary explanatory variables : Sex Senior Age b. The mean starting salary of male workers ($5957) was 16% higher than the mean starting salary of female workers ($5139). Confirm these mean salaries. Is this enough evidence to conclude gender discrimination exists? If not, what further evidence would you need?

Not enough information needed, i’d like to see multiple years

  1. How would you expect age, experience, and education to be related to starting salary? Generate appropriate exploratory plots; are the relationships as you expected? What implications does this have for modeling?

I feel like for beginning salary nothing should really be related Education and Experience should be the ones that be related more Age shouldnt have much in my initial thoughts. So according to the initial plots they all together dont have much correlation between them all, the closest one is the more education you have the higher but they are all not closely linked together.

library(corrplot)
## corrplot 0.84 loaded
plot(mod.full)

plot(bsal~age+educ+exper,  data =banksalary )

  1. Why might it be important to control for seniority (number of years with the bank) if we are only concerned with the salary when the worker started? It is an important stat to look at as we look to see that, maybe they were at another position and starting salary was for another position they are starting something new so their salary is higher because of their previous experience with the bank. Just recording beginning salary what day they joined with inflation and other things not included

  2. By referring to exploratory plots and summary statistics, are any explanatory variables (including sex) closely related to each other? What implications does this have for modeling? It is hard to compare different response variables with the different summary statistic, the best way to develop a difference would be looking at them all together and looking at the mean and such togethet not all independently.

plot(mod.full)

##favstats(~bsal, data = banksalary)
##favstats(~age, data = banksalary)
##favstats(~educ, data=banksalary)
##favstats(~sex, data=banksalary)
##install.packages(mosaic)
##library(mosaic)
  1. Fit a simple linear regression model with starting salary as the response and experience as the sole explanatory variable (Model 1). Interpret the intercept and slope of this model; also interpret the R-squared value. Is there a significant relationship between experience and starting salary? 1.7% is R squared which is very low and not significant at all, the intercept of the model is vert large at 52898 and a slope of 1.3
mod.1 = lm(bsal~exper, data=banksalary)
summary(mod.1)
## 
## Call:
## lm(formula = bsal ~ exper, data = banksalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1389.02  -503.33   -36.03   383.14  2740.08 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5289.0217   109.2984  48.391   <2e-16 ***
## exper          1.3009     0.8064   1.613     0.11    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 703.5 on 91 degrees of freedom
## Multiple R-squared:  0.0278, Adjusted R-squared:  0.01712 
## F-statistic: 2.602 on 1 and 91 DF,  p-value: 0.1102
anova(mod.1)
## Analysis of Variance Table
## 
## Response: bsal
##           Df   Sum Sq Mean Sq F value Pr(>F)
## exper      1  1287898 1287898  2.6024 0.1102
## Residuals 91 45035392  494894
plot(mod.1)

  1. Does Model 1 meet all linear regression assumptions? List each assumption and how you decided if it was met or not. 4 linear regression assumptions
  1. Linearity
  2. Homoscedascity
  3. Independence
  4. Normaility Every step is independent from each result as that is what I am assuming. Normaility is reached everything is normaly distributed For Homoscedascity for the most part the residuals is mimicing the variance of X Linearity is the hardest one
  1. Is a model with all 4 confounding variables (Model 2, with senior, educ, exper, and age) better than a model with just experience (Model 1)? Justify with an appropriate significance test in addition to summary statistics of model performance.

Yes the model with the 4 variables is much better than the first one, just a few things pop right at you with the P value being substantially lower at 7.4 e^-7 But the R squared value is low about arounf 30% of the data is accounted for which is not good There is a big F value at 10.24 which is substantial so this data isnt the best model but works to an extent.

mod.2 = lm(bsal~senior+ educ+exper+age, data=banksalary)
summary(mod.2)
## 
## Call:
## lm(formula = bsal ~ senior + educ + exper + age, data = banksalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1308.38  -378.67   -22.45   364.97  1873.29 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5879.5837   765.3756   7.682 2.06e-11 ***
## senior       -22.4400     6.2469  -3.592  0.00054 ***
## educ         130.1487    28.3551   4.590 1.47e-05 ***
## exper          2.7983     1.1586   2.415  0.01779 *  
## age           -1.1022     0.7777  -1.417  0.15992    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 599.4 on 88 degrees of freedom
## Multiple R-squared:  0.3176, Adjusted R-squared:  0.2866 
## F-statistic: 10.24 on 4 and 88 DF,  p-value: 7.474e-07
  1. You should have noticed that the term for age was not significant in Model 2. What does this imply about age and about future modeling steps?

That a variable on its own can be significant but once it is paired with other terms you can see its affect to the rest of the data. For this model we see that your age is not signficant compared to, seniority at the firm and your education, and doesnt matter how old you are as that does not affect your salary.

  1. Generate an appropriate coded scatterplot to examine a potential age-by-experience interaction. How would you describe the nature of this interaction? So this is a scatterplot showing the correlation of age and experience the majority of the data shows that the younger you are the less experience you have but not every point is like that.
plot(age~exper,data=banksalary)

Exercises

  1. (these are the remaining parts from the banksalary exercise above)
  1. A potential final model (Model 3) would contain terms for seniority, education, and experience in addition to sex. Does this model meet all regression assumptions? State a 95% confidence interval for sex and interpret this interval carefully in the context of the problem. part 1 the model Normal, Linear, assuming everything is independent and Homoscedascity matches as the residuals are linear and spread.

part 2 the CI For male at a 95% interval : 5710-6210 Mean salary for men should be around that range FOr female at a 95% interval : 5000-5280 mean salary for women should be around that range Safe assumption that Men make more than women salary in this bank

library(rcompanion)

mod.3 = lm(bsal ~ senior + educ + exper+sex,data = banksalary)
banksalary
##    bsal     sal77    sex senior age educ  exper
## 1  5040 12420.000   MALE     96 329   15  14.00
## 2  6300 12060.000   MALE     82 357   15  72.00
## 3  6000 15120.000   MALE     67 315   15  35.50
## 4  6000 16320.001   MALE     97 354   12  24.00
## 5  6000 12300.000   MALE     66 351   12  56.00
## 6  6840 10380.000   MALE     92 374   15  41.50
## 7  8100 13979.999   MALE     66 369   16  54.50
## 8  6000 10140.000   MALE     82 363   12  32.00
## 9  6000 12360.000   MALE     88 555   12 252.00
## 10 6900 10920.000   MALE     75 416   15 132.00
## 11 6900 10920.000   MALE     89 481   12 175.00
## 12 5400 12660.001   MALE     91 331   15  17.50
## 13 6000 12960.000   MALE     66 355   15  64.00
## 14 6000 12360.000   MALE     86 348   15  25.00
## 15 5100  8940.000 FEMALE     95 640   15 165.00
## 16 4800  8580.000 FEMALE     98 774   12 381.00
## 17 5280  8760.000 FEMALE     98 557    8 190.00
## 18 5280  8040.000 FEMALE     88 745    8  90.00
## 19 4800  9000.000 FEMALE     77 505   12  63.00
## 20 4800  8820.000 FEMALE     76 482   12   6.00
## 21 5400 13320.000 FEMALE     86 329   15  24.00
## 22 5520  9600.000 FEMALE     82 558   12  97.00
## 23 5400  8940.000 FEMALE     88 338   12  26.00
## 24 5700  9000.000 FEMALE     76 667   12  90.00
## 25 3900  8760.000 FEMALE     98 327   12   0.00
## 26 4800  9780.000 FEMALE     75 619   12 144.00
## 27 6120  9360.000 FEMALE     78 624   12 208.50
## 28 5220  7860.000 FEMALE     70 671    8 102.00
## 29 5100  9660.000 FEMALE     66 554    8  96.00
## 30 4380  9600.000 FEMALE     92 305    8   6.25
## 31 4290  9180.001 FEMALE     69 280   12   5.00
## 32 5400  9540.000 FEMALE     66 534   15 122.00
## 33 4380 10380.000 FEMALE     92 305   12   0.00
## 34 5400  8640.000 FEMALE     65 603    8 173.00
## 35 5400 11880.000 FEMALE     66 302   12  26.00
## 36 4500 12540.001 FEMALE     96 366    8  52.00
## 37 5400  8400.000 FEMALE     70 628   12  82.00
## 38 5520  8880.000 FEMALE     67 694   12 196.00
## 39 5640 10080.000 FEMALE     90 368   12  55.00
## 40 4800  9240.000 FEMALE     73 590   12 228.00
## 41 5400  8640.000 FEMALE     66 771    8 228.00
## 42 4500  7980.000 FEMALE     80 298   12   8.00
## 43 5400 11940.000 FEMALE     77 325   12  38.00
## 44 5400  9420.000 FEMALE     72 589   15  49.00
## 45 6300  9780.000 FEMALE     66 394   12  86.50
## 46 5160 10680.001 FEMALE     87 320   12  18.00
## 47 5100 11160.000 FEMALE     98 571   15 115.00
## 48 4800  8340.000 FEMALE     79 602    8  70.00
## 49 5400  9600.000 FEMALE     98 568   12 244.00
## 50 4020  9840.000 FEMALE     92 528   10  44.00
## 51 4980  8700.000 FEMALE     74 718    8 318.00
## 52 5280  9780.000 FEMALE     88 653   12 107.00
## 53 5700  8280.000 FEMALE     65 714   15 241.00
## 54 4800  8340.000 FEMALE     87 647   12 163.00
## 55 4800 13560.000 FEMALE     82 338   12  11.00
## 56 5700 10260.000 FEMALE     82 362   15  51.00
## 57 4380  9720.000 FEMALE     93 303   12   4.50
## 58 4380 10500.001 FEMALE     89 310   12   0.00
## 59 5400 10680.001   MALE     88 359   12  38.00
## 60 5400 11640.000   MALE     96 474   12 113.00
## 61 5100  7860.000   MALE     84 535   12 180.00
## 62 6600 11220.000   MALE     66 369   15  84.00
## 63 5100  8700.000   MALE     97 637   12 315.00
## 64 6600 12240.001   MALE     83 536   15 215.50
## 65 5700 11220.000   MALE     94 392   15  36.00
## 66 6000 12180.000   MALE     91 364   12  49.00
## 67 6000 11580.000   MALE     83 521   15 108.00
## 68 6000  8940.000   MALE     80 686   12 272.00
## 69 6000 10680.001   MALE     87 364   15  56.00
## 70 4620 11100.000   MALE     77 293   12  11.50
## 71 5220 10080.000   MALE     85 344   12  29.00
## 72 6600 15360.001   MALE     83 340   15  64.00
## 73 5400 12600.000   MALE     78 305   12   7.00
## 74 6000  8940.000   MALE     78 659    8 320.00
## 75 5400  9480.000   MALE     88 690   15 359.00
## 76 6000 14400.000   MALE     96 402   16  45.50
## 77 5700 10620.000 FEMALE     88 410   15  61.00
## 78 5400 10320.000 FEMALE     78 584   15  51.00
## 79 4440  9600.000 FEMALE     97 341   15  75.00
## 80 6300 10860.001 FEMALE     84 662   15 231.00
## 81 6000  9720.000 FEMALE     69 488   12 121.00
## 82 5100  9600.000 FEMALE     85 406   12  59.00
## 83 4800 11100.000 FEMALE     87 349   12  11.00
## 84 5100 10020.001 FEMALE     87 508   16 123.00
## 85 5700  9780.000 FEMALE     74 542   12 116.50
## 86 5400 10440.000 FEMALE     72 604   12 169.00
## 87 5100 10560.000 FEMALE     84 458   12  36.00
## 88 4800  9240.000 FEMALE     84 571   16 214.00
## 89 6000 11940.000 FEMALE     86 486   15  78.50
## 90 4380 10020.001 FEMALE     93 313    8   7.50
## 91 5580  7860.000 FEMALE     69 600   12 132.50
## 92 4620  9420.000 FEMALE     96 385   12  52.00
## 93 5220  8340.000 FEMALE     70 468   12 127.00
#t.test(banksalary$sex, conf.level=0.95)
#attach(banksalary)
summary(mod.3)
## 
## Call:
## lm(formula = bsal ~ senior + educ + exper + sex, data = banksalary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1238.9  -353.1   -16.6   280.0  1568.8 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5845.7085   526.3677  11.106  < 2e-16 ***
## senior       -23.4276     5.2001  -4.505 2.03e-05 ***
## educ          90.0197    24.6932   3.646 0.000451 ***
## exper          1.2679     0.5871   2.160 0.033527 *  
## sexMALE      722.3031   117.8246   6.130 2.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 507.4 on 88 degrees of freedom
## Multiple R-squared:  0.5109, Adjusted R-squared:  0.4887 
## F-statistic: 22.98 on 4 and 88 DF,  p-value: 5.068e-13
plot(mod.3)

  1. Based on Model 3, what conclusions can be drawn about gender discrimination at Harris Trust? Do these conclusions have to be qualified at all, or are they pretty clear cut? Experience should be the only one that needs to be qualified as it is the only one without a significant P value. This model still is not good, it does have a very good P and F value but R squared is still only about 48% so i would still reject this model.

  2. Often salary data is logged before analysis. Would you recommend logging starting salary in this study? Support your decision analytically. Yes i think logging starting salary would be beneficial as we can look at data at differnce between starting and beginning, I think a factor like that would answer and see how experience matters so much more. Also it could take away problems like that did not get taken in consideration, for example Harris Trust only hired more women the past 5 years then the previous 5 years, showing a lower beginning salary as the majority of their new workers are women.

  3. Regardless of your answer to the previous question, provide an interpretation for the coefficient for the male coefficient in a modified Model 3 after logging starting salary. The coefficient Male X 2.42 e-13 is a good value of P So it correlates well with the beginning salary, it also has a good T value.

mod.3adj = lm(bsal ~ senior + educ + exper+sex,data = banksalary)
anova(mod.3adj)
## Analysis of Variance Table
## 
## Response: bsal
##           Df   Sum Sq Mean Sq F value    Pr(>F)    
## senior     1  3784915 3784915  14.700 0.0002363 ***
## educ       1  8559662 8559662  33.245 1.184e-07 ***
## exper      1  1645247 1645247   6.390 0.0132612 *  
## sex        1  9675997 9675997  37.581 2.423e-08 ***
## Residuals 88 22657469  257471                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod.3adj)
## 
## Call:
## lm(formula = bsal ~ senior + educ + exper + sex, data = banksalary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1238.9  -353.1   -16.6   280.0  1568.8 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5845.7085   526.3677  11.106  < 2e-16 ***
## senior       -23.4276     5.2001  -4.505 2.03e-05 ***
## educ          90.0197    24.6932   3.646 0.000451 ***
## exper          1.2679     0.5871   2.160 0.033527 *  
## sexMALE      722.3031   117.8246   6.130 2.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 507.4 on 88 degrees of freedom
## Multiple R-squared:  0.5109, Adjusted R-squared:  0.4887 
## F-statistic: 22.98 on 4 and 88 DF,  p-value: 5.068e-13
  1. Build your own final model for this study and justify the selection of your final model. You might consider interactions with gender, since those terms could show that discrimination is stronger among certain workers. Based on your final model, do you find evidence of gender discrimination at Harris Trust?

There is slight evidence that gender discrimination is at Harris Trust, but I wouldnt be confident enough in saying there is discrimination without more responses to take care of and track. Other factors need to be stated and observed to make it clearer.

Final.mod = lm(bsal~senior+ educ +age + sex, data = banksalary)
summary(Final.mod)
## 
## Call:
## lm(formula = bsal ~ senior + educ + age + sex, data = banksalary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1223.68  -349.81   -49.44   304.94  1571.02 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5391.3940   599.3038   8.996 4.18e-14 ***
## senior       -22.3036     5.2398  -4.257 5.17e-05 ***
## educ          92.6090    24.7457   3.742 0.000324 ***
## age            0.9149     0.3997   2.289 0.024481 *  
## sexMALE      790.2871   119.5081   6.613 2.82e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 505.8 on 88 degrees of freedom
## Multiple R-squared:  0.5139, Adjusted R-squared:  0.4918 
## F-statistic: 23.26 on 4 and 88 DF,  p-value: 3.881e-13
  1. The parts below are taken from the BYSH authors’ guided exercise #2 on Sitting and MTL Thickness
## sitting.csv

sittingdata = read.csv(file='https://raw.githubusercontent.com/proback/BeyondMLR/master/data/sitting.csv', header=T)
summary(sittingdata)
##       MTL           sitting          MET            age        sex   
##  Min.   :2.221   Min.   : 2.0   Min.   :  99   Min.   :46.00   F:25  
##  1st Qu.:2.403   1st Qu.: 5.0   1st Qu.: 693   1st Qu.:54.00   M:10  
##  Median :2.541   Median : 7.0   Median :1040   Median :63.00         
##  Mean   :2.535   Mean   : 7.2   Mean   :1521   Mean   :60.37         
##  3rd Qu.:2.658   3rd Qu.: 9.5   3rd Qu.:2218   3rd Qu.:66.00         
##  Max.   :3.054   Max.   :15.0   Max.   :5112   Max.   :75.00         
##    education    
##  Min.   :12.00  
##  1st Qu.:14.00  
##  Median :16.00  
##  Mean   :16.37  
##  3rd Qu.:18.00  
##  Max.   :23.00
list(sittingdata)
## [[1]]
##        MTL sitting    MET age sex education
## 1  2.67061      10  777.0  66   M        14
## 2  2.56716      11 1039.8  71   M        20
## 3  2.69771       5  795.0  66   F        14
## 4  2.34203       7 2400.0  63   F        14
## 5  2.51524       3 2358.0  71   F        18
## 6  2.58794       5  693.0  71   F        18
## 7  2.90495       4  495.0  66   F        14
## 8  2.40919       6 1645.8  65   F        18
## 9  2.31250       7  396.0  51   F        18
## 10 2.42224      10  742.8  53   F        14
## 11 2.37321      15 2736.0  65   M        23
## 12 2.58554       2 4377.0  66   M        12
## 13 2.55969       6   99.0  75   M        18
## 14 2.22053       9 1713.0  55   F        18
## 15 2.57698       7  693.0  48   F        13
## 16 2.63438       8  727.8  52   F        14
## 17 2.77321      10 1971.0  54   F        16
## 18 2.72016       3 2874.0  46   F        18
## 19 2.22710       6 1639.8  54   F        20
## 20 2.71841       3 2430.0  49   F        16
## 21 2.77133       5  594.0  59   F        18
## 22 3.05417       5 2079.0  65   F        14
## 23 2.56617      12  693.0  63   M        18
## 24 2.64451       3  777.0  66   F        14
## 25 2.44100       9 1230.0  57   M        18
## 26 2.54093       5  684.0  69   F        14
## 27 2.39711      12 1392.0  57   F        15
## 28 2.43301       4  643.8  70   F        18
## 29 2.72078       8 5112.0  65   M        20
## 30 2.44821       8 1150.2  60   F        14
## 31 2.53537       3  132.0  63   M        16
## 32 2.33931       8 3000.0  47   M        14
## 33 2.23054      12  852.0  46   F        18
## 34 2.41255      13  360.0  62   F        16
## 35 2.36269       8 3942.0  57   F        16
  1. In their article’s introduction, Siddarth et al. (2018) differentiate their analysis on sedentary behavior from analysis on active behavior by citing evidence supporting the claim that “one can be highly active yet still be sedentary for most of the day.” Fit your own linear model with MET and sitting as your explanatory and response variables, respectively. Using \(R^2\), how much of the subject to subject variability in hours/day spent sitting can be explained by MET minutes per week? Does this support the claim that sedentary behaviors may be independent from physical activity? adjusted R^2 is -.2 percent which is not a good data point, there must be more response variables to make a better argument.
lm(MET~sitting, data = sittingdata)
## 
## Call:
## lm(formula = MET ~ sitting, data = sittingdata)
## 
## Coefficients:
## (Intercept)      sitting  
##     1715.37       -26.96
sit.mod =  lm(MET~sitting, data = sittingdata)
summary(sit.mod)
## 
## Call:
## lm(formula = MET ~ sitting, data = sittingdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1502.5  -872.5  -379.0   624.4  3612.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1715.37     506.43   3.387  0.00184 **
## sitting       -26.96      64.02  -0.421  0.67641   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1241 on 33 degrees of freedom
## Multiple R-squared:  0.005345,   Adjusted R-squared:  -0.0248 
## F-statistic: 0.1773 on 1 and 33 DF,  p-value: 0.6764
  1. In the paper’s section, “Statistical analysis”, the authors report that “Due to the skewed distribution of physical activity levels, we used log-transformed values in all analyses using continuous physical activity measures.” Generate both a histogram of MET values and log–transformed MET values. Do you agree with the paper’s decision to use a log-transformation here? Yes I agree with there assessment of using logMET is so much better as the histogram is so much better evened out and spacious. The regular MET histogram is lefdt skewed and ends to the right pretty quickly all the data is to the left, once taken the log of the MET values the histogram gets alot more evened out.
#hist(MET, main = "Medial Temporal Lobe ")
attach(sittingdata)
  logMet= log(MET)
  hist(logMet)

  1. Fit a preliminary model with MTL as the response and sitting as the sole explanatory variable. Are OLS conditions satisfied? Random sampling, observations being greater than the number of parameters, and regression being linear in parameters It is random sampling Observations are greater than parmeters. So the conditions are all being met.
lm(MTL~sitting, data=sittingdata)
## 
## Call:
## lm(formula = MTL ~ sitting, data = sittingdata)
## 
## Coefficients:
## (Intercept)      sitting  
##     2.69951     -0.02288
MTL.mod = lm(MTL~sitting, data=sittingdata)
summary(MTL.mod)
## 
## Call:
## lm(formula = MTL ~ sitting, data = sittingdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33511 -0.13432 -0.00252  0.11527  0.46907 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.69951    0.07309  36.933   <2e-16 ***
## sitting     -0.02288    0.00924  -2.476   0.0186 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1791 on 33 degrees of freedom
## Multiple R-squared:  0.1567, Adjusted R-squared:  0.1312 
## F-statistic: 6.132 on 1 and 33 DF,  p-value: 0.01857
  1. Expand on your previous model by including a centered version of age as a covariate. Interpret all three coefficients in this model.
lm(MTL~sitting+age, data = sittingdata)
## 
## Call:
## lm(formula = MTL ~ sitting + age, data = sittingdata)
## 
## Coefficients:
## (Intercept)      sitting          age  
##     2.42318     -0.02098      0.00435
MTL.age = lm(MTL~sitting+age, data = sittingdata)
summary(MTL.age)
## 
## Call:
## lm(formula = MTL ~ sitting + age, data = sittingdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3051 -0.1306 -0.0223  0.1328  0.4531 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.423185   0.255039   9.501 7.81e-11 ***
## sitting     -0.020976   0.009355  -2.242    0.032 *  
## age          0.004350   0.003848   1.130    0.267    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1783 on 32 degrees of freedom
## Multiple R-squared:  0.1891, Adjusted R-squared:  0.1384 
## F-statistic: 3.731 on 2 and 32 DF,  p-value: 0.03496

The only P value that is substantial is the MTL both sitting and age have high P values and we would fail to reject the Null in both cases as they are not good indicators The R squared for the overall model is 14% which is very low The P value for the overall model is also high and we would reject it as it is larger than 1%

```

  1. One model fit in Siddarth et al. (2018) includes sitting, log–transformed MET, and age as explanatory variables. They report an estimate \(\widehat{\beta_1} = -0.02\) with confidence interval \((-0.04,-0.002)\) for the coefficient corresponding to sitting, and \(\widehat{\beta_2} = 0.007\) with confidence interval \((-0.07, 0.08)\) for the coefficient corresponding to MET. Verify these intervals and estimates on your own. T test confidence interval code Sitting interval for my code for a 95% confidence interval the numbers did not match up for my lower I got 6.05 and my upper was 8.34
## sitting logMet , age 


library(Rmisc)
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:mosaic':
## 
##     count
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
CI(sittingdata$sitting, ci=0.95)
##    upper     mean    lower 
## 8.341735 7.200000 6.058265
#CI(sittingdata$LogMet, ci=0.95)
  1. Based on your confidence intervals from the previous part, do you support the paper’s claim that “it is possible that sedentary behavior is a more significant predictor of brain structure, specifically MTL thickness [than physical activity]”? Why or why not? No I do not think the confidence intervals above gave me good data to make any assumptions many of the models before me were not very efficent and often lead to errors in the P values and R squared. These confidence intervals had wide margins and they did not match up with the ones given.
  2. A New York Times Article was published discussing Siddarth et al. (2018) with the title “Standing Up at Your Desk Could Make You Smarter” (Friedman, 2018). Do you agree with this headline choice? Why or why not?

Normally I would agree and I would think activity would help out the brain, but the data is showing that it doesnt help, multiple times throughout the experiement the P values were showing there were no significance.