This data set is taken from the BYSH exercise Gender discrimination in bank salaries, and a description of the variables can be found there. Use the following code to read the data into R, or click here to download as a file.
banksalary = read.csv(file='http://personal.psu.edu/~sar320/497/data/banksalary.csv', header=T)
basic descriptive statistics
summary(banksalary)comparing a full versus reduced model
mod.full = lm(bsal~sex+senior+age, data=banksalary)
mod.red = lm(bsal~sex, data=banksalary)
anova(mod.full,mod.red)
mean(bsal~sex, data=banksalary)
## Warning in mean.default(bsal ~ sex, data = banksalary): argument is not
## numeric or logical: returning NA
require(mosaic)
## Loading required package: mosaic
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: lattice
## Loading required package: ggformula
## Loading required package: ggplot2
## Loading required package: ggstance
##
## Attaching package: 'ggstance'
## The following objects are masked from 'package:ggplot2':
##
## geom_errorbarh, GeomErrorbarh
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
## Loading required package: mosaicData
## Loading required package: Matrix
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
##
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
##
## mean
## The following object is masked from 'package:ggplot2':
##
## stat
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median,
## prop.test, quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
favstats(bsal~sex , data=banksalary)observational units : The people Subjects response variable : Beginning Salary explanatory variables : Sex Senior Age b. The mean starting salary of male workers ($5957) was 16% higher than the mean starting salary of female workers ($5139). Confirm these mean salaries. Is this enough evidence to conclude gender discrimination exists? If not, what further evidence would you need?
Not enough information needed, i’d like to see multiple years
I feel like for beginning salary nothing should really be related Education and Experience should be the ones that be related more Age shouldnt have much in my initial thoughts. So according to the initial plots they all together dont have much correlation between them all, the closest one is the more education you have the higher but they are all not closely linked together.
library(corrplot)
## corrplot 0.84 loaded
plot(mod.full)
plot(bsal~age+educ+exper, data =banksalary )
Why might it be important to control for seniority (number of years with the bank) if we are only concerned with the salary when the worker started? It is an important stat to look at as we look to see that, maybe they were at another position and starting salary was for another position they are starting something new so their salary is higher because of their previous experience with the bank. Just recording beginning salary what day they joined with inflation and other things not included
By referring to exploratory plots and summary statistics, are any explanatory variables (including sex) closely related to each other? What implications does this have for modeling? It is hard to compare different response variables with the different summary statistic, the best way to develop a difference would be looking at them all together and looking at the mean and such togethet not all independently.
plot(mod.full)
##favstats(~bsal, data = banksalary)
##favstats(~age, data = banksalary)
##favstats(~educ, data=banksalary)
##favstats(~sex, data=banksalary)
##install.packages(mosaic)
##library(mosaic)
mod.1 = lm(bsal~exper, data=banksalary)
summary(mod.1)
##
## Call:
## lm(formula = bsal ~ exper, data = banksalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1389.02 -503.33 -36.03 383.14 2740.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5289.0217 109.2984 48.391 <2e-16 ***
## exper 1.3009 0.8064 1.613 0.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 703.5 on 91 degrees of freedom
## Multiple R-squared: 0.0278, Adjusted R-squared: 0.01712
## F-statistic: 2.602 on 1 and 91 DF, p-value: 0.1102
anova(mod.1)
## Analysis of Variance Table
##
## Response: bsal
## Df Sum Sq Mean Sq F value Pr(>F)
## exper 1 1287898 1287898 2.6024 0.1102
## Residuals 91 45035392 494894
plot(mod.1)
Yes the model with the 4 variables is much better than the first one, just a few things pop right at you with the P value being substantially lower at 7.4 e^-7 But the R squared value is low about arounf 30% of the data is accounted for which is not good There is a big F value at 10.24 which is substantial so this data isnt the best model but works to an extent.
mod.2 = lm(bsal~senior+ educ+exper+age, data=banksalary)
summary(mod.2)
##
## Call:
## lm(formula = bsal ~ senior + educ + exper + age, data = banksalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1308.38 -378.67 -22.45 364.97 1873.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5879.5837 765.3756 7.682 2.06e-11 ***
## senior -22.4400 6.2469 -3.592 0.00054 ***
## educ 130.1487 28.3551 4.590 1.47e-05 ***
## exper 2.7983 1.1586 2.415 0.01779 *
## age -1.1022 0.7777 -1.417 0.15992
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 599.4 on 88 degrees of freedom
## Multiple R-squared: 0.3176, Adjusted R-squared: 0.2866
## F-statistic: 10.24 on 4 and 88 DF, p-value: 7.474e-07
That a variable on its own can be significant but once it is paired with other terms you can see its affect to the rest of the data. For this model we see that your age is not signficant compared to, seniority at the firm and your education, and doesnt matter how old you are as that does not affect your salary.
plot(age~exper,data=banksalary)
part 2 the CI For male at a 95% interval : 5710-6210 Mean salary for men should be around that range FOr female at a 95% interval : 5000-5280 mean salary for women should be around that range Safe assumption that Men make more than women salary in this bank
library(rcompanion)
mod.3 = lm(bsal ~ senior + educ + exper+sex,data = banksalary)
banksalary
## bsal sal77 sex senior age educ exper
## 1 5040 12420.000 MALE 96 329 15 14.00
## 2 6300 12060.000 MALE 82 357 15 72.00
## 3 6000 15120.000 MALE 67 315 15 35.50
## 4 6000 16320.001 MALE 97 354 12 24.00
## 5 6000 12300.000 MALE 66 351 12 56.00
## 6 6840 10380.000 MALE 92 374 15 41.50
## 7 8100 13979.999 MALE 66 369 16 54.50
## 8 6000 10140.000 MALE 82 363 12 32.00
## 9 6000 12360.000 MALE 88 555 12 252.00
## 10 6900 10920.000 MALE 75 416 15 132.00
## 11 6900 10920.000 MALE 89 481 12 175.00
## 12 5400 12660.001 MALE 91 331 15 17.50
## 13 6000 12960.000 MALE 66 355 15 64.00
## 14 6000 12360.000 MALE 86 348 15 25.00
## 15 5100 8940.000 FEMALE 95 640 15 165.00
## 16 4800 8580.000 FEMALE 98 774 12 381.00
## 17 5280 8760.000 FEMALE 98 557 8 190.00
## 18 5280 8040.000 FEMALE 88 745 8 90.00
## 19 4800 9000.000 FEMALE 77 505 12 63.00
## 20 4800 8820.000 FEMALE 76 482 12 6.00
## 21 5400 13320.000 FEMALE 86 329 15 24.00
## 22 5520 9600.000 FEMALE 82 558 12 97.00
## 23 5400 8940.000 FEMALE 88 338 12 26.00
## 24 5700 9000.000 FEMALE 76 667 12 90.00
## 25 3900 8760.000 FEMALE 98 327 12 0.00
## 26 4800 9780.000 FEMALE 75 619 12 144.00
## 27 6120 9360.000 FEMALE 78 624 12 208.50
## 28 5220 7860.000 FEMALE 70 671 8 102.00
## 29 5100 9660.000 FEMALE 66 554 8 96.00
## 30 4380 9600.000 FEMALE 92 305 8 6.25
## 31 4290 9180.001 FEMALE 69 280 12 5.00
## 32 5400 9540.000 FEMALE 66 534 15 122.00
## 33 4380 10380.000 FEMALE 92 305 12 0.00
## 34 5400 8640.000 FEMALE 65 603 8 173.00
## 35 5400 11880.000 FEMALE 66 302 12 26.00
## 36 4500 12540.001 FEMALE 96 366 8 52.00
## 37 5400 8400.000 FEMALE 70 628 12 82.00
## 38 5520 8880.000 FEMALE 67 694 12 196.00
## 39 5640 10080.000 FEMALE 90 368 12 55.00
## 40 4800 9240.000 FEMALE 73 590 12 228.00
## 41 5400 8640.000 FEMALE 66 771 8 228.00
## 42 4500 7980.000 FEMALE 80 298 12 8.00
## 43 5400 11940.000 FEMALE 77 325 12 38.00
## 44 5400 9420.000 FEMALE 72 589 15 49.00
## 45 6300 9780.000 FEMALE 66 394 12 86.50
## 46 5160 10680.001 FEMALE 87 320 12 18.00
## 47 5100 11160.000 FEMALE 98 571 15 115.00
## 48 4800 8340.000 FEMALE 79 602 8 70.00
## 49 5400 9600.000 FEMALE 98 568 12 244.00
## 50 4020 9840.000 FEMALE 92 528 10 44.00
## 51 4980 8700.000 FEMALE 74 718 8 318.00
## 52 5280 9780.000 FEMALE 88 653 12 107.00
## 53 5700 8280.000 FEMALE 65 714 15 241.00
## 54 4800 8340.000 FEMALE 87 647 12 163.00
## 55 4800 13560.000 FEMALE 82 338 12 11.00
## 56 5700 10260.000 FEMALE 82 362 15 51.00
## 57 4380 9720.000 FEMALE 93 303 12 4.50
## 58 4380 10500.001 FEMALE 89 310 12 0.00
## 59 5400 10680.001 MALE 88 359 12 38.00
## 60 5400 11640.000 MALE 96 474 12 113.00
## 61 5100 7860.000 MALE 84 535 12 180.00
## 62 6600 11220.000 MALE 66 369 15 84.00
## 63 5100 8700.000 MALE 97 637 12 315.00
## 64 6600 12240.001 MALE 83 536 15 215.50
## 65 5700 11220.000 MALE 94 392 15 36.00
## 66 6000 12180.000 MALE 91 364 12 49.00
## 67 6000 11580.000 MALE 83 521 15 108.00
## 68 6000 8940.000 MALE 80 686 12 272.00
## 69 6000 10680.001 MALE 87 364 15 56.00
## 70 4620 11100.000 MALE 77 293 12 11.50
## 71 5220 10080.000 MALE 85 344 12 29.00
## 72 6600 15360.001 MALE 83 340 15 64.00
## 73 5400 12600.000 MALE 78 305 12 7.00
## 74 6000 8940.000 MALE 78 659 8 320.00
## 75 5400 9480.000 MALE 88 690 15 359.00
## 76 6000 14400.000 MALE 96 402 16 45.50
## 77 5700 10620.000 FEMALE 88 410 15 61.00
## 78 5400 10320.000 FEMALE 78 584 15 51.00
## 79 4440 9600.000 FEMALE 97 341 15 75.00
## 80 6300 10860.001 FEMALE 84 662 15 231.00
## 81 6000 9720.000 FEMALE 69 488 12 121.00
## 82 5100 9600.000 FEMALE 85 406 12 59.00
## 83 4800 11100.000 FEMALE 87 349 12 11.00
## 84 5100 10020.001 FEMALE 87 508 16 123.00
## 85 5700 9780.000 FEMALE 74 542 12 116.50
## 86 5400 10440.000 FEMALE 72 604 12 169.00
## 87 5100 10560.000 FEMALE 84 458 12 36.00
## 88 4800 9240.000 FEMALE 84 571 16 214.00
## 89 6000 11940.000 FEMALE 86 486 15 78.50
## 90 4380 10020.001 FEMALE 93 313 8 7.50
## 91 5580 7860.000 FEMALE 69 600 12 132.50
## 92 4620 9420.000 FEMALE 96 385 12 52.00
## 93 5220 8340.000 FEMALE 70 468 12 127.00
#t.test(banksalary$sex, conf.level=0.95)
#attach(banksalary)
summary(mod.3)
##
## Call:
## lm(formula = bsal ~ senior + educ + exper + sex, data = banksalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1238.9 -353.1 -16.6 280.0 1568.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5845.7085 526.3677 11.106 < 2e-16 ***
## senior -23.4276 5.2001 -4.505 2.03e-05 ***
## educ 90.0197 24.6932 3.646 0.000451 ***
## exper 1.2679 0.5871 2.160 0.033527 *
## sexMALE 722.3031 117.8246 6.130 2.42e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 507.4 on 88 degrees of freedom
## Multiple R-squared: 0.5109, Adjusted R-squared: 0.4887
## F-statistic: 22.98 on 4 and 88 DF, p-value: 5.068e-13
plot(mod.3)
Based on Model 3, what conclusions can be drawn about gender discrimination at Harris Trust? Do these conclusions have to be qualified at all, or are they pretty clear cut? Experience should be the only one that needs to be qualified as it is the only one without a significant P value. This model still is not good, it does have a very good P and F value but R squared is still only about 48% so i would still reject this model.
Often salary data is logged before analysis. Would you recommend logging starting salary in this study? Support your decision analytically. Yes i think logging starting salary would be beneficial as we can look at data at differnce between starting and beginning, I think a factor like that would answer and see how experience matters so much more. Also it could take away problems like that did not get taken in consideration, for example Harris Trust only hired more women the past 5 years then the previous 5 years, showing a lower beginning salary as the majority of their new workers are women.
Regardless of your answer to the previous question, provide an interpretation for the coefficient for the male coefficient in a modified Model 3 after logging starting salary. The coefficient Male X 2.42 e-13 is a good value of P So it correlates well with the beginning salary, it also has a good T value.
mod.3adj = lm(bsal ~ senior + educ + exper+sex,data = banksalary)
anova(mod.3adj)
## Analysis of Variance Table
##
## Response: bsal
## Df Sum Sq Mean Sq F value Pr(>F)
## senior 1 3784915 3784915 14.700 0.0002363 ***
## educ 1 8559662 8559662 33.245 1.184e-07 ***
## exper 1 1645247 1645247 6.390 0.0132612 *
## sex 1 9675997 9675997 37.581 2.423e-08 ***
## Residuals 88 22657469 257471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod.3adj)
##
## Call:
## lm(formula = bsal ~ senior + educ + exper + sex, data = banksalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1238.9 -353.1 -16.6 280.0 1568.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5845.7085 526.3677 11.106 < 2e-16 ***
## senior -23.4276 5.2001 -4.505 2.03e-05 ***
## educ 90.0197 24.6932 3.646 0.000451 ***
## exper 1.2679 0.5871 2.160 0.033527 *
## sexMALE 722.3031 117.8246 6.130 2.42e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 507.4 on 88 degrees of freedom
## Multiple R-squared: 0.5109, Adjusted R-squared: 0.4887
## F-statistic: 22.98 on 4 and 88 DF, p-value: 5.068e-13
There is slight evidence that gender discrimination is at Harris Trust, but I wouldnt be confident enough in saying there is discrimination without more responses to take care of and track. Other factors need to be stated and observed to make it clearer.
Final.mod = lm(bsal~senior+ educ +age + sex, data = banksalary)
summary(Final.mod)
##
## Call:
## lm(formula = bsal ~ senior + educ + age + sex, data = banksalary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1223.68 -349.81 -49.44 304.94 1571.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5391.3940 599.3038 8.996 4.18e-14 ***
## senior -22.3036 5.2398 -4.257 5.17e-05 ***
## educ 92.6090 24.7457 3.742 0.000324 ***
## age 0.9149 0.3997 2.289 0.024481 *
## sexMALE 790.2871 119.5081 6.613 2.82e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 505.8 on 88 degrees of freedom
## Multiple R-squared: 0.5139, Adjusted R-squared: 0.4918
## F-statistic: 23.26 on 4 and 88 DF, p-value: 3.881e-13
## sitting.csv
sittingdata = read.csv(file='https://raw.githubusercontent.com/proback/BeyondMLR/master/data/sitting.csv', header=T)
summary(sittingdata)
## MTL sitting MET age sex
## Min. :2.221 Min. : 2.0 Min. : 99 Min. :46.00 F:25
## 1st Qu.:2.403 1st Qu.: 5.0 1st Qu.: 693 1st Qu.:54.00 M:10
## Median :2.541 Median : 7.0 Median :1040 Median :63.00
## Mean :2.535 Mean : 7.2 Mean :1521 Mean :60.37
## 3rd Qu.:2.658 3rd Qu.: 9.5 3rd Qu.:2218 3rd Qu.:66.00
## Max. :3.054 Max. :15.0 Max. :5112 Max. :75.00
## education
## Min. :12.00
## 1st Qu.:14.00
## Median :16.00
## Mean :16.37
## 3rd Qu.:18.00
## Max. :23.00
list(sittingdata)
## [[1]]
## MTL sitting MET age sex education
## 1 2.67061 10 777.0 66 M 14
## 2 2.56716 11 1039.8 71 M 20
## 3 2.69771 5 795.0 66 F 14
## 4 2.34203 7 2400.0 63 F 14
## 5 2.51524 3 2358.0 71 F 18
## 6 2.58794 5 693.0 71 F 18
## 7 2.90495 4 495.0 66 F 14
## 8 2.40919 6 1645.8 65 F 18
## 9 2.31250 7 396.0 51 F 18
## 10 2.42224 10 742.8 53 F 14
## 11 2.37321 15 2736.0 65 M 23
## 12 2.58554 2 4377.0 66 M 12
## 13 2.55969 6 99.0 75 M 18
## 14 2.22053 9 1713.0 55 F 18
## 15 2.57698 7 693.0 48 F 13
## 16 2.63438 8 727.8 52 F 14
## 17 2.77321 10 1971.0 54 F 16
## 18 2.72016 3 2874.0 46 F 18
## 19 2.22710 6 1639.8 54 F 20
## 20 2.71841 3 2430.0 49 F 16
## 21 2.77133 5 594.0 59 F 18
## 22 3.05417 5 2079.0 65 F 14
## 23 2.56617 12 693.0 63 M 18
## 24 2.64451 3 777.0 66 F 14
## 25 2.44100 9 1230.0 57 M 18
## 26 2.54093 5 684.0 69 F 14
## 27 2.39711 12 1392.0 57 F 15
## 28 2.43301 4 643.8 70 F 18
## 29 2.72078 8 5112.0 65 M 20
## 30 2.44821 8 1150.2 60 F 14
## 31 2.53537 3 132.0 63 M 16
## 32 2.33931 8 3000.0 47 M 14
## 33 2.23054 12 852.0 46 F 18
## 34 2.41255 13 360.0 62 F 16
## 35 2.36269 8 3942.0 57 F 16
MET and sitting as your explanatory and response variables, respectively. Using \(R^2\), how much of the subject to subject variability in hours/day spent sitting can be explained by MET minutes per week? Does this support the claim that sedentary behaviors may be independent from physical activity? adjusted R^2 is -.2 percent which is not a good data point, there must be more response variables to make a better argument.lm(MET~sitting, data = sittingdata)
##
## Call:
## lm(formula = MET ~ sitting, data = sittingdata)
##
## Coefficients:
## (Intercept) sitting
## 1715.37 -26.96
sit.mod = lm(MET~sitting, data = sittingdata)
summary(sit.mod)
##
## Call:
## lm(formula = MET ~ sitting, data = sittingdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1502.5 -872.5 -379.0 624.4 3612.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1715.37 506.43 3.387 0.00184 **
## sitting -26.96 64.02 -0.421 0.67641
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1241 on 33 degrees of freedom
## Multiple R-squared: 0.005345, Adjusted R-squared: -0.0248
## F-statistic: 0.1773 on 1 and 33 DF, p-value: 0.6764
MET values and log–transformed MET values. Do you agree with the paper’s decision to use a log-transformation here? Yes I agree with there assessment of using logMET is so much better as the histogram is so much better evened out and spacious. The regular MET histogram is lefdt skewed and ends to the right pretty quickly all the data is to the left, once taken the log of the MET values the histogram gets alot more evened out.#hist(MET, main = "Medial Temporal Lobe ")
attach(sittingdata)
logMet= log(MET)
hist(logMet)
MTL as the response and sitting as the sole explanatory variable. Are OLS conditions satisfied? Random sampling, observations being greater than the number of parameters, and regression being linear in parameters It is random sampling Observations are greater than parmeters. So the conditions are all being met.lm(MTL~sitting, data=sittingdata)
##
## Call:
## lm(formula = MTL ~ sitting, data = sittingdata)
##
## Coefficients:
## (Intercept) sitting
## 2.69951 -0.02288
MTL.mod = lm(MTL~sitting, data=sittingdata)
summary(MTL.mod)
##
## Call:
## lm(formula = MTL ~ sitting, data = sittingdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33511 -0.13432 -0.00252 0.11527 0.46907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.69951 0.07309 36.933 <2e-16 ***
## sitting -0.02288 0.00924 -2.476 0.0186 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1791 on 33 degrees of freedom
## Multiple R-squared: 0.1567, Adjusted R-squared: 0.1312
## F-statistic: 6.132 on 1 and 33 DF, p-value: 0.01857
age as a covariate. Interpret all three coefficients in this model.lm(MTL~sitting+age, data = sittingdata)
##
## Call:
## lm(formula = MTL ~ sitting + age, data = sittingdata)
##
## Coefficients:
## (Intercept) sitting age
## 2.42318 -0.02098 0.00435
MTL.age = lm(MTL~sitting+age, data = sittingdata)
summary(MTL.age)
##
## Call:
## lm(formula = MTL ~ sitting + age, data = sittingdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3051 -0.1306 -0.0223 0.1328 0.4531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.423185 0.255039 9.501 7.81e-11 ***
## sitting -0.020976 0.009355 -2.242 0.032 *
## age 0.004350 0.003848 1.130 0.267
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1783 on 32 degrees of freedom
## Multiple R-squared: 0.1891, Adjusted R-squared: 0.1384
## F-statistic: 3.731 on 2 and 32 DF, p-value: 0.03496
The only P value that is substantial is the MTL both sitting and age have high P values and we would fail to reject the Null in both cases as they are not good indicators The R squared for the overall model is 14% which is very low The P value for the overall model is also high and we would reject it as it is larger than 1%
```
sitting, log–transformed MET, and age as explanatory variables. They report an estimate \(\widehat{\beta_1} = -0.02\) with confidence interval \((-0.04,-0.002)\) for the coefficient corresponding to sitting, and \(\widehat{\beta_2} = 0.007\) with confidence interval \((-0.07, 0.08)\) for the coefficient corresponding to MET. Verify these intervals and estimates on your own. T test confidence interval code Sitting interval for my code for a 95% confidence interval the numbers did not match up for my lower I got 6.05 and my upper was 8.34## sitting logMet , age
library(Rmisc)
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following object is masked from 'package:mosaic':
##
## count
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
CI(sittingdata$sitting, ci=0.95)
## upper mean lower
## 8.341735 7.200000 6.058265
#CI(sittingdata$LogMet, ci=0.95)
Normally I would agree and I would think activity would help out the brain, but the data is showing that it doesnt help, multiple times throughout the experiement the P values were showing there were no significance.