Instructions

The first part of this is intended to take no more than 90 minutes to complete. For each question, show the R code using this markdown file, and provide a written description of what you did and what you found. You may consult any resources (your notes, previous problem sets, books, websites, etc.), but no other people during this part of the test.

Data

The test will use the following data:

data <- read.table("exam2.csv",header=T)

Part A

1. Test of mean differences (5 pts)

Use appropriate t-tests AND Bayes factor tests to answer the following questions: * Is the mean of x1 different from 0? * Is the mean of x2 different from 0? * Is the mean of x2 different from x1, assuming they are independently groups? * Is the mean of x2 different from x1, assuming they are two measures of the same group of people (e.g., before and after a medicine). * Do men have a different x2 score than women?

library(car)

## Warning: package 'car' was built under R version 4.0.2

## Loading required package: carData

par(mfrow = c(1,2))

qqPlot(data$x1,main = 'x1 Checking for Normality')

## [1] 39 30

qqPlot(data$x2,main = 'x2 Checking for Normality')

## [1] 39 34

library(BayesFactor)

## Warning: package 'BayesFactor' was built under R version 4.0.2

## Loading required package: coda

## Warning: package 'coda' was built under R version 4.0.2

## Loading required package: Matrix

## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************

#1)
t.test(data$x1, mu = 0, alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  data$x1
## t = 0.90552, df = 49, p-value = 0.3696
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.2818193  0.7441024
## sample estimates:
## mean of x 
## 0.2311416

#the mean of the variable x1 is not statistically different (t = 0.90552, p = 0.3696) from 0

ttestBF(data$x1, mu = 0) # BF = 0.226 which is a substantial evidence for the null hypothesis that the mean is not different from 0

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 0.226535 ±0%
## 
## Against denominator:
##   Null, mu = 0 
## ---
## Bayes factor type: BFoneSample, JZS

#2)
t.test(data$x2, mu = 0, alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  data$x2
## t = 2.6864, df = 49, p-value = 0.009833
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.1723599 1.1958402
## sample estimates:
## mean of x 
##    0.6841

#the mean of the variable x2 is statistically different (t = 2.6864, p = 0.009833) from 0

ttestBF(data$x2, mu = 0) # BF = 3.811 which is a substantial evidence for the alternative hypothesis that the mean is different from 0

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 3.8119 ±0%
## 
## Against denominator:
##   Null, mu = 0 
## ---
## Bayes factor type: BFoneSample, JZS

#3)
t.test(data$x1,data$x2, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  data$x1 and data$x2
## t = -1.2563, df = 97.999, p-value = 0.212
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.1684792  0.2625623
## sample estimates:
## mean of x mean of y 
## 0.2311416 0.6841000

#the mean of variable x1 is not statistically different (t = -1.2563,p = 0.212) from the mean of variable x2

ttestBF(data$x1,data$x2) #BF = 0.424 which is a substantial evidence for the null hypothesis that the two means are not statistically different from each other

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 0.4243157 ±0%
## 
## Against denominator:
##   Null, mu1-mu2 = 0 
## ---
## Bayes factor type: BFindepSample, JZS

#4)

t.test(data$x1,data$x2, alternative = "two.sided", paired = T)

## 
##  Paired t-test
## 
## data:  data$x1 and data$x2
## t = -5.4564, df = 49, p-value = 1.591e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6197826 -0.2861342
## sample estimates:
## mean of the differences 
##              -0.4529584

#the mean of variable x1 is statistically different (less in this case) than the mean of variable x2 (t = -5.4564, p = 1.591e-06) in a pairwise comparison

ttestBF(data$x1,data$x2,paired = T) #BF = 10680.79 which is a decisive evidence that the two means are statistically different when they are paired

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 10608.79 ±0%
## 
## Against denominator:
##   Null, mu = 0 
## ---
## Bayes factor type: BFoneSample, JZS

#5)
# table(data$x2, data$gender)

women = data$x2[data$gender == 'F']
men = data$x2[data$gender == 'M']

t.test(men,women,alternative = "two.sided")# t = 0.675 p = 0.5, no statistical difference of the mean between the two groups

## 
##  Welch Two Sample t-test
## 
## data:  men and women
## t = 0.67523, df = 40.114, p-value = 0.5034
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7076365  1.4177938
## sample estimates:
## mean of x mean of y 
## 0.8758425 0.5207639

ttestBF(men,women,alternative = "two.sided")#BF = 0.344 which is a substantial evidence in favor of the null hypothesis that the mean of the two variable are not statistically difference, hence no difference of x2 with respect to gender

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 0.3447744 ±0.02%
## 
## Against denominator:
##   Null, mu1-mu2 = 0 
## ---
## Bayes factor type: BFindepSample, JZS

# length(data$x2[data$gender == 'M'])
# length(data$x2[data$gender == 'F'])

2. Test of categorical differences ( 2 pts)

Form tables showing the relationship between (1) gender and favorite TV show, and (2) gender and favorite super hero character. Use a chi-squared test and the corresponding Bayes factor test to determine whether either TV show or Super Hero depend on gender. Briefly describe your conclusions.

library(BayesFactor)
tv_gender = table(data$tvshow,data$gender)
chisq.test(tv_gender)

## Warning in chisq.test(tv_gender): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  tv_gender
## X-squared = 21.08, df = 3, p-value = 0.0001013

contingencyTableBF(tv_gender, sampleType="indepMulti",fixedMargin = 'cols')

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 2095.778 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, independent multinomial

#both the bayesian factor = 2095.778 ±0% and the Chi-Square (X-squared = 21.08, df = 3, p-value = 0.0001013) are highly in favor of the alternative hypothesis that there are significantly different distributions in the table

hero_gender = table(data$hero,data$gender)
chisq.test(hero_gender)

## Warning in chisq.test(hero_gender): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  hero_gender
## X-squared = 2.4247, df = 3, p-value = 0.4891

contingencyTableBF(hero_gender, sampleType="indepMulti",fixedMargin = 'cols')

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 0.2835535 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, independent multinomial

#on the other hand, there are no real gender-related preferences for heros (X-squared = 2.4, p = 0.48, BF = 0.28)

3. Test of correlation (3 pts)

Is x1 significantly correlated with x2, x3, or x4 (test using pearson’s correlation). For which pair should you use a non-parametric test? Tell why, and how you know? For the pair you identify, test this correlation with both a parametric and non-parametric test.

library(car)
par(mfrow = c(2,2))

qqPlot(data$x1,main = 'x1 Checking for Normality')

## [1] 39 30

qqPlot(data$x2,main = 'x2 Checking for Normality')

## [1] 39 34

qqPlot(data$x3,main = 'x3 Checking for Normality')

## [1] 50  1

qqPlot(data$x4,main = 'x4 Checking for Normality')

## [1] 49 50

# variable x4 is not normally distributed

#1)
cor.test(data$x1,data$x2, method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  data$x1 and data$x2
## t = 20.423, df = 48, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9079842 0.9697273
## sample estimates:
##       cor 
## 0.9469933

# t = 20.423 p < 2.2e-16 r = 0.946 which means that there is a decisive evidence of positive correlation between the two variables

#2)
cor.test(data$x1,data$x3, method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  data$x1 and data$x3
## t = -0.20588, df = 48, p-value = 0.8378
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3055253  0.2507170
## sample estimates:
##        cor 
## -0.0297036

# t = -0.205 p = 0.837 r = -0.029 which means that there is no evidence of any correlation between the two variables

#3)
cor.test(data$x1,data$x4, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  data$x1 and data$x4
## S = 16028, p-value = 0.1075
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.2303481

# Use of Spearman rank order test since normality assumption was rejected.
#S = 16028 p = 0.107 rho = 0.23 it appear that there is only a small positive correlation between the two variables but this is not statistically significant

cor.test(data$x1,data$x4, method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  data$x1 and data$x4
## t = -0.45013, df = 48, p-value = 0.6546
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3370982  0.2174378
## sample estimates:
##         cor 
## -0.06483387

# t  = -0.45 p = 0.654 r = -0.064 similarly as between x1 and x3 there is no evidence of any correlation between the two variables

4. Fitting a least-squares regression model (1 pts)

Create a linear regression model predicting x1 by x2,x3, x4, and gender. Show the fitted coefficients values

linear_model = lm(x1~x2+x3+x4+gender,data=data)
summary(linear_model)

## 
## Call:
## lm(formula = x1 ~ x2 + x3 + x4 + gender, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.40989 -0.36441  0.05034  0.33158  1.51932 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.4549277  0.1288637  -3.530 0.000971 ***
## x2           0.9467475  0.0480819  19.690  < 2e-16 ***
## x3          -0.0185110  0.0582653  -0.318 0.752180    
## x4           0.0005956  0.0036616   0.163 0.871516    
## genderM      0.0999731  0.1772316   0.564 0.575500    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6025 on 45 degrees of freedom
## Multiple R-squared:  0.8977, Adjusted R-squared:  0.8886 
## F-statistic:  98.7 on 4 and 45 DF,  p-value: < 2.2e-16

linear_model$fitted.values

##            1            2            3            4            5            6 
## -1.434838285 -0.521969178 -0.103248514 -2.952759980 -0.685019721 -0.052594289 
##            7            8            9           10           11           12 
##  3.030823343 -0.557876513  0.468535961 -1.968465278  0.918351255  2.233391935 
##           13           14           15           16           17           18 
## -0.913285685 -0.635925111  1.694591827  0.578167295  0.121493713 -2.221428186 
##           19           20           21           22           23           24 
## -0.003952669  1.719513379  1.000730311  0.923908646  1.256107774  1.790989500 
##           25           26           27           28           29           30 
## -2.731278965 -2.368939608 -0.103093533  2.834613953  1.797034682 -2.940312941 
##           31           32           33           34           35           36 
##  0.532178326  0.497921027  1.954198646  3.474207520  0.657715295  1.704091361 
##           37           38           39           40           41           42 
## -1.948223432  0.172777124  4.639726421 -0.922112494 -1.287324768  0.944273758 
##           43           44           45           46           47           48 
##  1.306127312  2.401492427 -0.074808561 -0.493253478 -0.236536814  0.488051195 
##           49           50 
##  0.121623029 -2.548309311

5. Testing linear predictors (2 pts)

For the model you created, which predictors are significantly different from 0, other than the intercept term? Write a sentence for each predictor describing whether the t-test associated with the predictor shows it is different from 0. Include the t value, degrees of freedom, and the resulting p-value.

#x2 is the only predictor whose coefficient is statistically different from 0 p < 2e-16
#predictor x2 when compared to x1 is not statistically different (t = -1.25 p = 0.212 df = 97.99) 
#predictor x3 when compared to x1 is not statistically different (t = -1.49 p = 0.138 df = 97.93) 
#predictor x4 when compared to x1 is decisevely statistically different (W = 475 p = 9.332e-08)
#gender predictor when compared to x1 not statistically different (W = 263 p = 0.363)

t.test(data$x1,data$x2, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  data$x1 and data$x2
## t = -1.2563, df = 97.999, p-value = 0.212
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.1684792  0.2625623
## sample estimates:
## mean of x mean of y 
## 0.2311416 0.6841000

t.test(data$x1,data$x3, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  data$x1 and data$x3
## t = -1.4947, df = 97.93, p-value = 0.1382
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2730931  0.1792065
## sample estimates:
## mean of x mean of y 
## 0.2311416 0.7780849

wilcox.test(data$x1,data$x4)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$x1 and data$x4
## W = 475, p-value = 9.332e-08
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(data$x1~data$gender)

## 
##  Wilcoxon rank sum exact test
## 
## data:  data$x1 by data$gender
## W = 263, p-value = 0.3635
## alternative hypothesis: true location shift is not equal to 0

6. Interpreting coefficients of a model (2 pts)

For the significant predictors in the model, describe in words how you would interpret the coefficient. In addition, describe the coefficient for gender, and discuss specifically what that coefficient represents.

#the only significant predictor is the variable x2
# The linear model predicts x1 by the equation --> 
#x1 = -0.45 + 0.946 * x2 - 0.018 * x3 + 0.0005 * x4 + 0.0999 * gender (where male = 1 and female = 0)
# let's say that x3, x4, and gender happen to be 0, thus x1 = -0.45 + 0.946 * x2. When x2 is 1 x1 = -0.45+0.946*1 = 0.496 when x1 increases is 1 unit x1 = -0.45+0.946*2 = 1.442 when x2 increases by 2 units x1 = -0.45+0.946*3 = 2.388 and so on. Hence, for any n unit increase in x2, x1 increases by 0.946 * n which is a substantial increase (being 0.946 very close to 1) compared to the other variables that can only "explain" increases of 0.0005 and similar small changes. The gender coefficient means that if the variable appear to be 1 or Male than x1 increases by 0.0999 otherwise it doesn't change 0.0999 * Female = 0.0999 * 0 = 0

7. Variable selection/Model selection using AIC (3 pts)

Use an AIC criterion to reduce the model to the simplest, most descriptive model. You may do this with a stepwise regression function or obtain the AIC values in some other way and select the model manually. For this simpler model, examine the multiple R^2 and compare it to the largest model. Discuss how you would interpret the difference between the models in R^2.

library(MASS)

## Warning: package 'MASS' was built under R version 4.0.3

AIC_selected_model = stepAIC(linear_model,direction = 'both')

## Start:  AIC=-45.94
## x1 ~ x2 + x3 + x4 + gender
## 
##          Df Sum of Sq     RSS     AIC
## - x4      1     0.010  16.343 -47.911
## - x3      1     0.037  16.370 -47.828
## - gender  1     0.115  16.449 -47.588
## <none>                 16.333 -45.940
## - x2      1   140.725 157.058  65.230
## 
## Step:  AIC=-47.91
## x1 ~ x2 + x3 + gender
## 
##          Df Sum of Sq     RSS     AIC
## - x3      1     0.028  16.371 -49.827
## - gender  1     0.106  16.449 -49.587
## <none>                 16.343 -47.911
## + x4      1     0.010  16.333 -45.940
## - x2      1   140.855 157.198  63.274
## 
## Step:  AIC=-49.83
## x1 ~ x2 + gender
## 
##          Df Sum of Sq     RSS     AIC
## - gender  1     0.104  16.475 -51.509
## <none>                 16.371 -49.827
## + x3      1     0.028  16.343 -47.911
## + x4      1     0.000  16.370 -47.828
## - x2      1   140.986 157.356  61.325
## 
## Step:  AIC=-51.51
## x1 ~ x2
## 
##          Df Sum of Sq     RSS     AIC
## <none>                 16.475 -51.509
## + gender  1     0.104  16.371 -49.827
## + x3      1     0.026  16.449 -49.587
## + x4      1     0.007  16.468 -49.530
## - x2      1   143.159 159.634  60.043

cat(' rsquared AIC model = ',summary(AIC_selected_model)$r.squared,'\n'
,'adjusted r squared AIC model = ',summary(AIC_selected_model)$adj.r.squared,'\n','rsquared largest model = ',summary(linear_model)$r.squared
,'\n','adjusted r squared largest model = ',summary(linear_model)$adj.r.squared)

##  rsquared AIC model =  0.8967962 
##  adjusted r squared AIC model =  0.8946462 
##  rsquared largest model =  0.8976822 
##  adjusted r squared largest model =  0.8885873

#it is wise to include both multiple and adjusted r square as the second one penalizes the inclusion of more variables. Just by looking at the multiple r square of both models we can clearly see that for the largest model 89.76 % of variation can be successfully predicted by these variables predictors, while for the smallest one, x2 alone can successfully predict 89.67 of variation in x1. The adjust r squared also takes in consideration the number of variables included and smallest model predicts 0.6 % more variation than the largest one. In conclusion, the smallest model, with the most negative AIC score (-51.51) is also the best model of the two in predicting variation of x1 (which is not always the case because more variables tend to have a better variation estimate).

8. Transforms of outcome variable (3 pts)

Make a plot of x4 by x3. Make a model predicting x4 with x3, and examine the residuals. Choose a transform of x4 that makes the residuals appear normally distributed, and create a new model predicting this transformed variable. Compare the R^2 values of these two models and use this to determine whether the transformation was justified.

library(rcompanion)

## Warning: package 'rcompanion' was built under R version 4.0.3

#summary(lm(x4~x3,data = data))
tranformed_x4 = transformTukey(data$x4,plotit=T)

## 
##     lambda      W Shapiro.p.value
## 403   0.05 0.9776          0.4568
## 
## if (lambda >  0){TRANS = x ^ lambda} 
## if (lambda == 0){TRANS = log(x)} 
## if (lambda <  0){TRANS = -1 * x ^ lambda}

summary(lm(tranformed_x4~x3,data = data))$r.squared

## [1] 0.7528011

summary(lm(x4~x3,data = data))$r.squared

## [1] 0.3478208

#the transformation was highly justified since the multiple r squared increased from 0.347 to 0.752

par(mfrow=c(1,2))
plot(data$x3,data$x4,xlab = 'x3', ylab = 'x4')
plot(data$x3,lm(tranformed_x4~x3,data = data)$fitted.values, xlab = 'x3', ylab = 'Transformed x4')

9. Polynomial Regression (3 pts)

Create a polynomial regression predicting x5 based on x1. Use either an AIC, BIC, or F-test to select the order of the polynomial (number of predictors) you want to use to predict the data. Discuss which polynomial model you would select, and why.

library(car)

qqPlot(data$x5,main = 'x5 normality qqplot')

## [1] 39  4

lmodel = lm(x5~x1,data=data)
poly1 = lm(x5~x1 + I(x1^2),data=data)
poly2 = lm(x5~x1 + I(x1^2) + I(x1^3),data=data)
poly3 = lm(x5~x1 + I(x1^2) + I(x1^3) + I(x1^4),data=data)
poly4 = lm(x5~x1 + I(x1^2) + I(x1^3) + I(x1^4) + I(x1^5),data=data)
poly5 = lm(x5~x1 + I(x1^2) + I(x1^3) + I(x1^4) + I(x1^5) +I (x1^6),data=data)

extractAIC(lmodel)

## [1]   2.0000 244.6147

extractAIC(poly1)

## [1]   3.0000 240.5827

extractAIC(poly2)

## [1]  4.00000 74.97184

extractAIC(poly3)

## [1]  5.00000 76.09727

extractAIC(poly4)

## [1]  6.00000 77.97617

extractAIC(poly5)

## [1]  7.00000 78.10213

#I would select the model poly 2 because if the models with the lowest AIC value

10. Prediction (1 pt)

Use the model to predict the value of x5 for x1 values = c(-5,-1,2,4)

predict(poly2,list(x1 = c(-5,-1,2,4)))

##           1           2           3           4 
## -119.459714    3.898447   -8.135844   22.107415

Part B: Data Analysis Project.

You must work alone on this exam. Do not ask for or give assistance to other members of the class. It is due by Monday at Midnight. For this portion of the exam, create a professional-looking report that answers these questions. You can include R code as markdown, but do not include excessive output–provide your answers to a non-technical manager, and back them up with stats/code/results.

Imagine you are consulting with a medical group that is recording a number of physiological sensors on their patients using a mobile device. Your ultimate goal is to create methods for displaying and cleaning data, and representing it with a simpler model that will be uploaded to your company servers and used to track your customers over time.

Unless otherwise specified, demonstrate the following for each of the four data sets separately. Your best approach will be to create a function that works on a single patient, and then apply it to each patient data set separately.

Create a single function that plots ECG.HR, EDR.BR, Belt.BR, CoreTemp, and Temp all on the same time axis. You should be able to overlay all of these on the same plot. Be aware of missing data, and data that are out of a reasonable range (i.e., temp=0), and convert these to NA values. Full credit will only be given for functions that filter out or otherwise handle bad data automatically. Remember, you are creating the code that every patient will use when they uploade their data, so you won’t be there to make any judgments about what is good and what is bad daat.

library(readr)

## Warning: package 'readr' was built under R version 4.0.2

library(unikn)

## Warning: package 'unikn' was built under R version 4.0.3

A2_1_MOPP = read_csv("C:/Users/Andrea/Desktop/PSY5210/R files/A2_1_MOPP.csv")

B1_2_BDU = read_csv("C:/Users/Andrea/Desktop/PSY5210/R files/B1_2_BDU.csv")

C2_2_MOPP = read_csv("C:/Users/Andrea/Desktop/PSY5210/R files/C2_2_MOPP.csv")

D2_2_MOPPnew = read_csv("C:/Users/Andrea/Desktop/PSY5210/R files/D2_2_MOPPnew.csv")

#plots ECG.HR, EDR.BR, Belt.BR, CoreTemp, and Temp all on the same time axis.
#Be aware of missing data, and data that are out of a reasonable range (i.e., temp=0), and convert these to NA values.

best_function = function(data_set){
  
  temperature = data_set[,2][data_set[,2]>20 & !is.na(data_set[,2])]
  HR = data_set[,8][data_set[,8]>0 & !is.na(data_set[,8])]
  BR = data_set[,9][data_set[,9]>0 & !is.na(data_set[,9])]
  BELT_BR = data_set[,10][data_set[,10]>0 & !is.na(data_set[,10])]
  coretemp = data_set[,15][data_set[,15]>0 & !is.na(data_set[,15])]
  
  mean_hr = round(mean(HR),1)
  max_hr = round(max(HR),1)
  mean_temp = round(mean(temperature),1)
  max_temp = round(max(temperature),1)
  median_BR = round(median(BR),1)
  # mean_temp =
  
  plot(1226,ylim = c(0,max(HR)*1.3),xlim = c(0,length(HR)),xlab = 'Time(5s)', ylab = '',main =   deparse(substitute(data_set)))
  points(HR,type = 'l',col='grey')
  lines(filter(HR,c(1,1,1,1)/4),col="blue",lwd=2,type = 'l')
  points(BR,type = 'l',col='brown',lwd=2)
  points(BELT_BR,type = 'l',col='yellow',lwd=2)
  points(coretemp,type = 'l',col='green',lwd=2)
  points(temperature,type = 'l',col='purple',lwd=2)
  legend('top',legend=c('HR','BR','BELT BR',' Core Temp',' Temp'),col=c('blue','yellow','red','green','purple'), lty = 1, cex=0.8,horiz = T,   bty = 0.8)
  segments(-100,seq(0, max(HR)*1.1, by=15),1500,seq(0, max(HR)*1.1, by=15),lty = 3)
  
  uline(labels = c("mean HR =",mean_hr, "max HR = ", max_hr,"mean Temp C° = ",mean_temp,"max Temp C° = ",max_temp,"median BR = ", median_BR) ,x = c(length(HR)/10,length(HR)/6,length(HR)/10,length(HR)/6,length(HR)/2.3,length(HR)/1.85,length(HR)/2.3,length(HR)/1.85,length(HR)/1.2,length(HR)/1.1), y = c(max(HR)*1.12,max(HR)*1.12,max(HR)*1.19,max(HR)*1.19,max(HR)*1.12,max(HR)*1.12,max(HR)*1.19,max(HR)*1.19,max(HR)*1.165,max(HR)*1.165), cex = 0.75,col_bg = c('blue','blue','blue','blue','purple','purple','purple','purple','yellow','yellow'))
  
}
best_function (A2_1_MOPP)

# This plot lets you visualize trends of HR,BR,BELT BR, temperaure and Core Temperature over time. You can appreciate on the top part of the graph some descriptive statistics for these variables. The gridlines helps you understanding the variable value while for the HR I am plotting in grey the raw data while in blue a moving average by taking the average of 4 points in a consecutive manner in order to smooth out the data and make it more visually appealing. It is worth mentioning that this function filters out the NA and other improbable values like zeros or very close to zeros values.

best_function (B1_2_BDU)

# This second plot works fine but the data tends to overlap. This could be due to the fact that the data collection was stopped after at half the usual time needed and then restarted again. This would also explain why we the time variables stops and then restarts. Moreover, by doing so mean heart rate looks like retrieved from a subject who was undergoing heavy aerobic training and peaked at 221 bpm (which is unlikely to happen) and had a median breath rate of 50.8 (I chose the median because it wasn't normally distributed). Lastly, we should have noticed a higher max and mean temperature if this was the case.

best_function (C2_2_MOPP)

best_function (D2_2_MOPPnew)

# This 2 last subjects are very similar even though D2_2_MOPPnew has higher overall values of HR and temperature.

#Overall, plotting all these variables in one plot might not be the wisest choice to do since it's particularly difficult to interpret temperature and core temperature change over time because they look like lines. Furthermore I would possibly suggest to filter out data and noise in the other three variables as authors suggest in literature (if it was for a research lab job or so) 

A2_1_MOPP$real_time = seq(1,length(A2_1_MOPP$Time)*5,by=5)

loess_span_10 = loess(`ECG HR`~real_time, data = A2_1_MOPP, span=0.10)
loess_span_30 = loess(`ECG HR`~real_time, data = A2_1_MOPP, span=0.30)
loess_span_65 = loess(`ECG HR`~real_time, data = A2_1_MOPP, span=0.65)

loess_predict_10 = predict(loess_span_10)
loess_predict_30 = predict(loess_span_30)
loess_predict_65 = predict(loess_span_65)

# first loess graph span = 10%
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.0.2

theme_set(theme_classic())

ggplot(A2_1_MOPP, aes(x=real_time,y=`ECG HR`)) + 
  geom_line() + 
  labs(title="ECG HR with Loess regression line", 
       subtitle="10% smoothing span", 
       caption="Subject: A2_1_MOPP", 
       x='Time(s)',
       y="bpm")+
       geom_smooth(span = 0.1)

# 2nd loess plot with span = 30% and 65%
plot(A2_1_MOPP$real_time,A2_1_MOPP$`ECG HR`,type = 'l', main = 'ECG HR with larger Loess smoothing span                                                                                       Subject: A2_1_MOPP', xlab = 'Time(s)', ylab = 'ECG HR',col = 'darkgrey')
lines(A2_1_MOPP$real_time,loess_predict_30, col="blue",lwd = 3)
lines(A2_1_MOPP$real_time,loess_predict_65, col="green",lwd = 3)
legend('bottomright',legend = c('30% span','65% span'),col = c('blue','green'), lty = 1, lwd = 3)

#In the first plot we see smooth our data and include all major relevant curve and, although a lot of noise has been filtered out, the ECG peaks are not really included and we might need to report them. Thus a smoothing span of 10% seems to be good enough to fit the data but it has some minor drawbacks. On the other hand, the second plot reports a smoothing span of 30% and 65% and here we are really losing data and only accounting for rough ups and downs for when span = 0.3 and almost nothing for span = 0.65.

BR_A2 = c(A2_1_MOPP$`EDR BR`, A2_1_MOPP$`Belt BR`) 
BR_B1 = c(B1_2_BDU$`EDR BR`, B1_2_BDU$`Belt BR`) 

time_predictor_A2 = rep(1:length(A2_1_MOPP$`EDR BR`),2)
time_predictor_B1 = rep(1:length(B1_2_BDU$`EDR BR`),2)

type_A2 = as.factor(rep(c('ecg','belt'),each=length(A2_1_MOPP$`EDR BR`)))
type_B1 = as.factor(rep(c('ecg','belt'),each=length(B1_2_BDU$`EDR BR`)))

df_A2 = data.frame(time = time_predictor_A2,type = type_A2,BRrate = BR_A2)
df_B1 = data.frame(time = time_predictor_B1,type = type_B1,BRrate = BR_B1)

poly1_A2 = lm(BRrate~time+I(time^2)+type,data = df_A2)
poly2_A2 = lm(BRrate~time+I(time^2)+I(time^3)+type,data = df_A2)
poly3_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+type,data = df_A2)
poly4_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+type,data = df_A2)
poly5_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+type,data = df_A2)
poly6_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+type,data = df_A2)
poly7_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+type,data = df_A2)
poly8_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+type,data = df_A2)
poly9_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+type,data = df_A2)
poly10_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+type,data = df_A2)
poly11_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+type,data = df_A2)
poly12_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+type,data = df_A2)
poly13_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+type,data = df_A2)
poly14_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+type,data = df_A2)
poly15_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+I(time^16)+type,data = df_A2)
poly16_A2 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+I(time^16)+I(time^17)+type,data = df_A2)

extractAIC(poly1_A2)

## [1]    4.000 3600.168

extractAIC(poly2_A2)

## [1]    5.000 3597.911

extractAIC(poly3_A2)

## [1]    6.000 3599.907

extractAIC(poly4_A2)

## [1]    7.000 3601.839

extractAIC(poly5_A2)

## [1]    8.000 3468.746

extractAIC(poly6_A2)

## [1]    9.000 3468.921

extractAIC(poly7_A2)

## [1]   10.000 3443.102

extractAIC(poly8_A2)

## [1]   11.000 3354.447

extractAIC(poly9_A2)

## [1]   12.000 3346.762

extractAIC(poly10_A2)

## [1]   13.00 3348.62

extractAIC(poly11_A2)

## [1]   14.000 3335.522

extractAIC(poly12_A2)

## [1]   14.000 3335.522

extractAIC(poly13_A2)

## [1]   15.000 3107.627

extractAIC(poly14_A2)

## [1]   15.000 3107.627

extractAIC(poly15_A2)

## [1]   16.000 3053.123

extractAIC(poly16_A2)

## [1]   16.000 3053.123

poly1_B1 = lm(BRrate~time+I(time^2)+type,data = df_B1)
poly2_B1 = lm(BRrate~time+I(time^2)+I(time^3)+type,data = df_B1)
poly3_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+type,data = df_B1)
poly4_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+type,data = df_B1)
poly5_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+type,data = df_B1)
poly6_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+type,data = df_B1)
poly7_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+type,data = df_B1)
poly8_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+type,data = df_B1)
poly9_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+type,data = df_B1)
poly10_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+type,data = df_B1)
poly11_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+type,data = df_B1)
poly12_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+type,data = df_B1)
poly13_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+type,data = df_B1)
poly14_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+type,data = df_B1)
poly15_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+I(time^16)+type,data = df_B1)
poly16_B1 = lm(BRrate~time+I(time^2)+I(time^3)+I(time^4)+I(time^5)+I(time^6)+I(time^7)+I(time^8)+I(time^9)+I(time^10)+I(time^11)+I(time^12)+I(time^13)+I(time^14)+I(time^15)+I(time^16)+I(time^17)+type,data = df_B1)

extractAIC(poly1_B1)

## [1]    4.000 9137.523

extractAIC(poly2_B1)

## [1]    5.000 9120.276

extractAIC(poly3_B1)

## [1]    6.000 9108.001

extractAIC(poly4_B1)

## [1]    7.000 9109.952

extractAIC(poly5_B1)

## [1]    8.000 9109.804

extractAIC(poly6_B1)

## [1]    9.000 9078.308

extractAIC(poly7_B1)

## [1]   10.000 9038.351

extractAIC(poly8_B1)

## [1]   11.000 9030.223

extractAIC(poly9_B1)

## [1]   12.000 9025.488

extractAIC(poly10_B1)

## [1]   13.000 8996.828

extractAIC(poly11_B1)

## [1]   14.000 8991.383

extractAIC(poly12_B1)

## [1]   14.000 8991.383

extractAIC(poly13_B1)

## [1]   15.000 8993.207

extractAIC(poly14_B1)

## [1]   15.000 8993.207

extractAIC(poly15_B1)

## [1]   16.000 8993.629

extractAIC(poly16_B1)

## [1]   16.000 8993.629

par(mfrow=c(1,2))

plot(x = 1:(length(df_A2$time)/2), y = df_A2$BRrate[0:(length(df_A2$time)/2)], bty = 'n', pch = '', ylab = 'BR', xlab = 'Time(s)',main='BR by ECG and Belt records with 16th order poly')
points(x = 1:(length(df_A2$time)/2),df_A2$BRrate[0:(length(df_A2$time)/2)], col="#697EFF",pch = 17,)
points(x = 1:(length(df_A2$time)/2),df_A2$BRrate[-(1:(length(df_A2$time)/2))], col="#FD6161",pch = 15,)

lines(x = 1:(length(df_A2$time)/2), poly15_A2$fitted.values[0:(length(df_A2$time)/2)],col='blue', lwd=5)
lines(x = 1:(length(df_A2$time)/2), poly15_A2$fitted.values[-(1:(length(df_A2$time)/2))],col='red', lwd=5)
legend('topleft', legend = c('ecg','belt','poly-ecg','poly-belt'), pch = c(17,15,NA,NA), lty = c(NA,NA,1,1),col = c('#697EFF','#FD6161','red','blue'),bty = "n", lwd = 2, cex = 1.2,)
text(535,21,labels = 'Subject:A2')

plot(x = 1:(length(df_B1$time)/2), y = df_B1$BRrate[0:(length(df_B1$time)/2)], bty = 'n', pch = '', ylab = 'BR', xlab = 'Time(s)',main='BR by ECG and Belt records with 12th order poly')
points(x = 1:(length(df_B1$time)/2),df_B1$BRrate[0:(length(df_B1$time)/2)], col="#697EFF",pch = 17,)
points(x = 1:(length(df_B1$time)/2),df_B1$BRrate[-(1:(length(df_B1$time)/2))], col="#FD6161",pch = 15,)

lines(x = 1:(length(df_B1$time)/2), poly11_B1$fitted.values[0:(length(df_B1$time)/2)],col='blue', lwd=5)
lines(x = 1:(length(df_B1$time)/2), poly11_B1$fitted.values[-(1:(length(df_B1$time)/2))],col='red', lwd=5)
#legend('bottom', legend = c('ecg','belt','poly3ecg','poly3belt'), pch = c(17,15,NA,NA), lty = c(NA,NA,1,1),col = c('#697EFF','#FD6161','red','blue'),bty = "n", lwd = 2, cex = 1.2,)
text(1000,6,labels = 'Subject:B1')

summary(poly15_A2)

## 
## Call:
## lm(formula = BRrate ~ time + I(time^2) + I(time^3) + I(time^4) + 
##     I(time^5) + I(time^6) + I(time^7) + I(time^8) + I(time^9) + 
##     I(time^10) + I(time^11) + I(time^12) + I(time^13) + I(time^14) + 
##     I(time^15) + I(time^16) + type, data = df_A2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0131 -2.5297 -0.5846  1.8988 12.0227 
## 
## Coefficients: (2 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.246e+01  1.709e+00   7.292 5.62e-13 ***
## time         3.170e+00  3.537e-01   8.962  < 2e-16 ***
## I(time^2)   -2.591e-01  2.439e-02 -10.626  < 2e-16 ***
## I(time^3)    9.485e-03  8.008e-04  11.844  < 2e-16 ***
## I(time^4)   -1.864e-04  1.494e-05 -12.471  < 2e-16 ***
## I(time^5)    2.210e-06  1.748e-07  12.646  < 2e-16 ***
## I(time^6)   -1.698e-08  1.356e-09 -12.520  < 2e-16 ***
## I(time^7)    8.818e-11  7.224e-12  12.207  < 2e-16 ***
## I(time^8)   -3.157e-13  2.679e-14 -11.781  < 2e-16 ***
## I(time^9)    7.793e-16  6.904e-17  11.288  < 2e-16 ***
## I(time^10)  -1.296e-18  1.205e-19 -10.759  < 2e-16 ***
## I(time^11)   1.356e-21  1.328e-22  10.213  < 2e-16 ***
## I(time^12)  -7.266e-25  7.519e-26  -9.664  < 2e-16 ***
## I(time^13)          NA         NA      NA       NA    
## I(time^14)   1.658e-31  1.932e-32   8.584  < 2e-16 ***
## I(time^15)          NA         NA      NA       NA    
## I(time^16)  -3.747e-38  4.959e-39  -7.556 8.37e-14 ***
## typeecg      1.262e+00  2.096e-01   6.021 2.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.606 on 1168 degrees of freedom
## Multiple R-squared:  0.4579, Adjusted R-squared:  0.451 
## F-statistic: 65.78 on 15 and 1168 DF,  p-value: < 2.2e-16

summary(poly11_B1)

## 
## Call:
## lm(formula = BRrate ~ time + I(time^2) + I(time^3) + I(time^4) + 
##     I(time^5) + I(time^6) + I(time^7) + I(time^8) + I(time^9) + 
##     I(time^10) + I(time^11) + I(time^12) + type, data = df_B1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.7067  -2.9864  -0.2894   3.2038  18.3369 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.819e+00  1.500e+00   1.213 0.225429    
## time         1.069e+00  1.112e-01   9.614  < 2e-16 ***
## I(time^2)   -2.122e-02  2.727e-03  -7.781 1.03e-14 ***
## I(time^3)    2.073e-04  3.153e-05   6.574 5.91e-11 ***
## I(time^4)   -1.172e-06  2.049e-07  -5.721 1.18e-08 ***
## I(time^5)    4.185e-09  8.231e-10   5.084 3.97e-07 ***
## I(time^6)   -9.883e-12  2.158e-12  -4.580 4.88e-06 ***
## I(time^7)    1.580e-14  3.797e-15   4.161 3.27e-05 ***
## I(time^8)   -1.717e-17  4.516e-18  -3.802 0.000147 ***
## I(time^9)    1.249e-20  3.583e-21   3.486 0.000499 ***
## I(time^10)  -5.819e-24  1.816e-24  -3.205 0.001369 ** 
## I(time^11)   1.569e-27  5.314e-28   2.952 0.003188 ** 
## I(time^12)  -1.861e-31  6.834e-32  -2.723 0.006508 ** 
## typeecg      2.887e+01  2.227e-01 129.595  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.666 on 2574 degrees of freedom
## Multiple R-squared:  0.8691, Adjusted R-squared:  0.8684 
## F-statistic:  1314 on 13 and 2574 DF,  p-value: < 2.2e-16

# for subject A2 a 16th order polynomial regression seems to be the best guess as by evaluated by the AIC values. The type variable seem to be a strong predictor of BR p < 2.16e-06. The same is true for the second subject where I adopted a 12th order polynomial regression. However, our first model has a lower multiple r squared (R-squared = 0.1251) compared to our second model (R-squared = 0.8622). This could be due to the fact that the second dataset's BR distribution is greatly differenced between the collection of the 2 different sensors.

library(BayesFactor)
library(Bolstad)

## Warning: package 'Bolstad' was built under R version 4.0.3

linear_model_D2 = lm(BRrate~time+I(time^2)+type,data = df_A2)

SubjectD2 = D2_2_MOPPnew

SubjectD2$Time= (1:length(SubjectD2$Time))
SubjectD2$Time_squared = SubjectD2$Time * SubjectD2$Time
SubjectD2$Time_cubic = SubjectD2$Time * SubjectD2$Time * SubjectD2$Time
SubjectD2$Interaction = SubjectD2$`EDR BR`*SubjectD2$`ECG HR`

# modelBF_D2 = regressionBF(`Belt BR`~ `EDR BR`+ Temp + CoreTemp + `ECG HR` + Motion + `Body Pos` + Time + Time_squared + Time_cubic + Interaction, data = SubjectD2)
# 
# plot(head(modelBF_D2))

# plot(head(modelBF_D2)/max(modelBF_D2)) # comparing it to the other best models

# The best model evaluated by my Bayesian regression model to predict Breath Rate at belt sensor is composed by these following variables `EDR BR`+ Temp + Time + Time_squared + Time_cubic + Interaction. The Interaction between EDR BR and ECG HR seems to be a good predictor along with EDR BR and Time. My model is 1e + 207 times better than the Intercept only model and I prefer using this approach over the other methods because it really takes into account all possible permutations and it's less prone to human error.

best_model = bayes.lm (`Belt BR`~`EDR BR`+ Temp + Time + Time_squared + Time_cubic + Interaction,data = SubjectD2)

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts argument
## ignored

summary(best_model)

## 
## Call:
## bayes.lm(formula = `Belt BR` ~ `EDR BR` + Temp + Time + Time_squared + 
##      Time_cubic + Interaction, data = SubjectD2)
##  
## Residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.1790 -1.1535 -0.1132  0.9804  7.7195 
## 
## Coefficients:
##               Posterior Mean  Std. Error  t value   
## (Intercept)    2.110e+01       3.397e-03   6.213e+03
## `EDR BR`       5.884e-01       1.501e-03   3.920e+02
## Temp          -1.797e-01       1.383e-01  -1.299e+00
## Time           2.453e-02       8.428e-05   2.910e+02
## Time_squared  -4.806e-05       1.572e-10  -3.058e+05
## Time_cubic     2.636e-08       3.638e-17   7.245e+08
## Interaction    7.353e-04       5.690e-08   1.292e+04
## ---
## 
## Note: No prior given (Using flat prior).

plot(SubjectD2$Time, SubjectD2$`Belt BR`, pch = 1, cex = 1.2, col = 'black', main = 'Belt BR and Fitted Values Subject D2', xlab = 'Time(5s)', ylab = 'Belt BR')
points(SubjectD2$Time, SubjectD2$`Belt BR`,col = 'orange',cex = 1, pch = 16)
points(SubjectD2$Time,best_model$fitted.values,col = 'red', pch = 1, cex = 1.2)
points(SubjectD2$Time,best_model$fitted.values,col = 'blue', pch = 16, cex = 1)
legend(660, 32, legend = c('Values','Fitted Values'), pch = c(16,16), col = c('orange','blue'))

lm = lm(`Belt BR`~`EDR BR`+ Temp + Time + Time_squared + Time_cubic + Interaction,data = SubjectD2)
summary(lm)$r.squared

## [1] 0.7377614

#I am confident to predict around 73.7% of the total variability for BELT BR using this model.

Exam 2 (in class portion)

Andrea Serrano

11/08/2020