library(ggplot2)1) Find the regression formula for the points below
Following the example in the textbook, we are looking to fit a quadratic regression model to fit the x&y points below. There are multiple ways to create a quadratic regression model, the way I choose was using the lm() with the formula \[y=\beta_0+\beta_1x+\beta_2x^2\] given in example 19.
In the summary, our quadratic regression model accounts for ~99% of the total variance. The residual error is close to 4%,which is small enough to stay it will be the best fit. Our final quadratic regression formula will be \[Y=-0.505515-2.026159*x+1.006506*x^2\]
#Plotting out the x&Y
x<-c(-0.98,1.00,2.02,3.03,4.00)
y<-c(2.44,-1.51,-0.47,2.54,7.52)
d<-data.frame(x,y)
ggplot(d,aes(x,y))+geom_point()+theme_light()+labs(title="Qaudratic regression")#To translate the formula to account for quadratic regression for the lm function, we can add a new value of x^2 to create the quadratic formula
xsqrd<-x^2
quadLm<-lm(y~x+xsqrd)
summary(quadLm)##
## Call:
## lm(formula = y ~ x + xsqrd)
##
## Residuals:
## 1 2 3 4 5
## -0.00677 0.01517 0.02141 -0.05586 0.02605
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.505515 0.031778 -15.91 0.003928 **
## x -2.026159 0.026478 -76.52 0.000171 ***
## xsqrd 1.006506 0.007948 126.63 6.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04761 on 2 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9998
## F-statistic: 1.088e+04 on 2 and 2 DF, p-value: 9.19e-05
fx<-function(x){
y<-(-0.505515)-2.026159*x+1.006506*x^2
return(y)
}
ggplot(d,aes(x,y))+geom_point()+geom_function(fun=fx,colour="pink")+theme_light()+labs(title="Qaudratic regression with best fit line")2) Create best fit line for the non linear curve
We can use the non linear regression function in to create the non linear regression model. Using the formula given in example 20 \[y=\frac{x}{a+bx^2}\], we can push this equation as part of the explanatory variable. For ease, we will assume the first pair of x&y (1,1) is the initial coefficients.
After the model is created, we predict the y-values of the model and plot the predicted line against the original scatter plot. The resource to understand the nls() function is found here
x<-c(0.10,0.50,1.00,1.50,2.00,2.50)
y<-c(0.10,0.28,0.40,0.40,0.37,0.32)
d<-data.frame(x,y)
#Plotting the original scatter plot
ggplot(d,aes(x,y))+geom_point()+theme_light()+labs(title="Non-linear function")#model given in the textbook
fx<-function(x,a,b){
y<-x/(a+b*x^2)
return(y)
}
#creating the non linear regression model and produce its predictions for the plot
nonls<-nls(y~fx(x,a,b),d,start = list(a=1,b=1))
summary(nonls)##
## Formula: y ~ fx(x, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.48544 0.08777 16.92 7.15e-05 ***
## b 1.00212 0.05019 19.96 3.71e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01739 on 4 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.899e-07
mdl<-predict(nonls)
#Plot the scatter plus its predicted best fitted curve
ggplot(d,aes(x,y))+geom_point()+geom_smooth(aes(y=mdl),colour="pink")+theme_light()+labs(title="Non-linear regression with best fit curve")## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
3) Create the best fit line for the non linear function below
Following the same steps seen above
x<-c(0.1,0.5,1.0,1.5,2.0,2.5)
y<-c(0,0,1,1,1,0)
fx<-function(x,a,b){
y<-1/(1+exp(a+b*x))
return(y)
}
d<-data.frame(x,y)
#Plotting the original scatter plot
ggplot(d,aes(x,y))+geom_point()+theme_light()+labs(title="Non-linear function")#creating the non linear regression model and produce its predictions for the plot
nonls<-nls(y~fx(x,a,b),d,start = list(a=1,b=1))
summary(nonls)##
## Formula: y ~ fx(x, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 36.42 211289.32 0 1
## b -48.51 261890.12 0 1
##
## Residual standard error: 0.5 on 4 degrees of freedom
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 7.656e-06
mdl<-predict(nonls)
#Plot the scatter plus its predicted best fitted curve
ggplot(d,aes(x,y))+geom_point()+geom_smooth(aes(y=mdl),colour="pink")+theme_light()+labs(title="Non linear regression with best fit curve")## `geom_smooth()` using method = 'loess' and formula 'y ~ x'