library(tidyverse)
library(openintro)
library(statsr)## Warning: package 'BayesFactor' was built under R version 4.1.3
## Warning: package 'coda' was built under R version 4.1.3
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "replValueSp"; definition not updated
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "mMatrix"; definition not updated
## Warning: replacing previous import 'dplyr::combine' by 'gridExtra::combine' when
## loading 'statsr'
library(broom)low_v = read_csv("data/low_v.csv", show_col_types = FALSE)
high_v = read_csv("data/high_v.csv", show_col_types = FALSE)
all_v = bind_rows(low_v,high_v)
glimpse(all_v)## Rows: 32
## Columns: 2
## $ V <dbl> 0.5, 0.9, 1.6, 1.7, 1.8, 2.1, 2.2, 2.5, 2.5, 2.2, 2.1, 1.8, 1.7, 1.6~
## $ I <dbl> 0.000238, 0.000431, 0.000761, 0.000785, 0.000861, 0.000995, 0.001055~
What are the dimensions of the dataset? What does each row represent? Ans. The dimensions of the dataset are 32 times 2. Each row represents the voltage and the current passing through the given resistor.
dim(all_v)## [1] 32 2
We can visualize parts of the data (for instance V <= 5 )
low_v5<-all_v%>%filter(V<=5)…
What type of plot would you use to display the relationship between the voltage as a function of current ? Ans. I would use a line plot or a scatter plot to show the relationship between the voltage as a function of a current.
ggplot(low_v5) + geom_line(aes(x=I,y=V))display the relationship between the voltage as a function of current. For this you need to calculate the resistance by mutating the data:
low_v5 = low_v5 %>% mutate(R=V/I)
ggplot(low_v5) + geom_line(aes(x=I,y=R)) + ylim(1500,2500)## Warning: Removed 1 row(s) containing missing values (geom_path).
If the relationship looks linear, we can quantify the strength of the relationship with the correlation coefficient. There is a high correlation in the data.
low_v5 %>%
summarise(cor(I, V))## # A tibble: 1 x 1
## `cor(I, V)`
## <dbl>
## 1 0.857
Looking at your plot from the previous exercise, describe the relationship between these two variables. Make sure to discuss the form, direction, and strength of the relationship as well as any unusual observations.
Just as you’ve used the mean and standard deviation to summarize a single variable, you can summarize the relationship between these two variables by finding the line that best follows their association. Use the following interactive function to select the line that you think does the best job of going through the cloud of points.
#plot_ss(x = I, y = V, data = low_v5)However the original model does (may) not make physical sense since the voltage at zero current is not zero. To force this physical constraint, we need to force the intercept to zero by replacing the formula y ~ x with y ~ 0+x.
m2 <- lm(V ~ 0+I, data = low_v5)
tidy(m2)## # A tibble: 1 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 I 2023. 99.2 20.4 1.63e-17
glance(m2)## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.941 0.939 0.633 NA NA NA -25.5 54.9 57.5
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
ggplot(data = low_v5, aes(x = I, y = V)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ 0+x, se = FALSE)If someone saw the least squares regression line and not the actual data, how would they predict a Voltage for I not in your data ? Is it valid to use model to predict Voltage for 5mA ? Ans. They would use the linear regression to predic the voltage of I not in the data. Yes, it is valid to use this model for I = 5mA.
m1 <- lm(V ~ I, data = low_v5)
tidy(m1)## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.645 0.230 2.81 0.00950
## 2 I 1559. 187. 8.33 0.0000000111
glance(m1)## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.735 0.725 0.563 69.4 0.0000000111 1 -21.8 49.5 53.4
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
m1_aug <- augment(m1)
ggplot(data = m1, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
xlab("Fitted values") +
ylab("Residuals")Is there any apparent pattern in the residuals plot? What does this indicate about the linearity of the relationship between the two variables? Ans. The residual plot tends towards zero. This indicates that the two variables are linear.
ggplot(data = m1_aug, aes(x = .resid)) +
geom_histogram(binwidth = 0.005) +
xlab("Residuals") ### Ex 14 Based on the histogram, does the nearly normal residuals condition appear to be violated? Why or why not?
Ans. No, the normal residuals condition doesn’t appear to be violated, as the graph appears to be linear.
Based on the residuals vs. fitted plot, does the constant variability condition appear to be violated? Why or why not? Ans. The conditional variability does not appear to be violated either due to the linearity of the data.
What type of plot would you use to display the relationship between the voltage as a function of current ?
ggplot(high_v) + geom_line(aes(x=I,y=V))Display the relationship between the voltage as a function of current. For this you need to calculate the resistance by mutating the data:
high_v = high_v %>% mutate(R=V/I)
ggplot(high_v) + geom_line(aes(x=I,y=R))Create and compare the quadratic model created using different techniques
high_v <- high_v %>%
mutate(Isq = I^2)
mod2l <- lm(V ~ I, data = high_v)
mod2q_a <- lm(V ~ I + Isq, data = high_v)
mod2q_b <- lm(V ~ I + I(I^2), data = high_v)
mod2q_c <- lm(V ~ poly(I, 2), data = high_v)
mod2l_aug = augment(mod2l,high_v)
mod2q = mod2q_c
mod2q_aug = augment(mod2q_c,high_v)
plt = ggplot(mod2l_aug,aes(x=I,y=V)) +
geom_point()+
geom_line(aes(x = I, y = .fitted), col = "blue") +
geom_line(data =mod2q_aug, aes(x = I, y = .fitted), col = "red")
plt+labs(title = "Linear (blue) vs. Cubic (red) Polynomial Fits")mod2q0 <- lm(V ~ 0+poly(I, 2), data = high_v)
summary(mod2q0)##
## Call:
## lm(formula = V ~ 0 + poly(I, 2), data = high_v)
##
## Residuals:
## Min 1Q Median 3Q Max
## 2.751 3.057 3.900 5.123 6.957
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## poly(I, 2)1 1.749 4.760 0.367 0.719
## poly(I, 2)2 2.706 4.760 0.568 0.579
##
## Residual standard error: 4.76 on 14 degrees of freedom
## Multiple R-squared: 0.03167, Adjusted R-squared: -0.1067
## F-statistic: 0.229 on 2 and 14 DF, p-value: 0.7983
mod2q0_aug = augment(mod2q0,high_v)
plt+geom_line(data =mod2q0_aug, aes(x = I, y = .fitted), col = "red", linetype ='3313')