Call:
lm(formula = treatment_1 ~ predictor, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.148 -2.863 1.618 2.611 4.941
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.04237 1.34205 0.777 0.445
predictor 0.91795 0.09028 10.168 5.57e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.255 on 23 degrees of freedom
Multiple R-squared: 0.818, Adjusted R-squared: 0.8101
F-statistic: 103.4 on 1 and 23 DF, p-value: 5.575e-10
summary(model_2)
Call:
lm(formula = treatment_2 ~ predictor, data = df)
Residuals:
Min 1Q Median 3Q Max
-51.14 -25.44 0.90 19.88 46.59
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.3799 11.2528 1.544 0.136
predictor 0.1956 0.7569 0.258 0.798
Residual standard error: 27.29 on 23 degrees of freedom
Multiple R-squared: 0.002893, Adjusted R-squared: -0.04046
F-statistic: 0.06674 on 1 and 23 DF, p-value: 0.7984
This tells us the effect of the predictor for each of the groups (treatment 1 and treatment 2), but does not tell us the effect of the predictors as a whole.
We also don’t know the effect of switching between treatment 1 and 2.
“Tidy” Data (cont.)
This model needs a bit of work, but let’s see if we can try to visualize this data.
p <-ggplot(df, aes(x = predictor))p +geom_point(aes(y = treatment_1))
p +geom_point(aes(y = treatment_2))
We can create two separate graphs for each of the conditions, but it seems difficult to plot them on the same graph.
“Tidy” Data (cont.)
Data comes in many formats in real life, which can make it difficult to deal with during data analysis.
When data is all in the same format, with columns as variables, rows as observations, and cells as values, we call this tidy data.
Tidyverse is a collection of packages that was created surrounding the philosophy of “tidy” data.
We’ve already used one of the packages, called ggplot
If you want to read more about the philosophy behind tidy data, feel free to read this article.
Wide vs. Long Data
Installing Packages
Let’s get started with our first tidyverse function (besides ggplot). If you haven’t already, you will want to install the tidyverse package.
# install.packages("tidyverse")library(tidyverse)
Tidyverse will give you many included packages. You can also import the specific library you want.
library(tidyr)
pivot_longer
To fix the problem we were encountering with our earlier dataframe, let’s transform the dataframe to have the following three columns.
Most things that can be done in Tidyverse can be done in base R, so it is up to you what you prefer. We will go over a few more tidyverse equivalents of base R functions today.
dplyr
pivot_longer comes from the tidyr portion of Tidyverse. We will now go over the dplyr portion which contains many other functions related to dataframe manipulation
base_mutate <- iris # creating a copy of original dataframebase_mutate$Average.Length <- (base_mutate$Sepal.Length + base_mutate$Petal.Length) /2head(base_mutate)