require("tidyverse")
Loading required package: tidyverse
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ----------------------------------------------------------------------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
require("data.table")
Loading required package: data.table
data.table 1.9.6 For help type ?data.table or https://github.com/Rdatatable/data.table/wiki
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, last
The following object is masked from ‘package:purrr’:
transpose
df <- read.csv("ximenaseguel.csv", header = T)
df <- setnames(df, tolower(names(df)))
df <- df %>%
select(-c(nombre, dif..t.p., dif..t.b.)) %>%
rename(real = medida.cono,
paralela = medida.con.t.p. ,
bisectriz = medida.con.t.b.) %>%
gather("grupo", "valor", real:bisectriz)
Primero veo la distribución de las mediciones para cada grupo
ggplot(df, aes(x=valor, fill=grupo)) + geom_density(alpha=.3) +
theme_minimal()
Luego, las comparo
ggplot(df, aes(x=grupo, y=valor, fill=grupo)) +
geom_boxplot() +
theme_minimal()
AL parecer no hay mucha diferencia entre los grupos. Los comparo con un ANOVA
comparacion.entre.grupos <- aov(valor~grupo, data = df)
comparacion.entre.grupos
Call:
aov(formula = valor ~ grupo, data = df)
Terms:
grupo Residuals
Sum of Squares 49.9643 1245.5536
Deg. of Freedom 2 165
Residual standard error: 2.74751
Estimated effects may be unbalanced
summary(comparacion.entre.grupos)
Df Sum Sq Mean Sq F value Pr(>F)
grupo 2 50 24.982 3.309 0.039 *
Residuals 165 1246 7.549
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Si hay diferencias significativas. Verifico cual grupo difiere
TukeyHSD(comparacion.entre.grupos, "grupo", ordered = TRUE)
Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered
Fit: aov(formula = valor ~ grupo, data = df)
$grupo
diff lwr upr p adj
bisectriz-real 1.089286 -0.13872361 2.317295 0.0933775
paralela-real 1.214286 -0.01372361 2.442295 0.0533471
paralela-bisectriz 0.125000 -1.10300932 1.353009 0.9685620
Ahora grafico las diferencias. Aquellos grupos que toquen el 0 significa que no tienen diferencias
plot(TukeyHSD(comparacion.entre.grupos, "grupo"))
Todos lo tocan. Si bien el test indica una diferencia significativa, en este caso sugiero fijar el valor de p en 0.01 y aceptar la hipótesis nula que no hay diferencia entre grupos.
df.correlacion <- df %>%
spread("grupo", "valor")
Error: Duplicate identifiers for rows (130, 143), (131, 141), (116, 142, 159), (124, 128), (157, 166), (113, 120), (135, 167), (151, 162, 164), (74, 87), (75, 85), (60, 86, 103), (68, 72), (101, 110), (57, 64), (79, 111), (95, 106, 108), (18, 31), (19, 29), (4, 30, 47), (12, 16), (45, 54), (1, 8), (23, 55), (39, 50, 52)
fit1 <- lm(paralela ~ real, data = df.correlacion)
summary(fit1)
Call:
lm(formula = paralela ~ real, data = df.correlacion)
Residuals:
Min 1Q Median 3Q Max
-1.4149 -0.6106 -0.1809 0.6787 2.5383
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.21071 1.07695 2.053 0.045 *
real 0.95319 0.05026 18.964 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9199 on 54 degrees of freedom
Multiple R-squared: 0.8694, Adjusted R-squared: 0.867
F-statistic: 359.6 on 1 and 54 DF, p-value: < 2.2e-16
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(paralela ~ real, data = df.correlacion))
fit1 <- lm(bisectriz ~ paralela, data = df.correlacion)
summary(fit1)
Call:
lm(formula = bisectriz ~ paralela, data = df.correlacion)
Residuals:
Min 1Q Median 3Q Max
-4.3896 -1.5171 -0.0368 1.0711 8.6250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3018 2.7656 0.832 0.409
paralela 0.8921 0.1222 7.303 1.33e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.285 on 54 degrees of freedom
Multiple R-squared: 0.4969, Adjusted R-squared: 0.4876
F-statistic: 53.33 on 1 and 54 DF, p-value: 1.327e-09
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(bisectriz ~ paralela, data = df.correlacion))
Al comparar los promedios, Paralela y bisectriz no difieren de manera significativa entre ellas ni tampoco con respecto a la real. Al comparar las correlaciones, la paralela se correlaciona mejor con la real que la bisctriz