analisis

Cargar datos

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(haven)
library(ggstatsplot)

Warning: package 'ggstatsplot' was built under R version 4.4.1

You can cite this package as:
     Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
     Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167

library(lme4)

Warning: package 'lme4' was built under R version 4.4.1

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(effects)

Warning: package 'effects' was built under R version 4.4.1

Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.

library(ggeffects)

Warning: package 'ggeffects' was built under R version 4.4.1

library(emmeans)

Warning: package 'emmeans' was built under R version 4.4.1

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

library(gtsummary)

Warning: package 'gtsummary' was built under R version 4.4.1

library(sjPlot)

Warning: package 'sjPlot' was built under R version 4.4.1

Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

df  <- readxl::read_xlsx("TOTAL.GRADO6.xlsx",1) %>%   
  mutate(Sex= factor(as.factor(sex),
                     levels = c("masculino", "femenino"), 
                     labels = c("Male", "Female")),
   Age=round(age,2),
   Rating=round(Nota,1),
   Usability=round(Usabilidad,1),
   Grupo=Grupo+1,
   Grupo=ifelse(Grupo==1 & site=="Málaga",0,Grupo), 
   Group= factor(as.factor(Grupo),
                 levels = c(0, 1,2,3), 
                 labels = c("Control-Z", "Control-A", "Zapworks", "Augmentaty")),
   Center=factor(as.factor(site),
                 levels=c("Cádiz","Málaga"),
   labels=c("Cadiz","Malaga")
   ))  %>% 
  select(Orden,Group,Center,Sex,Age,Rating,Usability,fn)
  
  
df %>% head()

# A tibble: 6 × 8
  Orden Group      Center Sex      Age Rating Usability fn        
  <dbl> <fct>      <fct>  <fct>  <dbl>  <dbl>     <dbl> <chr>     
1    43 Augmentaty Cadiz  Male    26.3    9          NA 1994-11-23
2    60 Augmentaty Cadiz  Male    25.8    8.5        65 1995-05-17
3    16 Control-A  Cadiz  Female  25.8    6.9        NA 1995-06-08
4   189 Zapworks   Malaga Female  25.7    9.2        59 1995-05-03
5   133 Control-Z  Malaga Female  24.6    7.9        NA 1996-06-03
6    76 Augmentaty Cadiz  Female  24.4    8.4        NA 1996-10-16

#write_sav(df,"baseDatos.sav")

Tabla uno

df %>%   tbl_summary(
    include=c(Sex,Age,Rating,Usability),
    by = Group, 
    missing = "no",
    type = list(Age = "continuous2",Rating = "continuous2",Usability = "continuous2"),
    statistic = list(
      all_continuous() ~ c("{mean} ({sd})", "{median} [{p25}, {p75}]", "{min}, {max}")
    )
  ) |> 
  add_n() |>
  add_p() |>
  modify_header(label = "**Variable**") |> 
  bold_labels()

The following warnings were returned during `modify_header()`:
! For variable `Usability` (`Group = "Control-Z"`) and "min" statistic: no
  non-missing arguments to min; returning Inf
! For variable `Usability` (`Group = "Control-Z"`) and "max" statistic: no
  non-missing arguments to max; returning -Inf
! For variable `Usability` (`Group = "Control-A"`) and "min" statistic: no
  non-missing arguments to min; returning Inf
! For variable `Usability` (`Group = "Control-A"`) and "max" statistic: no
  non-missing arguments to max; returning -Inf

Variable	N	Control-Z N = 59 ¹	Control-A N = 37 ¹	Zapworks N = 54 ¹	Augmentaty N = 49 ¹	p-value ²
Sex	199					0.4
Male		27 (46%)	15 (41%)	24 (44%)	28 (57%)
Female		32 (54%)	22 (59%)	30 (56%)	21 (43%)
Age	199					<0.001
Mean (SD)		20.88 (0.97)	21.45 (1.04)	20.23 (1.30)	21.62 (1.36)
Median [Q1, Q3]		20.86 [20.37, 21.17]	21.06 [20.81, 22.00]	19.83 [19.46, 20.47]	20.99 [20.65, 22.45]
Min, Max		19.15, 24.62	20.33, 25.76	19.07, 25.71	20.31, 26.29
Rating	199					<0.001
Mean (SD)		7.44 (0.76)	7.31 (0.95)	8.36 (0.52)	8.87 (0.37)
Median [Q1, Q3]		7.40 [6.90, 8.10]	7.40 [6.80, 8.00]	8.40 [8.10, 8.70]	9.00 [8.80, 9.00]
Min, Max		6.00, 8.90	5.30, 8.90	6.30, 9.20	7.30, 9.40
Usability	93					0.049
Mean (SD)		NA (NA)	NA (NA)	61.6 (4.0)	63.9 (6.1)
Median [Q1, Q3]		NA [NA, NA]	NA [NA, NA]	61.0 [59.0, 64.0]	63.0 [60.0, 64.0]
Min, Max		Inf, -Inf	Inf, -Inf	54.0, 76.0	54.0, 80.0
¹ n (%)
² Pearson’s Chi-squared test; Kruskal-Wallis rank sum test

Esta es la nueva estructura de la base de datos, etiquetada de la forma adecuada para poder usarla en formato de publicaciones.

Análisis descriptivo

df %>%  
ggstatsplot::ggbarstats(x=Sex, y=Group,bf.message = FALSE)

En ese gráfico tenemos la distribución de sexos según grupo, y no se aprecian diferencias en la distribución del sexo entre los tres grupos. Hay algo más de hombres en el grupo Augmentaty, pero la “V de Cramer” nos indica que el effect size es muy pequeño.

df %>% 
  ggbetweenstats(x=Group, y=Age, bf.message = FALSE)+
  ylab("Age in years")+xlab("")

#Hacemos la prueba no paramétrica de Kruskal-Wallis para confirmar que hay diferencias en la edad entre los tres grupos
df %>% 
  ggbetweenstats(x=Group, y=Age, bf.message = FALSE, type="non-parametric")+
  ylab("Age in years")+xlab("")

Prácticamente todos los grupos se diferencian entre sí por edades. No es que sea importante, pero es dificil justificar que la asignación de los individuos a su respectivo grupo es aleatoria. Cuidado con dar mucho peso a la aleatorización.

Análisis univariante de la variable Rating

df %>% 
  ggbetweenstats(x=Group, y=Rating, bf.message = FALSE)+
  ylab("Rating")+xlab("")

df %>% 
  ggbetweenstats(x=Group, y=Rating, bf.message = FALSE, type="non-parametric")+
  ylab("Rating")+xlab("")

De esto os interesa especialmente que los grupos de control son similares entre sí, y que a su vez ambos controles se diferencian de cada intervención. Además las intervenciones se diferencian entre sí.

Análisis Univariante en forma de tablas en lugar de gráficos

En forma de tabla y dada la falta de normalidad de los datos, lo más conveniente es usar pruebas robustas no paramétricas

df %>%   tbl_summary(
    include=c(Age,Rating,Usability),
    by = Group, # split table by group
    missing = "no" # don't list missing data separately
  ) |> 
  add_n() |> # add column with total number of non-missing observations
  add_p() |> # test for a difference between groups
  modify_header(label = "**Variable**") |> # update the column header
  bold_labels()

Variable	N	Control-Z N = 59 ¹	Control-A N = 37 ¹	Zapworks N = 54 ¹	Augmentaty N = 49 ¹	p-value ²
Age	199	20.86 (20.37, 21.17)	21.06 (20.81, 22.00)	19.83 (19.46, 20.47)	20.99 (20.65, 22.45)	<0.001
Rating	199	7.40 (6.90, 8.10)	7.40 (6.80, 8.00)	8.40 (8.10, 8.70)	9.00 (8.80, 9.00)	<0.001
Usability	93	NA (NA, NA)	NA (NA, NA)	61.0 (59.0, 64.0)	63.0 (60.0, 64.0)	0.049
¹ Median (Q1, Q3)
² Kruskal-Wallis rank sum test

Análisis multivariante

En estos análisis anteriores no hemos corregido por edad, y ya que existían diferencias significativs entre los grupos, vamos a tenerlo en cuenta. Adicionalmente añadimos la variable sexo. No se espera que cambie nada, pero añadimos que se ha hecho para tener más robustez en la interpretación de los resultados:

m1 <- lm(Rating ~ Group + Sex+Age, data = df)
gtsummary::tbl_regression(m1, digits=3, 
                          tidy_fun = broom.mixed::tidy,exponentiate = FALSE) %>% 
  add_significance_stars(
    hide_p=F,hide_se=F, hide_ci=F) %>%
  bold_p() %>% 
  #add_q() %>% 
  #add_vif() %>% 
  add_n(location="level")

Characteristic	N	Beta ¹	SE ²	95% CI ²	p-value
Group
Control-Z	59	—	—	—
Control-A	37	-0.14	0.142	-0.42, 0.15	0.3
Zapworks	54	0.92***	0.129	0.66, 1.2	<0.001
Augmentaty	49	1.4***	0.133	1.2, 1.7	<0.001
Sex
Male	94	—	—	—
Female	105	0.07	0.096	-0.12, 0.26	0.4
Age	199	0.00	0.041	-0.08, 0.08	>0.9
¹ p<0.05; p<0.01; p<0.001
² SE = Standard Error, CI = Confidence Interval

Y lo mismo para Usability:

m1 <- lm(Usability ~ Group + Sex+Age, data = df)
gtsummary::tbl_regression(m1, digits=3, 
                          tidy_fun = broom.mixed::tidy,exponentiate = FALSE) %>% 
  add_significance_stars(
    hide_p=F,hide_se=F, hide_ci=F) %>%
  bold_p() %>% 
  #add_q() %>% 
  #add_vif() %>% 
  add_n(location="level")

Characteristic	N	Beta ¹	SE ²	95% CI ²	p-value
Group
Zapworks	54	—	—	—
Augmentaty	39	2.7*	1.19	0.28, 5.0	0.029
Sex
Male	46	—	—	—
Female	47	0.66	1.06	-1.4, 2.8	0.5
Age	93	-0.26	0.424	-1.1, 0.58	0.5
¹ p<0.05; p<0.01; p<0.001
² SE = Standard Error, CI = Confidence Interval

En este último, las diferencias entre grupos, que estaban en el límite de ser significativos se han vuelto significativas más claramente.

No estaría de más explorar gráficamente como la edad afecta a las calificaciones, controlando por todo lo que tenemos.

df %>% mutate(Subgroup=paste(Group,Center)) %>%
  ggplot(aes(x=Age, y=Rating, color=Subgroup))+
  geom_point()+
  geom_smooth(method="lm")+
  ylab("Rating")+xlab("Age")+
  theme_classic()

`geom_smooth()` using formula = 'y ~ x'

No diría que la edad tiene mucho que decir en cuanto a afectar la Usability. No se alejan los modelo significativamente de la horizontal. Las diferencias en edad entre grupos no deberían tener consecuencia.