Generalized reverse Discrete Choice Models

Author

SANJOR

Importation des données

Les données sont disponible dans le package Ecdat

library(Ecdat)
Warning: le package 'Ecdat' a été compilé avec la version R 4.4.3
Le chargement a nécessité le package : Ecfun
Warning: le package 'Ecfun' a été compilé avec la version R 4.4.3

Attachement du package : 'Ecfun'
L'objet suivant est masqué depuis 'package:base':

    sign

Attachement du package : 'Ecdat'
L'objet suivant est masqué depuis 'package:datasets':

    Orange
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(stargazer)

Please cite as: 

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 

Explorer les données

#data(package = "Ecdat")
Yogurt %>% as.tibble() -> df
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.

Proportion des yaourts

Yogurt%>%select(choice)%>%table()%>%prop.table()%>%round(5)
choice
yoplait  dannon  hiland  weight 
0.33914 0.40216 0.02944 0.22927 
# Moyenne et écart type des varaibles numériques
Yogurt %>%
 summarise(across(where(is.numeric) & !all_of("id"), list(mean = mean, sd = sd), na.rm = TRUE))
Warning: There was 1 warning in `summarise()`.
ℹ In argument: `across(...)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
  feat.yoplait_mean feat.yoplait_sd feat.dannon_mean feat.dannon_sd
1        0.05597015       0.2299117       0.03772803      0.1905772
  feat.hiland_mean feat.hiland_sd feat.weight_mean feat.weight_sd
1       0.03689884      0.1885525       0.03772803      0.1905772
  price.yoplait_mean price.yoplait_sd price.dannon_mean price.dannon_sd
1           10.68213         1.906265          8.163474        1.062886
  price.hiland_mean price.hiland_sd price.weight_mean price.weight_sd
1          5.362935        0.805391          7.949088       0.7735004
# 
Yogurt %>%
  group_by(choice) %>%
  summarise(proportion = n())
# A tibble: 4 × 2
  choice  proportion
  <fct>        <int>
1 yoplait        818
2 dannon         970
3 hiland          71
4 weight         553

Création d’un jeu de données mixte

set.seed(123)
data <- tibble(
 groupe = sample(c("A", "B", "C"), 100, replace = TRUE),
 x = rnorm(100, mean = 50, sd = 10),  # Variable quantitative
 y = rnorm(100, mean = 30, sd = 5)    # Variable quantitative
)
data %>%
 summarise(
   mean_x = mean(x, na.rm = TRUE),
   sd_x = sd(x, na.rm = TRUE),
   min_x = min(x, na.rm = TRUE),
   max_x = max(x, na.rm = TRUE),
   mean_y = mean(y, na.rm = TRUE),
   sd_y = sd(y, na.rm = TRUE)
   )
# A tibble: 1 × 6
  mean_x  sd_x min_x max_x mean_y  sd_y
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
1   49.4  9.57  26.9  71.9   30.0  4.71
stargazer(Yogurt, type = "text", summary = TRUE)

================================================
Statistic       N    Mean  St. Dev.  Min   Max  
------------------------------------------------
id            2,412 48.704  27.878    1    100  
feat.yoplait  2,412 0.056   0.230     0     1   
feat.dannon   2,412 0.038   0.191     0     1   
feat.hiland   2,412 0.037   0.189     0     1   
feat.weight   2,412 0.038   0.191     0     1   
price.yoplait 2,412 10.682  1.906   0.300 19.300
price.dannon  2,412 8.163   1.063   1.900 11.100
price.hiland  2,412 5.363   0.805   2.500 8.600 
price.weight  2,412 7.949   0.774   0.400 10.400
------------------------------------------------

Reproduire Table 1

Un exemple pédagogique pour comprendre pivot longer

df <- tibble(
 id = c(1, 2, 3),
 math = c(80, 90, 85),
 science = c(75, 95, 88),
 english = c(78, 85, 82)
)

df
# A tibble: 3 × 4
     id  math science english
  <dbl> <dbl>   <dbl>   <dbl>
1     1    80      75      78
2     2    90      95      85
3     3    85      88      82

En réalité il n y a qu’une seule variable, c’est la note mais déclinée par élève et par matière. On pourrait créer 3 colonnes : id, note et matière

df_long <- df %>%
 pivot_longer(cols = c(math, science, english),
              names_to = "matiere",
              values_to = "note")
df_long
# A tibble: 9 × 3
     id matiere  note
  <dbl> <chr>   <dbl>
1     1 math       80
2     1 science    75
3     1 english    78
4     2 math       90
5     2 science    95
6     2 english    85
7     3 math       85
8     3 science    88
9     3 english    82

Pivoter le dataframe

Le data frame est dans un format peu commode qui s’appelle un format wide. Il y a beaucoup de colonnes (1 + 4 + 4 + 1 = 10) colonnes. Il est plsu commode et plus parlant statistiquement de créer un format long avec des colonnes id, feat,price, choice. On sera obliger d’ajouter une autre colonne qui identifie la marque en question

Yogurt |>
 mutate(purshase = as.integer(row_number()),id=as.integer(id)) |>  
 pivot_longer(cols = starts_with("price") | starts_with("feat"),
              names_to = c(".value", "brand"),
              names_sep = "\\.") |>  
 relocate( choice, .after = last_col())->Yogurt_long
Yogurt_long
# A tibble: 9,648 × 6
      id purshase brand   price  feat choice
   <int>    <int> <chr>   <dbl> <dbl> <fct> 
 1     1        1 yoplait 10.8      0 weight
 2     1        1 dannon   8.1      0 weight
 3     1        1 hiland   6.10     0 weight
 4     1        1 weight   7.90     0 weight
 5     1        2 yoplait 10.8      0 dannon
 6     1        2 dannon   9.80     0 dannon
 7     1        2 hiland   6.40     0 dannon
 8     1        2 weight   7.50     0 dannon
 9     1        3 yoplait 10.8      0 dannon
10     1        3 dannon   9.80     0 dannon
# ℹ 9,638 more rows

Utiliser gtsummary

library(glue)
packageVersion("glue")
[1] '1.8.0'
library("gtsummary")
Warning: le package 'gtsummary' a été compilé avec la version R 4.4.3
Yogurt_long %>%
 mutate(price = price / 100, ms = as.numeric(brand == choice)) %>%
 tbl_summary(
   by = brand,  
   statistic = list(
     ms ~ "{mean}({sd})",
     feat ~ "{mean}({sd})",
     price ~ "{mean}({sd})"
   ),
   type = list(feat ~ "continuous", ms ~ "continuous"),
   digits = list(ms ~ 3, feat ~ 3, price ~ 3),
   include = c(ms, feat, price),
   label = list(ms = "Market Share", feat = "Feature", price = "Price"),
   missing = "no"
 ) %>%
 modify_header(
   all_stat_cols() ~ "**{level}**"  # Modifie les en-têtes pour supprimer "N = ..."
 ) %>%
 modify_header(
   label = "**Variable**"
 ) %>%
 bold_labels() %>%modify_caption("Summary statistics for Yogurt data")
Summary statistics for Yogurt data
Variable dannon1 hiland1 weight1 yoplait1
Market Share 0.402(0.490) 0.029(0.169) 0.229(0.420) 0.339(0.474)
Feature 0.038(0.191) 0.037(0.189) 0.038(0.191) 0.056(0.230)
Price 0.082(0.011) 0.054(0.008) 0.079(0.008) 0.107(0.019)
1 Mean(SD)