Avignon

Les tests:

* Les Khi2
* Wilcoxon
* ACM

Ici on ne traite pas les données qualitatives

kable(head(db_econfin_TP_u[,c(1,16)],2))

code	Pourquoi ?
2020-04-17_18:00:19	J’ai peur que cela ne suffise pas pour que l’on change suffisamment nos habitudes pour remédier à la crise écologique…
2020-04-18_21:07:51	Je ne suis pas certaine qu’elle engendre à elle seule un changement radical et général des habitudes de production et de consommation

db_econfin_TP_u = db_econfin_TP_u[,-16]
kable(head(db_econfin_TP_u))

code	day	year	sex	diplôme_plus_élevé	fam_st	religion	combien_pièces	liv	Avec_qui_confiné	travaille_confinement	after_COVID	ST_env	LT_env
2020-04-17_18:00:19	1	1995	Féminin	Bac +5	Célibataire	Athée	2	En zone urbaine	Avec un.e conjoint.e.	Oui, je suis en télétravail	Oui	Plutôt oui	Plutôt non
2020-04-18_21:07:51	2	1995	Féminin	Bac +5	Célibataire	NA	6	En zone urbaine	Chez vos parents	Oui, je suis en télétravail	Non	Plutôt non	Plutôt non
2020-04-19_15:01:46	3	1994	Féminin	Bac +5	Célibataire	Croyant non pratiquant (ex: de culture chrétienne)	8	En zone périurbaine	Chez vos parents	Oui, je suis en télétravail	Non	Plutôt oui	Plutôt non
2020-04-23_13:05:43	7	1962	Masculin	Bac +3	Marié.e	Bouddhiste	6	En zone rurale	Avec un.e conjoint.e.	Oui, je suis en télétravail	Je n’ai pas d’opinion	Plutôt oui	Plutôt non
2020-04-23_13:10:16	7	1994	Féminin	Bac +5	Célibataire	Croyant non pratiquant (ex: de culture chrétienne)	7	En zone urbaine	Chez vos parents	Oui, je suis en télétravail	Oui	Plutôt oui	Plutôt oui
2020-04-23_13:31:23	7	1969	Féminin	Bac ou bac professionnel	Célibataire	Agnostique	2	En zone urbaine	Seul.e	Oui, je suis sur mon site de travail habituel	Non	Plutôt oui	Plutôt non

On a déjà re codé les questions à un nom pour qu’ils soient plus faciles à moviliser sur R e.g. :

“Quel est votre diplôme le plus élevé?” –> diplôme_plus_élevé

Recodage des variables

Une example d’une variable qui a besoin de recodage pour les analyses:

table(db_econfin_TP_u$diplôme_plus_élevé)

## 
##                             Aucun diplôme 
##                                        16 
##                                    Bac +3 
##                                       263 
##                                    Bac +5 
##                                       466 
##                  Bac ou bac professionnel 
##                                       248 
## Brevet d'étude du premier cycle, CAP, BEP 
##                                        57 
##                          Doctorat et plus 
##                                        41 
##                                  DUT, BTS 
##                                        98

db_econfin_TP_u[db_econfin_TP_u == "NA"] = NA
#diplôme_plus_élevé
db_econfin_TP_u = db_econfin_TP_u %>% mutate(ES = case_when(diplôme_plus_élevé == "Bac +3" ~ 1, diplôme_plus_élevé == "Bac +5" ~ 1, diplôme_plus_élevé == "DUT, BTS" ~ 1, diplôme_plus_élevé == "Doctorat et plus" ~ 1, diplôme_plus_élevé == "Aucun diplôme" ~ 0, diplôme_plus_élevé == "Bac ou bac professionnel" ~ 0,diplôme_plus_élevé == "Brevet d'étude du premier cycle, CAP, BEP" ~ 0)) 
# status famille 
db_econfin_TP_u = db_econfin_TP_u %>% mutate(fam = case_when(fam_st == "Célibataire" ~ 0,fam_st == "Veuf.ve" ~ 0,fam_st == "En couple (concubinage)" ~ 1,fam_st == "Marié.e" ~ 1))
# nombre enfants
db_econfin_TP_u = db_econfin_TP_u %>% mutate(chi = case_when(Combien_enfants >= 1 ~ 1,Combien_enfants < 1 ~ 0))

Join data frame

db_econfin_TP_u = left_join(db_econfin_TP_u,db_econfin_TP_rcodage, by = "code")

db_econfin_TP_u = db_econfin_TP_u %>% select(-c(diplôme_plus_élevé,Combien_enfants,fam_st,religion,combien_pièces,Avec_qui_confiné,travaille_confinement))

Pour la religion, travail_confinement, confiné_avec_qui et numero de chambre ont suit le même procédé; e.g on a crée une variable chambre avec trois modalité: Moins de 3; Entre 3 et 6; Plus de 6.

Pour la religion:on a deux modalités presence ou absence de religion traditionelle.

On arrive a la base de donnée suivant:

kable(head(db_econfin_TP_u))

code	day	year	sex	liv	after_COVID	ST_env	LT_env	ES	fam	rel	room	act	lon
2020-04-17_18:00:19	1	1995	Féminin	En zone urbaine	Oui	Plutôt oui	Plutôt non	1	0	0	Moins de 3	1	0
2020-04-18_21:07:51	2	1995	Féminin	En zone urbaine	Non	Plutôt non	Plutôt non	1	0	NA	Entre 3 et 6	1	0
2020-04-19_15:01:46	3	1994	Féminin	En zone périurbaine	Non	Plutôt oui	Plutôt non	1	0	0	Plus de 6	1	0
2020-04-23_13:05:43	7	1962	Masculin	En zone rurale	Je n’ai pas d’opinion	Plutôt oui	Plutôt non	1	1	1	Entre 3 et 6	1	0
2020-04-23_13:10:16	7	1994	Féminin	En zone urbaine	Oui	Plutôt oui	Plutôt oui	1	0	0	Plus de 6	1	0
2020-04-23_13:31:23	7	1969	Féminin	En zone urbaine	Non	Plutôt oui	Plutôt non	0	0	0	Moins de 3	1	1

Exploration et visualization des données

table(db_econfin_TP_u$sex,useNA = "ifany")

## 
##   Agenre  Féminin Masculin     <NA> 
##        1      872      313        4

lapply(db_econfin_TP_u[,c(9:15)],useNA = "ifany",table)

## $ES
## 
##    0    1 <NA> 
##  321  868    1 
## 
## $fam
## 
##    0    1 <NA> 
##  516  623   51 
## 
## $chi
## 
##    0    1 <NA> 
##  824  331   35 
## 
## $rel
## 
##    0    1 <NA> 
##  955  218   17 
## 
## $room
## 
## Entre 3 et 6   Moins de 3    Plus de 6         <NA> 
##          640          189          340           21 
## 
## $act
## 
##    0    1 <NA> 
##  370  781   39 
## 
## $lon
## 
##    0    1 <NA> 
## 1052  136    2

ggplot(db_econfin_TP_u) + aes(x = sex) + geom_bar() + theme_minimal()

# barplot(table(db_econfin_TP_u$sex,useNA = "ifany")) # not so useful 
ggplot(db_econfin_TP_u) + aes(x = liv) +  geom_bar() + theme_minimal()

Les variables qu’on cherche à expliquer

ggplot(db_econfin_TP_u) + aes(x = ST_env) +  geom_bar() + theme_minimal()

ggplot(db_econfin_TP_u) + aes(x = LT_env) +  geom_bar() + theme_minimal()

ggplot(db_econfin_TP_u) + aes(x = after_COVID) +  geom_bar() + theme_minimal()

table(db_econfin_TP_u$ST_env)

## 
## Plutôt non Plutôt oui 
##        152       1022

table(db_econfin_TP_u$ST_env,db_econfin_TP_u$sex,useNA = "ifany")

##             
##              Agenre Féminin Masculin <NA>
##   Plutôt non      1      95       56    0
##   Plutôt oui      0     766      252    4
##   <NA>            0      11        5    0

(test = db_econfin_TP_u %>% group_by(sex) %>% count(ST_env) %>% mutate(total_by_sex = sum(n)) %>% mutate(per = round(n/sum(n)*100,2)))

## # A tibble: 8 x 5
## # Groups:   sex [4]
##   sex      ST_env         n total_by_sex    per
##   <chr>    <chr>      <int>        <int>  <dbl>
## 1 Agenre   Plutôt non     1            1 100   
## 2 Féminin  Plutôt non    95          872  10.9 
## 3 Féminin  Plutôt oui   766          872  87.8 
## 4 Féminin  <NA>          11          872   1.26
## 5 Masculin Plutôt non    56          313  17.9 
## 6 Masculin Plutôt oui   252          313  80.5 
## 7 Masculin <NA>           5          313   1.6 
## 8 <NA>     Plutôt oui     4            4 100

#(test = db_econfin_TP_u %>% group_by(sex) %>% count(after_COVID) %>% mutate(total_by_sex = sum(n)) %>% mutate(per = round(n/sum(n)*100,2)))
#sum(test$n)
#rbind(test, colSums(test[1:10,2:4]),colSums(test))
#prop.table(table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$sex,useNA = "ifany"))

Preparé la donnée pour les analysis

Renommée les modalités des variables

#liv
db_econfin_TP_u [db_econfin_TP_u == "En zone rurale"] = "1"
db_econfin_TP_u [db_econfin_TP_u == "En zone périurbaine"] = "2"
db_econfin_TP_u [db_econfin_TP_u == "En zone urbaine"] = "3"
#room
db_econfin_TP_u [db_econfin_TP_u == "Moins de 3"] = "1"
db_econfin_TP_u [db_econfin_TP_u == "Entre 3 et 6"] = "2"
db_econfin_TP_u [db_econfin_TP_u == "Plus de 6"] = "3"
# ST_env et LT_env
db_econfin_TP_u [db_econfin_TP_u == "Plutôt non"] = "0"
db_econfin_TP_u [db_econfin_TP_u == "Plutôt oui"] = "1"
#sex
db_econfin_TP_u [db_econfin_TP_u == "Masculin"] = "0"
db_econfin_TP_u [db_econfin_TP_u == "Féminin"] = "1"
db_econfin_TP_u = filter(db_econfin_TP_u, sex != "Agenre")
#After covid
db_econfin_TP_u [db_econfin_TP_u == "Non"] = "0"
db_econfin_TP_u [db_econfin_TP_u == "Oui"] = "1"
db_econfin_TP_u [db_econfin_TP_u == "Je n'ai pas d'opinion"] = "2"

Enlever NA

db_econfin_TP_u <- na.omit(db_econfin_TP_u)

Les résultat …

kable(head(db_econfin_TP_u))

code	day	year	sex	liv	after_COVID	ST_env	LT_env	ES	fam	rel	room	act	lon
2020-04-17_18:00:19	1	1995	1	3	1	1	0	1	0	0	1	1	0
2020-04-19_15:01:46	3	1994	1	2	0	1	0	1	0	0	3	1	0
2020-04-23_13:05:43	7	1962	0	1	2	1	0	1	1	1	2	1	0
2020-04-23_13:10:16	7	1994	1	3	1	1	1	1	0	0	3	1	0
2020-04-23_13:31:23	7	1969	1	3	0	1	0	0	0	0	1	1	1
2020-04-23_13:35:29	7	1996	0	1	1	1	1	1	0	0	3	0	0

dim(db_econfin_TP_u)

## [1] 1001   15

Conversion de type de données

class(db_econfin_TP_u$sex)

## [1] "character"

#rownames(db_econfin_TP_u) = c(db_econfin_TP_u$...1) ; # db_econfin_TP_u = db_econfin_TP_u[,2:13]
db_econfin_TP_u[,c(4:15)] <- lapply(db_econfin_TP_u[,c(4:15)], factor) 

db_econfin_TP_u[,c(2:3)] <- lapply(db_econfin_TP_u[,c(2:3)], as.numeric) #  coercion of year to numeric 


str(db_econfin_TP_u)

## tibble [1,001 x 15] (S3: tbl_df/tbl/data.frame)
##  $ code       : chr [1:1001] "2020-04-17_18:00:19" "2020-04-19_15:01:46" "2020-04-23_13:05:43" "2020-04-23_13:10:16" ...
##  $ day        : num [1:1001] 1 3 7 7 7 7 7 7 7 7 ...
##  $ year       : num [1:1001] 1995 1994 1962 1994 1969 ...
##  $ sex        : Factor w/ 2 levels "0","1": 2 2 1 2 2 1 2 1 2 1 ...
##  $ liv        : Factor w/ 3 levels "1","2","3": 3 2 1 3 3 1 1 2 2 3 ...
##  $ after_COVID: Factor w/ 3 levels "0","1","2": 2 1 3 2 1 2 1 1 2 1 ...
##  $ ST_env     : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 1 2 1 ...
##  $ LT_env     : Factor w/ 2 levels "0","1": 1 1 1 2 1 2 1 1 1 1 ...
##  $ ES         : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 2 2 ...
##  $ fam        : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 2 2 2 1 ...
##  $ chi        : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 1 1 ...
##  $ rel        : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 2 1 1 1 ...
##  $ room       : Factor w/ 3 levels "1","2","3": 1 3 2 3 1 3 2 2 2 1 ...
##  $ act        : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 1 2 2 ...
##  $ lon        : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 2 ...
##  - attr(*, "na.action")= 'omit' Named int [1:184] 2 30 44 49 54 55 56 59 62 76 ...
##   ..- attr(*, "names")= chr [1:184] "2" "30" "44" "49" ...

attach(db_econfin_TP_u)

Quantitative variables

####Jours de réponse de l’enquete

Jours 1: 2020-04-17 Jours 25: 2020-05-11

hist(db_econfin_TP_u$day)

Creation des nouvelles variables; age et tranche d’age

Les mesures de tendance centrale

db_econfin_TP_u = db_econfin_TP_u  %>% mutate(age = 2021 - year)

hist(db_econfin_TP_u$age, xlab="Age")

summary(db_econfin_TP_u$age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16.00   24.00   27.00   31.99   34.00   79.00

db_econfin_TP_u %>% group_by(sex) %>% dplyr::summarise(,n = n(), mean = round(mean(age,na.rm = TRUE),1),median = median(age),
                                               min = min(age), max = max(age))

## # A tibble: 2 x 6
##   sex       n  mean median   min   max
##   <fct> <int> <dbl>  <dbl> <dbl> <dbl>
## 1 0       261  35.9     30    19    79
## 2 1       740  30.6     27    16    77

Age par tranches d’age

db_econfin_TP_u = db_econfin_TP_u %>% mutate(t_age = case_when(age <= 27 ~ 1,age > 27 & age <= 40 ~ 2, age > 40 ~ 3)) 
db_econfin_TP_u <- within(db_econfin_TP_u, {t_age<-factor(t_age)})
ggplot(db_econfin_TP_u) +  aes(x = t_age) + geom_bar() +  scale_fill_hue() + theme_minimal()

First MCA

Social variables and LT, ST, after_covid

library(FactoMineR);library(factoextra)

MCA constrains the ordination axes to be linear combinations of explanatory variables

mca_c = MCA(db_econfin_TP_u[,c(4:15,17)], graph = FALSE)
print(mca_c) #mca_c$var$contrib

## **Results of the Multiple Correspondence Analysis (MCA)**
## The analysis was performed on 1001 individuals, described by 13 variables
## *The results are available in the following objects:
## 
##    name              description                       
## 1  "$eig"            "eigenvalues"                     
## 2  "$var"            "results for the variables"       
## 3  "$var$coord"      "coord. of the categories"        
## 4  "$var$cos2"       "cos2 for the categories"         
## 5  "$var$contrib"    "contributions of the categories" 
## 6  "$var$v.test"     "v-test for the categories"       
## 7  "$ind"            "results for the individuals"     
## 8  "$ind$coord"      "coord. for the individuals"      
## 9  "$ind$cos2"       "cos2 for the individuals"        
## 10 "$ind$contrib"    "contributions of the individuals"
## 11 "$call"           "intermediate results"            
## 12 "$call$marge.col" "weights of columns"              
## 13 "$call$marge.li"  "weights of rows"

The proportion of variances retained by the different dimensions; Eigenvalues.

Visualizing the percentages of inertia explained by each MCA dimensions.

#eig.val <- get_eigenvalue(mca_c) # factoextra R package
fviz_screeplot(mca_c, addlabels = TRUE, ylim = c(0, 30))

fviz_contrib(mca_c, choice = "var", axes = 1, top = 15)

# Contributions of rows to dimension 2
fviz_contrib(mca_c, choice = "var", axes = 2, top = 15)

fviz_mca_var(mca_c, choice = "mca.cor",
repel = TRUE, # Avoid text overlapping (slow)
ggtheme = theme_minimal())

fviz_mca_var(mca_c,repel = TRUE,ggtheme = theme_minimal())

fviz_mca_var(mca_c,select.var=list("ST_env"),col.var=c("black"),repel = TRUE,ggtheme = theme_minimal()) #

Chi-square test basics

Chi-square test examines whether rows and columns of a contingency table are statistically significantly associated.

Null hypothesis (H0): row and the column variables of the contingency table are independent (variable dependant et facteur en pratique)
Alternative hypothesis (H1): row and column variables are dependent

The Chi-square statistic:
\[\ X^2=\sum \frac{\ (o-e)^2}{\ e}\] * o is the observed value
* e is the expected value

Degrees of Freedom (DF):
\[DF\ =\ (\ rows\ – 1) * (columns\ – 1)\]

Example de deux variables avec deux modalités: \(DF = (2 – 1) * (2 – 1) = 1\)

Expliquer la perception du futur selon les variables sociales

after_COVID : Pensez-vous que le monde après-confinement sera radicalement différent du monde avant-confinement?

lon: Pendant la période de confinement, vivez-vous…

tab_after_COVID_lon = table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$lon)
colnames(tab_after_COVID_lon) <- c("Non-seule","Seule")
rownames(tab_after_COVID_lon) <- c("Non","Oui","Pas avis")
tab_after_COVID_lon

##           
##            Non-seule Seule
##   Non            519    52
##   Oui            267    35
##   Pas avis       107    21

#La statistique, les degrés de liberté, la p-value ; conclusion ???
(chisq_a  <- chisq.test(tab_after_COVID_lon))

## 
##  Pearson's Chi-squared test
## 
## data:  tab_after_COVID_lon
## X-squared = 6.0758, df = 2, p-value = 0.04793

Bar plot

ggplot(db_econfin_TP_u) +  aes(x = lon, fill = after_COVID) +  geom_bar() +  scale_fill_hue() + theme_minimal()

Vérification des effectifs théoriques : le critère de Cochran est-il respecté ?

chisq_a$expected

##           
##            Non-seule    Seule
##   Non       509.3936 61.60639
##   Oui       269.4166 32.58342
##   Pas avis  114.1898 13.81019

The most contributing cells to the total Chi-square score

\[r\ = \frac{\ o-e}{\ \sqrt{e}}\]

round(chisq_a$residuals, 3)

##           
##            Non-seule  Seule
##   Non          0.426 -1.224
##   Oui         -0.147  0.423
##   Pas avis    -0.673  1.935

library(corrplot)
corrplot(chisq_a$residuals, is.cor = FALSE) #cl.pos = 'b')

Ce qui habitent seules tendance à dire moins non than expected et dire plus 2 (j’ai pas d’opinion)

room: Dans votre résidence de confinement, de combien de pièces disposez-vous?

tab_room_after = table(db_econfin_TP_u$after_COVID, db_econfin_TP_u$room)
(chisq_b  <- chisq.test(tab_room_after))

## 
##  Pearson's Chi-squared test
## 
## data:  tab_room_after
## X-squared = 2.2388, df = 4, p-value = 0.6919

Variable room fortement explicative dans l’ACM mais pas dans le chi2.

#tab_COVID_liv_env = table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$liv) # not significant
#chisq.test(table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$t_age)) # not significant 
#chisq.test(table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$chi))   # not significant 
#chisq.test(table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$fam))   # not significant

Link entre vision du futur et l’impact sur lenvironnment

(chisq_c = chisq.test(table(db_econfin_TP_u$ST_env,db_econfin_TP_u$after_COVID)))

## 
##  Pearson's Chi-squared test
## 
## data:  table(db_econfin_TP_u$ST_env, db_econfin_TP_u$after_COVID)
## X-squared = 4.6835, df = 2, p-value = 0.09616

db_econfin_TP_u %>% group_by(after_COVID) %>% count(ST_env) %>% mutate(per = n/sum(n)*100)

## # A tibble: 6 x 4
## # Groups:   after_COVID [3]
##   after_COVID ST_env     n   per
##   <fct>       <fct>  <int> <dbl>
## 1 0           0         77 13.5 
## 2 0           1        494 86.5 
## 3 1           0         26  8.61
## 4 1           1        276 91.4 
## 5 2           0         17 13.3 
## 6 2           1        111 86.7

Des autres variables explicant la perception si la crise du Covid-19 aura un impact bénéfique sur l’environnement à court terme (Sur R codé comme ST_env)

Quel est votre statut familial ?

tab_st_fam = table(db_econfin_TP_u$ST_env,db_econfin_TP_u$fam)
colnames(tab_st_fam) <- c("Seule","Famille")
rownames(tab_st_fam) <- c("Plutôt non","Plutôt oui")

(chisq_d = chisq.test(tab_st_fam))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab_st_fam
## X-squared = 4.0817, df = 1, p-value = 0.04335

corrplot(chisq_d$residuals, is.cor = FALSE)

Il y a des differencesdans le “non”, ceux/celles qui habitent en familles plus de tendance à dire: plutôt non”

Pendant la période de confinement, vivez-vous…

tab_st_liv_env = table(db_econfin_TP_u$ST_env,db_econfin_TP_u$liv)
colnames(tab_st_liv_env) <- c("En zone rurale","En zone périurbaine","En zone urbaine")
rownames(tab_st_liv_env) <- c("Plutôt non","Plutôt oui")

(chisq_e  <- chisq.test(tab_st_liv_env))

## 
##  Pearson's Chi-squared test
## 
## data:  tab_st_liv_env
## X-squared = 9.0172, df = 2, p-value = 0.01101

#round(chisq_e$residuals, 3)
corrplot(chisq_e$residuals, is.cor = FALSE)

Ceux et celles qui habitent en zones urbaines plus de tendance à dire: Plutôt non

chi: Combien avez-vous d’enfants?

tab_st_chi = table(db_econfin_TP_u$ST_env,db_econfin_TP_u$chi)
rownames(tab_st_chi) <- c("Plutôt non","Plutôt oui")

(chisq_f= chisq.test(tab_st_chi))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab_st_chi
## X-squared = 8.6959, df = 1, p-value = 0.003189

corrplot(chisq_f$residuals, is.cor = FALSE)

#round(chisq_f$residuals, 3)

ggplot(db_econfin_TP_u) +
 aes(x = chi, fill = ST_env) +
 geom_bar() +  scale_fill_hue() + theme_minimal()

db_econfin_TP_u %>% group_by(chi) %>% count(ST_env) %>% mutate(per = n/sum(n)*100)

## # A tibble: 4 x 4
## # Groups:   chi [2]
##   chi   ST_env     n   per
##   <fct> <fct>  <int> <dbl>
## 1 0     0         76  10.2
## 2 0     1        672  89.8
## 3 1     0         44  17.4
## 4 1     1        209  82.6

#test1 = db_econfin_TP_u %>% group_by(chi,ST_env) %>% dplyr::count() 
#transform(test1,perc = ave(n,chi,FUN = prop.table))

Ce qui ont des enfants plus de tendance à dire: plutôt non

Des autres variables comme le sex.

#chisq.test(table(db_econfin_TP_u$ST_env,db_econfin_TP_u$t_age)) # not significant
#chisq.test(table(db_econfin_TP_u$ST_env,db_econfin_TP_u$room)) # no
#chisq.test(table(db_econfin_TP_u$ST_env,db_econfin_TP_u$act))  # no
#chisq.test(table(db_econfin_TP_u$ST_env,db_econfin_TP_u$ES))  # no

tab_st_sex =  table(db_econfin_TP_u$ST_env,db_econfin_TP_u$sex) # yes but no so important in ACM
rownames(tab_st_sex) <- c("Plutôt non","Plutôt oui")
colnames(tab_st_sex) <- c("Masculin","Féminin")

(chisq_g <- chisq.test(tab_st_sex))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab_st_sex
## X-squared = 4.1679, df = 1, p-value = 0.0412

#round(chisq_g$residuals, 3)
corrplot(chisq_g$residuals, is.cor = FALSE)

Les Hommes disent Plutôt non

Sur l’environnement à long-terme?

Au moins le chi2 ne montre pas grand choses.

E.g.

chisq.test(table(db_econfin_TP_u$LT_env,db_econfin_TP_u$chi))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(db_econfin_TP_u$LT_env, db_econfin_TP_u$chi)
## X-squared = 2.6224, df = 1, p-value = 0.1054

#chisq.test(table(db_econfin_TP_u$after_COVID,db_econfin_TP_u$chi))

Comment analyser une variable quantitative en rapport à une variable qualitative à deux modalités.

Les test non parametric

Mann-Whitney-Wilcoxon

Rappel: Ho et H1 statistiques

boxplot(day ~ ST_env, data = db_econfin_TP_u)

wilcox.test(day ~ ST_env, data=db_econfin_TP_u) # diff

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  day by ST_env
## W = 45365, p-value = 0.01105
## alternative hypothesis: true location shift is not equal to 0

Rapport positive entre jour et 1:Plutôt oui

On peut enlever cela si on ne peut pas donner une explication

boxplot(age ~ ST_env, data = db_econfin_TP_u)

wilcox.test(age ~ ST_env, data=db_econfin_TP_u) # diff

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  age by ST_env
## W = 60031, p-value = 0.01566
## alternative hypothesis: true location shift is not equal to 0

Ceux et celles qui sont moins agées plus de tendance à dire: Plutôt oui