Orinal Data / Counts /Exam SSP 2024 question 1

ORDINAL DATA TABLE OF CONTINGENCY

On the question 1 of the exam they show you a study on Lickert scale rating . This lickert scale althought integer reflect the levels of the factor of itemx.. Then cross tabulation (table de contingence) enables you to counts the pairings of correcponding response to all possible level of the factors (oridnal data in this case: “moyen” “faible”…). Each boxe is a joint distribution of that factor that is item1 and item 2. Marginal counts is not provided but easily calculated.

The problem arise how to start Should I construct the two way table?

Well this answer depdends on the (research Question) and I acknowledged that in R is not not straightforward with functions apply to factor . Let see it in example (intentional thought):

Reconstruct the table

Severals ways are possibles and are shown with the coding below: A TWO-WAY table (item1 crossed item2) with 4 levels factors:

Lets start with a dataframe:

Note: Table is a very old function working with matrix (Splus)

For data frame xtabs perform well better (see ?xtabs)

MYTAB=data.frame(expand.grid(item1=c("faible", "moy", "for", "tresfort"), 
    item2=c("faible", "moy", "for", "tresfort")),
    count=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
##count is taken by simplicity of coding
str(MYTAB)   ##Count is an num!!!

## 'data.frame':    16 obs. of  3 variables:
##  $ item1: Factor w/ 4 levels "faible","moy",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ item2: Factor w/ 4 levels "faible","moy",..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ count: num  1 2 3 4 5 6 7 8 9 10 ...

MYTAB

##       item1    item2 count
## 1    faible   faible     1
## 2       moy   faible     2
## 3       for   faible     3
## 4  tresfort   faible     4
## 5    faible      moy     5
## 6       moy      moy     6
## 7       for      moy     7
## 8  tresfort      moy     8
## 9    faible      for     9
## 10      moy      for    10
## 11      for      for    11
## 12 tresfort      for    12
## 13   faible tresfort    13
## 14      moy tresfort    14
## 15      for tresfort    15
## 16 tresfort tresfort    16

mytable=xtabs(MYTAB$count~MYTAB$item1+MYTAB$item2)     
mytable

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort
##    faible        1   5   9       13
##    moy           2   6  10       14
##    for           3   7  11       15
##    tresfort      4   8  12       16

#lets check if all factors are crossed with all possible paired levels:

table(MYTAB$item1,MYTAB$item2)##make check of all combinaisons 2 Here 12 pairs +4 own pairing make 16 boxes

##           
##            faible moy for tresfort
##   faible        1   1   1        1
##   moy           1   1   1        1
##   for           1   1   1        1
##   tresfort      1   1   1        1

##Add total margin

addmargins(mytable)#a 136 total of paired answers (item1 & item2)

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort Sum
##    faible        1   5   9       13  28
##    moy           2   6  10       14  32
##    for           3   7  11       15  36
##    tresfort      4   8  12       16  40
##    Sum          10  26  42       58 136

#that is why it s usuallly shown in maths as a double summation ∑∑ij(rowswise i then columnwise j)

addmargins(prop.table(mytable))#a proportions tables makeing a total of 100%

##            MYTAB$item2
## MYTAB$item1      faible         moy         for    tresfort         Sum
##    faible   0.007352941 0.036764706 0.066176471 0.095588235 0.205882353
##    moy      0.014705882 0.044117647 0.073529412 0.102941176 0.235294118
##    for      0.022058824 0.051470588 0.080882353 0.110294118 0.264705882
##    tresfort 0.029411765 0.058823529 0.088235294 0.117647059 0.294117647
##    Sum      0.073529412 0.191176471 0.308823529 0.426470588 1.000000000

#Attention that there could be other type of total tha is called a conditional magins (not explained here to avoid confusion)

Avant=factor(rep(c("oui","non"),c(20,30)))
    
Apres=factor(rep(c("oui","non"),c(30,20)))
    
Ma=data.frame(Avant,Apres)
str(Ma)

## 'data.frame':    50 obs. of  2 variables:
##  $ Avant: Factor w/ 2 levels "non","oui": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Apres: Factor w/ 2 levels "non","oui": 2 2 2 2 2 2 2 2 2 2 ...

table(Ma)

##      Apres
## Avant non oui
##   non  20  10
##   oui   0  20

##ATTENTION WRONG CODING

prop.table(addmargins(mytable))

##            MYTAB$item2
## MYTAB$item1      faible         moy         for    tresfort         Sum
##    faible   0.001838235 0.009191176 0.016544118 0.023897059 0.051470588
##    moy      0.003676471 0.011029412 0.018382353 0.025735294 0.058823529
##    for      0.005514706 0.012867647 0.020220588 0.027573529 0.066176471
##    tresfort 0.007352941 0.014705882 0.022058824 0.029411765 0.073529412
##    Sum      0.018382353 0.047794118 0.077205882 0.106617647 0.250000000

#the sum is taken as o won levels of factor and calculations is WRONG!!!

PLOTTING

library(waffle)

## Warning: le package 'waffle' a été compilé avec la version R 4.2.3

## Le chargement a nécessité le package : ggplot2

## Warning: le package 'ggplot2' a été compilé avec la version R 4.2.2

mytable

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort
##    faible        1   5   9       13
##    moy           2   6  10       14
##    for           3   7  11       15
##    tresfort      4   8  12       16

mosaicplot(mytable,col=rainbow(4))#the correct way of visualizing a two way table > 2 levels factors

barplot(mytable)###WRONG COMMAND!!! It takes the count here

#we will see only possible with a vecor of item!

A XTable via a MATRIX

###vias matrix
mytable

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort
##    faible        1   5   9       13
##    moy           2   6  10       14
##    for           3   7  11       15
##    tresfort      4   8  12       16

c(1,5,9,13,2,6,10,14,3,7,11,15,4,8,12,16)

##  [1]  1  5  9 13  2  6 10 14  3  7 11 15  4  8 12 16

#use rbind of cbind
tab <- as.table(rbind(c(1,5,9,13),c(2,6,10,14),c(3,7,11,15),c(4,8,12,16)))
tab

##    A  B  C  D
## A  1  5  9 13
## B  2  6 10 14
## C  3  7 11 15
## D  4  8 12 16

dimnames(tab) <- list(item1 = c("faible", "moy", "for", "tresfort"), 
                      item2  = c("faible", "moy", "for", "tresfort"))
tab#the

##           item2
## item1      faible moy for tresfort
##   faible        1   5   9       13
##   moy           2   6  10       14
##   for           3   7  11       15
##   tresfort      4   8  12       16

mytable

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort
##    faible        1   5   9       13
##    moy           2   6  10       14
##    for           3   7  11       15
##    tresfort      4   8  12       16

#reproduce correct table OK

You want item1 on top :

t(mytable)#transpose a table:Note transpose is not the same meaning as transpose a Matrix in maths

##            MYTAB$item1
## MYTAB$item2 faible moy for tresfort
##    faible        1   2   3        4
##    moy           5   6   7        8
##    for           9  10  11       12
##    tresfort     13  14  15       16

mytable

##            MYTAB$item2
## MYTAB$item1 faible moy for tresfort
##    faible        1   5   9       13
##    moy           2   6  10       14
##    for           3   7  11       15
##    tresfort      4   8  12       16

WHAT IS A QUANTILE FOR AN ORDINAL DATA

Wee quantile is a position measure apply to ranked variables.

On a dataframe it works easily on factors but on what we coded is a bit trickier:

quantile(mtcars$cyl)

##   0%  25%  50%  75% 100% 
##    4    4    6    8    8

quantile(tab)

##    0%   25%   50%   75%  100% 
##  1.00  4.75  8.50 12.25 16.00

##wrong# 
##with ordinal data it quantile is the location of all N ids paired
##from the 136 totals
quantile(mytable)

##    0%   25%   50%   75%  100% 
##  1.00  4.75  8.50 12.25 16.00

#wrong again
quantile(MYTAB$count)

##    0%   25%   50%   75%  100% 
##  1.00  4.75  8.50 12.25 16.00

#again wrong

##the solution: construct a full vector then ordered it and take the position you want as sucH:
#item1 i.e
myvector1=rep(c("faible","moy","for","tresfort"),c(28   ,    32   ,    36    ,   40 ))
summary(myvector1)

##    Length     Class      Mode 
##       136 character character

myvector1=factor(myvector1)
sum(table(myvector1))

## [1] 136

q=quantile(order(myvector1))##quantile(myvector1)is wrong you need to sort or order
q#that is the position of a vector of lenght 136

##     0%    25%    50%    75%   100% 
##   1.00  34.75  68.50 102.25 136.00

#record the value position of the desired quantile Here 0.25 , 0.5 , 75%
myvector1[c(35,68,102)]#position of ordinal with 25 50 and 75 quantiles

## [1] moy      for      tresfort
## Levels: faible for moy tresfort

#re-check if it is correct moy for position 35
myvector1[35]

## [1] moy
## Levels: faible for moy tresfort

KAPPA SYMETRY TEST

AGREEMENT-RELIABILITY TESTING:

ALL SAME PRINCIPLES

NOTE: on your course they is a Kappas presented as a Covariance (Symetry) Kappa but they are some other procedures to calculate a Kappa (see T Ancelle,Youtube,Epidemiolgy) derived from Khi Deux

On two way table n₁₁ & n₂₂ are the diagonal of the matrix YEY-YES / NO-NO A perfect concordance or agreement between cross answers (VAR) doesent interest you now: Conversly for a treatment i.e (Crisis before after a drug) it is a YES/NO or NO/Yes who define the effect of a treatment (depends of course of the design of exp) they are the n₁₂ n₂₁ that are of interest (COV) for this asymmetry the difference between these 2 values gives who her extent.

library(vcd)

## Le chargement a nécessité le package : grid

Kappa(mytable)

##               value     ASE       z Pr(>|z|)
## Unweighted -0.02361 0.04795 -0.4924   0.6224
## Weighted   -0.04938 0.05494 -0.8988   0.3688

K=Kappa(tab)
summary(K)

##               value     ASE       z Pr(>|z|)
## Unweighted -0.02361 0.04795 -0.4924   0.6224
## Weighted   -0.04938 0.05494 -0.8988   0.3688
## 
## Weights:
##           [,1]      [,2]      [,3]      [,4]
## [1,] 1.0000000 0.6666667 0.3333333 0.0000000
## [2,] 0.6666667 1.0000000 0.6666667 0.3333333
## [3,] 0.3333333 0.6666667 1.0000000 0.6666667
## [4,] 0.0000000 0.3333333 0.6666667 1.0000000