Test The Multinomial Distribution

Fair Dice Example
Goodness-of-fit tests for nominal variables example
Multinomial test example with plot and confidence intervals

Fair Dice Example

As the p-value need to be greater than the .05 significance level so we do not reject the null hypothesis that the sample data observed supports the prob statistics.

library(EMT)
library(DescTools)
library(ggplot2)

## Fair Dice
pdice = 1/6
prob <- c(pdice, pdice, pdice, pdice, pdice, pdice) 
observed <- c(4, 5, 4, 7, 6, 5)              
out <- multinomial.test(observed, prob)

## 
##  Exact Multinomial Test, distance measure: p
## 
##     Events    pObs    p.value
##     376992   2e-04     0.9588

if (out$p.value > 0.05) print("YES! the sample data observed support the prob statistics") else print("NO! the sample data observed does not support the prob statistics")

## [1] "YES! the sample data observed support the prob statistics"

plotMultinom(out, showmax = 10000)

## Biased Dice
observed <- c(4, 5, 2, 7, 0, 1) 
out <- multinomial.test(observed, prob)

## 
##  Exact Multinomial Test, distance measure: p
## 
##     Events    pObs    p.value
##      42504       0     0.0357

if (out$p.value > 0.05) print("YES! the sample data observed support the prob statistics") else print("NO! the sample data observed does not support the prob statistics")

## [1] "NO! the sample data observed does not support the prob statistics"

plotMultinom(out, showmax = 10000)

Goodness-of-fit tests for nominal variables example

Goodness-of-fit tests are used to compare proportions of levels of a nominal variable to theoretical or expected proportions.

As part of a demographic survey of students in this environmental issues webinar series, Alucard recorded the race and ethnicity of his students. He wants to compare the data for his class to the demographic data for Cumberland County, New Jersey as a whole

Race Alucard’s_class County_proportion White 20 0.775 Black 9 0.132 American Indian 9 0.012 Asian 1 0.054 Pacific Islander 1 0.002 Two or more races 1 0.025 —————– — —— Total 41 1.000

Interpretation Significant results can be reported as “The proportions for the levels for the nominal variable were statistically different from the expected proportions.

observed = c(20, 9, 9, 1, 1, 1)
expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)
out <- multinomial.test(observed, expected)

## 
##  Exact Multinomial Test, distance measure: p
## 
##     Events    pObs    p.value
##    1370754       0          0

if (out$p.value > 0.05) print("YES! the sample data observed support the prob statistics") else print("NO! the sample data observed does not support the prob statistics")

## [1] "NO! the sample data observed does not support the prob statistics"

plotMultinom(out)

### A faster, approximate test by Monte Carlo simulation

observed = c(20, 9, 9, 1, 1, 1)
expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)
out <- multinomial.test(observed, expected, MonteCarlo = TRUE)

##  
##  WARNING: Number of simulated withdrawels is lower than the number of possible outcomes. 
##                 This might yield unreliable results!
## 
## 
##  Monte Carlo Multinomial Test, distance measure: f
## 
##     Events    fObs    p.value
##    1370754       0          0

if (out$p.value > 0.05) print("YES! the sample data observed support the prob statistics") else print("NO! the sample data observed does not support the prob statistics")

## [1] "NO! the sample data observed does not support the prob statistics"

plotMultinom(out)

Multinomial test example with plot and confidence intervals

Walking to the store, Jerry Coyne observed the colors of nail polish on women’s toes (Coyne, 2016). Presumably because that’s the kind of thing retired professors are apt to do. He concluded that red was a more popular color but didn’t do any statistical analysis to support his conclusion.

Color of polish Count Red 19 None or clear 3 White 1 Green 1 Purple 2 Blue 2

We will use a multinomial goodness of fit test to determine if there is an overall difference in the proportion of colors (multinomial.test function in the EMT package). The confidence intervals for each proportion can be found with the MultinomCI function in the DescTools package. The data then needs to be manipulated some so that we can plot the data as counts and not proportions.

nail.color = c("Red", "None", "White", "Green", "Purple", "Blue")
observed   = c( 19,    3,      1,       1,       2,        2    )
expected   = c( 1/6,   1/6,    1/6,     1/6,     1/6,      1/6  )
out <- multinomial.test(observed, expected, MonteCarlo = TRUE)

##  
##  WARNING: Number of simulated withdrawels is lower than the number of possible outcomes. 
##                 This might yield unreliable results!
## 
## 
##  Monte Carlo Multinomial Test, distance measure: f
## 
##     Events    fObs    p.value
##     237336       0          0

if (out$p.value > 0.05) print("YES! the sample data observed support the prob statistics") else print("NO! the sample data observed does not support the prob statistics")

## [1] "NO! the sample data observed does not support the prob statistics"

plotMultinom(out)

MCI = MultinomCI(observed, conf.level=0.95,method="sisonglaz")

MCI

##             est    lwr.ci    upr.ci
## [1,] 0.67857143 0.5357143 0.8423162
## [2,] 0.10714286 0.0000000 0.2708876
## [3,] 0.03571429 0.0000000 0.1994590
## [4,] 0.03571429 0.0000000 0.1994590
## [5,] 0.07142857 0.0000000 0.2351733
## [6,] 0.07142857 0.0000000 0.2351733

### Order the levels, otherwise R will alphabetize them

Nail.color = factor(nail.color,levels=unique(nail.color))      


### For plot, Create variables of counts, and then wrap them into a data frame

Total = sum(observed)
Count = observed
Lower = MCI[,'lwr.ci'] * Total
Upper = MCI[,'upr.ci'] * Total
Data = data.frame(Count, Lower, Upper)
Data

##   Count Lower     Upper
## 1    19    15 23.584853
## 2     3     0  7.584853
## 3     1     0  5.584853
## 4     1     0  5.584853
## 5     2     0  6.584853
## 6     2     0  6.584853

ggplot(Data,aes(x     = Nail.color, y     = Count)) +geom_bar(stat = "identity",color = "black",fill  = "gray50", width =  0.7) +
geom_errorbar(aes(ymin  = Lower,ymax  = Upper),width = 0.2, size  = 0.7,color = "black") +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +
ylab("Count of observations") +
xlab("Nail color")

Test The Multinomial Distribution

Janpu Hou

August 17, 2017

Fair Dice Example

Goodness-of-fit tests for nominal variables example

Multinomial test example with plot and confidence intervals