Do the Shiny app for the two-way table on sex and love_first.
How many times did you re-sample?
30000
What percentage of the time did the re sampled chi-square statistic exceed the chi-square statistic in the actual study?
13.54
Do you think there is overwhelming evidence that in the GC population the two sexes differ in whether they believe in love at first sight?
No I do not
Here’s the code for a chi-square test to see if sex and belief in love at first sight are related in the GC population. Run the code:
chisqtestGC(~sex+love_first,data=m111survey,
graph=TRUE)
## Pearson's Chi-squared test with Yates' continuity correction
##
## Observed Counts:
## love_first
## sex no yes
## female 22 18
## male 23 8
##
## Counts Expected by Null:
## love_first
## sex no yes
## female 25.35 14.65
## male 19.65 11.35
##
## Contributions to the chi-square statistic:
## love_first
## sex no yes
## female 0.44 0.77
## male 0.57 0.99
##
##
## Chi-Square Statistic = 2.0068
## Degrees of Freedom of the table = 1
## P-Value = 0.1566
Now look at the output and answer these question:
What’s the test statistic?
2.0068
About how big should it be if the Null is correct?
1
What’s the P-value?
0.1566
Are race and gun owndership related in the U.S. population? In the code chunk below, insert the code needed to use chisqtestGC() to investigate this question. Tip: copy-paste and then modify the code from the previous problem.
chisqtestGC(~race+owngun,data=gss02,
graph=TRUE)
## Pearson's Chi-squared test
##
## Observed Counts:
## owngun
## race No Yes
## AfrAm 106 16
## Hispanic 20 3
## Other 25 7
## White 454 284
##
## Counts Expected by Null:
## owngun
## race No Yes
## AfrAm 80.67 41.33
## Hispanic 15.21 7.79
## Other 21.16 10.84
## White 487.97 250.03
##
## Contributions to the chi-square statistic:
## owngun
## race No Yes
## AfrAm 7.96 15.53
## Hispanic 1.51 2.95
## Other 0.70 1.36
## White 2.36 4.61
##
##
## Chi-Square Statistic = 36.9779
## Degrees of Freedom of the table = 3
## P-Value = 0
Looking at the output, answer the following questions.
What’s the test statistic?
36.9779
About how big should it be if the Null is correct?
3
What’s the P-value?
0
Do you think we have strong evidence for a relationship in the population, or could the pattern in the data be due just to chance?
just due to chance
Are degree and belief on marajuana related in the U.S?
chisqtestGC(~degree+marijuan,data=gss02,
graph=TRUE)
## Pearson's Chi-squared test
##
## Observed Counts:
## marijuan
## degree Legal NotLegal
## Bachelor 52 96
## Graduate 35 44
## HighSchool 157 291
## JunColl 22 34
## NotHs 40 80
##
## Counts Expected by Null:
## marijuan
## degree Legal NotLegal
## Bachelor 53.22 94.78
## Graduate 28.41 50.59
## HighSchool 161.09 286.91
## JunColl 20.14 35.86
## NotHs 43.15 76.85
##
## Contributions to the chi-square statistic:
## marijuan
## degree Legal NotLegal
## Bachelor 0.03 0.02
## Graduate 1.53 0.86
## HighSchool 0.10 0.06
## JunColl 0.17 0.10
## NotHs 0.23 0.13
##
##
## Chi-Square Statistic = 3.2236
## Degrees of Freedom of the table = 4
## P-Value = 0.5211
xtabs(~degree+marijuan,data=gss02)
## marijuan
## degree Legal NotLegal
## Bachelor 52 96
## Graduate 35 44
## HighSchool 157 291
## JunColl 22 34
## NotHs 40 80
degreemari<-xtabs(~degree+marijuan,data=gss02)
rowPerc(degreemari)
## marijuan
## degree Legal NotLegal Total
## Bachelor 35.14 64.86 100.00
## Graduate 44.30 55.70 100.00
## HighSchool 35.04 64.96 100.00
## JunColl 39.29 60.71 100.00
## NotHs 33.33 66.67 100.00
Mostly there is no relation juts overall over half of all degrees say it should be illegal.
Try simulation on the sex and seating-preference study:
chisqtestGC(~sex+seat,data=m111survey,
simulate.p.value="random",
B=3000)
Now try it again, without simulation:
chisqtestGC(~sex+seat,data=m111survey)
Compare the P-values: are they about the same, or very different? about the same
Let’s learn about a new data frame (its form the mosaicData package):
View(CPS85)
help(CPS85)
Say that we wnat to know: Who makes more money, on average: a male or a female? Males
In the code chunk below, write some code that with favstats that will help you answer this question.
favstats( wage ~ sex, data =CPS85)
Complete the chunk below to get a density plot to answer the same question, graphically:
densityplot( ~wage|sex, data =CPS85,
main = "Who Makes more, Guy or Gal?",
xlab = "Wages")
Who seems to make more?
Male
Before we conclude that there is wage discrimination on the basis of sex we should think about possible confounding factors.
Maybe work-setor is a confounding factor. If men and women differ in what ype of work they choose, and men tend to choose higher-paying types of work, then maybe that’s why their wages are higher?
In the code chunk below, produce some numerical output to help see whether men and women choose different ypes of work.
favstats(sector~sex,data=CPS85)
## Warning in (function (x, ..., na.rm = TRUE) : Auto-converting factor to
## numeric.
## Warning in (function (x, ..., na.rm = TRUE) : Auto-converting factor to
## numeric.
## sex min Q1 median Q3 max mean sd n missing
## 1 F 1 1 5 7 8 4.440816 2.722396 245 0
## 2 M 1 3 5 6 8 4.795848 1.978140 289 0
In the code chunk below, produce some graphical output to help see whether men and women choose different ypes of work.
barchartGC(~sector|sex,data=CPS85,
main="Do people choose differnt jobs",
xLab="Sector")
In the code-chunk below, run a chi-square test to see if the relationship you see in the data provides storng evidence that sex and sector are related in the U.S. population.
chisqtestGC(~sex+sector,data=CPS85)
## Pearson's Chi-squared test
##
## Observed Counts:
## sector
## sex clerical const manag manuf other prof sales service
## F 76 0 21 24 6 52 17 49
## M 21 20 34 44 62 53 21 34
##
## Counts Expected by Null:
## sector
## sex clerical const manag manuf other prof sales service
## F 44.5 9.18 25.23 31.2 31.2 48.17 17.43 38.08
## M 52.5 10.82 29.77 36.8 36.8 56.83 20.57 44.92
##
## Contributions to the chi-square statistic:
## sector
## sex clerical const manag manuf other prof sales service
## F 22.29 9.18 0.71 1.66 20.35 0.30 0.01 3.13
## M 18.90 7.78 0.60 1.41 17.25 0.26 0.01 2.65
##
##
## Chi-Square Statistic = 106.4973
## Degrees of Freedom of the table = 7
## P-Value = 0
In the code chunk below, produce some numerical output to help see whether wages vary by work sector.
favstats(wage~sector,data=CPS85)
## sector min Q1 median Q3 max mean sd n missing
## 1 clerical 3.00 5.2000 7.500 9.5000 15.03 7.422577 2.699018 97 0
## 2 const 3.75 7.2250 9.750 11.6275 15.00 9.502000 3.343877 20 0
## 3 manag 1.00 7.2500 10.620 16.3950 44.50 12.704000 7.572513 55 0
## 4 manuf 3.00 4.9250 6.750 9.8725 22.20 8.036029 4.117607 68 0
## 5 other 2.85 5.0000 6.940 10.8150 26.00 8.500588 4.601049 68 0
## 6 prof 4.35 7.5000 10.610 15.3800 24.98 11.947429 5.523833 105 0
## 7 sales 3.35 4.3125 5.725 10.8325 19.98 7.592632 4.232272 38 0
## 8 service 1.75 3.9650 5.500 8.0000 25.00 6.537470 3.673278 83 0
In the code chunk below, produce some graphical output to help see whether wages vary by work sector.
densityplot(~wage|sector,data=CPS85,
main="Wages By Work Sector",
xlab="Wages")
Run the code below: what does it tell you? This tells me the wages made by male/female in certain sectors.
densityplot(~wage|sector*sex, data =CPS85)
The following code gives the mean salary for each sex in each work sector. What does it tell you? It tells me that males make on average more than females.
with(CPS85, tapply(wage, INDEX = list(sex,sector), FUN = mean))
## clerical const manag manuf other prof sales service
## F 7.404211 NA 11.05619 5.713750 5.801667 11.10500 5.241765 6.059388
## M 7.489048 9.502 13.72176 9.302727 8.761774 12.77396 9.495714 7.226471
Does it appear that the overall difference between men and women can be explained by the fact that they choose different sectors of work? Why or why not?
I believe that this is the top reason. However even in the same jobs males earn a little bit more.