PBA Extra Credits

Opening Libraries

require(dplyr)
require(knitr)
require(pollster)
require(kableExtra)

Question 1

setwd("/Users/olivia/Documents/Documents/Study/Semester 5/PBA")
judges <- read.csv(file = 'judges.csv')
temp<-xtabs(~woman+republican,judges)
rownames(temp)<-c("Men","Woman")
colnames(temp)<-c("Democrat","Republican")
temp

##        republican
## woman   Democrat Republican
##   Men         82        124
##   Woman       27         11

gender_rep_table<-temp/sum(temp);
rownames(gender_rep_table)<-c("Men","Woman")
colnames(gender_rep_table)<-c("Democrat","Republican")
gender_rep_table

##        republican
## woman     Democrat Republican
##   Men   0.33606557 0.50819672
##   Woman 0.11065574 0.04508197

1. How many judges are in this data set?

cat("Sum of Judges:",sum(temp))

## Sum of Judges: 244

2. What proportion of the judges are men?

cat("Men Proportion:",round(sum(gender_rep_table[1,])*100,3),"%")

## Men Proportion: 84.426 %

3. Is the party composition different for male and female judges?

Yes it is different, we can see from the table below.

	Democrat	Republican
Men	82	124
Woman	27	11

	Democrat	Republican
Men	0.3360656	0.5081967
Woman	0.1106557	0.0450820

Question 2

Histogram of progressive_vote

hist(judges$progressive_vote,main = "Histogram of progressive_vote",breaks = 10,col="lightcoral")

mean(judges$progressive_vote)

## [1] 0.4351555

median(judges$progressive_vote)

## [1] 0.422619

The histogram is more similar to a right-skewed histogram. As a supporting information I have provided the mean and median.

It shows that the median is more left rather than the mean which is the same as the characteristic of a right-skewed graph.

mean(judges$progressive_vote)

## [1] 0.4351555

median(judges$progressive_vote)

## [1] 0.422619

Where is the region of highest density of this variable?

The highest density is in between 0.25 and 0.50

Question 3

Create a new factor variable called judges$gender_party

“F_Demo” for female judges appointed by Democratic presidents.
“F_Repub” for female judges appointed by Republican presidents.
“M_Demo” for male judges appointed by Democratic presidents.
“M_Repub” for male judges appointed by Republican presidents.

judges$gender_party<-case_when(judges$woman==1 & judges$republican==0 ~ "F_Demo",
                               judges$woman==1 & judges$republican==1 ~ "F_Repub",
                               judges$woman==0 & judges$republican==0 ~ "M_Demo",
                               judges$woman==0 & judges$republican==1 ~ "M_Repub")

Use tapply() to calculate the mean of progressive_vote in each of these groups and store this vector as gender_party_means.

gender_party_means<-tapply(judges$progressive_vote,judges$gender_party,mean)
gender_party_means

##    F_Demo   F_Repub    M_Demo   M_Repub 
## 0.4547162 0.3069867 0.5062359 0.3952614

Plot these means using a barplot

barplot(height = gender_party_means,col="lightcoral")

Does anything stand out to you?

Based on the bar plot we can see that both female and male democratic have a higher proportion on the progressive_vote
Based on the party association the democrat have a higher proportion the progressive_vote
Based on the gender we can see that Male who votes on republican or democratic have higher proportion on the progressive_vote than Woman

Question 4

Create a variable called judges$any_girls that is 1 when the judge has at least 1 girl and 0 otherwise.

judges$any_girls<-if_else(judges$girls<1|is.na(judges$girls)==TRUE,0,1)
head(judges)

##                    name child circuit girls progressive_vote race religion
## 1    Alarcon, Arthur L.     3       9     1        0.0000000    3        4
## 2     Aldisert, Ruggero     3       3     1        0.6666667    1        4
## 3       Aldrich, Bailey     2       1     0        0.3333333    1        1
## 4 Alito, Samuel A., Jr.     2       3     1        0.5000000    1        4
## 5    Altimari, Frank X.     4       2     1        0.5000000    1        4
## 6         Ambro, Thomas    NA       3    NA        0.0000000   NA       NA
##   republican sons woman yearb gender_party any_girls
## 1          0    2     0  1925       M_Demo         1
## 2          0    2     0  1919       M_Demo         1
## 3          1    2     0  1907      M_Repub         0
## 4          1    1     0  1950      M_Repub         1
## 5          1    3     0  1928      M_Repub         1
## 6          0   NA     0    NA       M_Demo         0

Create a subset of the data called parents that contains judges that have at least one child.

parents<- subset(judges,child>=1)
head(parents)

##                       name child circuit girls progressive_vote race religion
## 1       Alarcon, Arthur L.     3       9     1        0.0000000    3        4
## 2        Aldisert, Ruggero     3       3     1        0.6666667    1        4
## 3          Aldrich, Bailey     2       1     0        0.3333333    1        1
## 4    Alito, Samuel A., Jr.     2       3     1        0.5000000    1        4
## 5       Altimari, Frank X.     4       2     1        0.5000000    1        4
## 7 Anderson, R. Lanier, III     3       5    NA        0.2500000    1        2
##   republican sons woman yearb gender_party any_girls
## 1          0    2     0  1925       M_Demo         1
## 2          0    2     0  1919       M_Demo         1
## 3          1    2     0  1907      M_Repub         0
## 4          1    1     0  1950      M_Repub         1
## 5          1    3     0  1928      M_Repub         1
## 7          0   NA     0  1936       M_Demo         0

Create an object called ATE that is the difference in means of progressive_vote between judges that have at least one girl versus those that have no girls among those judges with any children.

ATE <- tapply(parents$progressive_vote, parents$any_girls, mean)
ATE

##         0         1 
## 0.3976401 0.4518024

ATE<-ATE[2]-ATE[1]
cat("Difference : ",ATE)

## Difference :  0.05416231

Can we interpret the result causally (i.e., can we safely say that the difference, if there is any, is caused by the judges having a daughter)?

No, we cannot interpret the result by just seeing how many daugther the judges have can influence the decision since there are many other factors that can influence someone’s decision. We can also see the difference is so small (0.05…) so we cannot interpret just like that.