####Table of Contents

####Deliverable Choose variables to analyze and submit at least one visualization and description on canvas. Use the step-by-step example on page 31 of your textbook as an example. You should include a table and either a bar plot or a pie chart with an accompanying description.

The following document will walk through an example analysis of the variables gender and years worked in education, but you are encouraged to explore different variables for your submission.

####Learning Objectives Here are some of the skills you should be familiar with in analyzing categorical data. This file will provide you with the technical resources to make these visualizations. While not a focus of this document, make sure you are able to interpret the meaning of your graphical displays as well. Use Chapter 3 as a resource.


####Resources Chapter 3 of your textbook

R markdown help

We will also be using the package ggplot2 to visualize our data. Here are resources for barplots ( Resource 1, Resource 2 ) and piecharts (Resource 1).

####The data

How is the data stored?

str(data_resilience)
## Classes 'tbl_df', 'tbl' and 'data.frame':    300 obs. of  34 variables:
##  $ Timestamp                                            : POSIXct, format: "2019-08-24 09:56:13" "2019-08-24 14:09:08" ...
##  $ Please indicate your age range.                      : chr  "45 - 49" "35 - 39" "50 - 54" "50 - 54" ...
##  $ Please indicate your gender.                         : chr  "Female" "Female" "Female" "Female" ...
##  $ Please indicate your race.                           : chr  "White" "White" "White" "White" ...
##  $ How many years have you been working in education?   : chr  "25 - 29 years" "5 - 9 years" "25 - 29 years" "20 - 24 years" ...
##  $ Which option best describes your role in your school?: chr  "Administrator" "Administrator" "Administrator" "Administrator" ...
##  $ In what division do you work?                        : chr  "Upper School" "Upper School" "Upper School" "Upper School" ...
##  $ In what state is your school located?                : chr  "DE" "AZ" "MS" "NY" ...
##  $ How would you describe your school?                  : num  2 3 3 3 4 4 3 4 4 3 ...
##  $ How much autonomy do you have in your job?           : num  3 4 4 4 4 4 4 5 4 3 ...
##  $ Know Yourself                                        : num  5 4 4 4 5 5 4 4 5 5 ...
##  $ Understand Emotions                                  : num  5 4 4 4 5 4 4 3 5 4 ...
##  $ Tell Empowering Stories                              : num  4 3 3 3 4 4 4 4 5 4 ...
##  $ Build Community                                      : num  4 3 5 4 3 5 3 4 5 3 ...
##  $ Be Here Now                                          : num  3 2 4 3 3 5 4 3 5 4 ...
##  $ Take Care of Yourself                                : num  3 3 2 5 4 2 3 3 4 5 ...
##  $ Focus on the Bright Spots                            : num  3 3 4 3 2 3 4 4 4 5 ...
##  $ Cultivate Compassion                                 : num  3 4 4 4 2 5 3 4 5 4 ...
##  $ Be a Learner                                         : num  3 4 4 3 3 5 4 4 4 4 ...
##  $ Play and Create                                      : num  3 3 5 3 3 5 4 4 5 5 ...
##  $ Ride the Waves of Change                             : num  4 3 4 4 4 5 4 5 4 4 ...
##  $ Celebrate and Appreciate                             : num  4 3 5 5 4 4 4 5 5 4 ...
##  $ Purposefulness                                       : num  5 4 4 4 4 4 4 4 5 4 ...
##  $ Acceptance                                           : num  4 3 3 4 5 3 4 5 5 4 ...
##  $ Optimism                                             : num  4 3 4 4 3 4 4 4 5 4 ...
##  $ Empathy                                              : num  3 5 4 4 5 5 4 4 5 5 ...
##  $ Humor                                                : num  4 5 4 2 4 5 4 5 4 5 ...
##  $ Positive Self-Perception                             : num  2 3 4 4 5 3 3 4 4 3 ...
##  $ Empowerment                                          : num  3 4 5 4 5 5 4 4 4 4 ...
##  $ Perspective                                          : num  3 5 4 5 5 5 4 4 4 4 ...
##  $ Curiosity                                            : num  4 4 4 3 4 5 4 5 4 4 ...
##  $ Courage                                              : num  4 4 4 4 5 4 4 5 4 3 ...
##  $ Perseverance                                         : num  4 5 4 4 5 5 3 5 4 4 ...
##  $ Trust                                                : num  3 3 4 4 5 5 3 4 5 3 ...

Because all of our data is categorial (even those variables that have numbers for each observation), we need to change all of the variables to “factors” in our dataset.

data_resilience[]<-lapply(data_resilience, factor) 

####1. Tables

Let’s start by looking at how our variables are distributed across different categories. We can organize these counts into tables, which records the totals or percentages and the category names.

Here we will look at how the variable of gender is distributed. We will look at both a table of counts or frequencies, and a table of proportions, or relative frequencies.

#frequency table (counts)
table_state<-table(data_resilience$`In what state is your school located?`)
table_state
## 
##  AZ  CA  CO  DE  FL  KY  MA  MS  NJ  NY  SC  TN  TX 
##  64 135   1   1   1  32   3   1   2   1   1  50   1
#relative frequencey table (propoertions) 
table_state_rel<-prop.table(table_state)
table_state_rel
## 
##          AZ          CA          CO          DE          FL          KY 
## 0.218430034 0.460750853 0.003412969 0.003412969 0.003412969 0.109215017 
##          MA          MS          NJ          NY          SC          TN 
## 0.010238908 0.003412969 0.006825939 0.003412969 0.003412969 0.170648464 
##          TX 
## 0.003412969
#marginal distribtuion
addmargins(table_state)
## 
##  AZ  CA  CO  DE  FL  KY  MA  MS  NJ  NY  SC  TN  TX Sum 
##  64 135   1   1   1  32   3   1   2   1   1  50   1 293
#frequency table (counts)
table_school<-table(data_resilience$`How would you describe your school?`)
table_school
## 
##          1          2          3 3.68013468          4          5 
##          5         33         79          1        115         65
#relative frequencey table (propoertions) 
table_school_rel<-prop.table(table_school)
table_school_rel
## 
##           1           2           3  3.68013468           4           5 
## 0.016778523 0.110738255 0.265100671 0.003355705 0.385906040 0.218120805
#marginal distribtuion
addmargins(table_school)
## 
##          1          2          3 3.68013468          4          5 
##          5         33         79          1        115         65 
##        Sum 
##        298

What are the differences between these tables?

Next we will create a contigency table comparing gender and years in education.

#contingency table
table_state_school<-table(data_resilience$`In what state is your school located?`, data_resilience$`How would you describe your school?`)
addmargins(table_state_school)
##      
##         1   2   3 3.68013468   4   5 Sum
##   AZ    1   4  15          0  34  10  64
##   CA    2   9  19          0  53  51 134
##   CO    0   1   0          0   0   0   1
##   DE    0   1   0          0   0   0   1
##   FL    0   1   0          0   0   0   1
##   KY    2   6  17          0   7   0  32
##   MA    0   0   0          0   3   0   3
##   MS    0   0   1          0   0   0   1
##   NJ    0   0   1          0   1   0   2
##   NY    0   0   1          0   0   0   1
##   SC    0   0   1          0   0   0   1
##   TN    0  11  22          0  13   4  50
##   TX    0   0   0          0   1   0   1
##   Sum   5  33  77          0 112  65 292

What is the difference between the following two tables?

addmargins(prop.table(table_state_school, margin=1))
##      
##                 1           2           3  3.68013468           4
##   AZ   0.01562500  0.06250000  0.23437500  0.00000000  0.53125000
##   CA   0.01492537  0.06716418  0.14179104  0.00000000  0.39552239
##   CO   0.00000000  1.00000000  0.00000000  0.00000000  0.00000000
##   DE   0.00000000  1.00000000  0.00000000  0.00000000  0.00000000
##   FL   0.00000000  1.00000000  0.00000000  0.00000000  0.00000000
##   KY   0.06250000  0.18750000  0.53125000  0.00000000  0.21875000
##   MA   0.00000000  0.00000000  0.00000000  0.00000000  1.00000000
##   MS   0.00000000  0.00000000  1.00000000  0.00000000  0.00000000
##   NJ   0.00000000  0.00000000  0.50000000  0.00000000  0.50000000
##   NY   0.00000000  0.00000000  1.00000000  0.00000000  0.00000000
##   SC   0.00000000  0.00000000  1.00000000  0.00000000  0.00000000
##   TN   0.00000000  0.22000000  0.44000000  0.00000000  0.26000000
##   TX   0.00000000  0.00000000  0.00000000  0.00000000  1.00000000
##   Sum  0.09305037  3.53716418  4.84741604  0.00000000  3.90552239
##      
##                 5         Sum
##   AZ   0.15625000  1.00000000
##   CA   0.38059701  1.00000000
##   CO   0.00000000  1.00000000
##   DE   0.00000000  1.00000000
##   FL   0.00000000  1.00000000
##   KY   0.00000000  1.00000000
##   MA   0.00000000  1.00000000
##   MS   0.00000000  1.00000000
##   NJ   0.00000000  1.00000000
##   NY   0.00000000  1.00000000
##   SC   0.00000000  1.00000000
##   TN   0.08000000  1.00000000
##   TX   0.00000000  1.00000000
##   Sum  0.61684701 13.00000000
addmargins(prop.table(table_state_school, margin=2))
##      
##                 1           2           3 3.68013468           4
##   AZ  0.200000000 0.121212121 0.194805195            0.303571429
##   CA  0.400000000 0.272727273 0.246753247            0.473214286
##   CO  0.000000000 0.030303030 0.000000000            0.000000000
##   DE  0.000000000 0.030303030 0.000000000            0.000000000
##   FL  0.000000000 0.030303030 0.000000000            0.000000000
##   KY  0.400000000 0.181818182 0.220779221            0.062500000
##   MA  0.000000000 0.000000000 0.000000000            0.026785714
##   MS  0.000000000 0.000000000 0.012987013            0.000000000
##   NJ  0.000000000 0.000000000 0.012987013            0.008928571
##   NY  0.000000000 0.000000000 0.012987013            0.000000000
##   SC  0.000000000 0.000000000 0.012987013            0.000000000
##   TN  0.000000000 0.333333333 0.285714286            0.116071429
##   TX  0.000000000 0.000000000 0.000000000            0.008928571
##   Sum 1.000000000 1.000000000 1.000000000            1.000000000
##      
##                 5 Sum
##   AZ  0.153846154    
##   CA  0.784615385    
##   CO  0.000000000    
##   DE  0.000000000    
##   FL  0.000000000    
##   KY  0.000000000    
##   MA  0.000000000    
##   MS  0.000000000    
##   NJ  0.000000000    
##   NY  0.000000000    
##   SC  0.000000000    
##   TN  0.061538462    
##   TX  0.000000000    
##   Sum 1.000000000

####2. Barplots

A bar chart displays the distribution of a categorical variable, showing the counts or proportions for each category next to each other for easy comparison.

Bar charts should have small spaces between the bars to indicate that these are freestanding bars that could be rearranged into any order. The bars should also be the same width, so their heights determine their areas, and the areas are proportional to the counts in each class. This convention will help you satisfy the “area principle”, which says that the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

Don’t violate the area principle. This is probably the most common mistake in a graphical display.

# Basic barplot
g <- ggplot(data_resilience, aes(data_resilience$`In what state is your school located?`))+ geom_bar()
g

# Horizontal bar plot
g + coord_flip()

#stacked bar plot (notice the fill argument that was added)
gy<-ggplot(data_resilience, aes(data_resilience$`In what state is your school located?`))+ geom_bar(aes(fill = data_resilience$`How would you describe your school?`))+theme(legend.position = "top")
gy

#Side-by-side bar chart (notice the position argument that was added)
gy_s<-ggplot(data_resilience, aes(data_resilience$`In what state is your school located?`))+ geom_bar(aes(fill = data_resilience$`How would you describe your school?`), position=position_dodge())+theme(legend.position = "top")
gy_s

#relative frequency bar chart (notice the y= argument)
gy_r<-ggplot(data_resilience, aes(data_resilience$`In what state is your school located?`))+ geom_bar(aes(y = (..count..)/sum(..count..), fill = data_resilience$`How would you describe your school?`))+theme(legend.position = "top") + ylab("Percent of Respondents")
gy_r


####3. Piecharts

Before you make a bar chart or a pie chart, always check the Categorical Data Condition: The data are counts or percentages of individuals in categories.

If you want to make a relative frequency bar chart or a pie chart, you’ll need to also make sure that the categories don’t overlap so that no individual is counted twice. If the categories do overlap, you can still make a bar chart, but the percentages won’t add up to 100%.

To make a pie chart, we will first store the frequency or contingency table as a dataframe, and make a pie chart based off of that table instead of the raw data itself.

#store table as dataframe
dftg1<-data.frame(table_state_rel)
dftg2<-data.frame(table_school_rel)


#pie chart of gender
ggplot(dftg1, aes(x="", y=Freq, fill=Var1)) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y", start=0) +theme_void()

ggplot(dftg2, aes(x="", y=Freq, fill=Var1)) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y", start=0) +theme_void()

#store table as dataframe
dftgy<-data.frame(prop.table(table_state_school, margin=1))

#pie chart of gender and years worked in education
ggplot(dftgy, aes(x="", y=Freq, fill=Var2)) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y", start=0) + theme_void()+facet_grid(facets=. ~ dftgy$Var1) + theme_void()

The first pie chart is the percent of participants from each state. The second pie chart is the percent of participants that answered the “progressiveness” of their school. This question was asked on a scale of 1-5, 1 being traditional, and 5 being progressive. The last grouping of pie charts is a visualization of each state and the percent of participants in each state that answered a particular value of progressiveness. Each value on the scale of 1-5 is assigned a particular color. These graphs give an important visualization of the relationship between the percieved progressiveness of a school and the state of the school. This information could have further implications if they study the underlying causes of feeling more progressive or traditional. ***