Afrikagglers, the african kagglers

Resumen del proyecto.

A raíz de la encuesta de Kaggle hacia su comunidad sobre los usos del machine learning y la ciencia de datos, surgió el interes de conocer la parte africana de la comunidad con el objetivo objetivo de tener una visión general de los Afrikagglers, como son llamados los miembros de la comunidad africanos, y sus experiencias en ciencia de datos y machine learning. Además, intentará averiguar cómo se compara África con otros continentes. Para ello se utilizaron los siguientes datasets: The 2017 Machine Learning and Data Science survey The 2018 Machine Learning and Data Science survey *A custom dataset with the list of countries present in the 2018 survey and each corresponding continent. En base a estos datos, se concluyó lo siguiente, La encuesta recopiló respuestas de participantes africanos de varios países, incluyendo Egipto, Kenia, Marruecos, Nigeria, Sudáfrica y Túnez. Se encontró que en los países africanos hay una proporción más alta de mujeres que de hombres. Además, la mayoría de los encuestados se encontraban en el rango de edad de 22 a 29 años. Se observó una disminución en la proporción de estudiantes mujeres en comparación con el año anterior, no obstante, se encontró que las mujeres encuestadas tienen una educación más alta que los hombres.

Lamentablemente, se identificó una brecha salarial de género en la que los hombres reciben salarios más altos que las mujeres con el mismo nivel de experiencia. Aunque se encontró que la ciencia de datos aún se encuentra en sus primeras etapas en África, Sudáfrica y Kenia tienen los trabajos mejor remunerados, incluyendo la ciencia de datos. En cuanto a los lenguajes de programación más populares, se encontró que Python, R y SQL son los más utilizados por los encuestados.

En cuanto a la experiencia de los encuestados, se observó que la mayoría se considera a sí misma como científicos de datos, aunque con pocos años de experiencia en codificación y están dispuestos a aprender. En cuanto a las plataformas de aprendizaje en línea, Coursera fue la más popular entre los encuestados.

Aunque el tamaño de la muestra es pequeño, los resultados sugieren que la ciencia de datos y el aprendizaje automático son campos en crecimiento en África con un futuro prometedor. Sin embargo, África difiere de las tendencias globales, como la proporción de hombres y mujeres, lo que indica la necesidad de abordar las desigualdades de género en la educación y el empleo.

Librerías necesarias

library(tidyverse)
library(grid)
library(gridExtra)
library(ggforce)

Conjunto de datos

Se usaran los siguientes datasets:

# STRING PROCESSING
# countries
multipleChoice18$Q3 <- str_replace(multipleChoice18$Q3,"Iran, Islamic Republic of...","Iran")
multipleChoice18$Q3 <- str_replace(multipleChoice18$Q3,"I do not wish to disclose my location","Won't disclose")
multipleChoice18$Q3 <- str_replace(multipleChoice18$Q3,"United Kingdom of Great Britain and Northern Ireland","UK and NI")
multipleChoice18$Q3 <- str_replace(multipleChoice18$Q3,"United States of America","USA")

continents$Country <- str_replace(continents$Country,"Iran, Islamic Republic of...","Iran")
continents$Country <- str_replace(continents$Country,"I do not wish to disclose my location","Won't disclose")
continents$Country <- str_replace(continents$Country,"United Kingdom of Great Britain and Northern Ireland","UK and NI")
continents$Country <- str_replace(continents$Country,"United States of America","USA")

# CONVERT CATEGORICAL DATA TO FACTOR
# age groups
multipleChoice18$Q2 <- factor(multipleChoice18$Q2,
                              levels = c("18-21","22-24","25-29",
                                        "30-34","35-39","40-44",
                                        "45-49","50-54","55-59",
                                        "60-69","70-79","80+"), 
                           labels = c("18-21","22-24","25-29",
                                      "30-34","35-39","40-44",
                                      "45-49","50-54","55-59",
                                      "60-69","70-79","80+"))

# degree
multipleChoice18$Q4 <- factor(multipleChoice18$Q4,
                              levels = c("Doctoral degree","Master’s degree","Bachelor’s degree","Professional degree",
                                         "No formal education past high school",
                                         "Some college/university study without earning a bachelor’s degree",
                                         "I prefer not to answer"), 
                           labels = c("PhD","Master","Bachelor","Professional",
                                         "High school","No degree","Won't disclose"))

# undergraduate major
multipleChoice18$Q5 <- factor(multipleChoice18$Q5, 
                               levels = c("Medical or life sciences (biology, chemistry, medicine, etc.)",
                                          "Computer science (software engineering, etc.)",
                                          "Engineering (non-computer focused)",
                                          "Mathematics or statistics",
                                          "A business discipline (accounting, economics, finance, etc.)",
                                          "Environmental science or geology",
                                          "Social sciences (anthropology, psychology, sociology, etc.)",
                                          "Physics or astronomy",
                                          "Information technology, networking, or system administration",
                                          "I never declared a major",
                                          "Other",
                                          "Humanities (history, literature, philosophy, etc.)") ,
                               labels = c("Medical/life sciences", "Computer science",
                                          "Engineering", "Mathematics/statistics",
                                          "A business discipline", "Physics/astronomy",
                                          "IT/Network/Sys. admin", "No major declared",
                                          "Humanities", "Env. science", "Social sciences", "Other"))

# In what industry is your current employer?
multipleChoice18$Q7 <- factor(multipleChoice18$Q7,
                              levels = c("Retail/Sales", "I am a student", 
                                         "Computers/Technology", "Accounting/Finance",
                                         "Academics/Education", 
                                         "Insurance/Risk Assessment","Other",
                                         "Energy/Mining", "Non-profit/Service",
                                         "Marketing/CRM", "Government/Public Service",
                                         "Manufacturing/Fabrication", 
                                         "Online Service/Internet-based Services",
                                         "Broadcasting/Communications",
                                         "Medical/Pharmaceutical",
                                         "Online Business/Internet-based Sales",
                                         "Military/Security/Defense",
                                         "Shipping/Transportation",
                                         "Hospitality/Entertainment/Sports"),
                              labels = c("Retail / Sales", "Student", 
                                         "Computers / Technology", "Accounting / Finance",
                                         "Academics / Education", 
                                         "Insurance / Risk Assessment","Other",
                                         "Energy / Mining", "Non-profit / Service",
                                         "Marketing / CRM", "Government / Public Service",
                                         "Manufacturing / Fabrication", 
                                         "Online Service / Internet-based Services",
                                         "Broadcasting / Communications",
                                         "Medical / Pharmaceutical",
                                         "Online Business / Internet-based Sales",
                                         "Military / Security/Defense",
                                         "Shipping / Transportation",
                                         "Hospitality / Entertainment/Sports"))

# experience in current role
multipleChoice18$Q8 <- factor(multipleChoice18$Q8, levels = c("0-1","1-2","2-3",
                                                   "3-4","4-5","5-10",
                                                   "10-15","15-20","20-25",
                                                   "25-30","30+"))

# yearly compensation
multipleChoice18$Q9 <- factor(multipleChoice18$Q9, 
                         levels = c("I do not wish to disclose my approximate yearly compensation",
                                   "0-10,000","10-20,000","20-30,000","30-40,000",
                                   "40-50,000","50-60,000","60-70,000","70-80,000",
                                   "80-90,000","90-100,000","100-125,000",
                                   "125-150,000","150-200,000","200-250,000",
                                   "250-300,000","300-400,000", "400-500,000","500,000+"),
                         labels = c("Won't disclose",
                                   "0-10,000","10-20,000","20-30,000","30-40,000",
                                   "40-50,000","50-60,000","60-70,000","70-80,000",
                                   "80-90,000","90-100,000","100-125,000",
                                   "125-150,000","150-200,000","200-250,000",
                                   "250-300,000","300-400,000", "400-500,000","500,000+"))

# time spent coding
multipleChoice18$Q23 <- factor(multipleChoice18$Q23, levels = c("0% of my time",
                                                                "1% to 25% of my time",
                                                                "25% to 49% of my time",
                                                                "50% to 74% of my time",
                                                                "75% to 99% of my time",
                                                                "100% of my time"), 
                               labels = c("0%","1% to 25%","25% to 49%",
                                          "50% to 74%","75% to 99%","100%"))
# coding experience
multipleChoice18$Q24 <- factor(multipleChoice18$Q24, 
                               levels = c("I have never written code and I do not want to learn",
                                          "I have never written code but I want to learn",
                                          "< 1 year","1-2 years","3-5 years","5-10 years",
                                          "10-20 years","20-30 years","30-40 years", "40+ years") ,
                               labels = c("I don't write code and don't want to learn",
                                         "I don't write code but want to learn",
                                        "< 1 year", "1-2 years", "3-5 years", 
                                       "5-10 years", "10-20 years","20-30 years","30-40 years", "40+ years")
)

# For how many years have you used machine learning methods
multipleChoice18$Q25 <- factor(multipleChoice18$Q25,
                               levels = c("I have never studied machine learning and I do not plan to", 
                                          "I have never studied machine learning but plan to learn in the future",
                               "< 1 year", "1-2 years", "2-3 years", "3-4 years", "4-5 years", 
                               "5-10 years", "10-15 years", "20+ years"),
                               labels = c("Never studied, do not plan to", 
                                          "Never studied, plan to learn",
                               "< 1 year", "1-2 years", "2-3 years", "3-4 years", "4-5 years", 
                               "5-10 years", "10-15 years", "20+ years"))

# use of machine learning in industries
multipleChoice18$Q10 <- factor(multipleChoice18$Q10,
                               levels = c("I do not know",
                                          "No (we do not use ML methods)",
                                          "We are exploring ML methods (and may one day put a model into production)",
                                          
                                          "We recently started using ML methods (i.e., models in production for less than 2 years)",
                                          "We have well established ML methods (i.e., models in production for more than 2 years)",
                                          "We use ML methods for generating insights (but do not put working models into production)"),
                               labels = c("I do not know", "No", "Exploring ML methods",
                                          "Recently started", "Well established ML methods", 
                                          "For generating insights"))
# expertise in data science
multipleChoice18$Q40 <- factor(multipleChoice18$Q40, 
                            levels = c("Independent projects are equally important as academic achievements",
                                       "Independent projects are much more important than academic achievements",
                                       "Independent projects are slightly more important than academic achievements",
                                       "Independent projects are slightly less important than academic achievements",
                                       "Independent projects are much less important than academic achievements",
                                       "No opinion; I do not know"),
                            labels = c("Equally important",
                                       "Much more important",
                                       "Slightly more important",
                                       "Less important",
                                       "Much less important",
                                       "No opinion/Don't know"))


# are you a data scientist?
multipleChoice18$Q26 <- factor(multipleChoice18$Q26, 
                               levels = c("Definitely yes", "Probably yes", "Maybe", 
                                          "Probably not", "Definitely not"), 
                               labels = c("Definitely yes", "Probably yes", "Maybe", 
                                          "Probably not", "Definitely not"))

Gráfica 1

Esta gráfica de barras nos describe el número de encuestados clasifados por continente, donde África se ubica en la quinta posición con un valor relativamente bajo ya que representan el 2.85% de los participantes.

newMultipleChoice %>%
  group_by(Continent) %>%
  summarise(Count = length(Continent)) %>%
  mutate(highlight_flag = ifelse((Continent == "Africa"), T, F)) %>%
  ggplot(aes(x = reorder(Continent,-Count), y = Count, fill = Continent)) +
  geom_bar(aes(fill = highlight_flag), stat = "identity", color = "grey") +
  geom_text(aes(label =as.character(Count)), 
            position = position_dodge(width = 1), 
            hjust = 0.5, vjust = -0.25, size = 3) +
  scale_fill_brewer(palette = "PuBu") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Number of respondents", 
       x = "", y = "Count", fill = "",
       caption = "Africa and the world")

Gráfica 2 & 3

El gráfico de la izquierda, utiliza el conjunto de datos “df” para trazar un gráfico de línea y puntos que muestra el número de encuestados por país a lo largo de los años. El gráfico de la derecha, utiliza el conjunto de datos “afroCountries” para trazar un gráfico de barras apiladas que muestra el número de encuestados por género y país de residencia.

p1 <- df %>%
  group_by(Country,Year) %>%
  summarise(Count = length(Country)) %>%
  ggplot(aes(x = Year, y = Count, group = Country)) +
  geom_line(aes(color = Country), size = 0.5) +
  geom_point(aes(color = Country), size = 4) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        plot.subtitle = element_text(hjust = 0.5),
        axis.text = element_text(size = 12), 
        legend.position = "bottom",
        legend.title=element_blank(),
        legend.text = element_text(size = 10)) +
  labs(title = "Number of respondents",
       x = "", y = "Count", fill = "", caption = "")

p2 <- afroCountries %>% 
  group_by(Q1,Q3) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  summarise(Count = length(Q3)) %>%
  ggplot(aes(x = reorder(Q3,-Count), y = Count, fill = Q1)) +
  geom_bar(stat = "identity") +
  scale_fill_brewer(palette = "Paired") + 
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 12), 
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.position = "top",
        legend.text = element_text(size = 10)) +
  labs(title = "Country of residence", x = "", y = "", fill = "",
       caption = "About us")

grid.arrange(p1,p2, ncol = 2)

### Gráfica 4 & 5 El primer gráfico de barras, nos indican las proporciones de mujeres a hombres en cada país de residencia utilizando barras apiladas en orden descendente. Mientras que el siguiente gráfico nos muestra el ratio de mujer a hombre clasificado por continente.

multipleChoice18 %>%
  group_by(Q1,Q3) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q3)) %>%
  summarise(Count = n()) %>%
  spread(Q1,Count) %>%
  mutate(ratio = Female/Male) %>%
  mutate(highlight_flag = ifelse((Q3 == "Egypt" | Q3 == "Kenya" | Q3 == "Morocco" |
                                   Q3 == "Nigeria" | Q3 == "Tunisia" | Q3 == "South Africa"), T, F)) %>%
  ggplot(aes(x = reorder(Q3,-ratio), y = ratio, fill = ratio)) +
  geom_bar(aes(fill = highlight_flag), stat = "identity") +
  scale_fill_brewer(palette = "Paired") +
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 11),
        axis.text.x = element_text(size = 9.5, angle = -90,
                                   hjust = 0 , vjust = 0.5)) +  
  labs(title = "Female to Male ratio", 
       x = "", y = "Ratio", fill = "",
       caption = "About us")

newMultipleChoice %>%
  group_by(Continent,Q1) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  summarise(Count = n()) %>%
  spread(Q1,Count) %>%
  mutate(ratio = Female/Male) %>%
  mutate(highlight_flag = ifelse((Continent == "Africa"), T, F)) %>%
  ggplot(aes(x = reorder(Continent,-ratio), y = ratio, fill = ratio)) +
  geom_bar(aes(fill = highlight_flag), stat = "identity", color = "grey") +
  scale_fill_brewer(palette = "PuBu") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 12),
        axis.text.x = element_text(size = 12)) +  
  labs(title = "Female to Male ratio", 
       x = "", y = "Ratio", fill = "",
       caption = "Africa and the world")

### Gráfica 6 Este gráfico de barras nos muestra la distribución de edad en África. Se han agrupado previamente por género y edad y se ha calculado la frecuencia de cada edad para cada género.

ggplot(data = temp, 
       aes(x = Q2, fill = Q1)) +
  geom_bar(data = filter(temp, Q1 == "Male"), aes(y = Count), stat = "identity") +
  geom_bar(data = filter(temp, Q1 == "Female"), aes(y = -1*Count), stat = "identity") +
  scale_y_continuous(breaks = seq(-50,150,50), 
                     labels = as.character(c(seq(50,0,-50), seq(50,150,50)))) + 
  scale_fill_brewer(palette = "Paired") +
  coord_flip() +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text =   element_text(size = 12),
        legend.position = "top",
        legend.text = element_text(size = 11)) + 
  labs(title = "Age distribution in Africa", 
       x = "Age group (years)", y = "Count", fill = "", 
       caption = "")

### Gráficas 7 & 8 La gráfica de la izquierda es un gráfico de lineas y puntos que muestra la distribución de la educación de los encuestados en diferentes continentes. Mientras que el de la derecha es un gráfico de barras apiladas con dos filas de paneles facetadas (una para hombres y otra para mujeres) que muestran la distribución de porcentajes de títulos universitarios en diferentes categorías.

p1 <- newMultipleChoice %>%
  group_by(Continent,Q4) %>%
  filter(!is.na(Q4)) %>%
  summarise(Count = length(Continent)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = Q4, y = pct, group = Continent)) +
  geom_line(aes(color = Continent), size = 0.5) +
  geom_point(aes(color = Continent), size = 2) +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 5)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 30))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 12),
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) +
  labs(title = "Educational background", 
       x = "", y = "%", fill = "",
       caption = "")

p2 <- afroCountries %>%
  group_by(Q1,Q4) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q4)) %>%
  summarise(Count = length(Q4)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = "", y = pct, fill = Q4)) +
  geom_col(width = 1) + 
  scale_fill_brewer(palette = "Set3") + 
  facet_grid(Q1~.) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 12), 
        legend.text = element_text(size = 11)) +
  labs(title = "Degree", 
       x = "", y = "", fill = "Degree",
       caption = "About us")

grid.arrange(p1,p2,ncol = 2)

### Gráficos 9, 10, 11, 12 El primero es un gráfico de dispersión que muestra la distribución relativa de estudiantes de pregrado en diferentes áreas de estudio en diferentes continentes. El segundo es un gráfico de linea que muestra la distribución de la elección de carrera por los estudiantes de pregrado africanos. El tercero es nuevamente un grafico de puntos que muestra la distribución del puesto actual en función del continente de origen de los encuestados. El cuarto es similar al segundo, solo que muestra el rol actual del encuestado y no la carrera.

newMultipleChoice %>%
  group_by(Continent,Q5) %>%
  filter(!is.na(Q5)) %>%
  summarise(Count = length(Q5)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot() + 
  geom_point(mapping = aes(x = Continent, y = reorder(Q5,pct), 
                           size = pct, color = Q5)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 5)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 11), 
        axis.text.x = element_text(size = 12),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Undergraduate major", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

afroCountries %>%
  group_by(Q1,Q5) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q5)) %>%
  summarise(Count = length(Q5)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = reorder(Q5,-pct), y = pct, group = Q1)) +
  geom_point(aes(color = Q1), size = 2) + geom_line(aes(color = Q1), size = 1) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 11), 
        axis.text.x = element_text(size = 12, angle = -90,hjust = 0,vjust = 0.5),
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) +
  labs(title = "Undergraduate major", 
       x = "", y = "%", fill = "",
       caption = "About us")

newMultipleChoice %>%
  group_by(Continent,Q6) %>%
  filter(!is.na(Q6)) %>%
  summarise(Count = length(Continent)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot() + 
  geom_point(mapping = aes(x = Continent, y = reorder(Q6,pct), 
                           size = 5*pct, color = Q6)) +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 8)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 10),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Current role", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

afroCountries %>% 
  group_by(Q1,Q6) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q6)) %>%
  summarise(Count = length(Q6)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = reorder(Q6,-pct), y = pct, group = Q1)) +
  geom_point(aes(color = Q1), size = 2) + geom_line(aes(color = Q1), size = 1) +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 12), 
        axis.text.x = element_text(size = 11, angle = -90, 
                                   hjust = 0, vjust = 0.5),
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) +
  labs(title = "Current role", x = "", y = "%", fill = "",
       caption = "About us")

### Gráfica 13 Presenta un gráfico de mapa de calor que muestra la relación entre la ocupación actual y el país de origen de los encuestados de africanos.

afroCountries %>%
  group_by(Q6,Q3) %>% 
  filter(!is.na(Q6)) %>%
  summarise(Count = length(Q6)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q6, y = Q3, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text.x = element_text(size = 12, angle = -90, hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 12), 
        legend.text = element_text(size = 11)) + 
  labs(title = "Current role by country", 
       x = "", y = "", fill = "",
       caption = "About us")

### Gráfica 14 & 15 Presenta dos gráficos circulares que muestran el avance con respecto a que los profesionales están adoptando la vida de la ciencia de datos.

propStud17 <- afroCountries17 %>%
  group_by(GenderSelect,StudentStatus) %>%
  filter(GenderSelect == "Female" | GenderSelect == "Male") %>%
  filter(!is.na(StudentStatus)) %>%
  summarise(Count = length(StudentStatus)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = "", y = pct, fill = StudentStatus)) + 
  geom_col(width = 1) +
  coord_polar("y", start = pi / 3) +
  scale_fill_brewer(palette = "Paired") +
  facet_wrap(GenderSelect~.) +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "top") +
  labs(title = "Proportion of students", subtitle = "2017", x = "", y = "", fill = "",
       caption = "")


propStud <- propStud18 %>%
  group_by(Q1,Q6) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q6)) %>%
  summarise(Count = length(Q6)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = "", y = pct, fill = Q6)) +
  geom_col(width = 1) +
  coord_polar("y", start = pi / 3) +
  scale_fill_brewer(palette = "Paired") +
  facet_wrap(Q1~.) +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "top") +
  labs(title = "Proportion of students", subtitle = "2018", x = "", y = "", fill = "",
       caption = "About us")

grid.arrange(propStud17,propStud, ncol = 2)

### Gráfica 16 Estas graficas de barras horizontales muestran la cantidad de personas africanas en dos años en diferentes roles laborales , agrupadas por su título de trabajo actual.

# 2017
p1 <- afroCountries17 %>%
  group_by(CurrentJobTitleSelect) %>%
  filter(!is.na(CurrentJobTitleSelect)) %>%
  summarise(Count = length(CurrentJobTitleSelect)) %>%
  ggplot(aes(x = reorder(CurrentJobTitleSelect, Count), y = Count, fill = CurrentJobTitleSelect)) +
  geom_col() +
  coord_flip() + 
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.y = element_text(size = 8),
        axis.text.x = element_text(size = 11),
        legend.position = "none") +
  labs(title = "Current role", subtitle = "2017", x = "", y = "Count", fill = "",
       caption = "")

# 2018
p2 <- afroCountries %>% 
  group_by(Q6) %>%
  filter(Q6 != "Student") %>%
  filter(!is.na(Q6)) %>%
  summarise(Count = length(Q6)) %>%
  ggplot(aes(x = reorder(Q6, Count), y = Count, fill = Q6)) + 
  geom_col() +
  coord_flip()+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.y = element_text(size = 8),
        axis.text.x = element_text(size = 11),
        legend.position = "none") +
  labs(title = "Current role", subtitle = "2018", x = "", y = "Count", fill = "",
       caption = "About us") 

grid.arrange(p1,p2, ncol = 2)

### Gráficos 17, 18 & 19 Los siguientes son gráficos de calor, el primero muestra los años de experiencia en la ocupación actual, el segundo el uso de machine learning en la industria y por quienes es usado, mientras que el tercer gráfico el uso de machine learning dia a dia.

afroCountries %>%
  group_by(Q6,Q8) %>%
  filter(!is.na(Q6)) %>%
  filter(!is.na(Q8)) %>%
  summarise(Count = length(Q8)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q8, y = Q6, fill = Count)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3) +
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11,
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 9), 
        legend.text = element_text(size = 11)) +
  labs(title = "Experience in current role", 
       x = "Years", y = "", fill = "",
       caption = "About us")

afroCountries %>%
  group_by(Q6,Q10) %>%
  filter(!is.na(Q10)) %>%
  summarise(Count = length(Q10)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q10, y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11), 
        axis.text.y = element_text(size = 10),
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Use of ML in industries", 
       x = "", y = "", fill = "",
       caption = "Machine learning usage")

afroCountries %>% 
  select(Q6,Q11_Part_1,Q11_Part_2, Q11_Part_3,Q11_Part_4,Q11_Part_5,Q11_Part_6,Q11_Part_7)%>%
  gather(2:8, key = "questions", value = "Function")%>%
  group_by(Q6,Function)%>%
  filter(!is.na(Function))%>%
  summarise(Count = length(Function))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = Function, y = Q6, fill = pct)) + 
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 9), 
        legend.position = "none") + 
  labs(title = "Day to day function", 
       x = "", y = "", fill = "",
       caption = "Machine learning use")

### Gráfica 20 & 21 La primera es una gráfica de barras apilada que señala el tiempo que llevan programando los encuestados y el de la segunda es un gráfico de puntos y lineas que formula la experiencia en programación por paises africanos.

p1 <- afroCountries %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  group_by(Q1,Q24) %>%
  filter(!is.na(Q24)) %>%
  summarise(Count = length(Q24)) %>%
  ggplot(aes(x = Q24, y = Count, fill = Q1)) +
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 12), 
        axis.text.x = element_text(size = 10, angle = -90,hjust = 0,vjust = 0.5),
        legend.position = "top",
        legend.text = element_text(size = 11)) + 
  labs(title = "Coding experience", 
       x = "", y = "Count", fill = "")

p2 <- afroCountries %>%
  group_by(Q3,Q24) %>%
  filter(!is.na(Q3)) %>%
  filter(!is.na(Q24)) %>%
  summarise(Count = length(Q24)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = Q24, y = pct, group = Q3)) +
  geom_point(aes(color = Q3), size = 1.5) + geom_line(aes(color = Q3), size = 0.5) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 15)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 11), 
        axis.text.x = element_text(size = 10, angle = -90,hjust = 0,vjust = 0.5),
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) +
  labs(title = "Coding experience by country", 
       x = "", y = "%", fill = "",
       caption = "About us")

grid.arrange(p1,p2,ncol = 2)

### Gráficas 22, 23, 24 & 25 La primera gráfica de calor muestra los años de experiencia escribiendo código, la segunda señala los lenguajes de programación más usados, la tercera los más recomendados y la cuarta una comparativa entre los más usados contra los más recomendados.

afroCountries %>%
  group_by(Q6,Q24) %>%
  filter(!is.na(Q24)) %>%
  summarise(Count = length(Q24)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q24, y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90,
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 9), 
        legend.text = element_text(size = 11)) +
  labs(title = "Coding experience", x = "", y = "",
       caption = "Coding experience")

afroCountries %>%
  group_by(Q6,Q17) %>%
  filter(!is.na(Q17)) %>%
  summarise(Count = length(Q17)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = reorder(Q17,-pct), y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3) + 
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 9), 
        legend.text = element_text(size = 11)) +
  labs(title = "Most used programming language", x = "", y = "",
       caption = "Coding experience")

afroCountries %>%
  group_by(Q6,Q18) %>%
  filter(!is.na(Q18)) %>%
  summarise(Count = length(Q18)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = reorder(Q18,-pct), y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3) + 
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11), 
        axis.text.y = element_text(size = 9), 
        legend.text = element_text(size = 11)) +
  labs(title = "Recommended programming language", x = "", y = "",
       caption = "Coding experience")

afroCountries %>%
  group_by(Q17,Q18) %>%
  filter(!is.na(Q17)) %>%
  filter(!is.na(Q18)) %>%
  summarise(Count = length(Q17)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = reorder(Q17,pct), y = Q18, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20))+
  geom_text(aes(label = as.character(Count)), color = "white", size = 3) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 11),
        axis.text.x = element_text(angle = 35, hjust = 1),
        legend.text = element_text(size = 11)) + 
  labs(title = "Most used vs. Recommended programming languages", 
       x = "Most used", y = "Recommended",
       caption = "Coding experience")

### Gráficas 26 & 27 Ambas gráficas indican el tiempo usado programando de los africanos, la primera es una gráfica de lineas y puntos que ademas muestra el porcentaje por genero. La segunda un heatmap que los clasifica por su posición actual.

afroCountries %>%
  group_by(Q1,Q23) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q23)) %>%
  summarise(Count = length(Q23)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = Q23, y = pct, group = Q1)) +
  geom_point(aes(color = Q1), size = 2) + geom_line(aes(color = Q1), size = 1) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.y = element_text(size = 11), 
        axis.text.x = element_text(size = 12), 
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) +
  labs(title = "Time spent actively coding", 
       x = "of time", y = "% of people", fill = "",
       caption = "Coding experience")

afroCountries %>%
  group_by(Q6,Q23) %>%
  filter(!is.na(Q23)) %>%
  summarise(Count = length(Q23)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q23, y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 11), 
        axis.text.y = element_text(size = 9),
        legend.text = element_text(size = 11)) + 
  labs(title = "Time spent coding", 
       x = "of time", y = "",
       caption = "Coding experience")

### Gráficas 28,29 & 30 Gráficos de mapas de calor, el primero muestra los softwares más utilizados para programar segun la posición, el segundo indica los “hosted notebook” más utilizados en los ultimos 5 años y el tercero señala las herramientas más utiles en el analisis de datos.

afroCountries %>% 
  select(Q6,30:45)%>%
  gather(2:16, key = "questions", value = "IDEs")%>%
  group_by(Q6,IDEs)%>%
  filter(!is.na(IDEs))%>%
  summarise(Count = length(IDEs))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(IDEs,-pct), y = Q6, fill = Count)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90,
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 9), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "IDEs", 
       x = "", y = "", fill = "",
       caption = "Coding experience")

afroCountries %>% 
  select(Q6,Q14_Part_1:Q14_Part_11)%>%
  gather(2:12, key = "questions", value = "Hosted_Notebook")%>%
  group_by(Q6,Hosted_Notebook)%>%
  filter(!is.na(Hosted_Notebook))%>%
  summarise(Count = length(Hosted_Notebook))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(Hosted_Notebook,-pct), y = Q6, fill = Count)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 9),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Hosted notebook used at school or work", 
       subtitle = "(past 5 years)",
       x = "", y = "", fill = "",
       caption = "")

afroCountries %>%
  group_by(Q6,Q12_MULTIPLE_CHOICE) %>%
  filter(!is.na(Q12_MULTIPLE_CHOICE)) %>%
  summarise(Count = length(Q12_MULTIPLE_CHOICE)) %>%
  ggplot(aes(x = reorder(Q12_MULTIPLE_CHOICE,-Count), y = Q6, fill = Count)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 11), 
        axis.text.y = element_text(size = 9),
        legend.text = element_text(size = 11)) + 
  labs(title = "Primary tools for data analysis", 
       x = "", y = "",
       caption = "Coding experience")

### Gráfica 31 & 32 Ambas gráficas muestran como se consideran los posibles cientificos de datos, la primera una gráfica de barras que ademas indica el genero y la segunda un gráfico de lineas que clasifica por continentes.

afroCountries %>%
  filter(Q1 == "Female"| Q1 == "Male") %>%
  group_by(Q1,Q26) %>%
  filter(!is.na(Q26)) %>%
  summarise(Count = length(Q26)) %>%
  ggplot(aes(x = Q26, y = Count, fill = Q1))+
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 8)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.text = element_text(size = 11)) + 
  labs(title = "Think of themself as a data scientist", 
       x = "", y = "Count", fill = "",
       caption = "Personal views")

newMultipleChoice %>%
  group_by(Continent,Q26) %>%
  filter(!is.na(Q26)) %>%
  summarise(Count = length(Q26)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q26, y = pct, group = Continent)) +
  geom_line(aes(color = Continent), size = 0.5) +
  geom_point(aes(color = Continent), size = 2) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) + 
  labs(title = "Think of themself as a data scientist", 
       x = "", y = "%", fill = "",
       caption = "Africa and the world")

Gráficas 33, 34, 35, 36 & 37

Todas estas gráficas tienen en común el uso de Machine learning, la primera es un mapa de calor que muestra su uso en la escuela o trabajo, la segunda es un gráfico de puntos que indica los productos gracias a el uso de Machine Learning. la tercera es similar a la anterior y adiciona la clasificación por continente, la cuarta es otro mapa de calor que señala los frameworks por Machine learning más comunes y la quinta es un gráfico de barras apiladas que muestra la distribución de respuestas a la pregunta “¿Consideras al Machine Learning como ‘Black boxes’?” en función del género de los encuestados.

afroCountries %>%
  group_by(Q6,Q25) %>%
  filter(!is.na(Q25)) %>%
  summarise(Count = length(Q25)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q25, y = Q6, fill = Count)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0, vjust = 0.5), 
        axis.text.y = element_text(size = 9),
        legend.text = element_text(size = 11)) + 
  labs(title = "Usage of machine learning at work/school", 
       x = "", y = "",
       caption = "Coding experience")

afroCountries %>% 
  select(Q6,152:194)%>%
  gather(2:44, key = "questions", value = "ML_Products")%>%
  group_by(Q6,ML_Products)%>%
  filter(!is.na(ML_Products))%>%
  summarise(Count = length(ML_Products))%>%
  mutate(pct =  prop.table(Count)*100)%>% 
  top_n(5,pct) %>%
  ggplot() + 
  geom_point(mapping = aes(x = reorder(ML_Products,-Count), y = Q6, 
                           size = pct, color = ML_Products)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 8, angle = 45, hjust = 1), 
        axis.text.y = element_text(size = 8), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Machine learning products (past 5 years)", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

newMultipleChoice %>% 
  select(Continent,152:194)%>%
  gather(2:44, key = "questions", value = "ML_Products")%>%
  group_by(Continent,ML_Products)%>%
  filter(!is.na(ML_Products))%>%
  summarise(Count = length(ML_Products))%>%
  mutate(pct =  prop.table(Count)*100)%>% 
  top_n(5,Count) %>%
  ggplot() + 
  geom_point(mapping = aes(x = Continent, y = reorder(ML_Products,pct), 
                           size = pct, color = ML_Products)) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 8), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Machine learning products (past 5 years)", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

afroCountries %>% 
  select(Q6,Q19_Part_1:Q19_Part_19)%>%
  gather(2:19, key = "questions", value = "ML_Framework")%>%
  group_by(Q6,ML_Framework)%>%
  filter(!is.na(ML_Framework))%>%
  summarise(Count = length(ML_Framework))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(ML_Framework,-Count), y = Q6, fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 10), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Machine learning framework (past 5 years)", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

afroCountries %>%
  group_by(Q1,Q48) %>% 
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q48)) %>%
  filter(!is.na(Q1)) %>%
  ggplot(aes(x = Q1, fill = Q48)) +
  geom_bar(position = "fill") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  scale_fill_brewer(palette = "Set3") +
  coord_flip() +
  labs(title = "Do you consider ML as 'black boxes'?", 
       x = "", y = "", fill = "", caption = "Personal views") +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        legend.position = "bottom", 
        axis.text = element_text(size = 12),
        legend.text = element_text(size = 10)) +
  guides(fill = guide_legend(ncol = 1))

### Gráficas 38 & 39 La primera es una gráfica de barras apiladas que indica los parametros usados en los paises de África para medir el exito y la segunda las metricas usadas por organizaciones en los diferentes continentes

afroCountries %>% 
  select(Q42_Part_1:Q42_Part_5,Q3) %>%
  gather(1:5, key = "questions", value = "metrics")%>%
  group_by(metrics,Q3) %>%
  filter(!is.na(metrics)) %>%
  filter(!is.na(Q3)) %>%
  ggplot(aes(x = Q3, fill = metrics)) +
  geom_bar(position = "fill") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 15)) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "Metrics used to measure model success", 
       x = "", y = "", fill = "", caption = "Machine learning usage") +
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 11), 
        legend.text = element_text(size = 11), 
        legend.position = "bottom") +
  guides(fill = guide_legend(ncol = 1))

newMultipleChoice %>% 
  select(Continent,Q42_Part_1:Q42_Part_5)%>%
  gather(2:6, key = "questions", value = "Metrics")%>%
  group_by(Continent,Metrics)%>%
  filter(!is.na(Metrics))%>%
  summarise(Count = length(Metrics))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = Continent, y = reorder(Metrics,pct), fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), 
            hjust = 0.5,vjust = 0.5, size = 4, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  scale_y_discrete(labels = function(x) str_wrap(x, width = 20))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12), 
        legend.position = "none",
        legend.text = element_text(size = 11)) + 
  labs(title = "Metrics used by organizations", 
       x = "", y = "", fill = "",
       caption = "Africa and the world")

###Conclusiones de las gráficas 40 - 55 Los Tipo de datos más utilizados son principalmente datos numéricos, aunque los científicos de datos manejan más datos tabulares. Las plataformas de agregación de conjuntos de datos son las principales fuentes de datos utilizadas, seguidas de la búsqueda en Google y Github. Un gran número de encuestados nunca ha utilizado servicios de computación en la nube. Aquellos que lo hacen utilizan principalmente Amazon Web Services, Google Cloud Platform o Microsoft Azure. La mayoría de los encuestados africanos no utilizan herramientas de Big Data y análisis. Google BigQuery es la más utilizada para aquellos que usan herramientas de Big Data y análisis, y hay muy pocos de ellos. Para mantenerse al día en el juego, los Afrikagglers utilizan otros medios para aprender sobre nuevas tendencias en aprendizaje automático y ciencia de datos. El valor medio del porcentaje de aprendizaje de ciencia de datos y aprendizaje automático es ligeramente mayor para los hombres que para las mujeres africanas en la enseñanza autodidacta, cursos en línea y trabajo. Sin embargo, la mediana es mayor para las mujeres africanas para el aprendizaje universitario. El porcentaje de mujeres que tienen títulos universitarios es mayor que el de los hombres en África, un porcentaje más alto de ellas también estaban / están estudiando ciencias de la computación y matemáticas / estadísticas, lo que podría explicar esto. Pocos de ellos están aprendiendo a través de competencias de Kaggle.

afroCountries %>% 
  select(Q6,Q32) %>%
  group_by(Q6,Q32) %>%
  filter(!is.na(Q32)) %>%
  summarise(Count = length(Q32)) %>%
  mutate(pct = round(prop.table(Count)*100,2)) %>%
  ggplot(aes(x = reorder(Q32,-Count), y = Q6, fill = pct)) +
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "none", 
        axis.text.y = element_text(size = 11),
        axis.text.x = element_text(size = 12, angle = -90, hjust = 0, vjust = 0.5),
        legend.text = element_text(size = 11)) +
  labs(title = "Most used data types", 
       x = "Type of data", y = "", fill = "", 
       caption = "Data")

afroCountries %>% 
  select(Q1,266:276) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  gather(2:12, key = "questions", value = "DataSource")%>%
  group_by(Q1,DataSource) %>%
  filter(!is.na(DataSource)) %>%
  summarise(Count = length(DataSource))%>%
  ggplot(aes(x = reorder(DataSource,-Count), y = Count, fill = Q1)) + 
  geom_col() +
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text = element_text(size = 12),
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.position = "top",
        legend.text = element_text(size = 11)) +
  labs(title = "Sources used to get public datasets", 
       x = "", y = "Count", fill = "",
       caption = "Data")

afroCountries %>% 
  select(Q6,Q15_Part_1:Q15_Part_7)%>%
  gather(2:8, key = "questions", value = "Cloud_services")%>%
  group_by(Q6,Cloud_services)%>%
  filter(!is.na(Cloud_services))%>%
  summarise(Count = length(Cloud_services))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(Cloud_services,-Count), y = Q6, fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 9),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Cloud computing services at work/school", 
       x = "", y = "", fill = "",
       caption = "")

afroCountries %>% 
  select(Q6,Q27_Part_1:Q27_Part_20) %>%
  gather(2:21, key = "questions", value = "cloud")%>%
  group_by(Q6,cloud)%>%
  filter(!is.na(cloud))%>%
  filter(!is.na(Q6))%>%
  summarise(Count = length(cloud))%>%
  mutate(pct = prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(cloud,-Count), y = Q6, fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.5, size = 2.5, color = "white") +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text = element_text(size = 10),
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Cloud computing products (past 5 years)", 
       x = "", y = "", fill = "",
       caption = "Cloud computing products")

afroCountries %>% 
  select(Q1,Q22)%>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  group_by(Q1,Q22)%>%
  filter(!is.na(Q22))%>%
  summarise(Count = length(Q22))%>%
  ggplot(aes(x = reorder(Q22,-Count), y = Count, fill = Q1)) + 
  geom_col() +
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text = element_text(size = 12),
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.text = element_text(size = 11),
        legend.position = "top") +
  labs(title = "Most used vizualisation libraries", 
       x = "", y = "Count", fill = "",
       caption = "Other tools")

afroCountries %>% 
  select(Q17,Q22)%>%
  group_by(Q17,Q22)%>%
  filter(!is.na(Q17)) %>%
  filter(!is.na(Q22))%>%
  summarise(Count = length(Q22))%>%
  mutate(pct =  prop.table(Count)*100)%>%
  ggplot(aes(x = reorder(Q22,-Count), y = Q17, fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), 
            hjust = 0.5,vjust = 0.25, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x,width = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text.y = element_text(size = 9),
        axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.5),
        legend.text = element_text(size = 11),
        legend.position = "none") +
  labs(title = "Most used vizualisation library", 
       x = "", y = "Most used programming language", fill = "",
       caption = "Vizualisation libraries")

afroCountries %>% 
  select(Q1,196:223) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  gather(2:29, key = "questions", value = "RDB_Products") %>%
  group_by(Q1,RDB_Products) %>%
  filter(!is.na(RDB_Products))%>%
  filter(!is.na(Q1))%>%
  summarise(Count = length(RDB_Products))%>%
  ggplot(aes(x = reorder(RDB_Products,-Count), y = Count, fill = Q1)) +
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +  
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text.x = element_text(size = 8, 
                                   angle = -90, hjust = 0, vjust = 0.5),
        legend.position = "top",
        legend.text = element_text(size = 11)) +
  labs(title = "Relational database products (past 5 years)", 
       x = "", y = "Count", fill = "",
       caption = "Relational database")

afroCountries %>% 
  select(Q1,225:249) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  gather(2:26, key = "questions", value = "BigData_Products") %>%
  group_by(Q1,BigData_Products) %>%
  filter(!is.na(BigData_Products)) %>%
  filter(!is.na(Q1)) %>%
  summarise(Count = length(BigData_Products))%>%
  ggplot(aes(x = reorder(BigData_Products,-Count), y = Count, fill = Q1)) + 
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text.x = element_text(size = 9, angle = -90, hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 12),
        legend.position = "top",
        legend.text = element_text(size = 11)) +
  labs(title = "Big data and analytics tools (past 5 years)", 
       x = "", y = "Count", fill = "",
       caption = "Big data and analytics tools")

p1 <- multipleChoice18 %>%
  select(Q1,Q35_Part_1) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "",y = Q35_Part_1, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "Self-taught", x = "", y = "%")
  
p2 <- multipleChoice18 %>%
  select(Q1,Q35_Part_2) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "", y = Q35_Part_2, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "Online courses", x = "", y = "%")

p3 <- multipleChoice18 %>%
  select(Q1,Q35_Part_3) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "",y = Q35_Part_3, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "Work", x = "", y = "%")
  
p4 <- multipleChoice18 %>%
  select(Q1,Q35_Part_4) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "", y = Q35_Part_4, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "University", x = "", y = "%")

p5 <- multipleChoice18 %>%
  select(Q1,Q35_Part_5) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "",y = Q35_Part_5, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "Kaggle competitions", x = "", y = "%")
  
p6 <- multipleChoice18 %>%
  select(Q1,Q35_Part_6) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  ggplot(aes(x = "", y = Q35_Part_6, fill = Q1)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "Paired") +
  theme(plot.title = element_text(size = 13, hjust = 0.5),
        legend.text = element_text(size = 9),
        legend.title = element_blank()) +
  labs(title = "Other", x = "", y = "%")

grid.arrange(p1,p2, p3, p4, p5, p6, ncol = 3)

afroCountries %>% 
  select(Q3,Q36_Part_1:Q36_Part_13)%>%
  gather(2:14, key = "questions", value = "OnlinePlatform")%>%
  group_by(Q3,OnlinePlatform)%>%
  filter(!is.na(OnlinePlatform))%>%
  summarise(Count = length(OnlinePlatform))%>%
  mutate(pct =  prop.table(Count)*100)%>% 
  ggplot(aes(x = OnlinePlatform, y = pct, group = Q3)) +
  geom_point(aes(color = Q3), size = 2) + geom_line(aes(color = Q3), size = 0.5) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) + 
  labs(title = "Platform for data science courses", 
       x = "", y = "%", fill = "",
       caption = "")

afroCountries %>% 
  select(Q1,Q36_Part_1:Q36_Part_13)%>%
  filter(Q1 == "Female"|Q1 == "Male")%>%
  gather(2:14, key = "questions", value = "OnlinePlatform")%>%
  group_by(Q1,OnlinePlatform)%>%
  filter(!is.na(OnlinePlatform))%>%
  summarise(Count = length(OnlinePlatform))%>%
  ggplot(aes(x = reorder(OnlinePlatform,-Count), y = Count, fill = Q1)) + 
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.text = element_text(size = 11)) + 
  labs(title = "Online platform used for learning", 
       x = "", y = "Count", fill = "",
       caption = "Online learning")

afroCountries %>% 
  select(Q1,Q38_Part_1:Q38_Part_22)%>%
  filter(Q1 == "Female"|Q1 == "Male")%>%
  gather(2:14, key = "questions", value = "OnlinePlatform")%>%
  group_by(Q1,OnlinePlatform)%>%
  filter(!is.na(OnlinePlatform))%>%
  summarise(Count = length(OnlinePlatform))%>%
  ggplot(aes(x = reorder(OnlinePlatform,-Count), y = Count, fill = Q1)) + 
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20))+
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.text = element_text(size = 11)) + 
  labs(title = "Online platform used for learning", 
       x = "", y = "Count", fill = "",
       caption = "Online learning")

afroCountries %>%
  group_by(Q6,Q39_Part_1) %>%
  filter(!is.na(Q39_Part_1)) %>%
  summarise(Count = length(Q39_Part_1)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = reorder(Q39_Part_1,-Count), y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 11), 
        axis.text.y = element_text(size = 9),
        legend.text = element_text(size = 11)) + 
  labs(title = "Online learning vs. Traditional institution", 
       x = "", y = "",
       caption = "")

afroCountries %>%
  group_by(Q6,Q39_Part_2) %>%
  filter(!is.na(Q39_Part_2)) %>%
  summarise(Count = length(Q39_Part_2)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = reorder(Q39_Part_2,-pct), y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = as.character(Count)), color = "white", size = 3.5) + 
  theme(legend.position = "none",
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text = element_text(size = 11), 
        axis.text.y = element_text(size = 9),
        legend.text = element_text(size = 11)) + 
  labs(title = "In-person bootcamp vs. Traditional institution", 
       x = "", y = "",
       caption = "")

afroCountries %>% 
  select(Q1,Q40) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  group_by(Q1,Q40) %>%
  filter(!is.na(Q40)) %>%
  filter(!is.na(Q1)) %>%
  summarise(Count = length(Q40))%>%
  mutate(pct = prop.table(Count)*100) %>%
  ggplot(aes(x = reorder(Q40,-pct), y = pct, fill = Q1)) + 
  geom_col() + 
  scale_fill_brewer(palette = "Paired") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        axis.text = element_text(size = 12),
        legend.position = "top",
        legend.text = element_text(size = 11)) +
  labs(title = "Independent projects vs. Academic achievements", 
       x = "", y = "Count", fill = "",
       caption = "Expertise in data science")

newMultipleChoice %>% 
  group_by(Continent, Q40)%>%
  filter(!is.na(Q40))%>%
  summarise(Count = length(Q40))%>%
  mutate(pct = prop.table(Count)*100)%>%
  ggplot(aes(x = Continent, y = reorder(Q40,pct), fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), 
            hjust = 0.5,vjust = 0.25, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  scale_y_discrete(labels = function(x) str_wrap(x, width = 30))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(size = 11),
        axis.text.y = element_text(size = 11),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Expertise in data science",
       subtitle = "Independent projects vs. academic achievements",
       x = "", y = "", fill = "",
       caption = "Africa and the world")

### Gráficas 57-65 #####Conclusiones de las gráficas *El ingreso promedio para los encuestados que estuvieron dispuestos a compartir su información, es de alrededor de 0-10,000 y 10,000-20,000$ para todos los países. Los países mejor pagados en África son Kenia y Sudáfrica, con más de 300,000$ para los encuestados masculinos. Independientemente del grado académico, la mayoría de los encuestados ganan menos de 10,000$ al año, tanto para hombres como mujeres. A simple vista, los hombres con doctorado ganan mucho más que sus contrapartes femeninas.

*Para ganar más dinero, es recomendable ser un científico de datos, estadístico o ingeniero de datos. Parece que el dinero está en el análisis de datos.

*Si cruzamos estos resultados con la experiencia en el puesto actual, parece que se necesitan de 1 a 2 años de experiencia para que las mujeres ganen 90,000-100,000$ como científicas de datos, mientras que los hombres pueden ganar entre 400,000 y más de 500,000$ con la misma experiencia.

*Los años de experiencia parecen tener poca importancia en cuanto a cuánto ganan las personas (en particular, como científicos de datos), pero el género sí importa. La mayoría de los encuestados africanos ganan menos de 10,000$. Norteamérica y Oceanía tienen la mayor proporción de personas que ganan más de 100,000$. En Europa, la mayoría se encuentra en el rango de 0-60,000$.

afroCountries %>% 
  group_by(Q1,Q9)%>%
  filter(Q1 == "Female"|Q1 == "Male")%>%
  filter(!is.na(Q9))%>%
  summarise(Count = length(Q9))%>%
  mutate(pct = prop.table(Count)*100)%>%
  ggplot(aes(x = Q9, y = pct, group = Q1)) +
  geom_point(aes(color = Q1), size = 2) + geom_line(aes(color = Q1), size = 1) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  
  theme(plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, vjust = 0.5, hjust = 0),
        axis.text.y = element_text(size = 12), 
        legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 11)) + 
  scale_fill_brewer(palette = "Paired") +
  labs(title = "Yearly compensation", 
       x = "$", y = "%", fill = "",
       caption = "About us")

afroCountries %>%
  group_by(Q1,Q9,Q3) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q9)) %>%
  summarise(Count = length(Q9)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q3, y = Q9, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), color = "white", size = 2) + 
  facet_grid(Q1~.) +
  coord_flip() +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90,  
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 11), 
        legend.text = element_text(size = 11)) + 
  labs(title = "Yearly compensation", 
       x = "", y = "$", fill = "",
       caption = "About us")

afroCountries %>%
  group_by(Q1,Q9,Q4) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q4)) %>%
  filter(!is.na(Q9)) %>%
  summarise(Count = length(Q9)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q4, y = Q9, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), color = "white", size = 2) + 
  facet_grid(Q1~.) +
  coord_flip() +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90,  
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 11), 
        legend.text = element_text(size = 11)) + 
  labs(title = "Yearly compensation by degree", 
       x = "", y = "$", fill = "",
       caption = "About us")

afroCountries %>%
  group_by(Q1,Q9,Q6) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q6)) %>%
  filter(!is.na(Q9)) %>%
  summarise(Count = length(Q9)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q9, y = Q6, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), color = "white", size = 2) +
  facet_grid(Q1~.) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 30)) +
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, 
                                   hjust = 0, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        legend.text = element_text(size = 11)) +
  labs(title = "Yearly compensation vs. current role", 
       x = "$", y = "", fill = "",
       caption = "About us")

afroCountries %>%
  group_by(Q1,Q9,Q8) %>%
  filter(Q1 == "Female" | Q1 == "Male") %>%
  filter(!is.na(Q8)) %>%
  filter(!is.na(Q9)) %>%
  summarise(Count = length(Q9)) %>%
  mutate(pct = round(prop.table(Count)*100,2))%>%
  ggplot(aes(x = Q9, y = Q8, fill = pct)) +
  geom_tile(size = 0.5, show.legend = TRUE) +
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), color = "white", size = 2) +
  facet_grid(Q1~.) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme(legend.position = "none", 
        plot.title = element_text(size = 15, hjust = 0.5), 
        axis.text.x = element_text(size = 11, angle = -90, hjust = 0),
        axis.text.y = element_text(size = 11), 
        legend.text = element_text(size = 11)) +
  labs(title = "Yearly compensation by gender and experience in current role", 
       x = "$", y = "Years of experience", fill = "",
       caption = "About us")

newMultipleChoice %>% 
  group_by(Continent, Q9)%>%
  filter(!is.na(Q9))%>%
  summarise(Count = length(Q9))%>%
  mutate(pct = prop.table(Count)*100)%>%
  ggplot(aes(x = Continent, y = Q9, fill = pct)) + 
  geom_tile(stat = "identity") + 
  scale_fill_gradient(low = "salmon1", high = "blue") +
  geom_text(aes(label = sprintf("%.2f%%", pct)), 
            hjust = 0.5,vjust = 0.25, size = 3, color = "white") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))+
  scale_y_discrete(labels = function(x) str_wrap(x, width = 30))+
  theme(plot.title = element_text(size = 15, hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(size = 11),
        axis.text.y = element_text(size = 11),
        legend.position = "none",
        legend.text = element_text(size = 11)) +
  labs(title = "Yearly compensation",
       x = "", y = "", fill = "",
       caption = "Africa and the world")

Informe 1

Angello Restrepo

2023-03-02