Curso Libre - Análisis y Manipulación de datos con Tidyverse Unidad de Informática - Facultad de Ciencias Económicas Universidad Nacional de Colombia
Se cargan los paquetes readr, dplyr y ggplot 2, utiles para cargar las bases, transformarlas y graficar.
pacman::p_load(
readr,
dplyr,
ggplot2)
Se importa el archivo matches, que se encuentra en la carpeta del nuevo proyecto de R.
matches <- read_csv("matches.csv")
Se genera el histograma de densidad de color azul y con 45 bins.Antes de generar el histograma se eliminaron dos registros en los que la variable tenia NAs.
summary(matches$dist)
matches <- matches %>% filter(!is.na(dist))
ggplot(matches, aes(x = dist)) +
geom_histogram(aes(y = ..density..),
bins = 45,
fill = "blue",
color = "black") +
labs(x = "dist",
y = "Density") +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white"),
panel.grid.major = element_line(size = 0.2),
panel.grid.minor = element_line(size = 0.1)
)
ggplot(matches, aes(x = day)) +
geom_bar(aes(fill = day), color = "black") +
scale_fill_manual(values = c("red", "brown", "green", "cyan", "blue",
"purple", "pink")) +
labs(title = "Distribución de días",
x = "Día",
y = "Cantidad") +
theme_minimal() +
theme(legend.position = "none")
matches_filtrado <- matches[matches$team %in% c("Arsenal",
"Liverpool", "Manchester City",
"Manchester United"), ]
graf_disp <- ggplot(matches_filtrado, aes(x = dist, y = xg, color = team)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
facet_wrap(~team, scales = "free", strip.position = "top") +
scale_x_continuous(
name = "Distancia de remate promedio",
breaks = seq(floor(min(matches_filtrado$dist)), ceiling(max(matches_filtrado$dist)), by = 1)
) +
scale_y_continuous(
name = "Expectativa de gol",
breaks = seq(floor(min(matches_filtrado$xg)), ceiling(max(matches_filtrado$xg)), by = 1)
) +
theme_minimal() +
theme(
plot.title = element_text(size = 25, face = "bold", hjust = 0.5),
panel.grid.major = element_line(color = "gray15"),
panel.grid.minor = element_line(color = "gray15"),
panel.background = element_rect(fill = "lightblue1"),
legend.position = "none",
strip.placement = "outside",
axis.text.x = element_text(size = 10)
) +
ggtitle("Dispersión entre distancia de remate y expectativa de gol") +
theme(plot.title = element_text(size = 12))
print(graf_disp)