1 1. Introducción

Este informe replica el Caso 1 – Analítica Descriptiva solicitado para la base de datos de Adidas. Los objetivos son: - Analizar rentabilidad (margen operativo) y su relación con precio por unidad y volumen. - Evaluar eficiencia en ventas (relación precio–unidades vendidas). - Evaluar desempeño financiero general (contribución a ventas y utilidad).

Reproducibilidad. Coloque el archivo Adidas(1).xlsx en la misma carpeta que este .Rmd y ejecute Knit a HTML/Word/PDF.

2 2. Paquetes y lectura de datos

needs <- c("tidyverse", "readxl", "janitor", "scales", "glue", "knitr", "kableExtra", "ggrepel")
to_install <- needs[!needs %in% installed.packages()[, "Package"]]
if (length(to_install) > 0) install.packages(to_install, dependencies = TRUE)
invisible(lapply(needs, library, character.only = TRUE))

# Ruta relativa: el Excel debe estar junto a este .Rmd
excel_path <- "Adidas(1).xlsx"

# Lectura flexible del Excel (primer hoja por defecto)
raw <- readxl::read_excel(excel_path) %>% janitor::clean_names()

# Mostrar nombres de columnas detectados
names(raw)

##  [1] "retailer"         "region"           "state"            "city"            
##  [5] "product"          "price_per_unit"   "units_sold"       "total_sales"     
##  [9] "operating_profit" "operating_margin" "sales_method"

2.1 2.1. Estandarización de nombres de variables

Intentamos mapear los nombres esperados habituales del dataset de Adidas a un esquema estándar.

# Diccionario de posibles nombres -> estandar
dict <- list(
  retailer          = c("retailer", "tienda", "comercio"),
  region            = c("region"),
  state             = c("state", "estado"),
  city              = c("city", "ciudad"),
  product           = c("product", "producto", "product_name"),
  price_per_unit    = c("price_per_unit", "precio", "unit_price", "price"),
  units_sold        = c("units_sold", "unidades", "quantity", "qty"),
  total_sales       = c("total_sales", "ventas_totales", "sales"),
  operating_profit  = c("operating_profit", "utilidad_operativa", "profit"),
  operating_margin  = c("operating_margin", "margen_operativo", "margin"),
  sales_method      = c("sales_method", "metodo_venta", "sales_channel")
)

standard_names <- names(dict)

# Función auxiliar que busca la primera coincidencia existente en el dataset
find_first <- function(candidates, cols) {
  cand <- candidates[candidates %in% cols]
  if (length(cand) == 0) NA_character_ else cand[1]
}

cols <- names(raw)
mapped <- purrr::map_chr(dict, find_first, cols = cols)
mapped

##           retailer             region              state               city 
##         "retailer"           "region"            "state"             "city" 
##            product     price_per_unit         units_sold        total_sales 
##          "product"   "price_per_unit"       "units_sold"      "total_sales" 
##   operating_profit   operating_margin       sales_method 
## "operating_profit" "operating_margin"     "sales_method"

# Renombrar solo las columnas encontradas
rename_pairs <- mapped[!is.na(mapped)]
names(rename_pairs) <- names(rename_pairs) # ensure names are preserved

# Construir una lista de pares old = > new para dplyr::rename
rename_list <- setNames(names(rename_pairs), unname(rename_pairs))

datos_col <- raw %>% dplyr::rename(!!!rename_list)

# Validación: imprimir las columnas estándar presentes
intersect(names(datos_col), standard_names)

##  [1] "retailer"         "region"           "state"            "city"            
##  [5] "product"          "price_per_unit"   "units_sold"       "total_sales"     
##  [9] "operating_profit" "operating_margin" "sales_method"

Nota: Este paso también corrige el error típico “objeto ‘datos_col’ no encontrado” al crear explícitamente el objeto datos_col a partir de los datos leídos.

3 3. Diagnóstico inicial y limpieza mínima

# Filas totales y NA por variable
resumen_na <- datos_col %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "na_count") %>%
  arrange(desc(na_count))

n_filas <- nrow(datos_col)

knitr::kable(head(resumen_na, 20), caption = glue("Top NA por variable (primeras 20) — n = {n_filas}")) %>%
  kableExtra::kable_classic(full_width = FALSE)

Top NA por variable (primeras 20) — n = 9648
variable	na_count
retailer	0
region	0
state	0
city	0
product	0
price_per_unit	0
units_sold	0
total_sales	0
operating_profit	0
operating_margin	0
sales_method	0

# Convertir a tipos numéricos donde aplique
coerce_numeric <- c("price_per_unit", "units_sold", "total_sales", "operating_profit", "operating_margin")
for (v in intersect(coerce_numeric, names(datos_col))) {
  datos_col[[v]] <- suppressWarnings(as.numeric(datos_col[[v]]))
}

# Quitar filas sin precio o sin unidades si existieran
datos_col <- datos_col %>% filter(is.na(price_per_unit) | is.na(units_sold) | price_per_unit >= 0, units_sold >= 0)

4 4. Estadística descriptiva

descriptive_vars <- intersect(c("price_per_unit", "units_sold", "total_sales", "operating_profit", "operating_margin"), names(datos_col))

stats_tbl <- datos_col %>%
  summarise(across(all_of(descriptive_vars), list(
    n = ~sum(!is.na(.)),
    mean = ~mean(., na.rm = TRUE),
    median = ~median(., na.rm = TRUE),
    sd = ~sd(., na.rm = TRUE),
    p25 = ~quantile(., 0.25, na.rm = TRUE),
    p75 = ~quantile(., 0.75, na.rm = TRUE)
  ), .names = "{.col}__{.fn}")) %>%
  pivot_longer(everything(), names_to = c("variable", ".value"), names_sep = "__")

knitr::kable(stats_tbl, digits = 2, caption = "Estadísticos descriptivos (n, media, mediana, sd, p25, p75)") %>%
  kableExtra::kable_classic(full_width = FALSE)

Estadísticos descriptivos (n, media, mediana, sd, p25, p75)
variable	n	mean	median	sd	p25	p75
price_per_unit	9648	45.22	45.00	14.71	35.00	55.00
units_sold	9648	256.93	176.00	214.25	106.00	350.00
total_sales	9648	12455.08	7803.50	12716.39	4065.25	15864.50
operating_profit	9648	34425.24	4371.42	54193.11	1921.75	52062.50
operating_margin	9648	0.42	0.41	0.10	0.35	0.49

5 5. Visualizaciones clave

5.1 5.1. Distribuciones (histograma y boxplot)

vars_plot <- intersect(c("operating_margin", "price_per_unit", "units_sold"), names(datos_col))

for (v in vars_plot) {
  p_hist <- ggplot(datos_col, aes(x = .data[[v]])) +
    geom_histogram(bins = 30) +
    labs(title = glue("Distribución de {v}"), x = v, y = "Frecuencia")
  print(p_hist)

  p_box <- ggplot(datos_col, aes(y = .data[[v]])) +
    geom_boxplot(outlier.alpha = 0.4) +
    labs(title = glue("Boxplot de {v}"), y = v, x = NULL)
  print(p_box)
}

5.2 5.2. Relación precio–unidades (eficiencia en ventas)

if (all(c("price_per_unit", "units_sold") %in% names(datos_col))) {
  ggplot(datos_col, aes(x = price_per_unit, y = units_sold)) +
    geom_point(alpha = 0.5) +
    geom_smooth(method = "loess", se = TRUE) +
    labs(title = "Relación Precio por Unidad vs Unidades Vendidas",
         x = "Precio por unidad",
         y = "Unidades vendidas")
}

5.3 5.3. Margen vs precio y volumen (rentabilidad)

if (all(c("operating_margin", "price_per_unit") %in% names(datos_col))) {
  ggplot(datos_col, aes(x = price_per_unit, y = operating_margin)) +
    geom_point(alpha = 0.5) +
    geom_smooth(method = "lm", se = TRUE) +
    labs(title = "Margen Operativo vs Precio por Unidad",
         x = "Precio por unidad",
         y = "Margen operativo")
}

if (all(c("operating_margin", "units_sold") %in% names(datos_col))) {
  ggplot(datos_col, aes(x = units_sold, y = operating_margin)) +
    geom_point(alpha = 0.5) +
    geom_smooth(method = "lm", se = TRUE) +
    labs(title = "Margen Operativo vs Unidades Vendidas",
         x = "Unidades vendidas",
         y = "Margen operativo")
}

6 6. Desempeño por producto y contribuciones

group_key <- intersect(c("product"), names(datos_col))
if (length(group_key) == 1) {
  contrib_prod <- datos_col %>%
    group_by(.data[[group_key]]) %>%
    summarise(
      n = n(),
      units = sum(units_sold, na.rm = TRUE),
      sales = sum(total_sales, na.rm = TRUE),
      op_profit = sum(operating_profit, na.rm = TRUE),
      avg_price = mean(price_per_unit, na.rm = TRUE),
      avg_margin = mean(operating_margin, na.rm = TRUE)
    ) %>%
    ungroup() %>%
    arrange(desc(op_profit)) %>%
    mutate(
      sales_share = sales / sum(sales, na.rm = TRUE),
      profit_share = op_profit / sum(op_profit, na.rm = TRUE),
      cum_sales_share = cumsum(replace_na(sales_share, 0)),
      cum_profit_share = cumsum(replace_na(profit_share, 0))
    )

  knitr::kable(head(contrib_prod, 20), digits = 2,
               caption = "Top 20 productos por utilidad operativa") %>%
    kableExtra::kable_classic(full_width = FALSE)

  # Barras top productos por utilidad
  top_n <- 15
  ggplot(slice_head(contrib_prod, n = top_n),
         aes(x = reorder(!!sym(group_key), op_profit), y = op_profit)) +
    geom_col() +
    coord_flip() +
    scale_y_continuous(labels = scales::dollar_format(prefix = "$", big.mark = ",")) +
    labs(title = glue("Top {top_n} productos por utilidad operativa"),
         x = "Producto", y = "Utilidad operativa")

  # Curva de Pareto de ventas
  ggplot(contrib_prod, aes(x = seq_along(sales_share), y = cum_sales_share)) +
    geom_line() +
    geom_point(size = 1) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    labs(title = "Curva de Pareto — Acumulado de participación en ventas",
         x = "Ranking de producto", y = "Ventas acumuladas (%)")
}

7 7. Segmentaciones útiles (opcional)

# Si existen variables de segmentación como region/state/city/sales_method
seg_vars <- intersect(c("region", "state", "city", "sales_method"), names(datos_col))
if (length(seg_vars) > 0) {
  for (s in seg_vars) {
    if (all(c("total_sales", "operating_profit") %in% names(datos_col))) {
      seg <- datos_col %>% group_by(.data[[s]]) %>%
        summarise(sales = sum(total_sales, na.rm = TRUE),
                  op_profit = sum(operating_profit, na.rm = TRUE),
                  avg_margin = mean(operating_margin, na.rm = TRUE),
                  .groups = "drop") %>%
        arrange(desc(sales))

      knitr::kable(head(seg, 10), digits = 2,
                   caption = glue("Top segmentos por {s} — ventas y rentabilidad")) %>%
        kableExtra::kable_classic(full_width = FALSE)

      ggplot(seg, aes(x = reorder(.data[[s]], sales), y = sales)) +
        geom_col() +
        coord_flip() +
        scale_y_continuous(labels = scales::dollar_format(prefix = "$", big.mark = ",")) +
        labs(title = glue("Ventas por {s}"), x = s, y = "Ventas")
    }
  }
}

8 8. utilidad por tipo de tienda

# 6.X. Análisis por canal de ventas (In-Store, Outlet, Online)

if ("sales_method" %in% names(datos_col)) {
  canal <- datos_col %>%
    group_by(sales_method) %>%
    summarise(
      ventas_totales = sum(total_sales, na.rm = TRUE),
      utilidad_total = sum(operating_profit, na.rm = TRUE),
      margen_promedio = mean(operating_margin, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(desc(utilidad_total))

  # Tabla resumen
  knitr::kable(canal, digits = 2,
               caption = "Utilidad y ventas por canal de ventas (In-Store, Outlet, Online)") %>%
    kableExtra::kable_classic(full_width = FALSE)

  # Gráfico comparativo
  ggplot(canal, aes(x = reorder(sales_method, utilidad_total), y = utilidad_total, fill = sales_method)) +
    geom_col(show.legend = FALSE) +
    coord_flip() +
    scale_y_continuous(labels = scales::dollar_format(prefix = "$", big.mark = ",")) +
    labs(title = "Utilidad Operativa por Canal de Ventas",
         x = "Canal de ventas",
         y = "Utilidad operativa")
}

# 6.X. Análisis por canal de ventas (In-Store, Outlet, Online)

if (“sales_method” %in% names(datos_col)) { canal <- datos_col %>% group_by(sales_method) %>% summarise( ventas_totales = sum(total_sales, na.rm = TRUE), utilidad_total = sum(operating_profit, na.rm = TRUE), margen_promedio = mean(operating_margin, na.rm = TRUE), .groups = “drop” ) %>% arrange(desc(utilidad_total))

# Tabla resumen knitr::kable(canal, digits = 2, caption = “Utilidad y ventas por canal de ventas (In-Store, Outlet, Online)”) %>% kableExtra::kable_classic(full_width = FALSE)

# Gráfico comparativo ggplot(canal, aes(x = reorder(sales_method, utilidad_total), y = utilidad_total, fill = sales_method)) + geom_col(show.legend = FALSE) + coord_flip() + scale_y_continuous(labels = scales::dollar_format(prefix = “$”, big.mark = “,”)) + labs(title = “Utilidad Operativa por Canal de Ventas”, x = “Canal de ventas”, y = “Utilidad operativa”) }

```

9 9. Hallazgos y recomendaciones (ejemplo)

Hallazgos típicos a verificar con sus datos: - Una relación negativa entre precio por unidad y unidades vendidas (elasticidad de demanda). - Productos/segmentos con alto margen pero bajo volumen: candidatos a impulsar con campañas. - Productos con alto volumen y bajo margen: evaluar ajustes de precio o reducción de costos. - Concentración tipo Pareto (20/80) donde pocos productos explican la mayor parte de las ventas/utilidad.

Recomendaciones orientativas (ajustar según resultados): 1. Optimización de precios en líneas con elasticidad alta: pruebas A/B de precio. 2. Mix de portafolio: priorizar impulso de productos con mejor contribución a utilidad (no solo ventas). 3. Eficiencia comercial: reasignar esfuerzos a canales/segmentos más rentables según sales_method/region. 4. Control de costos en productos de volumen donde el margen es estrecho.

Siguiente paso: Si lo prefieres, cambia output: a tu formato preferido; luego presiona Knit. Si obtienes errores, revisa que el archivo Adidas(1).xlsx esté en la misma carpeta y que los nombres de columnas estén mapeados en la sección Estandarización de nombres.

Análisis Financiero Adidas — Caso de Analítica Descriptiva

CAMILO MARTINEZ ORTEGA

2025-08-22