1. Manipulação de Dados

1.1 Carregar o dataset (iris)

data("iris")
df <- as_tibble(iris)
head(df, 8)
glimpse(df)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
summary(df)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

1.2 Manipulação Simples

df2 <- df %>% mutate(Sepal.Ratio = Sepal.Length / Sepal.Width)
df_filtered <- df2 %>% filter(Sepal.Length > 5.5)
df_sorted <- df_filtered %>% arrange(desc(Sepal.Ratio))
summary_by_species <- df2 %>%
  group_by(Species) %>%
  summarise(
    n = n(),
    mean_sepal_length = mean(Sepal.Length),
    sd_sepal_length = sd(Sepal.Length),
    mean_sepal_ratio = mean(Sepal.Ratio)
  )
df_preview <- df_sorted %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Sepal.Ratio, Species)

cat("Dimensão original:", dim(df), "\n")
## Dimensão original: 150 5
cat("Dimensão após filtro (Sepal.Length > 5.5):", dim(df_filtered), "\n\n")
## Dimensão após filtro (Sepal.Length > 5.5): 91 6
knitr::kable(summary_by_species, digits = 3, caption = "Resumo por espécie")
Resumo por espécie
Species n mean_sepal_length sd_sepal_length mean_sepal_ratio
setosa 50 5.006 0.352 1.47
versicolor 50 5.936 0.516 2.16
virginica 50 6.588 0.636 2.23
head(df_preview, 10)

2. Tabela Interativa

datatable(
  df2,
  caption = 'Tabela interativa: iris (com Sepal.Ratio)',
  style = 'bootstrap4',
  class = 'cell-border stripe hover',
  options = list(
    pageLength = 10,
    lengthMenu = c(5, 10, 25, 50),
    autoWidth = TRUE,
    scrollX = TRUE
  )
)

3. Equações Matemáticas

3.1 SVD

\[ \mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\top \]

3.2 Log-verossimilhança logística

\[ \ell(\beta) = \sum_{i=1}^n [y_i \log \sigma(\mathbf{x}_i^\top \beta) + (1-y_i)\log(1-\sigma(\mathbf{x}_i^\top \beta))] \]

3.3 SVM

\[ \min_{\mathbf{w}, b, \xi} \frac{1}{2}\|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i \]

3.4 Teorema de Bayes

\[ p(\theta | \mathcal{D}) = \frac{p(\mathcal{D} | \theta)p(\theta)}{p(\mathcal{D})} \]

3.5 Cross-Entropy

\[ \mathcal{L} = -\frac{1}{n}\sum_{i=1}^n \sum_{k=1}^K y_{i,k} \log \hat{p}_{i,k} \]

4. Figuras Relacionadas à Ciência de Dados

Figura 1: Relação entre Sepal.Length e Petal.Length

Figura 1: Relação entre Sepal.Length e Petal.Length

Figura 2: Distribuição da Largura da Pétala por Espécie

Figura 2: Distribuição da Largura da Pétala por Espécie

5. Referências Bibliográficas

  1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
  2. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  3. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
  4. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
  5. R Core Team (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.r-project.org/