Import libs

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)

Import data

data(iris)
write.csv(iris, "iris.csv", row.names = FALSE)

Câu 1:

1) Tính giá trị lớn nhất và nhỏ nhất của cột Petal.Width theo từng nhóm Species

petal_stats <- iris %>%
  group_by(Species) %>%
  summarise(
    PetalWidth_max = max(Petal.Width, na.rm = TRUE),
    PetalWidth_min = min(Petal.Width, na.rm = TRUE)
  )

print(petal_stats)

## # A tibble: 3 × 3
##   Species    PetalWidth_max PetalWidth_min
##   <fct>               <dbl>          <dbl>
## 1 setosa                0.6            0.1
## 2 versicolor            1.8            1  
## 3 virginica             2.5            1.4

2) Vẽ biểu đồ line của trung bình Sepal.Width theo từng nhóm Species bằng ggplot2

sepal_means <- iris %>%
  group_by(Species) %>%
  summarise(mean_Sepal_Width = mean(Sepal.Width, na.rm = TRUE))

ggplot(sepal_means, aes(x = Species, y = mean_Sepal_Width, group = 1)) +
  geom_line() +
  geom_point(size = 3) +
  labs(
    title = "Trung bình Sepal.Width theo Species",
    x = "Species",
    y = "Mean Sepal.Width"
  ) +
  theme_minimal()

Câu 2:

Dùng Pandas để:

Đọc file iris.csv.
Nhóm dữ liệu theo Species và tính trung bình của tất cả các cột (trừ Species).
Lưu kết quả vào file mean_by_species.csv.

import pandas as pd

# 1) Đọc file iris.csv
df = pd.read_csv("iris.csv")

# 2) Nhóm theo Species và tính trung bình của tất cả các cột (trừ Species)
# chú ý: .mean() tự bỏ cột không phải số (Species)
mean_by_species = df.groupby("Species").mean()

# Nếu muốn reset index để Species là cột:
mean_by_species = mean_by_species.reset_index()

print(mean_by_species)

##       Species  Sepal.Length  Sepal.Width  Petal.Length  Petal.Width
## 0      setosa         5.006        3.428         1.462        0.246
## 1  versicolor         5.936        2.770         4.260        1.326
## 2   virginica         6.588        2.974         5.552        2.026


# 3) Lưu kết quả vào file mean_by_species.csv
mean_by_species.to_csv("mean_by_species.csv", index=False)

Bài kiểm tra 1

NÔNG NGỌC YÊN

2026-01-08

Import libs

Import data

Câu 1:

1) Tính giá trị lớn nhất và nhỏ nhất của cột Petal.Width theo từng nhóm Species

2) Vẽ biểu đồ line của trung bình Sepal.Width theo từng nhóm Species bằng ggplot2

Câu 2:

Dùng Pandas để: