BaiKTKN2

Câu 1 (5 điểm):

Sinh ngẫu nhiên 600 giá trị từ phân phối nhị thức với n=20, p=0.5
Vẽ biểu đồ histogram của dữ liệu và so sánh với phân phối chuẩn.

set.seed(123)
data <- rbinom(n = 600, size = 20, prob = 0.5)

# Histogram
hist(data, 
     probability = TRUE, 
     breaks = seq(0, 20, 1),
     col = "lightblue",
     main = "Histogram phân phối nhị thức và xấp xỉ chuẩn",
     xlab = "Giá trị")

# Tham số phân phối chuẩn
mu <- mean(data)
sigma <- sd(data)

# Vẽ đường cong chuẩn
x <- seq(0, 20, length = 200)
lines(x, dnorm(x, mean = mu, sd = sigma), col = "red", lwd = 2)

Câu 2 (5 điểm):

Đọc tập dữ liệu Iris bằng thư viện Pandas.
Tính giá trị trung bình, độ lệch chuẩn của từng đặc trưng theo từng loại hoa.
Vẽ biểu đồ Boxplot của chiều rộng đài hoa (Sepal.Width).

import pandas as pd
from sklearn.datasets import load_iris

# Load dữ liệu Iris
iris = load_iris()

# Chuyển sang DataFrame Pandas
df = pd.DataFrame(
    iris.data,
    columns=iris.feature_names
)

# Thêm cột loài hoa
df["species"] = iris.target
df["species"] = df["species"].map(dict(enumerate(iris.target_names)))

df.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

stats = df.groupby("species").agg(["mean", "std"])
stats

	sepal length (cm)		sepal width (cm)		petal length (cm)		petal width (cm)
	mean	std	mean	std	mean	std	mean	std
species
setosa	5.006	0.352490	3.428	0.379064	1.462	0.173664	0.246	0.105386
versicolor	5.936	0.516171	2.770	0.313798	4.260	0.469911	1.326	0.197753
virginica	6.588	0.635880	2.974	0.322497	5.552	0.551895	2.026	0.274650

import matplotlib.pyplot as plt

plt.figure()
df.boxplot(column="sepal width (cm)", by="species")
plt.title("Boxplot Sepal.Width theo loài hoa")
plt.suptitle("")
plt.xlabel("Loài hoa")
plt.ylabel("Sepal.Width (cm)")
plt.show()

BaiKTKN2

Sung Thi Tho

2026-02-05

Câu 1 (5 điểm):

Câu 2 (5 điểm):