Việc 2: Soạn biểu đồ phân bố histogram

library(ggplot2)
p = ggplot(data = df, aes(x = log(income), fill=factor(gender))) + geom_histogram(fill = "darkgreen", col = "white") + labs(x="Income(logscale)", y="Frequency", title = " Phân bố thu nhập")
show(p)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 241 rows containing non-finite outside the scale range
## (`stat_bin()`).

Hãy vẽ biểu đồ phân bố thu nhập (income) theo giới tính (gender)

p = ggplot(data = df, aes(x = log(income), fill=factor(gender))) + geom_histogram(col = "white") + labs(x="Income(logscale)", y="Frequency", title = " Phân bố thu nhập")
show(p)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 241 rows containing non-finite outside the scale range
## (`stat_bin()`).

p <- ggplot(data = df, aes(x = log(income), fill = factor(gender))) +
     geom_histogram(col = "white", position = "stack") +
     labs(x = "Income (logscale)", y = "Frequency", title = "Phân bố thu nhập") +
     scale_fill_manual(
         values = c("1" = "red", "2" = "purple"),  # gán màu theo giá trị cụ thể
         name = "Giới tính",
         labels = c("Nam", "Nữ"))
show(p)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 241 rows containing non-finite outside the scale range
## (`stat_bin()`).

Theo tỉ lệ

p = ggplot(data = df, aes(x = log(income), fill=factor(gender))) + geom_density(alpha=0.5, col = "white") + labs(x="Income(logscale)", y="Frequency", title = " Phân bố thu nhập")
show(p)

## Warning: Removed 241 rows containing non-finite outside the scale range
## (`stat_density()`).

Ngay 2

Oanh Pham

2025-05-11

Việc 1: Đọc dữ liệu “CHNS data full.csv” vào R và gọi dữ liệu là “df”

Việc 2: Soạn biểu đồ phân bố histogram

Hãy vẽ biểu đồ phân bố thu nhập (income) theo giới tính (gender)

Theo tỉ lệ