Day 2: Data visualisation

Chuẩn bị

arr = read.csv("C:\\VN trips\\VN trip 4 (Dec 2022)\\VLU\\Regression analysis\\Datasets\\Arrest data.csv", header = TRUE)

library(ggplot2)

(2.1) Việc 1: Phân bố biến week với hàm hist

hist(arr$week)

hist(arr$week, col="blue", border="white")

hist(arr$week, col="blue", border="white", main="Distribution of Time to arrest (week)")

hist(arr$week, col="blue", border="white", main="Distribution of time to arrest (week)", xlab="Week", ylab="Number of participants")

(2.2) Việc 2: Vẽ biểu đồ phân bố biến week với hàm ggplot trong package ggplot2

ggplot(data=arr, aes(x=week)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data=arr, aes(x=week)) + geom_histogram(fill="blue", col="white")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data=arr, aes(x=week)) + geom_histogram(fill="blue", col="white") + labs(title="Distribution of Time to arrest", x="Time to arrest (weeks)", y="Number of participants")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(2.3) Việc 3: Vẽ biểu đồ phân bố biến age và đường probability density

ggplot(data=arr, aes(x=age)) + geom_histogram(fill="blue", col="white")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data=arr, aes(x=age)) + geom_histogram(aes(y=..density..), fill="blue", col="white")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data=arr, aes(x=age)) + geom_histogram(aes(y=..density..), fill="blue", col="white") + geom_density(col="red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(2.4) Việc 4: Vẽ biểu đồ thanh (bar chart) biến educ

ggplot(data=arr, aes(x=educ)) + geom_bar(col="blue") + labs(title = "Distribution of education", x = "Education", y = "Number of participants")

ggplot(data=arr, aes(x=educ)) + geom_bar(fill="blue") + labs(title = "Distribution of education", x = "Education", y = "Number of participants")

(2.5) Việc 5: Vẽ biểu đồ thanh (bar chart) của 2 biến educ và arrest

ggplot(data=arr, aes(x=educ, fill=arrest)) + geom_bar()

arr$arrest1[arr$arrest == 1] = "Yes"
arr$arrest1[arr$arrest == 0] = "No"
ggplot(data=arr, aes(x=educ, fill=arrest1)) + geom_bar()

(2.6) Việc 6: Ghi lại vào tài khoản rpubs.com (https://rpubs.com/ThachTran/981509)