Input data

pisa = read.csv("C:\\Users\\Paul Vo\\Desktop\\Textbook\\PISA Data Vietnam 2015.csv")
head(pisa)
##     School SchoolSize ClassSize STratio SchoolType  Area Region   Age Gender
## 1 70400001        883        18  22.075          3 URBAN  SOUTH 15.58   Boys
## 2 70400001        883        18  22.075          3 URBAN  SOUTH 15.92   Boys
## 3 70400001        883        18  22.075          3 URBAN  SOUTH 15.42  Girls
## 4 70400001        883        18  22.075          3 URBAN  SOUTH 15.58  Girls
## 5 70400001        883        18  22.075          3 URBAN  SOUTH 15.92  Girls
## 6 70400001        883        18  22.075          3 URBAN  SOUTH 16.25  Girls
##   PARED HISCED  WEALTH INSTSCIE JOYSCIE  ICTRES    Math    Read Science
## 1     9      2 -2.0697   0.9798  2.1635 -1.5244 439.923 412.290 475.612
## 2    12      4 -1.7903   1.7359  2.1635 -1.9305 406.251 409.598 450.320
## 3     9      2 -2.1942  -0.2063 -0.1808 -1.6093 414.369 384.307 405.787
## 4     5      1 -2.0301  -0.3115 -0.4318 -1.6250 468.801 459.104 462.968
## 5     9      2 -1.0522   0.7648  1.3031 -0.5305 355.432 402.435 453.736
## 6     5      1 -3.0570   0.3708  0.5094 -2.5873 458.955 483.885 529.866

Many chart

Phân nhóm xíu:

p1=hist(pisa$Science[pisa$Gender=="Boys"], plot=F)
p2=hist(pisa$Science[pisa$Gender=="Girls"], plot=F)
plot(p1, col="skyblue", border="white")
plot(p2,add=T,col=scales::alpha("yellow",0.4),border="aliceblue")

## dòng lệnh add phía trên nó ghép vài vô trai nói cách kháclà nằm đè lên

Giờ thì đổi gói sang lattice (mà chắc cung ko vui đâu)

library(lattice)

densityplot(~ Science, group = Gender, data = pisa)

densityplot(~Science|Gender, data=pisa)

Time to Play with ggplot2

lipit = read.csv("C:\\Users\\Paul Vo\\Desktop\\Textbook\\Obesity data.csv")
library(ggplot2)
library(ggthemes)

Now this

p= ggplot(data=lipit, aes(x=lipit$bmi, y=lipit$pcfat, col= gender, fill = gender)) + geom_point() + labs(x="BMI", y="pcfat")+ geom_smooth (method ="lm", formula=y~x+I(x^2))
p

the I in formular mean exacly, in that case the real formular is y~x^2

Why? well that is experience in the other word the relationship of two varience in nature are not y~x, not a line - it curved

Jitter nên được dùng nhiều, tìm hiểu thêm về ggplot2