UTS STATISTIKA

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Bangkitkan bilangan acak yang berdistribusi exponential (λ) dengan metode Inverse Transform Method, yang amatannya berjumlah 10000, dengan pdf

n <- 10000
u <- runif(n) 
set.seed(10) # membuat seed
x <- -(log(1-u)/2) #log merupakan ln di R
head(x)

## [1] 0.19515593 0.80926837 0.03295969 0.12159156 0.50176118 0.67455867

length(x)

## [1] 10000

set.seed(10) # membuat seed
y <- rexp(n,rate = 3)
head(y)

## [1] 0.004985469 0.306740402 0.250719646 0.525013950 0.077219539 0.362224334

length(y)

## [1] 10000

par(mfrow=c(1,2))
hist(x,main = "Exp dari Invers transform")
hist(y,main = "Exp dari fungsi rexp")

2 A. Buatlah Digram Frame dengan menambahkan kolom baru nilai total skor dan ratarata skor

studentperformace= read.csv("C:/Users/Hp/Downloads/StudentsPerformance.csv",header= TRUE)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

head(studentperformace)

##   gender race.ethnicity parental.level.of.education        lunch
## 1 female        group B           bachelor's degree     standard
## 2 female        group C                some college     standard
## 3 female        group B             master's degree     standard
## 4   male        group A          associate's degree free/reduced
## 5   male        group C                some college     standard
## 6 female        group B          associate's degree     standard
##   test.preparation.course math.score reading.score writing.score
## 1                    none         72            72            74
## 2               completed         69            90            88
## 3                    none         90            95            93
## 4                    none         47            57            44
## 5                    none         76            78            75
## 6                    none         71            83            78

Data.Siswa <- studentperformace%>%mutate(score.average=(math.score+reading.score+writing.score)/3,score.sum=math.score+reading.score+writing.score)
head(Data.Siswa)

##   gender race.ethnicity parental.level.of.education        lunch
## 1 female        group B           bachelor's degree     standard
## 2 female        group C                some college     standard
## 3 female        group B             master's degree     standard
## 4   male        group A          associate's degree free/reduced
## 5   male        group C                some college     standard
## 6 female        group B          associate's degree     standard
##   test.preparation.course math.score reading.score writing.score score.average
## 1                    none         72            72            74      72.66667
## 2               completed         69            90            88      82.33333
## 3                    none         90            95            93      92.66667
## 4                    none         47            57            44      49.33333
## 5                    none         76            78            75      76.33333
## 6                    none         71            83            78      77.33333
##   score.sum
## 1       218
## 2       247
## 3       278
## 4       148
## 5       229
## 6       232

tibble(Data.Siswa[1:10])

## # A tibble: 1,000 × 10
##    gender race.e…¹ paren…² lunch test.…³ math.…⁴ readi…⁵ writi…⁶ score…⁷ score…⁸
##    <chr>  <chr>    <chr>   <chr> <chr>     <int>   <int>   <int>   <dbl>   <int>
##  1 female group B  bachel… stan… none         72      72      74    72.7     218
##  2 female group C  some c… stan… comple…      69      90      88    82.3     247
##  3 female group B  master… stan… none         90      95      93    92.7     278
##  4 male   group A  associ… free… none         47      57      44    49.3     148
##  5 male   group C  some c… stan… none         76      78      75    76.3     229
##  6 female group B  associ… stan… none         71      83      78    77.3     232
##  7 female group B  some c… stan… comple…      88      95      92    91.7     275
##  8 male   group B  some c… free… none         40      43      39    40.7     122
##  9 male   group D  high s… free… comple…      64      64      67    65       195
## 10 female group B  high s… free… none         38      60      50    49.3     148
## # … with 990 more rows, and abbreviated variable names ¹race.ethnicity,
## #   ²parental.level.of.education, ³test.preparation.course, ⁴math.score,
## #   ⁵reading.score, ⁶writing.score, ⁷score.average, ⁸score.sum

B. Hitung nilai summary

summary(Data.Siswa)

##     gender          race.ethnicity     parental.level.of.education
##  Length:1000        Length:1000        Length:1000                
##  Class :character   Class :character   Class :character           
##  Mode  :character   Mode  :character   Mode  :character           
##                                                                   
##                                                                   
##                                                                   
##     lunch           test.preparation.course   math.score     reading.score   
##  Length:1000        Length:1000             Min.   :  0.00   Min.   : 17.00  
##  Class :character   Class :character        1st Qu.: 57.00   1st Qu.: 59.00  
##  Mode  :character   Mode  :character        Median : 66.00   Median : 70.00  
##                                             Mean   : 66.09   Mean   : 69.17  
##                                             3rd Qu.: 77.00   3rd Qu.: 79.00  
##                                             Max.   :100.00   Max.   :100.00  
##  writing.score    score.average      score.sum    
##  Min.   : 10.00   Min.   :  9.00   Min.   : 27.0  
##  1st Qu.: 57.75   1st Qu.: 58.33   1st Qu.:175.0  
##  Median : 69.00   Median : 68.33   Median :205.0  
##  Mean   : 68.05   Mean   : 67.77   Mean   :203.3  
##  3rd Qu.: 79.00   3rd Qu.: 77.67   3rd Qu.:233.0  
##  Max.   :100.00   Max.   :100.00   Max.   :300.0

B. Hitung nilai skewness dan kurtosis

library(e1071)
skewness(studentperformace$math.score)

## [1] -0.2780989

skewness(studentperformace$reading.score)

## [1] -0.2583277

skewness(studentperformace$writing.score)

## [1] -0.2885762

kurtosis(Data.Siswa$math.score)

## [1] 0.2610652

kurtosis(Data.Siswa$reading.score)

## [1] -0.07976785

kurtosis(Data.Siswa$writing.score)

## [1] -0.04511069

C. Buatlah Dagram Pie Chart pada (i) gender, (ii) race.ethnicity, (iii) parental.level.of.education, (iv) lunch dan (v) test.preparation.course dengan menampilkan label persentase (%)

gender=table(studentperformace$gender)
piepercent = round(100*gender/sum(gender), 1)

pie(gender, labels = piepercent, main = "pie chart gender",col = rainbow(length(gender)))
legend("topright", c("female","male"), cex = 0.8,
   fill =rainbow(length(gender)))

race.ethnicity=table(studentperformace$race.ethnicity)
piepercent = round(100*race.ethnicity/sum(race.ethnicity), 1)

pie(race.ethnicity, labels = piepercent, main = "pie chart race.ethnicity",col = rainbow(length(race.ethnicity)))
legend("topright", c("group A","group B","group C","group D","group E"), cex = 0.8,
   fill =rainbow(length(race.ethnicity)))

parental.level.of.education=table(studentperformace$parental.level.of.education)
piepercent = round(100*parental.level.of.education/sum(parental.level.of.education), 1)

pie(parental.level.of.education, labels = piepercent, main = "pie chart parental.level.of.education",col = rainbow(length(parental.level.of.education)))
legend("topright", c("bachelor's","some college","master's degree","high school","associate's degree"), cex = 0.8,
   fill =rainbow(length(parental.level.of.education)))

lunch=table(studentperformace$lunch)
piepercent = round(100*lunch/sum(lunch), 1)

pie(lunch, labels = piepercent, main = "pie chart lunch",col = rainbow(length(lunch)))
legend("topright", c("standard","free/reduced"), cex = 0.8,
   fill =rainbow(length(lunch)))

test.preparation.course=table(studentperformace$test.preparation.course)
piepercent = round(100*test.preparation.course/sum(test.preparation.course), 1)

pie(test.preparation.course, labels = piepercent, main = "pie chart test.preparation.course",col = rainbow(length(test.preparation.course)))
legend("topright", c("none","completed"), cex = 0.8,
   fill =rainbow(length(test.preparation.course)))

D. Buatlah diagram batang yang telah diurutkan pada data rata-rata skor terhadap (i) parental.level.of.education (ii) race.ethnicity

rataan = aggregate(score.average~parental.level.of.education, data= Data.Siswa, FUN = mean)
rata= rataan[ , -1]
rate.pare= rataan[ , -2]
ggplot(rataan, aes(x = rate.pare, y = rata , fill = rata ))+geom_bar(stat = "identity")+labs(title="DIAGRAM RATA-RATA SKOR TERHADAP PARENTAL EDUCATION")

rataan = aggregate(score.average~race.ethnicity, data= Data.Siswa, FUN = mean)
rata= rataan[ , -1]
rate.pare= rataan[ , -2]
ggplot(rataan, aes(x = rate.pare, y = rata , fill = rata ))+geom_bar(stat = "identity")+labs(title="DIAGRAM RATA-RATA SKOR TERHADAP PARENTAL EDUCATION")

E. Buatlah diagram batang cluster/stacked pada, gunakan ggplot (i) parental level of education dan gender terhadap total score ; (ii) race.ethnicity dan gender terhadap total score

total.skor=c(Data.Siswa$score.sum)
ggplot(studentperformace, aes(x = parental.level.of.education, y = total.skor , fill = gender))+geom_bar(stat = "identity")+labs(title="DIAGRAM BATANG/CLUSTER")

ggplot(studentperformace, aes(x = race.ethnicity, y = total.skor , fill = gender))+geom_bar(stat = "identity")+labs(title="DIAGRAM BATANG/CLUSTER")

F. Buatlah visualisasi sebaran data skor matematika, skor writing dan skor reading pada masing-masing jenis kelamin dengan menggunakan histogram

#sebaran mathe.score pada masing-masing gender
group = studentperformace$gender
ggplot(studentperformace, aes(x= math.score, col = group)) + geom_histogram(fill= "light grey", alpha=0.5, position="identity")+ labs(title = "histogram math.score terhadap gender")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#sebaran reading.score pada masing-masing gender
group = studentperformace$gender
ggplot(studentperformace, aes(x= reading.score, color = group)) + geom_histogram(fill="light grey", alpha=0.5, position="identity")+ labs(title = "histrogram reading.score terhadap gender")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#sebaran reading.score pada masing-masing gender
group = studentperformace$gender
ggplot(studentperformace, aes(x= writing.score, color = group)) + geom_histogram(fill="light grey", alpha=0.5, position="identity")+ labs(title = "histogram writing.score terhadap gender")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

G. Buatlah visualisasi boxplot pada masing-masing skor kemampuan (metamatika,writing, reading) terhadap parental education level di masing-masing gender

ggplot(studentperformace,mapping=aes(x=parental.level.of.education, y= math.score, fill=gender))+geom_violin(fill="#2D2D2D", alpha=0.5, trim=F,width=1.00)+geom_boxplot()+ labs(title="boxplot pada skor mat terhadap parental education di masing' gender")+coord_flip()+scale_fill_brewer(palette="RdPu")

ggplot(studentperformace,mapping=aes(x=parental.level.of.education, y= writing.score, fill=gender))+geom_violin(fill="#2D2D2D", alpha=0.5, trim=F,width=1.00)+geom_boxplot()+ labs(title="boxplot pada skor menulis terhadap parental education di masing' gender")+coord_flip()+scale_fill_brewer(palette="RdPu")

ggplot(studentperformace,mapping=aes(x=parental.level.of.education, y= reading.score, fill=gender))+geom_violin(fill="#2D2D2D", alpha=0.5, trim=F,width=1.00)+geom_boxplot()+ labs(title="boxplot pada skor membaca terhadap parental education di masing' gender")+coord_flip()+scale_fill_brewer(palette="RdPu")

H. Buatlah scatter plot antara skor matematika, writing dan reading

#scatterplot antara math.score dan reading.score
plot(studentperformace$math.score,
studentperformace$reading.score, pch = 20, cex = 1, frame = FALSE, xlab = "Math Score", ylab = "Reading Score", main="Scatterplot math.score dan reading.score")

#scatterplot antara reading.score dan writing.score
plot(studentperformace$reading.score, studentperformace$writing.score, pch = 20, cex = 1, frame = FALSE, xlab = "Reading Score" , ylab = "Writing Score", main="Scatterplot Reading Score dan Writing Score")

#scatterplot antara writing.score dan math.score
plot(studentperformace$writing.score, studentperformace$math.score, pch = 20, cex = 1, frame = FALSE, xlab = "Writing Score", ylab = "Math Score", main="Scatterplot Writing Score dan Math Score")

UTS STATISTIKA

arisna nuri salsabila

2022-10-26

R Markdown

Including Plots