Eksik veri inceleme

Bu ödevde, TIMSS 2015 uygulamasına ait bir kitapçığın Türkiye ve Amerika verilerini kullanacaksınız.

  1. Veri seti adı: “TRUSA.RDS”. Bu veri setini R ortamına aktarınız.

  2. Veri setinde eksik veri olup olmadığını kontrol ediniz.

  3. Kitapçıktaki 35 maddenin toplamını hesaplayarak veri setine yeni bir sütun olarak ekleyiniz.

  4. Toplam puanın her iki ülkeye göre betimsel istatistiklerini hesaplayınız.

  5. Toplam puanın, Türkiye ve ABD örneklemlerinde farklılaşıp farklılaşmadığını t testi ile test ediniz.

  6. Veri setinde %5, %10 ve %15 oranında eksik veriler oluşturunuz.

  7. Oluşturulan eksik veri setlerinde önce eksik verinin rastgele olup olmadığını test ediniz. Ardından, liste bazında silme yöntemiyle eksik verileri temizleyerek e seçeneğinde gerçekleştirdiğiniz t testini tekrarlayınız. Tam veri ile elde edilen sonuçlarla karşılaştırınız.

  8. f seçeneğinde oluşturulan veri setlerindeki eksik verileri, belirlediğiniz bir kayıp veri atama yöntemiyle doldurunuz. Daha sonra, e seçeneğinde gerçekleştirdiğiniz t testini tekrar ediniz ve tam veri ile elde edilen sonuçlarla karşılaştırınız.

  9. Eksik veri oranının uygulanan yöntemlerin performansına etkisini açıklayınız.

İyi kodlamalar :)

Çözüm 1: Emrah hocanın versiyonu

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
TRUSA <- readRDS("TRUSA.RDS")

TRUSA$toplam <- TRUSA %>% select(starts_with("M")) %>% rowSums()

TRUSA %>% group_by(CNT) %>% summarise(n=n(),
                                      ort=mean(toplam),
                                      sd=sd(toplam),
                                      min=min(toplam),
                                      max=max(toplam))
## # A tibble: 2 × 6
##   CNT       n   ort    sd   min   max
##   <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 TUR     435  13.5  7.57     2    32
## 2 USA     716  17.0  7.53     1    34
t.test(toplam ~ CNT, data=TRUSA)
## 
##  Welch Two Sample t-test
## 
## data:  toplam by CNT
## t = -7.8242, df = 912.31, p-value = 1.41e-14
## alternative hypothesis: true difference in means between group TUR and group USA is not equal to 0
## 95 percent confidence interval:
##  -4.494510 -2.691921
## sample estimates:
## mean in group TUR mean in group USA 
##          13.45287          17.04609

Çözüm 2: Hocanın anlatımı

  1. import data
TRUSA <- readRDS("TRUSA.RDS")
  1. control missing
#library(nainar)
#miss_var_table(TRUSA)
  1. calculate total score
SUM <- TRUSA %>% dplyr::select(starts_with("M")) %>% rowSums()
TRUSA$SUM <-SUM

d)descriptive statistics of total score

library(dplyr)
library(psych)

describe.by(TRUSA$SUM, TRUSA$CNT)
## Warning in describe.by(TRUSA$SUM, TRUSA$CNT): describe.by is deprecated.
## Please use the describeBy function
## 
##  Descriptive statistics by group 
## group: TUR
##    vars   n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 435 13.45 7.57     11   12.74 7.41   2  32    30 0.71    -0.61 0.36
## ------------------------------------------------------------ 
## group: USA
##    vars   n  mean   sd median trimmed mad min max range skew kurtosis   se
## X1    1 716 17.05 7.53     17   16.92 8.9   1  34    33  0.1     -0.9 0.28
  1. independent sample t test
library(effsize) #to calculate the effect size
## 
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
## 
##     cohen.d
t_test_result  <- t.test(SUM~CNT, data=TRUSA, var.equal=TRUE)
print(t_test_result)
## 
##  Two Sample t-test
## 
## data:  SUM by CNT
## t = -7.8348, df = 1149, p-value = 1.064e-14
## alternative hypothesis: true difference in means between group TUR and group USA is not equal to 0
## 95 percent confidence interval:
##  -4.493049 -2.693382
## sample estimates:
## mean in group TUR mean in group USA 
##          13.45287          17.04609

Calculate Cohen’s d

cohen_d_result <-effsize::cohen.d(TRUSA$SUM[TRUSA$CNT=="TUR"],
                          TRUSA$SUM[TRUSA$CNT=="USA"])
print(cohen_d_result)
## 
## Cohen's d
## 
## d estimate: -0.4762813 (small)
## 95 percent confidence interval:
##      lower      upper 
## -0.5971341 -0.3554285
  1. missing data with diff %
library(mvdalab)
## 
## Attaching package: 'mvdalab'
## The following object is masked from 'package:psych':
## 
##     smc
TRUSA_5 <- introNAs(TRUSA, percent=5)
TRUSA_10 <- introNAs(TRUSA, percent = 10)
TRUSA_15 <- introNAs(TRUSA, percent= 15)
TRUSA_5 %>% is.na() %>% colSums()
##   IDSTUD   IDBOOK  M042182  M042081  M042049  M042052  M042076 M042302A 
##       53       71       49       63       57       53       63       51 
## M042302B M042302C  M042100  M042202  M042240  M042093  M042271  M042268 
##       60       63       54       66       55       50       62       50 
##  M042159  M042164  M042167  M062208 M062208A M062208B M062208C M062208D 
##       57       56       64       54       76       58       59       58 
##  M062153 M062111A M062111B  M062237  M062314  M062074  M062183  M062202 
##       64       49       56       64       55       56       54       46 
##  M062246  M062286  M062325  M062106  M062124      CNT      SUM 
##       50       51       60       65       50       66       56
TRUSA_10 %>% is.na() %>% colSums()
##   IDSTUD   IDBOOK  M042182  M042081  M042049  M042052  M042076 M042302A 
##      114      127      115      119      112      115      116      111 
## M042302B M042302C  M042100  M042202  M042240  M042093  M042271  M042268 
##      112      107      123      108      114      103      105      106 
##  M042159  M042164  M042167  M062208 M062208A M062208B M062208C M062208D 
##      112      127      117      131      135      114       99      117 
##  M062153 M062111A M062111B  M062237  M062314  M062074  M062183  M062202 
##      116      102      104      120      121      120      126      118 
##  M062246  M062286  M062325  M062106  M062124      CNT      SUM 
##      111      107      116      136      117      101      115
TRUSA_15 %>% is.na() %>% colSums()
##   IDSTUD   IDBOOK  M042182  M042081  M042049  M042052  M042076 M042302A 
##      175      160      187      198      201      174      157      185 
## M042302B M042302C  M042100  M042202  M042240  M042093  M042271  M042268 
##      164      189      187      147      160      164      147      195 
##  M042159  M042164  M042167  M062208 M062208A M062208B M062208C M062208D 
##      171      170      171      153      161      166      192      177 
##  M062153 M062111A M062111B  M062237  M062314  M062074  M062183  M062202 
##      147      181      162      179      169      168      163      173 
##  M062246  M062286  M062325  M062106  M062124      CNT      SUM 
##      177      170      184      173      176      185      175
  1. independent sample t test
TRUSA_5_lw <- na.omit(TRUSA_5)
TRUSA_10_lw <- na.omit(TRUSA_10)
TRUSA_15_lw <- na.omit(TRUSA_15)

ordinalsa poly

Çoklu atama yapmamız gerekir

#library(mice)
#TRUSA_5_im1 <- mice(TRUSA_5[,2:36],m=5,maxit = 50, method = 'logreg', seed=500)
#TRUSA_10_im1 <- mice(TRUSA_10[,2:36],m=5, maxit=50, method='logreg',seed=500)
#TRUSA_15_im1 <- mice(TRUSA_15[,2:36],m=5, maxit = 50, method='logreg',seed=500)

#completed_data_1 <- complete(TRUSA_5_im1, 1)
#completed_data_2 <- complete(TRUSA_5_im1,2)
#completed_data_3 <- complete(TRUSA_5_im1,3)
#completed_data_4 <- complete(TRUSA_5_im1, 4)
#completed_data_5 <- complete(TRUSA_5_im1, 5)

#t_test_result_1 <- t.test(SUM~CNT, data=completed_data_1, var.equal=TRUE)

#not: cok uzundu ormesi ondan sabitledim :(
library(tidyverse)
library(stevemisc)
## 
## Attaching package: 'stevemisc'
## The following object is masked from 'package:lubridate':
## 
##     dst
## The following object is masked from 'package:dplyr':
## 
##     tbl_df
library(knitr)
library(haven)
library(summarytools)
## 
## Attaching package: 'summarytools'
## The following object is masked from 'package:tibble':
## 
##     view
library(outliers)
## 
## Attaching package: 'outliers'
## The following object is masked from 'package:psych':
## 
##     outlier
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggpmisc)
## Zorunlu paket yükleniyor: ggpp
## Registered S3 methods overwritten by 'ggpp':
##   method                  from   
##   heightDetails.titleGrob ggplot2
##   widthDetails.titleGrob  ggplot2
## 
## Attaching package: 'ggpp'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(psych)
library(sur)
## 
## Attaching package: 'sur'
## The following object is masked from 'package:psych':
## 
##     skew
library(moments)
library(corrplot)
## corrplot 0.95 loaded
library(olsrr)
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers

NORMALLİK

library(dplyr)
library(haven) # SPSS dosyalarını R ortamına aktarmak için haven paketini kullanın.
screen <- read_sav("SCREEN.sav")
screen <- expss::drop_var_labs(screen)
head(screen) # Veri setinin ilk birkaç satırını görüntüle
## # A tibble: 6 × 8
##   SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS  RACE
##   <dbl>   <dbl>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl> <dbl>
## 1     1       1       8       27      5       1       2     1
## 2     2       3       7       20      6       0       2     1
## 3     3       0       8       23      3       0       2     1
## 4     4      13       9       28      8       1       2     1
## 5     5      15       7       24      1       1       2     1
## 6     6       3       8       25      4       0       2     1

eksik veri düzenlemesi

screen <- screen %>% 
mutate(INCOME = ifelse(is.na(INCOME), mean(INCOME, na.rm =TRUE),INCOME)) %>% na.omit()
summary(screen)
##      SUBNO          TIMEDRS          ATTDRUG         ATTHOUSE    
##  Min.   :  1.0   Min.   : 0.000   Min.   : 5.00   Min.   : 2.00  
##  1st Qu.:136.8   1st Qu.: 2.000   1st Qu.: 7.00   1st Qu.:21.00  
##  Median :313.5   Median : 4.000   Median : 8.00   Median :24.00  
##  Mean   :317.3   Mean   : 7.914   Mean   : 7.69   Mean   :23.54  
##  3rd Qu.:483.2   3rd Qu.:10.000   3rd Qu.: 9.00   3rd Qu.:27.00  
##  Max.   :758.0   Max.   :81.000   Max.   :10.00   Max.   :35.00  
##      INCOME          EMPLMNT         MSTATUS          RACE      
##  Min.   : 1.000   Min.   :0.000   Min.   :1.00   Min.   :1.000  
##  1st Qu.: 3.000   1st Qu.:0.000   1st Qu.:2.00   1st Qu.:1.000  
##  Median : 4.000   Median :0.000   Median :2.00   Median :1.000  
##  Mean   : 4.208   Mean   :0.472   Mean   :1.78   Mean   :1.086  
##  3rd Qu.: 6.000   3rd Qu.:1.000   3rd Qu.:2.00   3rd Qu.:1.000  
##  Max.   :10.000   Max.   :1.000   Max.   :2.00   Max.   :2.000
x <- c(3,5,7,NA,9)
ifelse(is.na(x),mean(x,na.rm=TRUE),x)
## [1] 3 5 7 6 9

Kategorik değişkenler için:

library(dplyr)
table(screen$RACE)
## 
##   1   2 
## 424  40
library(summarytools)
freq(screen$RACE, 
     round.digits=2,report.nas = FALSE,
 style = "rmarkdown") 
## setting plain.ascii to FALSE
## ### Frequencies  
## #### screen$RACE  
## **Type:** Numeric  
## 
## |    &nbsp; | Freq |      % | % Cum. |
## |----------:|-----:|-------:|-------:|
## |     **1** |  424 |  91.38 |  91.38 |
## |     **2** |   40 |   8.62 | 100.00 |
## | **Total** |  464 | 100.00 | 100.00 |
library(knitr)
freq(screen$MSTATUS,report.nas = FALSE) %>%
  kable(format='markdown', 
      caption="Frekans Tablosu",digits = 2)
Frekans Tablosu
Freq % Valid % Valid Cum. % Total % Total Cum.
1 102 21.98 21.98 21.98 21.98
2 362 78.02 100.00 78.02 100.00
0 NA NA 0.00 100.00
Total 464 100.00 100.00 100.00 100.00

summarytools paketini incele.

Sürekli değişkenlerde uç değerler:

library(outliers)
z.scores <- screen %>%  
 select(2:5) %>% 
 scores(type = "z") %>%
 round(2)
head(z.scores)
##   TIMEDRS ATTDRUG ATTHOUSE INCOME
## 1   -0.63    0.27     0.77   0.34
## 2   -0.45   -0.60    -0.79   0.76
## 3   -0.72    0.27    -0.12  -0.51
## 4    0.46    1.13     0.99   1.61
## 5    0.65   -0.60     0.10  -1.36
## 6   -0.45    0.27     0.33  -0.09
summarytools::descr(z.scores,
 stats     = c("min", "max"),
 transpose = TRUE,
 headings  = FALSE) 
## 
##                    Min    Max
## -------------- ------- ------
##        ATTDRUG   -2.33   2.00
##       ATTHOUSE   -4.80   2.56
##         INCOME   -1.36   2.46
##        TIMEDRS   -0.72   6.67
library(DT)

DT::datatable(z.scores)
library(ggplot2)
ggplot(screen, aes(x = TIMEDRS)) +
  geom_histogram(bins = 30L, fill = "#0c4c8a")

# library(ggpmisc)
ggplot(screen, aes(x = TIMEDRS)) + geom_histogram() + 
geom_vline(xintercept =7.914, color = "red", 
linetype = "dashed") + 
annotate("text", label = "Ort = 7.913", x = 10, y = 100,  color ="black")
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

  ggplot(screen, aes(x = TIMEDRS)) +
 geom_histogram(aes(y=..density..))+
 geom_density(alpha=.5, fill="#0c4c8a") +
  theme_minimal()
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

library(plotly)
plot_ly(x = screen$TIMEDRS,  type = "histogram", 
histnorm = "probability")
ggplot(screen, aes(y = TIMEDRS)) + 
  geom_boxplot() 

out <- boxplot.stats(screen$TIMEDRS)$out
out
##  [1] 60 23 39 33 38 34 27 30 25 49 60 27 27 52 24 57 52 58 57 43 37 75 29 30 25
## [26] 37 56 29 37 81 27 23
out_ind <- which(screen$TIMEDRS %in% c(out))
out_ind
##  [1]  40  64  67  76  79  96 102 117 150 163 168 170 178 193 203 206 213 249 274
## [20] 278 285 289 309 342 344 362 367 374 388 404 408 443
plot_ly(y = screen$TIMEDRS, type = 'box') 
plot_ly(y = screen$TIMEDRS, type = 'box')  %>% 
  layout(title = 'Box Plot',
annotations = list( x = -0.01,  y = boxplot.stats(screen$TIMEDRS)$out, 
text = paste(out_ind), showarrow = FALSE,
xanchor = "right"))
ggplot(screen, aes(x = factor(MSTATUS), 
y = TIMEDRS, fill = factor(MSTATUS))) +
  geom_boxplot() +
  theme_minimal()

ggplot(screen) + aes(x =  ATTDRUG) +
  geom_histogram( bins = 6, fill = "#0c4c8a")+
theme_minimal()

ggplot(screen) +
aes(x =  ATTHOUSE) +
geom_histogram( bins = 10, fill = "darkgreen") +
theme_minimal()

plot_ly(y = screen$ATTHOUSE, type = 'box')
screen[c(260,298),]
## # A tibble: 2 × 8
##   SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS  RACE
##   <dbl>   <dbl>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl> <dbl>
## 1   346       2       8        2      1       0       1     1
## 2   407       2       8        2      4       0       1     1
screen2 <- screen[-c(260,298),]

Mahalanobis Uzaklığı

library(psych)
veri <- screen2[,1:5]
md <- mahalanobis(veri, center = colMeans(veri), cov = cov(veri))
head(md,20)
##  [1]  3.785517  4.541493  3.501077  7.281365  5.457240  2.896550  5.807898
##  [8]  3.879478  4.751166  7.415405 10.602100  5.249121  6.073732  3.271885
## [15] 12.316463  4.440749  4.836160  6.362806  4.126524 10.797545
library(psych)
alpha <- .001
cutoff <- (qchisq(p = 1 - alpha, df = ncol(veri)))
cutoff
## [1] 20.51501
ucdegerler <- which(md > cutoff)
veri[ucdegerler, ]
## # A tibble: 9 × 5
##   SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME
##   <dbl>   <dbl>   <dbl>    <dbl>  <dbl>
## 1    48      60       7       24      1
## 2   235      60      10       29      4
## 3   276      57       9       24      2
## 4   291      52       8       19      1
## 5   330      58       7       29      4
## 6   370      57       8       23      4
## 7   398      75       9       33      9
## 8   502      56       8       19      3
## 9   548      81       8       24      9
data_temiz <- veri[-ucdegerler, ]
veri[ucdegerler, ]
## # A tibble: 9 × 5
##   SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME
##   <dbl>   <dbl>   <dbl>    <dbl>  <dbl>
## 1    48      60       7       24      1
## 2   235      60      10       29      4
## 3   276      57       9       24      2
## 4   291      52       8       19      1
## 5   330      58       7       29      4
## 6   370      57       8       23      4
## 7   398      75       9       33      9
## 8   502      56       8       19      3
## 9   548      81       8       24      9

Çok Değişkenli Normallik Sayıltısı

library(sur)
attach(screen)

skew(screen$TIMEDRS)
## [1] 3.234045
sew(data_temiz$TIMEDRS)
## NULL
se.skew(TIMEDRS)
## [1] 0.1133494
skew.ratio(TIMEDRS)
## [1] 28.53164
skew(TIMEDRS)/se.skew(TIMEDRS)
## [1] 28.53164

jarque.test fonksiyonu veri normal dağılımdan farklılaşmamaktadır yokluk hipotezini test etmektedir.

library(moments)
library(labelled)
jarque.test(remove_labels(TIMEDRS))
## 
##  Jarque-Bera Normality Test
## 
## data:  remove_labels(TIMEDRS)
## JB = 4034.9, p-value < 2.2e-16
## alternative hypothesis: greater
jarque.test(remove_labels(ATTDRUG))
## 
##  Jarque-Bera Normality Test
## 
## data:  remove_labels(ATTDRUG)
## JB = 5.0552, p-value = 0.07985
## alternative hypothesis: greater
skew.ratio(ATTDRUG)
## [1] -1.10762
jarque.test(remove_labels(ATTHOUSE))
## 
##  Jarque-Bera Normality Test
## 
## data:  remove_labels(ATTHOUSE)
## JB = 61.092, p-value = 5.418e-14
## alternative hypothesis: greater
set.seed(0)
normal <- rnorm(200)
non_normal <- rexp(200, rate=3)
par(mfrow=c(1,2))
hist(normal, col='steelblue', main='Normal')
hist(non_normal, col='steelblue', main='Non-normal')

par(mfrow=c(1,2))
qqnorm(normal, main='Normal')
qqline(normal)
qqnorm(non_normal, main='Non-normal')

ggplot(data=screen, aes(sample=ATTHOUSE))+
  geom_qq()+
  geom_qq_line()

Burası ek not, derste yazmışım :)

##any_na(TRUSA)
#n_miss(TRUSA)
#prop_miss(TRUSA)

#TRUSA %>% is.na() %>% colSums()

#miss_var_summary(TRUSA)
#miss_var_table(TRUSA)
TRUSA$toplam <-rowSums(TRUSA[,3:37],na.rm =TRUE)
TRUSA$toplam
##    [1] 10 18  8 11  6 21 19 18 19 20 12  6  7 28 15 12 26 18  9 25 13  7  5 22
##   [25] 20 16 26 23 10 13  9 27 20  7  6  5  7  5  7  6 30 28 30  6 20  8 29 13
##   [49] 12  5  8  9 11  7 15 12 21 22 12  7  4  6 27 13 22 20  7 15  3 10  6 28
##   [73]  8 12 30 30 21 13 11 12  9 14 19  7 16 13  6 30 27  8  9 18 11 31  5  8
##   [97]  5 10 27  9  3  7  8 16 22  6 18 10  9  3  7 16 14  7 21  8  8  9  7 14
##  [121] 27  7 26 20  9 12 20 29 17 24 23  9  2 26 22 10 21  9 12 22 25 11 13  5
##  [145]  9  6 16 13 10  7 15 11 15 27 21  4 14 12 10  9 13  6  9 11  5 13 15 21
##  [169] 12 26 29 23  8 11 14  9  7  5  8  8  9 27 22 16 15  5  8 19 12  8 22 10
##  [193] 10  6 24 19 16 16  8 17  6 12 24  9  9  8  7 21 18 11  9  7 15  9 26  6
##  [217] 24  7 13 10 31  8 23  5  6 11  8 13  8 15  9  9 15  7 25  9 17  8  6 10
##  [241] 31 28 32 32 11 13 17 28  7 21  4 13  8  7 12 26 25 11  7 18 21 12 16 24
##  [265]  6  6 12  4  8  6  4  7  4  8 21  7 29  3 24 17 12  6 15  7  8 15 11 10
##  [289] 17  8  3  6  9 16 13  6 17 11  4  6 16  8 10  6  6  6 14 26 12  7 16  2
##  [313] 28  4  6 27 17 14  7  3 17  8  6  5  8  9  9 25  4  6  7 28 16  7  7  7
##  [337]  3 13 26  8 25 19  9  8 12 15 18 16 12 13 19 29  5 32 10 10 22  4  6  9
##  [361]  5 10 10  6  6  4 24  8  6  7 20 11 14 15  8 24 23 25 16 21 25  8 10  8
##  [385]  5  9  9  8  6  6 28  7 27  6 13  9  6  5  8 21 20  9  8 10 16 28  4 11
##  [409]  6 24 17 10  5  8 26  9 17 23 19  8  6 15 16 23 21 14 21 13 20  4 17 19
##  [433]  5 28 32  7 22  8 19 20 27 13  8 29 33 16 12 26 22 19 20 11 10  1  1  1
##  [457] 14 22 22 25 14  6 17 28 29 30 28 28 18 22 21 26  7  7 18 11  6 27 18  5
##  [481] 17 23 19 22 15  9 22 29 12 23 27 17 23  8 23 14 18 24 28 28 32 21 15 10
##  [505] 34 20 17 19 14 16  6 17 14  9 20  8 12  7  8 18 17  7 22 11 21 17 26 15
##  [529] 11 27  5  9  4 23  6 23 19 16 20  8  6 14 24  7 19 29  8 22 24 21 10 22
##  [553] 22 13 17 28 28 24 10 15  4 24 22 19 18 16 13  6 17  9 19 13 19  7  8 11
##  [577] 11 19 16 11 19 16 22 22 31 16 30 30 19 25 11  8 19  5 30 12 15 28 20 19
##  [601] 22  3 18 18 19 21 23 24 22  9 32 20  9  9  6 29 20  9 23 26 15 13 14  5
##  [625] 10 15 17 32 31 31 20 10 15 21 22 22 30 31 17 20 32 20 11  9 12 13 10 17
##  [649] 22  6  6 15 21 18  9  6 22 14  9 24 16 10 28 26 15 19  6 18 27 16  6  7
##  [673]  4  8  9 16 30 15 10 10 23 28 18  9 20 12 18 25 31 20 28 27 10 20 24 30
##  [697]  4 23 21 26 20 11 30 16  8  7 32 24  7  9 22 17 24 29 12  7 17 27 11 17
##  [721]  9  8 15  6  9 27 30  7 11 10 11 19 27 15  1  8  8  6  8 14 12 15  6 18
##  [745]  7 15 21 32  9 11 18 22 25 16 30 27 25 16 12 17 14 18 20 18 19 20 21 22
##  [769] 19 29 24 29 23 11 16 11 17 13 23 22  5  8  9  5 10  7  7 14  4 17 22 10
##  [793]  3 11 23 23 27  8  6  3 10  6 10  8  8 15 15  7 21 16 24 10 12  8 24 24
##  [817] 25 18 17 11 13 17 17 12 26 26 15 13  6 15 20 22  8 29 22  7 11 17 10 22
##  [841] 10  2  4 19  8 26 21 26 22 27 29 22 26  7 19 17 25 13 17 22 12 12  9 14
##  [865] 19 20 16 24 33 29 12 25 30 24 16 23 26 15  6 29 29 17 18  7  7 25 31 17
##  [889] 12 15 15  8 14  7 22 24 12  7 18 10 12 16 20 18 14 32 29 29 24 29 17  6
##  [913] 10 13 24 22 18 30 21  8  7 20 27 25 10 23 26 11 11 12 27 24 28 23 26 19
##  [937] 16 18 30 31 21 26 17 29 18  8 16 23 11 12 17 18 17 17 22 20  4 15 13 11
##  [961] 20 20 12 19 30 15 11 20 32 14 28  6 16 17 25 23 16 10 24 29  8  8 23 22
##  [985] 20 26 29 16 19 20 17 28 22 16 29 22 12  2 11 15 21 23 16 17 19 11  6 15
## [1009] 11 15 16  8  9  9 10 24 19 17 29 18 22 15 17 21 20 10 28 30 20 19 22 15
## [1033] 10 16 15 14 28 15 19  9  5 11 19 26  5 16 25 14 32 23 17 25 17 17 18 21
## [1057] 13  6 19 15 18  5 14 25 10 12 12 14  4 11  6 10  8 15 16 18 18  6 11 10
## [1081] 20 17  8 11 15 20  3 13 12  6 19  2  9 22 17 11 11 16  7 28 25 24 25 20
## [1105] 28 31 14 17 11 21 22 17 28 32 12 23 12 18 29 17 14 15 13 22  8 17 11  7
## [1129]  9  9  7 12 27  9  6  7 26 27 23 22 27 10  5 19 14 29 30 25 26 13 13
veri_1 <- TRUSA %>% 
  group_by(TRUSA$toplam) %>% 
  select(CNT)
## Adding missing grouping variables: `TRUSA$toplam`
veri_1
## # A tibble: 1,151 × 2
## # Groups:   TRUSA$toplam [34]
##    `TRUSA$toplam` CNT  
##             <dbl> <chr>
##  1             10 TUR  
##  2             18 TUR  
##  3              8 TUR  
##  4             11 TUR  
##  5              6 TUR  
##  6             21 TUR  
##  7             19 TUR  
##  8             18 TUR  
##  9             19 TUR  
## 10             20 TUR  
## # ℹ 1,141 more rows

🧠 Öğrenme Günlüğü

R aynı yabancı dil gibi sürekli tekrar etmek gerekiyor. Bugün eksik veri üzerine konuştuk. Yeni nesil normallik testi olan jarque.test()i ilk kez bu derste duydum.