EKSİK VERİ

% 5 ten az veri varsa silinebilir. Ancak verinin neden kayıp veriolduğu da önemli. Belki de sorulan özel bir soru olduğu için cevaplamamış olabilir dikkat etmek gerekli. Kayıp veri tamamen rastlantısal da olabilir. Eksik verinin tamamen rastlantısal olması MCAR testi ile diğer değişkenlere bağlı olarak rastgele eksik olması MAR ve değişkenin kendisine bağlı eksik olan (örneğin; okuduğunu anlama yeterliği ölçülen değişken) MNAR testi ile belirlenmektedir.

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(dplyr)     # veri manipülasyonu için
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(haven)     # spss dosyalarını ortamına aktarır. 
screen <- read_sav("SCREEN.sav")
summary(screen) # veri setini okuduktan sonra summary fonksiyonu ile verinin max-min gibi betimsel istatistiklerini özetleyerek görüntüledik.
##      SUBNO          TIMEDRS          ATTDRUG          ATTHOUSE    
##  Min.   :  1.0   Min.   : 0.000   Min.   : 5.000   Min.   : 2.00  
##  1st Qu.:137.0   1st Qu.: 2.000   1st Qu.: 7.000   1st Qu.:21.00  
##  Median :314.0   Median : 4.000   Median : 8.000   Median :24.00  
##  Mean   :317.4   Mean   : 7.901   Mean   : 7.686   Mean   :23.54  
##  3rd Qu.:483.0   3rd Qu.:10.000   3rd Qu.: 9.000   3rd Qu.:27.00  
##  Max.   :758.0   Max.   :81.000   Max.   :10.000   Max.   :35.00  
##                                                    NA's   :1      
##      INCOME         EMPLMNT         MSTATUS           RACE      
##  Min.   : 1.00   Min.   :0.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 2.50   1st Qu.:0.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median : 4.00   Median :0.000   Median :2.000   Median :1.000  
##  Mean   : 4.21   Mean   :0.471   Mean   :1.778   Mean   :1.088  
##  3rd Qu.: 6.00   3rd Qu.:1.000   3rd Qu.:2.000   3rd Qu.:1.000  
##  Max.   :10.00   Max.   :1.000   Max.   :2.000   Max.   :2.000  
##  NA's   :26
library(psych)
describe(screen[,-1])
##          vars   n  mean    sd median trimmed  mad min max range  skew kurtosis
## TIMEDRS     1 465  7.90 10.95      4    5.61 4.45   0  81    81  3.23    12.88
## ATTDRUG     2 465  7.69  1.16      8    7.71 1.48   5  10     5 -0.12    -0.47
## ATTHOUSE    3 464 23.54  4.48     24   23.62 4.45   2  35    33 -0.45     1.51
## INCOME      4 439  4.21  2.42      4    4.01 2.97   1  10     9  0.58    -0.38
## EMPLMNT     5 465  0.47  0.50      0    0.46 0.00   0   1     1  0.12    -1.99
## MSTATUS     6 465  1.78  0.42      2    1.85 0.00   1   2     1 -1.34    -0.21
## RACE        7 465  1.09  0.28      1    1.00 0.00   1   2     1  2.90     6.40
##            se
## TIMEDRS  0.51
## ATTDRUG  0.05
## ATTHOUSE 0.21
## INCOME   0.12
## EMPLMNT  0.02
## MSTATUS  0.02
## RACE     0.01
library(gtsummary)
screen %>%
  select(2:6) %>%
  tbl_summary(statistic=all_continuous() ~ c ("{min}, {max}"), missing = "always" )
## ! Column(s) "EMPLMNT" are class "haven_labelled".
## ℹ This is an intermediate data structure not meant for analysis.
## ℹ Convert columns with `haven::as_factor()`, `labelled::to_factor()`,
##   `labelled::unlabelled()`, and `unclass()`. Failure to convert may have
##   unintended consequences or result in error.
## <https://haven.tidyverse.org/articles/semantics.html>
## <https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled>
Characteristic N = 4651
Visits to health professionals 0, 81
    Unknown 0
Attitudes toward medication
    5 13 (2.8%)
    6 60 (13%)
    7 126 (27%)
    8 149 (32%)
    9 95 (20%)
    10 22 (4.7%)
    Unknown 0
Attitudes toward housework 2.0, 35.0
    Unknown 1
INCOME 1.00, 10.00
    Unknown 26
Whether currently employed
    0 246 (53%)
    1 219 (47%)
    Unknown 0
1 Min, Max; n (%)
library(vtable)  # özet tablo oluşturur.
## Loading required package: kableExtra
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
sumtable(screen, summ = c ('notNA(x)', 'min(x)', 'max(x)'))
Summary Statistics
Variable NotNA Min Max
SUBNO 465 1 758
TIMEDRS 465 0 81
ATTDRUG 465 5 10
ATTHOUSE 464 2 35
INCOME 439 1 10
MSTATUS 465 1 2
RACE 465 1 2
st(screen, summ = c('notNA(x)', 'min(x)', 'max(x)'),summ.names = c('Frekans', 'minimum', 'maksimum'))
Summary Statistics
Variable Frekans minimum maksimum
SUBNO 465 1 758
TIMEDRS 465 0 81
ATTDRUG 465 5 10
ATTHOUSE 464 2 35
INCOME 439 1 10
MSTATUS 465 1 2
RACE 465 1 2
kable(describe(screen[,-1]),format = 'markdown', caption = "betimsel istatistikler", digits = 2)   # describe fonk. ile daha detaylı betimsel istatistikler aldık.
betimsel istatistikler
vars n mean sd median trimmed mad min max range skew kurtosis se
TIMEDRS 1 465 7.90 10.95 4 5.61 4.45 0 81 81 3.23 12.88 0.51
ATTDRUG 2 465 7.69 1.16 8 7.71 1.48 5 10 5 -0.12 -0.47 0.05
ATTHOUSE 3 464 23.54 4.48 24 23.62 4.45 2 35 33 -0.45 1.51 0.21
INCOME 4 439 4.21 2.42 4 4.01 2.97 1 10 9 0.58 -0.38 0.12
EMPLMNT 5 465 0.47 0.50 0 0.46 0.00 0 1 1 0.12 -1.99 0.02
MSTATUS 6 465 1.78 0.42 2 1.85 0.00 1 2 1 -1.34 -0.21 0.02
RACE 7 465 1.09 0.28 1 1.00 0.00 1 2 1 2.90 6.40 0.01
 # kable fonk. ile markdown formatında tablolar oluşturulur.
library(skimr)   # skimr ile veri setinin detaylı özeti alındı.
skim(screen)
Data summary
Name screen
Number of rows 465
Number of columns 8
_______________________
Column type frequency:
numeric 8
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
SUBNO 0 1.00 317.38 194.16 1 137.0 314 483 758 ▇▆▆▇▁
TIMEDRS 0 1.00 7.90 10.95 0 2.0 4 10 81 ▇▁▁▁▁
ATTDRUG 0 1.00 7.69 1.16 5 7.0 8 9 10 ▃▇▇▅▁
ATTHOUSE 1 1.00 23.54 4.48 2 21.0 24 27 35 ▁▁▅▇▂
INCOME 26 0.94 4.21 2.42 1 2.5 4 6 10 ▆▇▅▃▂
EMPLMNT 0 1.00 0.47 0.50 0 0.0 0 1 1 ▇▁▁▁▇
MSTATUS 0 1.00 1.78 0.42 1 2.0 2 2 2 ▂▁▁▁▇
RACE 0 1.00 1.09 0.28 1 1.0 1 1 2 ▇▁▁▁▁
library(DataExplorer) # veri seti hakkında otomatik rapor oluşturdu.
create_report(screen)
## 
## 
## processing file: report.rmd
##   |                                             |                                     |   0%  |                                             |.                                    |   2%                                   |                                             |..                                   |   5% [global_options]                  |                                             |...                                  |   7%                                   |                                             |....                                 |  10% [introduce]                       |                                             |....                                 |  12%                                   |                                             |.....                                |  14% [plot_intro]
##   |                                             |......                               |  17%                                   |                                             |.......                              |  19% [data_structure]                  |                                             |........                             |  21%                                   |                                             |.........                            |  24% [missing_profile]
##   |                                             |..........                           |  26%                                   |                                             |...........                          |  29% [univariate_distribution_header]  |                                             |...........                          |  31%                                   |                                             |............                         |  33% [plot_histogram]
##   |                                             |.............                        |  36%                                   |                                             |..............                       |  38% [plot_density]                    |                                             |...............                      |  40%                                   |                                             |................                     |  43% [plot_frequency_bar]              |                                             |.................                    |  45%                                   |                                             |..................                   |  48% [plot_response_bar]               |                                             |..................                   |  50%                                   |                                             |...................                  |  52% [plot_with_bar]                   |                                             |....................                 |  55%                                   |                                             |.....................                |  57% [plot_normal_qq]
##   |                                             |......................               |  60%                                   |                                             |.......................              |  62% [plot_response_qq]                |                                             |........................             |  64%                                   |                                             |.........................            |  67% [plot_by_qq]                      |                                             |..........................           |  69%                                   |                                             |..........................           |  71% [correlation_analysis]
##   |                                             |...........................          |  74%                                   |                                             |............................         |  76% [principal_component_analysis]
##   |                                             |.............................        |  79%                                   |                                             |..............................       |  81% [bivariate_distribution_header]   |                                             |...............................      |  83%                                   |                                             |................................     |  86% [plot_response_boxplot]           |                                             |.................................    |  88%                                   |                                             |.................................    |  90% [plot_by_boxplot]                 |                                             |..................................   |  93%                                   |                                             |...................................  |  95% [plot_response_scatterplot]       |                                             |.................................... |  98%                                   |                                             |.....................................| 100% [plot_by_scatterplot]           
## output file: /Users/zarifetastan/Desktop/DOKTORA/R dersi /1. hafta/report.knit.md
## /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/pandoc +RTS -K512m -RTS '/Users/zarifetastan/Desktop/DOKTORA/R dersi /1. hafta/report.knit.md' --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc22be5688d188.html --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/table-classes.lua --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /var/folders/h4/bh3sky6n2n59m0ds3_g0pzlm0000gn/T//RtmpXYXRVx/rmarkdown-str22be263aae4b.html
## 
## Output created: report.html
library(expss)
## Loading required package: maditr
## 
## To modify variables or add new variables:
##              let(mtcars, new_var = 42, new_var2 = new_var*hp) %>% head()
## 
## Attaching package: 'maditr'
## The following objects are masked from 'package:data.table':
## 
##     copy, dcast, let, melt
## The following object is masked from 'package:skimr':
## 
##     to_long
## The following objects are masked from 'package:dplyr':
## 
##     between, coalesce, first, last
## 
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
##  To return to the console output, use 'expss_output_default()'.
## 
## Attaching package: 'expss'
## The following objects are masked from 'package:data.table':
## 
##     copy, fctr, like
## The following object is masked from 'package:DataExplorer':
## 
##     split_columns
## The following objects are masked from 'package:gtsummary':
## 
##     contains, vars, where
## The following objects are masked from 'package:haven':
## 
##     is.labelled, read_spss
## The following objects are masked from 'package:dplyr':
## 
##     compute, contains, na_if, recode, vars, where
screen <- expss::drop_var_labs(screen)
head(screen)
## # A tibble: 6 × 8
##   SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS  RACE
##   <dbl>   <dbl>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl> <dbl>
## 1     1       1       8       27      5       1       2     1
## 2     2       3       7       20      6       0       2     1
## 3     3       0       8       23      3       0       2     1
## 4     4      13       9       28      8       1       2     1
## 5     5      15       7       24      1       1       2     1
## 6     6       3       8       25      4       0       2     1
library(naniar)  # naniar ve ggplot2 ile kayıp veri incelendi.
## 
## Attaching package: 'naniar'
## The following object is masked from 'package:expss':
## 
##     is_na
## The following object is masked from 'package:skimr':
## 
##     n_complete
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:expss':
## 
##     vars
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
any_na(screen)
## [1] TRUE
n_miss(screen)     # kayıp veri sayısı
## [1] 27
prop_miss(screen)     # kayıp veri oranı
## [1] 0.007258065
screen %>% is.na() %>% colSums()
##    SUBNO  TIMEDRS  ATTDRUG ATTHOUSE   INCOME  EMPLMNT  MSTATUS     RACE 
##        0        0        0        1       26        0        0        0
miss_var_summary(screen)
## # A tibble: 8 × 3
##   variable n_miss pct_miss
##   <chr>     <int>    <num>
## 1 INCOME       26    5.59 
## 2 ATTHOUSE      1    0.215
## 3 SUBNO         0    0    
## 4 TIMEDRS       0    0    
## 5 ATTDRUG       0    0    
## 6 EMPLMNT       0    0    
## 7 MSTATUS       0    0    
## 8 RACE          0    0
miss_var_table(screen)
## # A tibble: 3 × 3
##   n_miss_in_var n_vars pct_vars
##           <int>  <int>    <dbl>
## 1             0      6     75  
## 2             1      1     12.5
## 3            26      1     12.5
miss_case_summary(screen)
## # A tibble: 465 × 3
##     case n_miss pct_miss
##    <int>  <int>    <dbl>
##  1    52      1     12.5
##  2    64      1     12.5
##  3    69      1     12.5
##  4    77      1     12.5
##  5   118      1     12.5
##  6   135      1     12.5
##  7   161      1     12.5
##  8   172      1     12.5
##  9   173      1     12.5
## 10   174      1     12.5
## # ℹ 455 more rows
miss_case_table(screen)
## # A tibble: 2 × 3
##   n_miss_in_case n_cases pct_cases
##            <int>   <int>     <dbl>
## 1              0     438     94.2 
## 2              1      27      5.81
library(rlang)
## 
## Attaching package: 'rlang'
## The following object is masked from 'package:expss':
## 
##     is_na
## The following object is masked from 'package:maditr':
## 
##     :=
## The following object is masked from 'package:data.table':
## 
##     :=
library(ggplot2)
library(UpSetR)
library(naniar)
gg_miss_upset(screen)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the UpSetR package.
##   Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the UpSetR package.
##   Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

vis_miss(screen)

library(naniar)
mcar_test(data = screen[,c(2,3,4,5,7,8)])  # kayıp veri bakıldı.
## # A tibble: 1 × 4
##   statistic    df p.value missing.patterns
##       <dbl> <dbl>   <dbl>            <int>
## 1      18.7    10  0.0440                3
screen2 <- screen
screen2$INCOME_m <- screen2$INCOME
library(finalfit)       # finalfit paketi ile veri kaybının diğer değişkenler ile ilişkili olup olmadığına bakılır t test ile.
explanatory = c("TIMEDRS", "ATTDRUG", "ATTHOUSE")
dependent = "INCOME_m"
screen2 %>%
  missing_compare(dependent,explanatory) %>%
  knitr :: kable (row.names = FALSE,align = c("l", "l", "r", "r", "r"), 
                  caption = "eksik veriye sahip olan ve olmayan değişkenlerin ortalama karşılaştırması")
eksik veriye sahip olan ve olmayan değişkenlerin ortalama karşılaştırması
Missing data analysis: INCOME_m Not missing Missing p
TIMEDRS Mean (SD) 7.9 (11.1) 7.6 (7.4) 0.891
ATTDRUG Mean (SD) 7.7 (1.2) 7.9 (1.0) 0.368
ATTHOUSE Mean (SD) 23.5 (4.5) 23.7 (4.2) 0.860

KAYIP VERİ İLE BAŞ ETME YÖNTEMLERİ

  • VERİ SİLMEYE DAYALI YÖNTEM A. LİSTE BAZINDA SİLME B. ÇİFTLER BAZINDA SİLME

*VERİ ATAMAYA DAYALI YÖNTEM A. ORTALAMA ATAMA B. ORTANCA ATAMA C. REGRESYONA DAYALI ATAMA D. BEKLENTİ MAKSİMİZASYONU E. ÇOKLU ATAMA

NOT: ORTALAMA ATAMAK YERİNE MEDYAN ATAMAK DAHA DOĞRUDUR. DEĞİŞKENLİĞİ AZALTIR.

na.omit (screen)  # na.omit ile liste bazında silme yaptık.
## # A tibble: 438 × 8
##    SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS  RACE
##    <dbl>   <dbl>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl> <dbl>
##  1     1       1       8       27      5       1       2     1
##  2     2       3       7       20      6       0       2     1
##  3     3       0       8       23      3       0       2     1
##  4     4      13       9       28      8       1       2     1
##  5     5      15       7       24      1       1       2     1
##  6     6       3       8       25      4       0       2     1
##  7     7       2       7       30      6       1       2     1
##  8     8       0       7       24      6       1       2     1
##  9     9       7       7       20      2       1       2     1
## 10    10       4       8       30      8       0       1     1
## # ℹ 428 more rows
screen3 <- screen
screen3$INCOME[is.na(screen3$INCOME)] <- mean(screen3$INCOME, na.rm =TRUE)
summary(screen3$INCOME)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.00    4.00    4.21    6.00   10.00
library(mvdalab)
## 
## Attaching package: 'mvdalab'
## The following object is masked from 'package:psych':
## 
##     smc
dat <- introNAs(iris, percent = 25)
dat_EM <- imputeEM(dat[,-5])
dat_EM

##     Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1       5.100000    3.500000     1.400000   0.2605775
## 2       4.900000    3.000000     1.400000   0.2000000
## 3       4.700000    3.200000     1.300000   0.2000000
## 4       4.600000    3.100000     1.500000   0.1675099
## 5       5.000000    3.600000     1.350640   0.2000000
## 6       5.400000    3.900000     1.700000   0.4000000
## 7       4.600000    3.400000     1.400000   0.3000000
## 8       5.000000    3.464630     1.500000   0.2000000
## 9       4.400000    2.816480     1.400000   0.1311091
## 10      4.900000    3.100000     1.500000   0.1000000
## 11      5.400000    3.700000     1.650034   0.2000000
## 12      4.800000    3.400000     1.600000   0.2000000
## 13      4.569955    3.000000     1.400000   0.1580321
## 14      4.300000    3.000000     1.100000   0.1000000
## 15      5.800000    4.000000     1.780257   0.2000000
## 16      5.700000    4.160741     1.500000   0.4000000
## 17      5.400000    3.900000     1.300000   0.4000000
## 18      5.100000    3.500000     1.400000   0.3000000
## 19      5.318467    3.800000     1.700000   0.3000000
## 20      5.100000    3.800000     1.500000   0.3000000
## 21      5.400000    3.400000     1.700000   0.2000000
## 22      5.100000    3.700000     1.355922   0.1961955
## 23      4.600000    3.166981     1.000000   0.2000000
## 24      5.065494    3.300000     1.700000   0.5000000
## 25      4.800000    3.400000     1.900000   0.2000000
## 26      5.000000    3.000000     1.600000   0.3953391
## 27      5.000000    3.400000     1.600000   0.4000000
## 28      5.200000    3.500000     1.500000   0.3275324
## 29      5.200000    3.400000     1.400000   0.2000000
## 30      4.755735    3.200000     1.454014   0.2000000
## 31      4.750937    3.100000     1.600000   0.2567006
## 32      5.400000    3.618041     1.960561   0.4000000
## 33      5.200000    4.100000     1.500000   0.1000000
## 34      5.500000    4.200000     1.400000   0.2504571
## 35      4.693085    3.100000     1.500000   0.2000000
## 36      5.000000    3.200000     1.200000   0.2000000
## 37      5.500000    3.500000     2.447001   0.6724876
## 38      4.900000    3.600000     1.400000   0.1000000
## 39      5.783064    3.000000     3.770185   1.2291526
## 40      4.914848    3.400000     1.500000   0.2000000
## 41      5.000000    3.500000     1.300000   0.1936225
## 42      4.500000    2.300000     1.300000   0.3000000
## 43      4.400000    3.200000     1.300000   0.2000000
## 44      5.000000    3.500000     1.600000   0.6000000
## 45      5.100000    3.800000     1.900000   0.4000000
## 46      4.800000    3.000000     1.400000   0.3000000
## 47      5.100000    3.800000     1.201324   0.1345443
## 48      4.600000    3.033520     1.400000   0.2000000
## 49      4.826266    3.279563     1.500000   0.2000000
## 50      5.000000    2.883139     2.421321   0.6107047
## 51      7.000000    3.200000     4.700000   1.9259951
## 52      6.400000    3.200000     4.499329   1.5000000
## 53      6.217499    2.990440     4.900000   1.5000000
## 54      5.500000    2.300000     4.302173   1.4123021
## 55      6.500000    2.800000     4.600000   1.5000000
## 56      5.700000    2.800000     4.500000   1.3000000
## 57      6.300000    3.300000     4.319964   1.5017695
## 58      4.900000    2.400000     3.300000   0.8917802
## 59      6.600000    3.441122     4.600000   1.6622401
## 60      5.200000    2.700000     3.657515   1.4000000
## 61      5.000000    2.000000     3.500000   1.0888674
## 62      5.900000    3.000000     4.200000   1.5000000
## 63      6.000000    3.324799     4.000000   1.0000000
## 64      6.100000    2.874835     4.700000   1.6091982
## 65      5.600000    2.900000     3.600000   1.3000000
## 66      6.700000    3.100000     4.759098   1.4000000
## 67      5.600000    2.502050     4.303934   1.5000000
## 68      5.800000    3.317375     3.266934   1.0000000
## 69      6.200000    2.200000     4.500000   1.5000000
## 70      5.600000    2.500000     3.900000   1.1000000
## 71      5.900000    3.200000     4.800000   1.8000000
## 72      6.100000    2.800000     4.000000   1.3000000
## 73      6.300000    2.978249     4.900000   1.7159807
## 74      5.848407    2.800000     4.700000   1.2000000
## 75      6.101488    3.062212     4.300000   1.4722477
## 76      6.600000    3.000000     4.400000   1.7343988
## 77      6.800000    2.800000     6.070307   2.2512627
## 78      6.316377    3.000000     4.813281   1.7000000
## 79      6.000000    2.900000     4.500000   1.5000000
## 80      5.827278    3.093573     3.712993   1.2109524
## 81      5.500000    2.752574     3.800000   1.1000000
## 82      5.500000    2.400000     3.700000   1.0000000
## 83      5.800000    2.700000     3.900000   1.2000000
## 84      6.000000    2.566669     5.100000   1.7340689
## 85      5.400000    2.169454     4.500000   1.5000000
## 86      6.000000    3.400000     3.578953   1.1753760
## 87      6.700000    3.612269     4.700000   1.5000000
## 88      6.300000    2.300000     4.400000   1.3000000
## 89      5.600000    3.000000     3.415460   1.0689911
## 90      5.500000    2.500000     4.000000   1.3000000
## 91      5.500000    2.494021     4.400000   1.2000000
## 92      6.100000    3.007282     4.600000   1.4000000
## 93      5.800000    2.975839     4.000000   1.2000000
## 94      5.000000    2.300000     3.300000   1.0000000
## 95      5.600000    2.700000     4.200000   1.3000000
## 96      5.700000    3.000000     4.200000   1.2000000
## 97      5.700000    2.900000     4.200000   1.3000000
## 98      6.200000    3.183001     4.300000   1.4843275
## 99      5.100000    2.500000     3.000000   1.1000000
## 100     5.700000    2.849359     3.891071   1.3000000
## 101     6.300000    3.300000     6.000000   1.8766559
## 102     5.800000    2.360670     5.100000   1.6985912
## 103     5.827278    3.093573     3.712993   1.2109524
## 104     6.300000    2.900000     5.600000   1.8000000
## 105     6.500000    3.000000     5.800000   2.2000000
## 106     7.600000    3.000000     6.600000   2.6717286
## 107     4.900000    1.741430     4.500000   1.3250272
## 108     7.300000    2.900000     6.300000   2.4978180
## 109     6.700000    2.500000     5.800000   1.8000000
## 110     7.200000    3.600000     6.100000   2.5000000
## 111     6.539349    2.972621     5.100000   2.0000000
## 112     6.400000    2.700000     5.373462   1.9000000
## 113     6.800000    3.048538     5.687773   2.1000000
## 114     5.700000    2.500000     5.000000   2.0000000
## 115     5.800000    2.800000     5.100000   2.4000000
## 116     6.400000    3.200000     5.300000   2.3000000
## 117     6.500000    3.000000     5.500000   1.8000000
## 118     7.700000    3.800000     6.700000   2.5230490
## 119     7.700000    2.600000     6.900000   2.3000000
## 120     6.000000    2.617960     5.000000   1.6984165
## 121     6.900000    2.913018     6.120003   2.3000000
## 122     5.688659    2.800000     3.892386   1.2680470
## 123     7.700000    4.122294     5.563648   2.0000000
## 124     6.300000    2.700000     4.900000   1.8000000
## 125     6.700000    2.942644     5.678090   2.1000000
## 126     7.200000    3.200000     6.000000   1.8000000
## 127     6.223111    2.800000     4.800000   1.8000000
## 128     6.100000    3.000000     4.900000   1.8000000
## 129     6.400000    2.800000     5.600000   2.1000000
## 130     7.200000    3.000000     5.800000   2.3146487
## 131     7.400000    3.495737     6.100000   2.3389366
## 132     7.900000    3.800000     6.400000   2.5453839
## 133     6.400000    2.800000     5.627280   2.2000000
## 134     6.122595    2.800000     5.100000   1.5000000
## 135     6.100000    2.600000     4.606057   1.4000000
## 136     7.700000    3.000000     6.100000   2.3000000
## 137     6.300000    3.400000     5.600000   2.4000000
## 138     6.400000    3.100000     5.500000   1.8636295
## 139     6.000000    3.000000     4.800000   1.5558189
## 140     6.653771    2.947138     5.400000   2.1000000
## 141     6.993250    3.100000     5.600000   2.4000000
## 142     6.900000    3.244158     5.100000   2.3000000
## 143     5.800000    2.700000     5.100000   1.9000000
## 144     6.800000    3.200000     5.900000   2.1044951
## 145     6.700000    2.979911     5.700000   2.0721551
## 146     6.700000    2.978555     5.200000   2.3000000
## 147     6.430902    3.026141     5.000000   1.7851680
## 148     6.500000    3.000000     5.200000   2.0000000
## 149     6.200000    2.171758     6.052220   2.3000000
## 150     5.900000    2.374774     5.100000   1.8000000
library(mice) # MİCE İLE MULTİPLE İMPUTATİON YAPILDI.
## 
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
md.pattern(screen)

##     SUBNO TIMEDRS ATTDRUG EMPLMNT MSTATUS RACE ATTHOUSE INCOME   
## 438     1       1       1       1       1    1        1      1  0
## 26      1       1       1       1       1    1        1      0  1
## 1       1       1       1       1       1    1        0      1  1
##         0       0       0       0       0    0        1     26 27
imputed_data <- mice(screen, m = 5, maksit = 50, method = 'pmm', seed = 50)
## 
##  iter imp variable
##   1   1  ATTHOUSE  INCOME
##   1   2  ATTHOUSE  INCOME
##   1   3  ATTHOUSE  INCOME
##   1   4  ATTHOUSE  INCOME
##   1   5  ATTHOUSE  INCOME
##   2   1  ATTHOUSE  INCOME
##   2   2  ATTHOUSE  INCOME
##   2   3  ATTHOUSE  INCOME
##   2   4  ATTHOUSE  INCOME
##   2   5  ATTHOUSE  INCOME
##   3   1  ATTHOUSE  INCOME
##   3   2  ATTHOUSE  INCOME
##   3   3  ATTHOUSE  INCOME
##   3   4  ATTHOUSE  INCOME
##   3   5  ATTHOUSE  INCOME
##   4   1  ATTHOUSE  INCOME
##   4   2  ATTHOUSE  INCOME
##   4   3  ATTHOUSE  INCOME
##   4   4  ATTHOUSE  INCOME
##   4   5  ATTHOUSE  INCOME
##   5   1  ATTHOUSE  INCOME
##   5   2  ATTHOUSE  INCOME
##   5   3  ATTHOUSE  INCOME
##   5   4  ATTHOUSE  INCOME
##   5   5  ATTHOUSE  INCOME

:)) ÖĞRENME GÜNLÜĞÜ :))

Bu hafta derste eksik veri konusunu öğrendik. Eksik verinin bazen tamamen rastlantısal olabileceğini, bazen diğer değişkenlerle ilişkili olabileceğini, bazen de doğrudan kişinin yanıtlamadığı değişkenin kendisine bağlı olabileceğini konuştuk. Bu noktada MCAR, MAR ve MNAR kavramlarını gördük. Özellikle bu kavramları birbirinden ayırırken biraz zorlandım.

R başlangıç dersini almadan bu dersi almanın zorluğunu fazlası ile hissettim. Temelde var olan eksiklikler dersi anlamamı yavaşlatıp zorlaştırdı. Derste yapılan işlemlerin genel mantığını R üzerinde uygulamaya gelince çok zorlandım. Özellikle çok sayıda foksiyon kullanılması, her paketin farklı bir çıktı vermesi ve bazen uyarı ya da hata mesajlarıyla karşılaşmak beni biraz yordu. İlk ders olması nedeniyle bu durumun normal olduğunu düşünüyorum ama yine de R’ye alışma sürecimin zaman alacağını hissediyorum. Yine de zamanla daha fazla pratik yaptıkça hem R’ye hem de dersin içeriğine daha çok hakim olabileceğimi düşünüyorum. Bu nedenle ilk ders benim için biraz yorucu ama aynı zamanda öğretici bir başlangıç oldu:))