% 5 ten az veri varsa silinebilir. Ancak verinin neden kayıp veriolduğu da önemli. Belki de sorulan özel bir soru olduğu için cevaplamamış olabilir dikkat etmek gerekli. Kayıp veri tamamen rastlantısal da olabilir. Eksik verinin tamamen rastlantısal olması MCAR testi ile diğer değişkenlere bağlı olarak rastgele eksik olması MAR ve değişkenin kendisine bağlı eksik olan (örneğin; okuduğunu anlama yeterliği ölçülen değişken) MNAR testi ile belirlenmektedir.
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(dplyr) # veri manipülasyonu için
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(haven) # spss dosyalarını ortamına aktarır.
screen <- read_sav("SCREEN.sav")
summary(screen) # veri setini okuduktan sonra summary fonksiyonu ile verinin max-min gibi betimsel istatistiklerini özetleyerek görüntüledik.
## SUBNO TIMEDRS ATTDRUG ATTHOUSE
## Min. : 1.0 Min. : 0.000 Min. : 5.000 Min. : 2.00
## 1st Qu.:137.0 1st Qu.: 2.000 1st Qu.: 7.000 1st Qu.:21.00
## Median :314.0 Median : 4.000 Median : 8.000 Median :24.00
## Mean :317.4 Mean : 7.901 Mean : 7.686 Mean :23.54
## 3rd Qu.:483.0 3rd Qu.:10.000 3rd Qu.: 9.000 3rd Qu.:27.00
## Max. :758.0 Max. :81.000 Max. :10.000 Max. :35.00
## NA's :1
## INCOME EMPLMNT MSTATUS RACE
## Min. : 1.00 Min. :0.000 Min. :1.000 Min. :1.000
## 1st Qu.: 2.50 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000
## Median : 4.00 Median :0.000 Median :2.000 Median :1.000
## Mean : 4.21 Mean :0.471 Mean :1.778 Mean :1.088
## 3rd Qu.: 6.00 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.:1.000
## Max. :10.00 Max. :1.000 Max. :2.000 Max. :2.000
## NA's :26
library(psych)
describe(screen[,-1])
## vars n mean sd median trimmed mad min max range skew kurtosis
## TIMEDRS 1 465 7.90 10.95 4 5.61 4.45 0 81 81 3.23 12.88
## ATTDRUG 2 465 7.69 1.16 8 7.71 1.48 5 10 5 -0.12 -0.47
## ATTHOUSE 3 464 23.54 4.48 24 23.62 4.45 2 35 33 -0.45 1.51
## INCOME 4 439 4.21 2.42 4 4.01 2.97 1 10 9 0.58 -0.38
## EMPLMNT 5 465 0.47 0.50 0 0.46 0.00 0 1 1 0.12 -1.99
## MSTATUS 6 465 1.78 0.42 2 1.85 0.00 1 2 1 -1.34 -0.21
## RACE 7 465 1.09 0.28 1 1.00 0.00 1 2 1 2.90 6.40
## se
## TIMEDRS 0.51
## ATTDRUG 0.05
## ATTHOUSE 0.21
## INCOME 0.12
## EMPLMNT 0.02
## MSTATUS 0.02
## RACE 0.01
library(gtsummary)
screen %>%
select(2:6) %>%
tbl_summary(statistic=all_continuous() ~ c ("{min}, {max}"), missing = "always" )
## ! Column(s) "EMPLMNT" are class "haven_labelled".
## ℹ This is an intermediate data structure not meant for analysis.
## ℹ Convert columns with `haven::as_factor()`, `labelled::to_factor()`,
## `labelled::unlabelled()`, and `unclass()`. Failure to convert may have
## unintended consequences or result in error.
## <https://haven.tidyverse.org/articles/semantics.html>
## <https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled>
| Characteristic | N = 4651 |
|---|---|
| Visits to health professionals | 0, 81 |
| Unknown | 0 |
| Attitudes toward medication | |
| 5 | 13 (2.8%) |
| 6 | 60 (13%) |
| 7 | 126 (27%) |
| 8 | 149 (32%) |
| 9 | 95 (20%) |
| 10 | 22 (4.7%) |
| Unknown | 0 |
| Attitudes toward housework | 2.0, 35.0 |
| Unknown | 1 |
| INCOME | 1.00, 10.00 |
| Unknown | 26 |
| Whether currently employed | |
| 0 | 246 (53%) |
| 1 | 219 (47%) |
| Unknown | 0 |
| 1 Min, Max; n (%) | |
library(vtable) # özet tablo oluşturur.
## Loading required package: kableExtra
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
sumtable(screen, summ = c ('notNA(x)', 'min(x)', 'max(x)'))
| Variable | NotNA | Min | Max |
|---|---|---|---|
| SUBNO | 465 | 1 | 758 |
| TIMEDRS | 465 | 0 | 81 |
| ATTDRUG | 465 | 5 | 10 |
| ATTHOUSE | 464 | 2 | 35 |
| INCOME | 439 | 1 | 10 |
| MSTATUS | 465 | 1 | 2 |
| RACE | 465 | 1 | 2 |
st(screen, summ = c('notNA(x)', 'min(x)', 'max(x)'),summ.names = c('Frekans', 'minimum', 'maksimum'))
| Variable | Frekans | minimum | maksimum |
|---|---|---|---|
| SUBNO | 465 | 1 | 758 |
| TIMEDRS | 465 | 0 | 81 |
| ATTDRUG | 465 | 5 | 10 |
| ATTHOUSE | 464 | 2 | 35 |
| INCOME | 439 | 1 | 10 |
| MSTATUS | 465 | 1 | 2 |
| RACE | 465 | 1 | 2 |
kable(describe(screen[,-1]),format = 'markdown', caption = "betimsel istatistikler", digits = 2) # describe fonk. ile daha detaylı betimsel istatistikler aldık.
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TIMEDRS | 1 | 465 | 7.90 | 10.95 | 4 | 5.61 | 4.45 | 0 | 81 | 81 | 3.23 | 12.88 | 0.51 |
| ATTDRUG | 2 | 465 | 7.69 | 1.16 | 8 | 7.71 | 1.48 | 5 | 10 | 5 | -0.12 | -0.47 | 0.05 |
| ATTHOUSE | 3 | 464 | 23.54 | 4.48 | 24 | 23.62 | 4.45 | 2 | 35 | 33 | -0.45 | 1.51 | 0.21 |
| INCOME | 4 | 439 | 4.21 | 2.42 | 4 | 4.01 | 2.97 | 1 | 10 | 9 | 0.58 | -0.38 | 0.12 |
| EMPLMNT | 5 | 465 | 0.47 | 0.50 | 0 | 0.46 | 0.00 | 0 | 1 | 1 | 0.12 | -1.99 | 0.02 |
| MSTATUS | 6 | 465 | 1.78 | 0.42 | 2 | 1.85 | 0.00 | 1 | 2 | 1 | -1.34 | -0.21 | 0.02 |
| RACE | 7 | 465 | 1.09 | 0.28 | 1 | 1.00 | 0.00 | 1 | 2 | 1 | 2.90 | 6.40 | 0.01 |
# kable fonk. ile markdown formatında tablolar oluşturulur.
library(skimr) # skimr ile veri setinin detaylı özeti alındı.
skim(screen)
| Name | screen |
| Number of rows | 465 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| SUBNO | 0 | 1.00 | 317.38 | 194.16 | 1 | 137.0 | 314 | 483 | 758 | ▇▆▆▇▁ |
| TIMEDRS | 0 | 1.00 | 7.90 | 10.95 | 0 | 2.0 | 4 | 10 | 81 | ▇▁▁▁▁ |
| ATTDRUG | 0 | 1.00 | 7.69 | 1.16 | 5 | 7.0 | 8 | 9 | 10 | ▃▇▇▅▁ |
| ATTHOUSE | 1 | 1.00 | 23.54 | 4.48 | 2 | 21.0 | 24 | 27 | 35 | ▁▁▅▇▂ |
| INCOME | 26 | 0.94 | 4.21 | 2.42 | 1 | 2.5 | 4 | 6 | 10 | ▆▇▅▃▂ |
| EMPLMNT | 0 | 1.00 | 0.47 | 0.50 | 0 | 0.0 | 0 | 1 | 1 | ▇▁▁▁▇ |
| MSTATUS | 0 | 1.00 | 1.78 | 0.42 | 1 | 2.0 | 2 | 2 | 2 | ▂▁▁▁▇ |
| RACE | 0 | 1.00 | 1.09 | 0.28 | 1 | 1.0 | 1 | 1 | 2 | ▇▁▁▁▁ |
library(DataExplorer) # veri seti hakkında otomatik rapor oluşturdu.
create_report(screen)
##
##
## processing file: report.rmd
## | | | 0% | |. | 2% | |.. | 5% [global_options] | |... | 7% | |.... | 10% [introduce] | |.... | 12% | |..... | 14% [plot_intro]
## | |...... | 17% | |....... | 19% [data_structure] | |........ | 21% | |......... | 24% [missing_profile]
## | |.......... | 26% | |........... | 29% [univariate_distribution_header] | |........... | 31% | |............ | 33% [plot_histogram]
## | |............. | 36% | |.............. | 38% [plot_density] | |............... | 40% | |................ | 43% [plot_frequency_bar] | |................. | 45% | |.................. | 48% [plot_response_bar] | |.................. | 50% | |................... | 52% [plot_with_bar] | |.................... | 55% | |..................... | 57% [plot_normal_qq]
## | |...................... | 60% | |....................... | 62% [plot_response_qq] | |........................ | 64% | |......................... | 67% [plot_by_qq] | |.......................... | 69% | |.......................... | 71% [correlation_analysis]
## | |........................... | 74% | |............................ | 76% [principal_component_analysis]
## | |............................. | 79% | |.............................. | 81% [bivariate_distribution_header] | |............................... | 83% | |................................ | 86% [plot_response_boxplot] | |................................. | 88% | |................................. | 90% [plot_by_boxplot] | |.................................. | 93% | |................................... | 95% [plot_response_scatterplot] | |.................................... | 98% | |.....................................| 100% [plot_by_scatterplot]
## output file: /Users/zarifetastan/Desktop/DOKTORA/R dersi /1. hafta/report.knit.md
## /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/pandoc +RTS -K512m -RTS '/Users/zarifetastan/Desktop/DOKTORA/R dersi /1. hafta/report.knit.md' --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc22be5688d188.html --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmarkdown/lua/table-classes.lua --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /var/folders/h4/bh3sky6n2n59m0ds3_g0pzlm0000gn/T//RtmpXYXRVx/rmarkdown-str22be263aae4b.html
##
## Output created: report.html
library(expss)
## Loading required package: maditr
##
## To modify variables or add new variables:
## let(mtcars, new_var = 42, new_var2 = new_var*hp) %>% head()
##
## Attaching package: 'maditr'
## The following objects are masked from 'package:data.table':
##
## copy, dcast, let, melt
## The following object is masked from 'package:skimr':
##
## to_long
## The following objects are masked from 'package:dplyr':
##
## between, coalesce, first, last
##
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
## To return to the console output, use 'expss_output_default()'.
##
## Attaching package: 'expss'
## The following objects are masked from 'package:data.table':
##
## copy, fctr, like
## The following object is masked from 'package:DataExplorer':
##
## split_columns
## The following objects are masked from 'package:gtsummary':
##
## contains, vars, where
## The following objects are masked from 'package:haven':
##
## is.labelled, read_spss
## The following objects are masked from 'package:dplyr':
##
## compute, contains, na_if, recode, vars, where
screen <- expss::drop_var_labs(screen)
head(screen)
## # A tibble: 6 × 8
## SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS RACE
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 8 27 5 1 2 1
## 2 2 3 7 20 6 0 2 1
## 3 3 0 8 23 3 0 2 1
## 4 4 13 9 28 8 1 2 1
## 5 5 15 7 24 1 1 2 1
## 6 6 3 8 25 4 0 2 1
library(naniar) # naniar ve ggplot2 ile kayıp veri incelendi.
##
## Attaching package: 'naniar'
## The following object is masked from 'package:expss':
##
## is_na
## The following object is masked from 'package:skimr':
##
## n_complete
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:expss':
##
## vars
## The following objects are masked from 'package:psych':
##
## %+%, alpha
any_na(screen)
## [1] TRUE
n_miss(screen) # kayıp veri sayısı
## [1] 27
prop_miss(screen) # kayıp veri oranı
## [1] 0.007258065
screen %>% is.na() %>% colSums()
## SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS RACE
## 0 0 0 1 26 0 0 0
miss_var_summary(screen)
## # A tibble: 8 × 3
## variable n_miss pct_miss
## <chr> <int> <num>
## 1 INCOME 26 5.59
## 2 ATTHOUSE 1 0.215
## 3 SUBNO 0 0
## 4 TIMEDRS 0 0
## 5 ATTDRUG 0 0
## 6 EMPLMNT 0 0
## 7 MSTATUS 0 0
## 8 RACE 0 0
miss_var_table(screen)
## # A tibble: 3 × 3
## n_miss_in_var n_vars pct_vars
## <int> <int> <dbl>
## 1 0 6 75
## 2 1 1 12.5
## 3 26 1 12.5
miss_case_summary(screen)
## # A tibble: 465 × 3
## case n_miss pct_miss
## <int> <int> <dbl>
## 1 52 1 12.5
## 2 64 1 12.5
## 3 69 1 12.5
## 4 77 1 12.5
## 5 118 1 12.5
## 6 135 1 12.5
## 7 161 1 12.5
## 8 172 1 12.5
## 9 173 1 12.5
## 10 174 1 12.5
## # ℹ 455 more rows
miss_case_table(screen)
## # A tibble: 2 × 3
## n_miss_in_case n_cases pct_cases
## <int> <int> <dbl>
## 1 0 438 94.2
## 2 1 27 5.81
library(rlang)
##
## Attaching package: 'rlang'
## The following object is masked from 'package:expss':
##
## is_na
## The following object is masked from 'package:maditr':
##
## :=
## The following object is masked from 'package:data.table':
##
## :=
library(ggplot2)
library(UpSetR)
library(naniar)
gg_miss_upset(screen)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the UpSetR package.
## Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the UpSetR package.
## Please report the issue to the authors.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
vis_miss(screen)
library(naniar)
mcar_test(data = screen[,c(2,3,4,5,7,8)]) # kayıp veri bakıldı.
## # A tibble: 1 × 4
## statistic df p.value missing.patterns
## <dbl> <dbl> <dbl> <int>
## 1 18.7 10 0.0440 3
screen2 <- screen
screen2$INCOME_m <- screen2$INCOME
library(finalfit) # finalfit paketi ile veri kaybının diğer değişkenler ile ilişkili olup olmadığına bakılır t test ile.
explanatory = c("TIMEDRS", "ATTDRUG", "ATTHOUSE")
dependent = "INCOME_m"
screen2 %>%
missing_compare(dependent,explanatory) %>%
knitr :: kable (row.names = FALSE,align = c("l", "l", "r", "r", "r"),
caption = "eksik veriye sahip olan ve olmayan değişkenlerin ortalama karşılaştırması")
| Missing data analysis: INCOME_m | Not missing | Missing | p | |
|---|---|---|---|---|
| TIMEDRS | Mean (SD) | 7.9 (11.1) | 7.6 (7.4) | 0.891 |
| ATTDRUG | Mean (SD) | 7.7 (1.2) | 7.9 (1.0) | 0.368 |
| ATTHOUSE | Mean (SD) | 23.5 (4.5) | 23.7 (4.2) | 0.860 |
*VERİ ATAMAYA DAYALI YÖNTEM A. ORTALAMA ATAMA B. ORTANCA ATAMA C. REGRESYONA DAYALI ATAMA D. BEKLENTİ MAKSİMİZASYONU E. ÇOKLU ATAMA
NOT: ORTALAMA ATAMAK YERİNE MEDYAN ATAMAK DAHA DOĞRUDUR. DEĞİŞKENLİĞİ AZALTIR.
na.omit (screen) # na.omit ile liste bazında silme yaptık.
## # A tibble: 438 × 8
## SUBNO TIMEDRS ATTDRUG ATTHOUSE INCOME EMPLMNT MSTATUS RACE
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 8 27 5 1 2 1
## 2 2 3 7 20 6 0 2 1
## 3 3 0 8 23 3 0 2 1
## 4 4 13 9 28 8 1 2 1
## 5 5 15 7 24 1 1 2 1
## 6 6 3 8 25 4 0 2 1
## 7 7 2 7 30 6 1 2 1
## 8 8 0 7 24 6 1 2 1
## 9 9 7 7 20 2 1 2 1
## 10 10 4 8 30 8 0 1 1
## # ℹ 428 more rows
screen3 <- screen
screen3$INCOME[is.na(screen3$INCOME)] <- mean(screen3$INCOME, na.rm =TRUE)
summary(screen3$INCOME)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.00 4.00 4.21 6.00 10.00
library(mvdalab)
##
## Attaching package: 'mvdalab'
## The following object is masked from 'package:psych':
##
## smc
dat <- introNAs(iris, percent = 25)
dat_EM <- imputeEM(dat[,-5])
dat_EM
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.100000 3.500000 1.400000 0.2605775
## 2 4.900000 3.000000 1.400000 0.2000000
## 3 4.700000 3.200000 1.300000 0.2000000
## 4 4.600000 3.100000 1.500000 0.1675099
## 5 5.000000 3.600000 1.350640 0.2000000
## 6 5.400000 3.900000 1.700000 0.4000000
## 7 4.600000 3.400000 1.400000 0.3000000
## 8 5.000000 3.464630 1.500000 0.2000000
## 9 4.400000 2.816480 1.400000 0.1311091
## 10 4.900000 3.100000 1.500000 0.1000000
## 11 5.400000 3.700000 1.650034 0.2000000
## 12 4.800000 3.400000 1.600000 0.2000000
## 13 4.569955 3.000000 1.400000 0.1580321
## 14 4.300000 3.000000 1.100000 0.1000000
## 15 5.800000 4.000000 1.780257 0.2000000
## 16 5.700000 4.160741 1.500000 0.4000000
## 17 5.400000 3.900000 1.300000 0.4000000
## 18 5.100000 3.500000 1.400000 0.3000000
## 19 5.318467 3.800000 1.700000 0.3000000
## 20 5.100000 3.800000 1.500000 0.3000000
## 21 5.400000 3.400000 1.700000 0.2000000
## 22 5.100000 3.700000 1.355922 0.1961955
## 23 4.600000 3.166981 1.000000 0.2000000
## 24 5.065494 3.300000 1.700000 0.5000000
## 25 4.800000 3.400000 1.900000 0.2000000
## 26 5.000000 3.000000 1.600000 0.3953391
## 27 5.000000 3.400000 1.600000 0.4000000
## 28 5.200000 3.500000 1.500000 0.3275324
## 29 5.200000 3.400000 1.400000 0.2000000
## 30 4.755735 3.200000 1.454014 0.2000000
## 31 4.750937 3.100000 1.600000 0.2567006
## 32 5.400000 3.618041 1.960561 0.4000000
## 33 5.200000 4.100000 1.500000 0.1000000
## 34 5.500000 4.200000 1.400000 0.2504571
## 35 4.693085 3.100000 1.500000 0.2000000
## 36 5.000000 3.200000 1.200000 0.2000000
## 37 5.500000 3.500000 2.447001 0.6724876
## 38 4.900000 3.600000 1.400000 0.1000000
## 39 5.783064 3.000000 3.770185 1.2291526
## 40 4.914848 3.400000 1.500000 0.2000000
## 41 5.000000 3.500000 1.300000 0.1936225
## 42 4.500000 2.300000 1.300000 0.3000000
## 43 4.400000 3.200000 1.300000 0.2000000
## 44 5.000000 3.500000 1.600000 0.6000000
## 45 5.100000 3.800000 1.900000 0.4000000
## 46 4.800000 3.000000 1.400000 0.3000000
## 47 5.100000 3.800000 1.201324 0.1345443
## 48 4.600000 3.033520 1.400000 0.2000000
## 49 4.826266 3.279563 1.500000 0.2000000
## 50 5.000000 2.883139 2.421321 0.6107047
## 51 7.000000 3.200000 4.700000 1.9259951
## 52 6.400000 3.200000 4.499329 1.5000000
## 53 6.217499 2.990440 4.900000 1.5000000
## 54 5.500000 2.300000 4.302173 1.4123021
## 55 6.500000 2.800000 4.600000 1.5000000
## 56 5.700000 2.800000 4.500000 1.3000000
## 57 6.300000 3.300000 4.319964 1.5017695
## 58 4.900000 2.400000 3.300000 0.8917802
## 59 6.600000 3.441122 4.600000 1.6622401
## 60 5.200000 2.700000 3.657515 1.4000000
## 61 5.000000 2.000000 3.500000 1.0888674
## 62 5.900000 3.000000 4.200000 1.5000000
## 63 6.000000 3.324799 4.000000 1.0000000
## 64 6.100000 2.874835 4.700000 1.6091982
## 65 5.600000 2.900000 3.600000 1.3000000
## 66 6.700000 3.100000 4.759098 1.4000000
## 67 5.600000 2.502050 4.303934 1.5000000
## 68 5.800000 3.317375 3.266934 1.0000000
## 69 6.200000 2.200000 4.500000 1.5000000
## 70 5.600000 2.500000 3.900000 1.1000000
## 71 5.900000 3.200000 4.800000 1.8000000
## 72 6.100000 2.800000 4.000000 1.3000000
## 73 6.300000 2.978249 4.900000 1.7159807
## 74 5.848407 2.800000 4.700000 1.2000000
## 75 6.101488 3.062212 4.300000 1.4722477
## 76 6.600000 3.000000 4.400000 1.7343988
## 77 6.800000 2.800000 6.070307 2.2512627
## 78 6.316377 3.000000 4.813281 1.7000000
## 79 6.000000 2.900000 4.500000 1.5000000
## 80 5.827278 3.093573 3.712993 1.2109524
## 81 5.500000 2.752574 3.800000 1.1000000
## 82 5.500000 2.400000 3.700000 1.0000000
## 83 5.800000 2.700000 3.900000 1.2000000
## 84 6.000000 2.566669 5.100000 1.7340689
## 85 5.400000 2.169454 4.500000 1.5000000
## 86 6.000000 3.400000 3.578953 1.1753760
## 87 6.700000 3.612269 4.700000 1.5000000
## 88 6.300000 2.300000 4.400000 1.3000000
## 89 5.600000 3.000000 3.415460 1.0689911
## 90 5.500000 2.500000 4.000000 1.3000000
## 91 5.500000 2.494021 4.400000 1.2000000
## 92 6.100000 3.007282 4.600000 1.4000000
## 93 5.800000 2.975839 4.000000 1.2000000
## 94 5.000000 2.300000 3.300000 1.0000000
## 95 5.600000 2.700000 4.200000 1.3000000
## 96 5.700000 3.000000 4.200000 1.2000000
## 97 5.700000 2.900000 4.200000 1.3000000
## 98 6.200000 3.183001 4.300000 1.4843275
## 99 5.100000 2.500000 3.000000 1.1000000
## 100 5.700000 2.849359 3.891071 1.3000000
## 101 6.300000 3.300000 6.000000 1.8766559
## 102 5.800000 2.360670 5.100000 1.6985912
## 103 5.827278 3.093573 3.712993 1.2109524
## 104 6.300000 2.900000 5.600000 1.8000000
## 105 6.500000 3.000000 5.800000 2.2000000
## 106 7.600000 3.000000 6.600000 2.6717286
## 107 4.900000 1.741430 4.500000 1.3250272
## 108 7.300000 2.900000 6.300000 2.4978180
## 109 6.700000 2.500000 5.800000 1.8000000
## 110 7.200000 3.600000 6.100000 2.5000000
## 111 6.539349 2.972621 5.100000 2.0000000
## 112 6.400000 2.700000 5.373462 1.9000000
## 113 6.800000 3.048538 5.687773 2.1000000
## 114 5.700000 2.500000 5.000000 2.0000000
## 115 5.800000 2.800000 5.100000 2.4000000
## 116 6.400000 3.200000 5.300000 2.3000000
## 117 6.500000 3.000000 5.500000 1.8000000
## 118 7.700000 3.800000 6.700000 2.5230490
## 119 7.700000 2.600000 6.900000 2.3000000
## 120 6.000000 2.617960 5.000000 1.6984165
## 121 6.900000 2.913018 6.120003 2.3000000
## 122 5.688659 2.800000 3.892386 1.2680470
## 123 7.700000 4.122294 5.563648 2.0000000
## 124 6.300000 2.700000 4.900000 1.8000000
## 125 6.700000 2.942644 5.678090 2.1000000
## 126 7.200000 3.200000 6.000000 1.8000000
## 127 6.223111 2.800000 4.800000 1.8000000
## 128 6.100000 3.000000 4.900000 1.8000000
## 129 6.400000 2.800000 5.600000 2.1000000
## 130 7.200000 3.000000 5.800000 2.3146487
## 131 7.400000 3.495737 6.100000 2.3389366
## 132 7.900000 3.800000 6.400000 2.5453839
## 133 6.400000 2.800000 5.627280 2.2000000
## 134 6.122595 2.800000 5.100000 1.5000000
## 135 6.100000 2.600000 4.606057 1.4000000
## 136 7.700000 3.000000 6.100000 2.3000000
## 137 6.300000 3.400000 5.600000 2.4000000
## 138 6.400000 3.100000 5.500000 1.8636295
## 139 6.000000 3.000000 4.800000 1.5558189
## 140 6.653771 2.947138 5.400000 2.1000000
## 141 6.993250 3.100000 5.600000 2.4000000
## 142 6.900000 3.244158 5.100000 2.3000000
## 143 5.800000 2.700000 5.100000 1.9000000
## 144 6.800000 3.200000 5.900000 2.1044951
## 145 6.700000 2.979911 5.700000 2.0721551
## 146 6.700000 2.978555 5.200000 2.3000000
## 147 6.430902 3.026141 5.000000 1.7851680
## 148 6.500000 3.000000 5.200000 2.0000000
## 149 6.200000 2.171758 6.052220 2.3000000
## 150 5.900000 2.374774 5.100000 1.8000000
library(mice) # MİCE İLE MULTİPLE İMPUTATİON YAPILDI.
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
md.pattern(screen)
## SUBNO TIMEDRS ATTDRUG EMPLMNT MSTATUS RACE ATTHOUSE INCOME
## 438 1 1 1 1 1 1 1 1 0
## 26 1 1 1 1 1 1 1 0 1
## 1 1 1 1 1 1 1 0 1 1
## 0 0 0 0 0 0 1 26 27
imputed_data <- mice(screen, m = 5, maksit = 50, method = 'pmm', seed = 50)
##
## iter imp variable
## 1 1 ATTHOUSE INCOME
## 1 2 ATTHOUSE INCOME
## 1 3 ATTHOUSE INCOME
## 1 4 ATTHOUSE INCOME
## 1 5 ATTHOUSE INCOME
## 2 1 ATTHOUSE INCOME
## 2 2 ATTHOUSE INCOME
## 2 3 ATTHOUSE INCOME
## 2 4 ATTHOUSE INCOME
## 2 5 ATTHOUSE INCOME
## 3 1 ATTHOUSE INCOME
## 3 2 ATTHOUSE INCOME
## 3 3 ATTHOUSE INCOME
## 3 4 ATTHOUSE INCOME
## 3 5 ATTHOUSE INCOME
## 4 1 ATTHOUSE INCOME
## 4 2 ATTHOUSE INCOME
## 4 3 ATTHOUSE INCOME
## 4 4 ATTHOUSE INCOME
## 4 5 ATTHOUSE INCOME
## 5 1 ATTHOUSE INCOME
## 5 2 ATTHOUSE INCOME
## 5 3 ATTHOUSE INCOME
## 5 4 ATTHOUSE INCOME
## 5 5 ATTHOUSE INCOME
:)) ÖĞRENME GÜNLÜĞÜ :))
Bu hafta derste eksik veri konusunu öğrendik. Eksik verinin bazen tamamen rastlantısal olabileceğini, bazen diğer değişkenlerle ilişkili olabileceğini, bazen de doğrudan kişinin yanıtlamadığı değişkenin kendisine bağlı olabileceğini konuştuk. Bu noktada MCAR, MAR ve MNAR kavramlarını gördük. Özellikle bu kavramları birbirinden ayırırken biraz zorlandım.
R başlangıç dersini almadan bu dersi almanın zorluğunu fazlası ile hissettim. Temelde var olan eksiklikler dersi anlamamı yavaşlatıp zorlaştırdı. Derste yapılan işlemlerin genel mantığını R üzerinde uygulamaya gelince çok zorlandım. Özellikle çok sayıda foksiyon kullanılması, her paketin farklı bir çıktı vermesi ve bazen uyarı ya da hata mesajlarıyla karşılaşmak beni biraz yordu. İlk ders olması nedeniyle bu durumun normal olduğunu düşünüyorum ama yine de R’ye alışma sürecimin zaman alacağını hissediyorum. Yine de zamanla daha fazla pratik yaptıkça hem R’ye hem de dersin içeriğine daha çok hakim olabileceğimi düşünüyorum. Bu nedenle ilk ders benim için biraz yorucu ama aynı zamanda öğretici bir başlangıç oldu:))