library("cleaninginspectoR")Here we create some fake data for illustration purposes. It is not important to understand this; we keep it in so you can run the example yourself if you like. The dataset contains:
a: random values and outliersuuid: values should be unique but are notwater.source.other: all NA except for twoGPS.lat just some numbers, but the column header indicates this is potentially sensitivetestdf <- data.frame(a= c(runif(98),7287,-100),
b=sample(letters,100,T),
uuid=c(1:98, 4,20),
water.source.other = c(rep(NA,98),"neighbour's well","neighbour's well"),
GPS.lat = runif(100)
)The function inspect_all runs all cleaning checks that are available.
inspect_all(testdf)| index | value | variable | has_issue | issue_type |
|---|---|---|---|---|
| NA | NA | GPS.lat | TRUE | Potentially sensitive information. Please ensure all PII is removed |
| 99 | 4 | uuid | TRUE | duplicate in uuid |
| 100 | 20 | uuid | TRUE | duplicate in uuid |
| 99 | 7287 | a | TRUE | normal distribution outlier |
| NA | neighbour’s well \ 2 instance(s) | water.source.other | NA | ‘other’ response. may need recoding. |
One of the things inspect_all does is to look for duplicates in the first column containing the word “uuid”. If your ID column has a different name, you can specify it in the second parameter:
inspect_all(df = testdf,uuid.column.name = "b")kable(inspect_all(df = testdf,uuid.column.name = "b"))| index | value | variable | has_issue | issue_type |
|---|---|---|---|---|
| NA | NA | GPS.lat | TRUE | Potentially sensitive information. Please ensure all PII is removed |
| 10 | b | b | TRUE | duplicate in b |
| 13 | k | b | TRUE | duplicate in b |
| 15 | t | b | TRUE | duplicate in b |
| 16 | d | b | TRUE | duplicate in b |
| 19 | u | b | TRUE | duplicate in b |
| 21 | h | b | TRUE | duplicate in b |
| 22 | h | b | TRUE | duplicate in b |
| 23 | e | b | TRUE | duplicate in b |
| 25 | p | b | TRUE | duplicate in b |
| 26 | t | b | TRUE | duplicate in b |
| 27 | u | b | TRUE | duplicate in b |
| 30 | z | b | TRUE | duplicate in b |
| 32 | r | b | TRUE | duplicate in b |
| 33 | t | b | TRUE | duplicate in b |
| 36 | u | b | TRUE | duplicate in b |
| 37 | f | b | TRUE | duplicate in b |
| 38 | k | b | TRUE | duplicate in b |
| 39 | l | b | TRUE | duplicate in b |
| 40 | u | b | TRUE | duplicate in b |
| 41 | w | b | TRUE | duplicate in b |
| 42 | e | b | TRUE | duplicate in b |
| 44 | p | b | TRUE | duplicate in b |
| 46 | s | b | TRUE | duplicate in b |
| 47 | e | b | TRUE | duplicate in b |
| 48 | t | b | TRUE | duplicate in b |
| 50 | p | b | TRUE | duplicate in b |
| 52 | m | b | TRUE | duplicate in b |
| 53 | l | b | TRUE | duplicate in b |
| 54 | v | b | TRUE | duplicate in b |
| 55 | y | b | TRUE | duplicate in b |
| 56 | c | b | TRUE | duplicate in b |
| 57 | v | b | TRUE | duplicate in b |
| 58 | v | b | TRUE | duplicate in b |
| 59 | c | b | TRUE | duplicate in b |
| 60 | j | b | TRUE | duplicate in b |
| 61 | j | b | TRUE | duplicate in b |
| 62 | j | b | TRUE | duplicate in b |
| 63 | i | b | TRUE | duplicate in b |
| 64 | m | b | TRUE | duplicate in b |
| 66 | d | b | TRUE | duplicate in b |
| 67 | l | b | TRUE | duplicate in b |
| 68 | h | b | TRUE | duplicate in b |
| 69 | t | b | TRUE | duplicate in b |
| 70 | l | b | TRUE | duplicate in b |
| 71 | a | b | TRUE | duplicate in b |
| 72 | a | b | TRUE | duplicate in b |
| 73 | y | b | TRUE | duplicate in b |
| 74 | q | b | TRUE | duplicate in b |
| 75 | p | b | TRUE | duplicate in b |
| 76 | w | b | TRUE | duplicate in b |
| 77 | y | b | TRUE | duplicate in b |
| 78 | o | b | TRUE | duplicate in b |
| 79 | m | b | TRUE | duplicate in b |
| 80 | n | b | TRUE | duplicate in b |
| 81 | u | b | TRUE | duplicate in b |
| 82 | f | b | TRUE | duplicate in b |
| 83 | s | b | TRUE | duplicate in b |
| 84 | s | b | TRUE | duplicate in b |
| 85 | o | b | TRUE | duplicate in b |
| 86 | u | b | TRUE | duplicate in b |
| 87 | o | b | TRUE | duplicate in b |
| 88 | t | b | TRUE | duplicate in b |
| 89 | y | b | TRUE | duplicate in b |
| 90 | v | b | TRUE | duplicate in b |
| 91 | r | b | TRUE | duplicate in b |
| 92 | o | b | TRUE | duplicate in b |
| 93 | y | b | TRUE | duplicate in b |
| 94 | v | b | TRUE | duplicate in b |
| 95 | t | b | TRUE | duplicate in b |
| 96 | h | b | TRUE | duplicate in b |
| 97 | t | b | TRUE | duplicate in b |
| 98 | l | b | TRUE | duplicate in b |
| 99 | e | b | TRUE | duplicate in b |
| 100 | u | b | TRUE | duplicate in b |
| 99 | 7287 | a | TRUE | normal distribution outlier |
| NA | neighbour’s well \ 2 instance(s) | water.source.other | NA | ‘other’ response. may need recoding. |
For more information and individual check functions, see the detailed example.