Data Description

This cleaning will look at first 6 variables in miteResponse.csv.

m <- read.csv('miteResponse.csv')[1:6]
point <- m[1]
eco <- m[2]
area <- m[3]
pyr <- m[4]
month <- m[5]
ph <- m[6]

Number of Unique Values

Checking how many unique elements are in each column vs expected values.

Expected

  • point: 216, unique values of similar formats
  • ecosystem: 2, {patch,pasture}
  • area: 2, {exterior,interior}
  • pyramid: 18, {1f,1p,2f,2p,3f,3p,4f,4p,5f,5p,6f,6p,7f,7p,8f,8p,9f,9p}
  • month: 6, {february,april,june,august,october,december}
  • pH: No set amount

All results below match expected number of unique values

unique(point)
##     point
## 1   1FE-1
## 2   1FE-2
## 3   1FE-3
## 4   1FE-4
## 5   1FE-5
## 6   1FE-6
## 7   1FI-1
## 8   1FI-2
## 9   1FI-3
## 10  1FI-4
## 11  1FI-5
## 12  1FI-6
## 13  1PE-1
## 14  1PE-2
## 15  1PE-3
## 16  1PE-4
## 17  1PE-5
## 18  1PE-6
## 19  1PI-1
## 20  1PI-2
## 21  1PI-3
## 22  1PI-4
## 23  1PI-5
## 24  1PI-6
## 25  2FE-1
## 26  2FE-2
## 27  2FE-3
## 28  2FE-4
## 29  2FE-5
## 30  2FE-6
## 31  2FI-1
## 32  2FI-2
## 33  2FI-3
## 34  2FI-4
## 35  2FI-5
## 36  2FI-6
## 37  2PE-1
## 38  2PE-2
## 39  2PE-3
## 40  2PE-4
## 41  2PE-5
## 42  2PE-6
## 43  2PI-1
## 44  2PI-2
## 45  2PI-3
## 46  2PI-4
## 47  2PI-5
## 48  2PI-6
## 49  3FE-1
## 50  3FE-2
## 51  3FE-3
## 52  3FE-4
## 53  3FE-5
## 54  3FE-6
## 55  3FI-1
## 56  3FI-2
## 57  3FI-3
## 58  3FI-4
## 59  3FI-5
## 60  3FI-6
## 61  3PE-1
## 62  3PE-2
## 63  3PE-3
## 64  3PE-4
## 65  3PE-5
## 66  3PE-6
## 67  3PI-1
## 68  3PI-2
## 69  3PI-3
## 70  3PI-4
## 71  3PI-5
## 72  3PI-6
## 73  4FE-1
## 74  4FE-2
## 75  4FE-3
## 76  4FE-4
## 77  4FE-5
## 78  4FE-6
## 79  4FI-1
## 80  4FI-2
## 81  4FI-3
## 82  4FI-4
## 83  4FI-5
## 84  4FI-6
## 85  4PE-1
## 86  4PE-2
## 87  4PE-3
## 88  4PE-4
## 89  4PE-5
## 90  4PE-6
## 91  4PI-1
## 92  4PI-2
## 93  4PI-3
## 94  4PI-4
## 95  4PI-5
## 96  4PI-6
## 97  5FE-1
## 98  5FE-2
## 99  5FE-3
## 100 5FE-4
## 101 5FE-5
## 102 5FE-6
## 103 5FI-1
## 104 5FI-2
## 105 5FI-3
## 106 5FI-4
## 107 5FI-5
## 108 5FI-6
## 109 5PE-1
## 110 5PE-2
## 111 5PE-3
## 112 5PE-4
## 113 5PE-5
## 114 5PE-6
## 115 5PI-1
## 116 5PI-2
## 117 5PI-3
## 118 5PI-4
## 119 5PI-5
## 120 5PI-6
## 121 6FE-1
## 122 6FE-2
## 123 6FE-3
## 124 6FE-4
## 125 6FE-5
## 126 6FE-6
## 127 6FI-1
## 128 6FI-2
## 129 6FI-3
## 130 6FI-4
## 131 6FI-5
## 132 6FI-6
## 133 6PE-1
## 134 6PE-2
## 135 6PE-3
## 136 6PE-4
## 137 6PE-5
## 138 6PE-6
## 139 6PI-1
## 140 6PI-2
## 141 6PI-3
## 142 6PI-4
## 143 6PI-5
## 144 6PI-6
## 145 7FE-1
## 146 7FE-2
## 147 7FE-3
## 148 7FE-4
## 149 7FE-5
## 150 7FE-6
## 151 7FI-1
## 152 7FI-2
## 153 7FI-3
## 154 7FI-4
## 155 7FI-5
## 156 7FI-6
## 157 7PE-1
## 158 7PE-2
## 159 7PE-3
## 160 7PE-4
## 161 7PE-5
## 162 7PE-6
## 163 7PI-1
## 164 7PI-2
## 165 7PI-3
## 166 7PI-4
## 167 7PI-5
## 168 7PI-6
## 169 8FE-1
## 170 8FE-2
## 171 8FE-3
## 172 8FE-4
## 173 8FE-5
## 174 8FE-6
## 175 8FI-1
## 176 8FI-2
## 177 8FI-3
## 178 8FI-4
## 179 8FI-5
## 180 8FI-6
## 181 8PE-1
## 182 8PE-2
## 183 8PE-3
## 184 8PE-4
## 185 8PE-5
## 186 8PE-6
## 187 8PI-1
## 188 8PI-2
## 189 8PI-3
## 190 8PI-4
## 191 8PI-5
## 192 8PI-6
## 193 9FE-1
## 194 9FE-2
## 195 9FE-3
## 196 9FE-4
## 197 9FE-5
## 198 9FE-6
## 199 9FI-1
## 200 9FI-2
## 201 9FI-3
## 202 9FI-4
## 203 9FI-5
## 204 9FI-6
## 205 9PE-1
## 206 9PE-2
## 207 9PE-3
## 208 9PE-4
## 209 9PE-5
## 210 9PE-6
## 211 9PI-1
## 212 9PI-2
## 213 9PI-3
## 214 9PI-4
## 215 9PI-5
## 216 9PI-6
unique(eco)
##    ecosystem
## 1      patch
## 13   pasture
unique(area)
##       area
## 1 exterior
## 7 interior
unique(pyr)
##     pyramid
## 1        1f
## 13       1p
## 25       2f
## 37       2p
## 49       3f
## 61       3p
## 73       4f
## 85       4p
## 97       5f
## 109      5p
## 121      6f
## 133      6p
## 145      7f
## 157      7p
## 169      8f
## 181      8p
## 193      9f
## 205      9p
unique(month)
##      month
## 1 february
## 2    april
## 3     june
## 4   august
## 5  october
## 6 december
unique(ph)
##       pH
## 1   4.78
## 2   4.58
## 3   5.26
## 4   4.12
## 5   3.94
## 6   3.51
## 7   4.56
## 8   4.50
## 9   4.75
## 10  4.43
## 11  4.20
## 12  4.29
## 13  5.71
## 14  5.46
## 15  6.20
## 16  5.42
## 17  5.94
## 18  4.74
## 19  5.80
## 20  5.51
## 21  5.74
## 22  5.85
## 23  5.53
## 24  4.41
## 25  4.05
## 26  3.91
## 27  4.21
## 28  4.23
## 29  3.57
## 30  3.48
## 31  4.59
## 32  4.32
## 33  4.64
## 34  4.09
## 35  4.01
## 36  3.33
## 37  5.67
## 38  5.23
## 40  4.98
## 41  5.40
## 42  4.72
## 43  5.68
## 44  5.19
## 45  5.59
## 46  5.37
## 47  5.34
## 48  4.47
## 49  4.61
## 50  4.24
## 51  4.66
## 52  3.82
## 54  3.34
## 55  4.33
## 56  4.27
## 57  4.77
## 58  4.60
## 59  4.93
## 60  4.02
## 61  5.54
## 62  5.38
## 63  6.10
## 65  5.01
## 66  5.39
## 67  5.81
## 68  5.57
## 69  6.07
## 70  5.69
## 71  5.36
## 72  5.07
## 74  4.04
## 75  5.11
## 76  4.63
## 77  3.73
## 78  3.92
## 79  4.45
## 80  4.49
## 82  4.46
## 83  4.28
## 85  5.62
## 87  6.00
## 88  5.18
## 89  5.49
## 90  4.37
## 91  5.58
## 92  5.60
## 96  5.08
## 97  4.89
## 98  4.18
## 99  5.27
## 103 5.05
## 104 4.06
## 105 5.09
## 108 3.90
## 109 5.72
## 111 5.43
## 112 4.99
## 116 5.13
## 120 4.40
## 121 4.00
## 126 3.49
## 128 4.11
## 130 4.10
## 132 3.52
## 133 5.10
## 137 5.25
## 138 4.73
## 140 6.08
## 141 5.70
## 142 5.76
## 144   NA
## 146 3.84
## 148 4.19
## 149 4.25
## 150 3.71
## 151 4.38
## 152 4.70
## 153 3.95
## 154 3.63
## 155 4.44
## 156 3.74
## 158 5.32
## 159 5.95
## 161 5.65
## 162 4.83
## 163 6.05
## 165 5.64
## 166 5.84
## 170 3.96
## 172 4.35
## 173 4.16
## 174 3.38
## 177 4.52
## 180 3.40
## 181 5.90
## 183 6.30
## 184 5.87
## 185 5.91
## 186 4.76
## 187 6.03
## 190 6.23
## 192 5.33
## 193 5.15
## 195 5.29
## 200 4.82
## 201 4.92
## 204 4.15
## 207 5.82
## 208 5.21
## 209 5.50
## 215 5.20
## 216 4.57

Checking for Errors

Pyramid/Eco

Pyramid appears to take the form xf, or xp, where x corresponds to the pyramid number, and f and p refer to patch, and pasture respectively.

As can be seen below all f corresponds to patch, and p to pasture.

pyraPlot <- data.frame(x=pyr, y=eco)
unique(pyraPlot)
##     pyramid ecosystem
## 1        1f     patch
## 13       1p   pasture
## 25       2f     patch
## 37       2p   pasture
## 49       3f     patch
## 61       3p   pasture
## 73       4f     patch
## 85       4p   pasture
## 97       5f     patch
## 109      5p   pasture
## 121      6f     patch
## 133      6p   pasture
## 145      7f     patch
## 157      7p   pasture
## 169      8f     patch
## 181      8p   pasture
## 193      9f     patch
## 205      9p   pasture

ID Formatting

point is used an a unique ID and is a condensed version of several other variables.

1FE-1 corresponds to the, pyramid 1f (1F), area E (Exterior), and month 1 (February).

The months are encoded as such: February = 1 April = 2 June = 3 August = 4 October = 5 December = 6

Therefore the ID takes the form, pyramid, area, - month all condensed into one small index.

As shown before there are 216 unique values for point therefore all the IDs are usable as a primary key of sorts.

pH

Looking for outliers in pH may help show erroneous values.

As can be seen from the boxplot below there are no outliers, and all the values lie between 3.33, and 6.3. All of which are acceptable values for pH in general.

The IQR ranges from 4.2 to 5.475, a Natural Resources Conservation Services classifes this in the ‘Extemely acidic’ to ‘Strongly acidic’ range. Given the dataset is in Columbia. Acidic soil is typical for South-American.

There is 1 NA value, corresponding to 6PI-6.

boxplot(ph)

summary(ph)
##        pH       
##  Min.   :3.330  
##  1st Qu.:4.200  
##  Median :4.780  
##  Mean   :4.847  
##  3rd Qu.:5.475  
##  Max.   :6.300  
##  NA's   :1