This cleaning will look at first 6 variables in miteResponse.csv.
point: Code to identify each sample taken, has some sort of formatting.
ecosystem: 2 options, pasture (field) or plot (forest).
area: Where the sample was taken, inside or outside the pyramid.
pyramid: Number of the pyramid where sample was taken. 9 patch pyramids, 9 pasture pyramids.
month: Month in which sample was taken. Sampling was made for one year, every 2 months. February, April, June, August, October, December.
pH: A measure of how acidic/basic a sample is. Ranges between 0-14, 1-7 acidic, 7-14 basic.
m <- read.csv('miteResponse.csv')[1:6]
point <- m[1]
eco <- m[2]
area <- m[3]
pyr <- m[4]
month <- m[5]
ph <- m[6]
Checking how many unique elements are in each column vs expected values.
All results below match expected number of unique values
unique(point)
## point
## 1 1FE-1
## 2 1FE-2
## 3 1FE-3
## 4 1FE-4
## 5 1FE-5
## 6 1FE-6
## 7 1FI-1
## 8 1FI-2
## 9 1FI-3
## 10 1FI-4
## 11 1FI-5
## 12 1FI-6
## 13 1PE-1
## 14 1PE-2
## 15 1PE-3
## 16 1PE-4
## 17 1PE-5
## 18 1PE-6
## 19 1PI-1
## 20 1PI-2
## 21 1PI-3
## 22 1PI-4
## 23 1PI-5
## 24 1PI-6
## 25 2FE-1
## 26 2FE-2
## 27 2FE-3
## 28 2FE-4
## 29 2FE-5
## 30 2FE-6
## 31 2FI-1
## 32 2FI-2
## 33 2FI-3
## 34 2FI-4
## 35 2FI-5
## 36 2FI-6
## 37 2PE-1
## 38 2PE-2
## 39 2PE-3
## 40 2PE-4
## 41 2PE-5
## 42 2PE-6
## 43 2PI-1
## 44 2PI-2
## 45 2PI-3
## 46 2PI-4
## 47 2PI-5
## 48 2PI-6
## 49 3FE-1
## 50 3FE-2
## 51 3FE-3
## 52 3FE-4
## 53 3FE-5
## 54 3FE-6
## 55 3FI-1
## 56 3FI-2
## 57 3FI-3
## 58 3FI-4
## 59 3FI-5
## 60 3FI-6
## 61 3PE-1
## 62 3PE-2
## 63 3PE-3
## 64 3PE-4
## 65 3PE-5
## 66 3PE-6
## 67 3PI-1
## 68 3PI-2
## 69 3PI-3
## 70 3PI-4
## 71 3PI-5
## 72 3PI-6
## 73 4FE-1
## 74 4FE-2
## 75 4FE-3
## 76 4FE-4
## 77 4FE-5
## 78 4FE-6
## 79 4FI-1
## 80 4FI-2
## 81 4FI-3
## 82 4FI-4
## 83 4FI-5
## 84 4FI-6
## 85 4PE-1
## 86 4PE-2
## 87 4PE-3
## 88 4PE-4
## 89 4PE-5
## 90 4PE-6
## 91 4PI-1
## 92 4PI-2
## 93 4PI-3
## 94 4PI-4
## 95 4PI-5
## 96 4PI-6
## 97 5FE-1
## 98 5FE-2
## 99 5FE-3
## 100 5FE-4
## 101 5FE-5
## 102 5FE-6
## 103 5FI-1
## 104 5FI-2
## 105 5FI-3
## 106 5FI-4
## 107 5FI-5
## 108 5FI-6
## 109 5PE-1
## 110 5PE-2
## 111 5PE-3
## 112 5PE-4
## 113 5PE-5
## 114 5PE-6
## 115 5PI-1
## 116 5PI-2
## 117 5PI-3
## 118 5PI-4
## 119 5PI-5
## 120 5PI-6
## 121 6FE-1
## 122 6FE-2
## 123 6FE-3
## 124 6FE-4
## 125 6FE-5
## 126 6FE-6
## 127 6FI-1
## 128 6FI-2
## 129 6FI-3
## 130 6FI-4
## 131 6FI-5
## 132 6FI-6
## 133 6PE-1
## 134 6PE-2
## 135 6PE-3
## 136 6PE-4
## 137 6PE-5
## 138 6PE-6
## 139 6PI-1
## 140 6PI-2
## 141 6PI-3
## 142 6PI-4
## 143 6PI-5
## 144 6PI-6
## 145 7FE-1
## 146 7FE-2
## 147 7FE-3
## 148 7FE-4
## 149 7FE-5
## 150 7FE-6
## 151 7FI-1
## 152 7FI-2
## 153 7FI-3
## 154 7FI-4
## 155 7FI-5
## 156 7FI-6
## 157 7PE-1
## 158 7PE-2
## 159 7PE-3
## 160 7PE-4
## 161 7PE-5
## 162 7PE-6
## 163 7PI-1
## 164 7PI-2
## 165 7PI-3
## 166 7PI-4
## 167 7PI-5
## 168 7PI-6
## 169 8FE-1
## 170 8FE-2
## 171 8FE-3
## 172 8FE-4
## 173 8FE-5
## 174 8FE-6
## 175 8FI-1
## 176 8FI-2
## 177 8FI-3
## 178 8FI-4
## 179 8FI-5
## 180 8FI-6
## 181 8PE-1
## 182 8PE-2
## 183 8PE-3
## 184 8PE-4
## 185 8PE-5
## 186 8PE-6
## 187 8PI-1
## 188 8PI-2
## 189 8PI-3
## 190 8PI-4
## 191 8PI-5
## 192 8PI-6
## 193 9FE-1
## 194 9FE-2
## 195 9FE-3
## 196 9FE-4
## 197 9FE-5
## 198 9FE-6
## 199 9FI-1
## 200 9FI-2
## 201 9FI-3
## 202 9FI-4
## 203 9FI-5
## 204 9FI-6
## 205 9PE-1
## 206 9PE-2
## 207 9PE-3
## 208 9PE-4
## 209 9PE-5
## 210 9PE-6
## 211 9PI-1
## 212 9PI-2
## 213 9PI-3
## 214 9PI-4
## 215 9PI-5
## 216 9PI-6
unique(eco)
## ecosystem
## 1 patch
## 13 pasture
unique(area)
## area
## 1 exterior
## 7 interior
unique(pyr)
## pyramid
## 1 1f
## 13 1p
## 25 2f
## 37 2p
## 49 3f
## 61 3p
## 73 4f
## 85 4p
## 97 5f
## 109 5p
## 121 6f
## 133 6p
## 145 7f
## 157 7p
## 169 8f
## 181 8p
## 193 9f
## 205 9p
unique(month)
## month
## 1 february
## 2 april
## 3 june
## 4 august
## 5 october
## 6 december
unique(ph)
## pH
## 1 4.78
## 2 4.58
## 3 5.26
## 4 4.12
## 5 3.94
## 6 3.51
## 7 4.56
## 8 4.50
## 9 4.75
## 10 4.43
## 11 4.20
## 12 4.29
## 13 5.71
## 14 5.46
## 15 6.20
## 16 5.42
## 17 5.94
## 18 4.74
## 19 5.80
## 20 5.51
## 21 5.74
## 22 5.85
## 23 5.53
## 24 4.41
## 25 4.05
## 26 3.91
## 27 4.21
## 28 4.23
## 29 3.57
## 30 3.48
## 31 4.59
## 32 4.32
## 33 4.64
## 34 4.09
## 35 4.01
## 36 3.33
## 37 5.67
## 38 5.23
## 40 4.98
## 41 5.40
## 42 4.72
## 43 5.68
## 44 5.19
## 45 5.59
## 46 5.37
## 47 5.34
## 48 4.47
## 49 4.61
## 50 4.24
## 51 4.66
## 52 3.82
## 54 3.34
## 55 4.33
## 56 4.27
## 57 4.77
## 58 4.60
## 59 4.93
## 60 4.02
## 61 5.54
## 62 5.38
## 63 6.10
## 65 5.01
## 66 5.39
## 67 5.81
## 68 5.57
## 69 6.07
## 70 5.69
## 71 5.36
## 72 5.07
## 74 4.04
## 75 5.11
## 76 4.63
## 77 3.73
## 78 3.92
## 79 4.45
## 80 4.49
## 82 4.46
## 83 4.28
## 85 5.62
## 87 6.00
## 88 5.18
## 89 5.49
## 90 4.37
## 91 5.58
## 92 5.60
## 96 5.08
## 97 4.89
## 98 4.18
## 99 5.27
## 103 5.05
## 104 4.06
## 105 5.09
## 108 3.90
## 109 5.72
## 111 5.43
## 112 4.99
## 116 5.13
## 120 4.40
## 121 4.00
## 126 3.49
## 128 4.11
## 130 4.10
## 132 3.52
## 133 5.10
## 137 5.25
## 138 4.73
## 140 6.08
## 141 5.70
## 142 5.76
## 144 NA
## 146 3.84
## 148 4.19
## 149 4.25
## 150 3.71
## 151 4.38
## 152 4.70
## 153 3.95
## 154 3.63
## 155 4.44
## 156 3.74
## 158 5.32
## 159 5.95
## 161 5.65
## 162 4.83
## 163 6.05
## 165 5.64
## 166 5.84
## 170 3.96
## 172 4.35
## 173 4.16
## 174 3.38
## 177 4.52
## 180 3.40
## 181 5.90
## 183 6.30
## 184 5.87
## 185 5.91
## 186 4.76
## 187 6.03
## 190 6.23
## 192 5.33
## 193 5.15
## 195 5.29
## 200 4.82
## 201 4.92
## 204 4.15
## 207 5.82
## 208 5.21
## 209 5.50
## 215 5.20
## 216 4.57
Pyramid appears to take the form xf, or xp, where x corresponds to the pyramid number, and f and p refer to patch, and pasture respectively.
As can be seen below all f corresponds to patch, and p to pasture.
pyraPlot <- data.frame(x=pyr, y=eco)
unique(pyraPlot)
## pyramid ecosystem
## 1 1f patch
## 13 1p pasture
## 25 2f patch
## 37 2p pasture
## 49 3f patch
## 61 3p pasture
## 73 4f patch
## 85 4p pasture
## 97 5f patch
## 109 5p pasture
## 121 6f patch
## 133 6p pasture
## 145 7f patch
## 157 7p pasture
## 169 8f patch
## 181 8p pasture
## 193 9f patch
## 205 9p pasture
point is used an a unique ID and is a condensed version of several other variables.
1FE-1 corresponds to the, pyramid 1f (1F), area E (Exterior), and month 1 (February).
The months are encoded as such: February = 1 April = 2 June = 3 August = 4 October = 5 December = 6
Therefore the ID takes the form, pyramid, area, - month all condensed into one small index.
As shown before there are 216 unique values for point therefore all the IDs are usable as a primary key of sorts.
Looking for outliers in pH may help show erroneous values.
As can be seen from the boxplot below there are no outliers, and all the values lie between 3.33, and 6.3. All of which are acceptable values for pH in general.
The IQR ranges from 4.2 to 5.475, a Natural Resources Conservation Services classifes this in the ‘Extemely acidic’ to ‘Strongly acidic’ range. Given the dataset is in Columbia. Acidic soil is typical for South-American.
There is 1 NA value, corresponding to 6PI-6.
boxplot(ph)
summary(ph)
## pH
## Min. :3.330
## 1st Qu.:4.200
## Median :4.780
## Mean :4.847
## 3rd Qu.:5.475
## Max. :6.300
## NA's :1