x<-1 library(readxl) arsenic<-read_excel(path=“C:/Users/lenovo/Desktop/arsenic.xlsx”) flouride<-read_excel(path=“C:/Users/lenovo/Desktop/flouride.xlsx”) summary(arsenic) location n_wells_tested percent_wells_above_guideline median
Length:917 Min. : 0.00 Min. : 0.000 Min. : 0.250
Class :character 1st Qu.: 0.00 1st Qu.: 3.225 1st Qu.: 0.500
Mode :character Median : 5.00 Median : 8.300 Median : 1.000
Mean : 33.99 Mean :12.411 Mean : 1.617
3rd Qu.: 41.00 3rd Qu.:18.375 3rd Qu.: 1.887
Max. :632.00 Max. :58.900 Max. :14.000
NA’s :575 NA’s :575
percentile_95 maximum
Min. : 0.500 Min. : 0.00
1st Qu.: 6.265 1st Qu.: 6.20
Median : 13.650 Median : 24.00
Mean : 25.550 Mean : 67.35
3rd Qu.: 28.350 3rd Qu.: 64.00
Max. :372.500 Max. :3100.00
NA’s :575 NA’s :364
summary(flouride) location n_wells_tested percent_wells_above_guideline median
Length:917 Min. : 0.00 Min. : 0.000 Min. :0.1000
Class :character 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.:0.1000
Mode :character Median : 6.00 Median : 0.600 Median :0.1000
Mean : 38.17 Mean : 2.448 Mean :0.1762
3rd Qu.: 49.00 3rd Qu.: 3.125 3rd Qu.:0.2000
Max. :503.00 Max. :30.000 Max. :1.2900
NA’s :557 NA’s :557
percentile_95 maximum
Min. :0.1000 Min. : 0.0500
1st Qu.:0.5195 1st Qu.: 0.4225
Median :0.9855 Median : 1.3000
Mean :1.1471 Mean : 1.8987
3rd Qu.:1.5995 3rd Qu.: 2.9000
Max. :4.4400 Max. :14.0000
NA’s :557 NA’s :363

There are a total of 917 observations for 6 variables. Location is measured by the nominal scale of measurement and has no missing values. Other variables namely n_wells_tested, percent_wells_above_guideline, median, percentile_95, and maximum are continuous variables which have 0, 575, 575, 575 and 364 as missing values. The subset is taken that includes no missing values in any variable. There are a total of 917 observations for 6 variables. Location is measured by the nominal scale of measurement and has no missing values. Other variables namely n_wells_tested, percent_wells_above_guideline, median, percentile_95, and maximum are continuous variables which have 0, 557, 557, 557 and 363 as missing values. The subset is taken that includes no missing values in any variable. > mydata1 <- na.omit(arsenic) > mydata2 <- na.omit(flouride) > summary(mydata1) location n_wells_tested percent_wells_above_guideline median
Length:342 Min. : 20.00 Min. : 0.000 Min. : 0.250
Class :character 1st Qu.: 36.00 1st Qu.: 3.225 1st Qu.: 0.500
Mode :character Median : 57.00 Median : 8.300 Median : 1.000
Mean : 86.51 Mean :12.411 Mean : 1.617
3rd Qu.:108.75 3rd Qu.:18.375 3rd Qu.: 1.887
Max. :632.00 Max. :58.900 Max. :14.000
percentile_95 maximum
Min. : 0.500 Min. : 1.00
1st Qu.: 6.265 1st Qu.: 18.00
Median : 13.650 Median : 39.00
Mean : 25.550 Mean : 96.10
3rd Qu.: 28.350 3rd Qu.: 92.75
Max. :372.500 Max. :3100.00
> summary(mydata2) location n_wells_tested percent_wells_above_guideline median
Length:360 Min. : 20.00 Min. : 0.000 Min. :0.1000
Class :character 1st Qu.: 40.00 1st Qu.: 0.000 1st Qu.:0.1000
Mode :character Median : 64.00 Median : 0.600 Median :0.1000
Mean : 93.33 Mean : 2.448 Mean :0.1762
3rd Qu.:118.25 3rd Qu.: 3.125 3rd Qu.:0.2000
Max. :503.00 Max. :30.000 Max. :1.2900
percentile_95 maximum
Min. :0.1000 Min. : 0.100
1st Qu.:0.5195 1st Qu.: 1.130
Median :0.9855 Median : 2.100
Mean :1.1471 Mean : 2.491
3rd Qu.:1.5995 3rd Qu.: 3.400
Max. :4.4400 Max. :14.000

The subset contains a total of 342 observations for 6 arsenic variables. The other subset contains a total of 360 observations for 6 variables of fluoride. > total <- merge(mydata1,mydata2,by=“location”)

The data is joined using the variable location. > basicStats(total$n_wells_tested.x) X..total.n_wells_tested.x nobs 341.000000 NAs 0.000000 Minimum 20.000000 Maximum 632.000000 1. Quartile 36.000000 3. Quartile 109.000000 Mean 86.686217 Median 57.000000 Sum 29560.000000 SE Mean 4.413053 LCL Mean 78.005894 UCL Mean 95.366540 Variance 6640.986545 Stdev 81.492248 Skewness 2.664927 Kurtosis 9.665998

The mean number of wells tested is 86 with a high value of standard deviation (81). From the value of skewness, it appears that the distribution of number of wells tested is positively skewed. The best measure of central tendency is the median with a value of 57 wells tested. > basicStats(total$percent_wells_above_guideline.x) X..total.percent_wells_above_guideline.x nobs 341.000000 NAs 0.000000 Minimum 0.000000 Maximum 58.900000 1. Quartile 3.200000 3. Quartile 18.300000 Mean 12.371848 Median 8.300000 Sum 4218.800000 SE Mean 0.658457 LCL Mean 11.076686 UCL Mean 13.667009 Variance 147.845793 Stdev 12.159186 Skewness 1.236798 Kurtosis 1.009462

The mean number of wells above guideline is 12 with a high value of standard deviation (12). From the value of skewness I can say that the distribution of number of wells tested is slightly positively skewed. There are very locations with a large number of wells above guideline. The best measure of central tendency is median with a value of 8 wells above guideline. > basicStats(total$median.x) X..total.median.x nobs 341.000000 NAs 0.000000 Minimum 0.250000 Maximum 14.000000 1. Quartile 0.500000 3. Quartile 1.900000 Mean 1.620073 Median 1.000000 Sum 552.445000 SE Mean 0.104448 LCL Mean 1.414627 UCL Mean 1.825520 Variance 3.720124 Stdev 1.928762 Skewness 2.748265 Kurtosis 9.151919

The minimum and maximum value of median wells is 1 ug/L and 14 ug/L respectively. The distribution is slightly positively skewed indicating that there are few observations with high values of median. The median for median is 2 g/L for arsenic. > basicStats(total$percentile_95.x) X..total.percentile_95.x nobs 341.000000 NAs 0.000000 Minimum 0.500000 Maximum 372.500000 1. Quartile 6.220000 3. Quartile 28.200000 Mean 24.872824 Median 13.550000 Sum 8481.633000 SE Mean 1.961244 LCL Mean 21.015125 UCL Mean 28.730523 Variance 1311.648654 Stdev 36.216690 Skewness 4.542307 Kurtosis 30.723733

The minimum and maximum value of 95 percentile reading for arsenic is 0.5 ug/L and 372.5 ug/L respectively. The distribution is positively skewed indicating that there are few observations with high values of 95 percentile reading for arsenic. The median for 95 percentile reading for arsenic is 13.65 g/L for arsenic. > basicStats(total$maximum.x) X..total.maximum.x nobs 341.00000 NAs 0.00000 Minimum 1.00000 Maximum 3100.00000 1. Quartile 18.00000 3. Quartile 92.00000 Mean 95.03372 Median 39.00000 Sum 32406.50000 SE Mean 11.92396 LCL Mean 71.57971 UCL Mean 118.48774 Variance 48483.63842 Stdev 220.19001 Skewness 9.06942 Kurtosis 108.40563

The distribution is positively skewed for maximum. There are few observations with high values of maximum arsenic. The median for maximum arsenic is 39 g/L. > basicStats(total$n_wells_tested.y) X..total.n_wells_tested.y nobs 341.000000 NAs 0.000000 Minimum 21.000000 Maximum 503.000000 1. Quartile 43.000000 3. Quartile 126.000000 Mean 97.208211 Median 69.000000 Sum 33148.000000 SE Mean 4.416756 LCL Mean 88.520604 UCL Mean 105.895818 Variance 6652.135932 Stdev 81.560627 Skewness 2.106506 Kurtosis 5.365651

The mean number of wells tested is 93 with a high value of standard deviation (81). From the value of skewness, the distribution of number of wells tested is positively skewed. There are varying locations with a large number of wells tested. The best measure of central tendency is median with a value of 64 wells tested. > basicStats(total$percent_wells_above_guideline.y) X..total.percent_wells_above_guideline.y nobs 341.000000 NAs 0.000000 Minimum 0.000000 Maximum 30.000000 1. Quartile 0.000000 3. Quartile 3.100000 Mean 2.464809 Median 0.700000 Sum 840.500000 SE Mean 0.222086 LCL Mean 2.027974 UCL Mean 2.901645 Variance 16.818876 Stdev 4.101082 Skewness 2.726037 Kurtosis 9.524881

The mean number of wells above guideline is 2.44 with a high value of standard deviation (4). From the value of skewness, the indication that the distribution of number of wells tested is slightly positively skewed. There are locations with a large number of wells above guideline. The best measure of central tendency is median with a value of 1 well above guideline. > basicStats(total$median.y) X..total.median.y nobs 341.000000 NAs 0.000000 Minimum 0.100000 Maximum 1.290000 1. Quartile 0.100000 3. Quartile 0.200000 Mean 0.176525 Median 0.100000 Sum 60.195000 SE Mean 0.008395 LCL Mean 0.160011 UCL Mean 0.193038 Variance 0.024035 Stdev 0.155031 Skewness 3.496615 Kurtosis 16.132907

The minimum and maximum value of median wells is 1 ug/L and 1.2 ug/L respectively. The distribution is slightly positively skewed indicating that there are few observations with high values of median. The median for median is 0.1 g/L for fluoride. > basicStats(total$percentile_95.y) X..total.percentile_95.y nobs 341.000000 NAs 0.000000 Minimum 0.100000 Maximum 4.180000 1. Quartile 0.522000 3. Quartile 1.601000 Mean 1.152806 Median 1.003000 Sum 393.107000 SE Mean 0.043660 LCL Mean 1.066930 UCL Mean 1.238683 Variance 0.649998 Stdev 0.806225 Skewness 0.928524 Kurtosis 0.659096

The minimum and maximum value of 95 percentile reading for arsenic is 0.1 ug/L and 4.4 ug/L respectively. The distribution is slightly positively skewed indicating that there are few observations with high values of 95 percentile reading for fluoride. The median for 95 percentile reading for fluoride is 0.98 g/L. > basicStats(total$maximum.y) X..total.maximum.y nobs 341.000000 NAs 0.000000 Minimum 0.100000 Maximum 14.000000 1. Quartile 1.200000 3. Quartile 3.400000 Mean 2.546305 Median 2.200000 Sum 868.290000 SE Mean 0.102967 LCL Mean 2.343772 UCL Mean 2.748838 Variance 3.615363 Stdev 1.901411 Skewness 1.594526 Kurtosis 4.763082

The distribution is positively skewed for maximum. There are few observations with high values of maximum fluoride. The median for maximum fluoride is 2.1 g/L.

  Code: x<-1 library(readxl) arsenic<-read_excel(path=“C:/Users/lenovo/Desktop/arsenic.xlsx”) flouride<-read_excel(path=“C:/Users/lenovo/Desktop/flouride.xlsx”) summary(arsenic) summary(flouride)

mydata1 <- na.omit(arsenic) mydata2 <- na.omit(flouride) summary(mydata1) summary(mydata2)

total <- merge(mydata1,mydata2,by=“location”)

library(fBasics) basicStats(total\(n_wells_tested.x) basicStats(total\)percent_wells_above_guideline.x) basicStats(total\(median.x) basicStats(total\)percentile_95.x) basicStats(total$maximum.x)

basicStats(total\(n_wells_tested.y) basicStats(total\)percent_wells_above_guideline.y) basicStats(total\(median.y) basicStats(total\)percentile_95.y) basicStats(total$maximum.y)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.