Abstract

This report continues the research from the previous report. Now that a data set has been cleaned, the data of the United States and China are extracted and summarized using descriptive statistics and graphics. The results of this report indicate that the United States has higher water acccessibility than China, and that the data are mostly normally distributed.

Introduction

This week, I will be looking to answer one of the questions I came up with from the last report: “Are the countries with the largest economies necessarily those with the highest percent of their population with access to clean water?” The 2 largest economies in the world are those of the United States and China. The current way of measuring the standard of living by country is primarily through GDP per capita. This value is widely different from GDP because it accounts for the wealth of each person, with the assumption of a uniform distribution (it is not uniform in reality).Median GDP per household however is a far better estimator as it accounts for an unequal distribution of wealth. According to a 2013 Gallop Poll, these values for China and the United States are $6180, and $43,585, respectively. Based on these numbers I would expect to see that the United States has a higher percent of its population with access to drinking water than China does. Tis report attempts to answer that question.

Methods

With the intent of comparing the United States and China, first it is necessary to extract these countries’ data from the parent data set. Because the observation count is very small, 26, I decided to do this in Google Sheets. The original data set was downloaded previously, and from there I created another spreadsheet of 3 variables; “Year”,“US”, and “China”. This was saved as a csv file and using the following command, imported this data set into RStudio:

Using this data set, I will examine both China and The United States’ populations in their ability to access drinking water. The examination will include multiple tables and graphics showing descriptive statistics including, mean median, minimums, maximums, quartiles, standard deviations, kurtosis, skewness, scatter plots, histograms, and Q-Q plots.

Results and Discussion

In order to better visualize the data, I will use a scatter plot that directly reflects the complete refined data set in Table 1.
In Figure 1, we can see the progression of both countries over the years 1990 to 2015. Here, it is easy to see that while the United States has had little improvement, this is because its starting point was very high, and given that the upper limit is 100, it is expected to see a low rate of change. China on the other hand, started significantly lower, and made a great progression over the years to nearly reach the values of The United States.

Next, I will look at the basic descriptive statistics of both countries:

Table 2 shows the mean, median, minimum, maximum, and the 1st and 3rd Quartiles of both countries. We can see that the United States’ population has significantly greater access to drinking water than China’s population. There could be many explanations of this. One explanation is that the United States has been a superpower economy for over a century, thus giving them ample time to improve living conditions, while China saw its rapid increase in production far more recently and hasn’t had equal time to reach the highest standards. The mean Value of the United States is 98.85, this means that 98.85% of the population within the United States have access to drinking water. While this is a high value, it is also surprising that the United States, the largest economy in the world, has not reached a value of 100 by 2015. The mean value of China is 82.63. Given the history of China and its minimum value of 66.9. It is clear that they have made strides in improving the quality of life of their population, and in 2015, they have improved to 95.5, nearly mirroring the level of the United States. In analyzing what this means in general, this might suggest that there is a lag in improvement in water access relative to when the country’s economy grows.

Further, the distributional statistics of both countries are found below in Table 3:

These statistics tell us more about how the data is shaped. The Standard deviation tells us the dispersion of the data about the mean. A high Value would indicate that a significant portion of the data is far from the mean, and that visually, the distribution about the mean is “wider” than that of a normal distribution. A lower Value would indicate that the data is very close to the mean, and visually, the distribution would appear “narrow” compared to the normal distribution. We can see in Table 3 that the standard deviation of the United States is very low, .25. This means that over the past 26 years, the country’s population with access to drinking water has been consistent and not too deviant of the mean value. In China however, we have a high standard deviation. This is indicative of the serious improvement China has made in 26 years. Because of the steady progression, the data contains values much lower and much higher than the average. In the calculation of standard deviation, the differences in the data from the mean of the data are squared to obtain positive values, summed, and square rooted, thus, the high value shows that much of the data is far from the mean.
The skewness and kurtosis tell us more about both the symmetry about the mean and the thickness of the tails, respectively. In Table3, The skewness of the US and China are -.32 and -.21, respectively. These values are close to that of the standard normal distribution, however the do indicate that there is a slight left-skewness to the distribution. This theoretically implies that median of the data is slighty greater than the mean, indicating that more of the years were lower than the mean than higher than the mean. The kurtosis of both countries are also very similar,, and are less than the standard normal distribution of 3. This means that less data exists in the tails of each distribution, or fewer outliers.

The distributions of each country compared with the normal distribution curve are below in Figure 2 and Figure 3:

These distributions give us an idea about how normal the data is. Normal data is easy to manipulate for further statistical analysis and therefore it is ideal to use data that is close to being normally distributed. The data in this paper is not perfectly normal, however, it is close enough to carry out further analysis.
Another useful tool in analyzing the normality of the data is through the use of Q-Q plots. What this does is measure how linear the data is. This tool can be more visually indicative of normality as it does not rely on amount of bins and bin size like a histogram does. Theoretically, if the data forms a perfectly straight line, then the data is normal.In Figure 4 and Figure 5 below, we can see how the data values of each country compare to a straight line:

The data here again is not perfectly normal, but it does follow the general trend of a straight line which indicates that the data is close to being normal. This allows for continued analysis with statistical meaning and impact.

Conclusion

The purpose of this report was to get a better idea of the key descriptive statistics and shape of the data. After performing the necessarily actions, we find that the United States has a higher % of its population with access to clean water than China has. Despite this, China has made far greater improvement, increasing by 29.9% over the past 26 years. This shows us that China is currently making improvements and is quickly approaching the quality of the United States. Given the histograms and Q-Q plots, we can also confirm that the data is almost normal, which provides enough insight as to whether or not the data can be manipulated as they were normal. Continued research on this topic will attempt to model the data based on the year and test the significance of these models.

References

Phelps, Glenn, and Steve Crabtree. “Worldwide, Median Household Income About $10,000.” Gallup.com, 16 Dec. 2013, news.gallup.com/poll/166211/worldwide-median-household-income-000.aspx.