Introduction

The purpose of this document is to interpret information regarding global populations, provided “worldometer.com”. On this webpage, countries in the world by population (2020) are provided in a table. By examining this data, we can seek to learn more about how countries with different population sizes compare in terms of Fertility rate, Medium age, and other factors, we can understand ways in which we can solve global issues, such as human overpopulation.

Load Packages

All packages are loaded, installed, and ready for use.

-tidy verse allows for the use of ggplot, which can be used to create visualizations and graphs.

-rmarkdown and knitr allow for reporting

-XML allows for working with XML table objects on HTML webpages

-httr is useful for web authentication

-rvest provides useful tools for working with HTML and XML

-countrycode provides useful tools to add a new column

Defining the HTML source

The data set has been downloaded from the web source and put into a data frame.

Data Wrangling

After the dataset has been imported, the first thing to do is change all the data types so that they can be used appropriatly for analysis. The type of country will be changed to character, while all other data types in the data set will be changed to numeric. Since the starting data type is factor for numeric values, they will be changed to character first before being changed to numeric to ensure data is transfered properly.

The following is an interactive table for the user.

Explored Analysis 1

For the countries who have the highest Net Change, how much of this Net Change is due to Migration? This an interesting question, because a common concern for many countries is both over population, and an influx in immigrants. Seeing if immigrants lead to overpopulation in a given country could effect law making decisions.

Now that the data has been loaded and cleaned for this analysis, we first need to determine which countries have the highest net change. We can do this by creating an interactive table, and then clicking on the top row under “Net Migrants” to sort the rows by order. While this table helps to give the information we need for this specific question, it also provides a good tool for the user to help understand the data. Then, using this information, we can create a bar graph that show the number of migrants for these top countries.

To my surprise, it actually appears that these countries (which have huge net positive changes in population) actually have negative values for Migrants. That means that even though the population of these countries is growing overall, there are still people who are migrating away from these countries in mass numbers. A next step to this research would be to see if the is a linear relationship between net population change and the number of migrants, to see if this is a change globally, and then use linear regression to try and predict this realtionship.

Explored Analysis 2

What is the relationship between population size and land area? While this data set already provides a number for densisity, having a visual representation of this relationship will help us to understand how location plays a role in how closley packed populations are.

Because we already have the data of interest that we need, the first thing that needs to be done is having the addition of a new variaiable that indicates the continent of each country. That way, we can also have a clearer indication of where these higher populated countries are.

Next, we create the proper visualization. For these two continuous variables, a scatterplot is best able to show the realtionship. In order to make the results easier to see visually, a limit on the population and the area has been put in place to remove some of the outliers, such as Russia.

Based on our analysis, countries that are more densly populated tend to have a smaller area, which makes sense since you calculate density by dividing the number of people by the area of that location. In order to purse this further, I would reccomend using the pearson correlation coefficient to see if there is a linear relationship between these variable as a next step to confirm this analysis.

Explored Analysis 3

Do countries with a larger population tend to have higher fertility rates? The reason why this is any interesting topic to persue is it can provide some indiciation on the role that women play in the overall population of their countries. According to https://wol.iza.org/ (world of labor.org), women who are more highly educated tend to have fewer children. I want to see if this has any general effect on the over all population

As we already have all the data that we need with no new variables needing to be greated, let’s go straight into the visualization! Similar to the last analysis, a scatter plot will be used to show the relationship.

Based on the results of this visual, there doesn’t appear to be much of a relationship between Fertility Rate and Population. While there are a few outlier, it’s hard to say definitivly that a higher fertility rate means that there is a higher population. However, I would be interested in seeing how fertility rate and population have been related on a global scale; there maybe other factors effecting the relationship of these two variable between country to country, but on a global scale it may be a clearer relationship. From a statistical analysis sense, you could also use the pearson correlation coefficient to see if there is infact a linear relationship.

Explored Analysis 4

Are older populations more urban than younger populations? This could provide some useful information on cultural values for different ages. While this is by no means confirmed, often times in the media there is this repeate trope of the young person wanting to leave there small town, and branch out into the ‘big city’. By looking at the relationship between the average age populations and how urban it is, we can see if this idea has any support behind it.

Before we create a visualization that compares these two variables, I have decided to create a new column based on age that breaks up each row value into an age bracket.

Now that we have a variable that breaks up the ages into age categories, a bar graph can be created that shows the medium age by the percent of that population that is urban.

With the exception of the 34-39 age bracket, the data almost appears to be normally distributed, with extremley young populations and extremly old populations being more rural, with middle aged populations being more urban. This could indicate that countries in the middle-aged range tend to value the busy working envirnments of urban cities, while older and younger populations tend to live more in rural populations. A next step to purse that could be interesting is the role that wealth plays in the percentage of urban populations, so that we can see if these middle-aged populations benefit from being more in the cities. However, before doing this additional analysis, I would do the following:

1: Research into the NA’s. For many of the smaller countries, there isn’t any information recorded for the average age of their population, as well as the percent of the population that is urban. In order to truly get an accurate representation in the data, information in this regard should be gathered first. 2: Perform a normality test to see if the information is actually normally distributed once information about the NA’s has been gathered.

Explored Analysis 5

Demonstrate the yearly change by continent. This is useful because it helps to show (in the broadest sense) how people are migrating globally. Seeing global trends help goverments plan how they allocate their resources, and seeing these tends by continent is just one step in gaining this understanding.

As we have already gathered our data and created the column that categorizes countries by continent, the only thing that is left to do from this perspective is to create the visualization

Based on our visual, overwhelmingly Africa is growing in population in comparison to the rest of the population. This means that African governments will face challenges in feeding and providing healthcare for the growing population, but will also face opportunities economically through more people being involved in their continental markets. In comparison, Europe’s population is changing the least out of the group, indicating that their population is becoming more stable. The next step in understanding this data would to gather how the African yearly change has been in the past, and if appropriate, use linear regression to help predict future growth. If the African continent’s population continues to grow at the rate, then the government can have some indicuation on how much they need to prepare.