1 Executive Summary

The aim of this report is to note and explain some of the key trends revealed from the dataset ‘Countries of the World’. These trends relate specifically to demographic information of all countries and independent islands throughout the world. Correlations between a number of the dataset’s quantitative variables, can be drawn to reveal some more underlying, implicit cultural trends that contribute to what makes each country unique. The report reveals details regarding


2 Full Report

2.1 Initial Data Analysis (IDA)

Summary:

The data is a compilation of demographic information for all of the world’s countries and independent islands. Even though the data doesn’t include any specific variables relating to cultural, social or political aspects, some implicit information (such as a country’s level of development, general health standards and economic contribution) can be drawn - as explained in Research Question 1.

It has 227 rows (countries and independent islands) and 20 columns (variables). 3 of the variables are qualitative - 2 nominal and 1 ordinal, and 17 are quantitative - 14 discrete and 3 continuous. One of the major discoveries was the huge divide in countries’ populations. China and India stand alone with populations well over 1 billion. In fact, the difference in population between the country with the world’s 2nd highest population (India) and the country with the world’s 3rd highest population (USA) was approximately 797 million. The world’s 5 most heavily populated countries account for 50.69% of the world’s total population while the combined population of the 5 rows with the smallest populations was only 49,064 - occupying less than 1/100,000th of the world’s total population.

  • The data came from ‘Kaggle.com’ which is an american website designed for the public sharing of US Government-approved datasets. The data is valid because it is sourced and approved from a reputable source, that is the United States Government and is shared on a well-rated website with strict regulations regarding what can and can’t be shared.

  • One particular issue with the dataset is that there is no distinction between which rows are and aren’t classified as legitimate countries. The dataset has 35 independent islands and 195 legitimate countries, but there is nowhere in the dataset which recognises this difference. There is also a variable with title name ‘Other’ in column 14 which is measured like a discrete quantitative variable, but the name doesn’t give a clear indication of what is being measured.

  • Each row represents a different country (or independent island)

  • Each column represents one of the 20 different variables.


2.2 Research Question 1

Is there a correlation between countries’ GDP growth per capita and their literacy rates? Explain what the trend could reveal.

2.2.1 Conclusions

As shown on the scatterplot, there exists some positive correlation between GDP growth per capita and average literacy rates. The plot reveals that the countries with highest GDP growth per capita also demonstrate literacy rates above the middle of the spectrum. This piece of evidence suggests that the countries who perform the best economically demonstrate high literacy levels which are a strong indication of a well-educated population. On the other hand, low levels of GDP growth, don’t necessarily correlate to an average low level of literacy rates. As seen on the plot, there is a large cluster of countries in the top left hand corner which demonstrate lower GDP growth per capita but above average literacy rates. This pattern indicates that a country isn’t necessarily economically efficient despite high levels of education/literacy.

2.3 Research Question 2

Which measurement might show a more accurate representation of the central tendency for the size of a population - median or mean? Explain why.

2.3.1 Counclusions

The mean value for the ‘population’ variable totaled is 28,740,284 while the median value is only 4,786,994. As explained earlier, the only two countries that exceed a population of 1 billion are China and India, the next largest country in terms population is the US, whose population is 797 million smaller than that of India. For this reason, India and China are outlier countries - they dramatically pull up the size of the total mean population. Excluding these two countries, the mean population of the remaining 225 columns is 18,287,639. As such, the better indicator of central number for a country’s population is the median.

2.4 Research Question 3

Do literacy rates and/or GDP per capita correlate with the number of phones per 1000 people? How does this correlate to the dispersal of technology around the world?

2.4.1 Conclusions

Literacy rates have a direct correlation with the number of phones per 1000 people. This is shown in the dataset, where the median literacy rate and phones per 1000 is 67.1 and 195.7 respectively. If literacy rate moves forward a standard deviation, the number of phones per 1000 people becomes 207.3. This shows that as literacy rates increase, the number of phones per 1000 also increase. However, the number of phones per 1000 people also can be affected by other variables, like GDP, therefore, it must be noted that literacy rates aren’t the only factor that affect the number of phones per 1000 people in a population.

For example, in the country of USA (GDP per capita is US $37,800), which has a literacy rate of 97.0, there is a very high number of phones per 1000 people (898.0). On the other hand, Indonesia is the next most heavily populated country in the world following the USA - it has a GDP per capita of US $3,200 and a literacy rate of 87.9 however, on average only 52.0 phones per 1000 people. As such, statistics for this trend suggest some correlation between the literacy rate (an indication of the general level of education) and the number of phones per 1000 people, however GDP growth per capita, which is a strong indicator of a country’s economic success, could also be playing a large contribution to the number of phones (ie level of wealth) people have.

This trend conveys to us, that the level of education and economic success in a country, correlates to the dispersal of technology. In a day and age like today, where technology makes up a huge portion of our daily lives here in a developed country like Australia, we see that a potential solution doesn’t only lie in improving literacy and increasing the use of technology each on their own, but, one can deduce, that if both are done together, the result will be more profound.

3 References

Lasso, F. (2018). Countries of the World. Retrieved 12/9/2018 from https://www.kaggle.com/fernandol/countries-of-the-world/version/1#countries%20of%20the%20world.csv

Hofferth, S, L, et al. (2016). Cell Phone Use and Child and Adolescent Reading Proficiency. Retrieved 12/9/2014from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5036529/

4 Personal reflection on group work

  • I mainly contributed to the group work by finding the dataset and drawing out most of the statistical information used throughout the report.

  • What I learnt about group work is that it can be useful for minimising errors in your work because you have one or multiple other people reading over the work that you’ve produced. This aspect is really useful because it allows people to view what has already been written with a fresh set of eyes.

  • On the other hand, group work can be challenging because it’s often difficult to communicate ideas to your partner/s when you’re not with them in person.