1.1 Print out the dimensions of the data frame.
## [1] 56000 4
1.2 Print out the names and type of each of the data frame’s columns.
## Rows: 56,000
## Columns: 4
## $ region <chr> "SSC20005", "SSC20005", "SSC20005", "SSC20005", "SSC20005",~
## $ age <dbl> 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,~
## $ gender <chr> "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F",~
## $ population <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
1.3 Print out the number of unique regions in the dataset. Ans: 500 unique regions, each with 112 observations.
## Rows: 500
## Columns: 2
## $ region <chr> "SSC20005", "SSC20012", "SSC20018", "SSC20027", "SSC20029", "SS~
## $ n <int> 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112~
1.4 What is the minimum age bin? Ans: 0 year
1.5 What is the maximum age bin? Ans: 55 years
1.6 What is the bin size for the age field? Ans: 1 year
2.1 Use the expected value for the age to find the mean age for the whole data sample. Ans: Expected Value is 27.800
## [1] 27.80027
2.2 Standard Deviation for whole data sample
Ans=Sample Standard Deviation= 15.778, the same as the population Standard Deviation of 15.778 to 3dp.
## [1] 15.77804
## [1] 15.77818
Question 3 Statistics of mean age for each region
3.1 Mean=30.608
3.2 SD=7.996
3.3 Minimum = 2
3.4 First Quartile = 27.426
3.5 Median = 29.232
3.6 Third Quartile = 33.35
3.7 Maximum = 55
3.8 IQR = 5.924
## [1] "1"
## mean sd min Q1 Median Q3 max IQR
## 1 30.608 7.9962 2 27.426 29.232 33.35 55 5.9243
3.9 Histogram of the distribution of means

Question 4 Region with smallest population
SSC20099 is one of the regions with the smallest population of 3 people
## Rows: 35
## Columns: 2
## $ region <chr> "SSC20099", "SSC20127", "SSC20151", "SSC20346", "SSC20383", "SS~
## $ n <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ~
Question 5 Region with largest population. Ans: Region is SSC22015 with population of 37948
From the plot in 5.1, it is observed that:
1. The population is highest around 3 and 30 years old.
2. The population declines from 4 to 20, and beyond 30 years old.
3. The trend suggests that the region is populated by mainly young families with young children.
## # A tibble: 1 x 2
## region n
## <chr> <dbl>
## 1 SSC22015 37948



6.1 Scatter Plot: Ratio of old to young vs population
## [1] "1"

The scatter plot in 6.1 indicates the following trends:
1. When the population is low, the ratio of old to young is high. This suggests that there are more old people than young people when the population of a region is low.
2. When the population is high, the ratio is low. This shows that in the more populous regions, there are more young people than old people.
3. This is consistent with a trend that older people would move to a small country town where the cost of housing is cheaper, and they will have a greater spending power with their limited funds and many of the people are likely retired.
4. The younger couples and people will live in a more populous region for cheaper housing, jobs, schooling, health facility and other conveniences to support their lifestyles.
5. There are many more in between the two extremes, and the trend will depend on a combination of factors. e.g life stages, wealth levels, empty nesters, availability of jobs etc.
7.1 Scatter Plot: Ratio of female to male vs population
## [1] "1"
## [1] 5.3333
## [1] 0

The scatter plot in 7.1 indicates the following trends:
1. In regions where the population is low, the ratio of female/male ranges from very low to a high of 5.33.
2. One possible reason for this trend is related to Question 6.2, where retired people tended to move to small country towns. Female generally lives longer than male, and more males have died in these low population regions.
3. Another possible reason for the high ratio could be that the male of the family who are still fit and healthy to work, will go to more populous regions to find work and send money home. This way the female members could stay put in a region where cost of living is lower, and rely on the income earned by the male members of the family. e.g. This is quite typical in countries like china, where the male members will go from rural villages to the big city to earn an income to support their families in the poor villages.
4. The red regression line shows that the ratio trends around 1, representing the more common balance between female and male in most regions.
8.1
Females 18 to 21 have been chosen as the primary customers for the hypothetical product of a face cream that will make a young woman look even more beautiful. In addition, the purchaser of the product has a chance of referring other customers to buy the product. The top three referrers will earn a free dinner for two in a five star restaurant.
8.2
1. The two regions with the largest population of females between 18 and 21 are SSC22015 (1113) and SSC20492 (2566).
The four plots in 9.3 show that as n increases, the sample distribution approaches the normal distribution, confiming CLT which states that the distribution of sample means of any distribution will tend to the normal distribution. (Math2406, Applied Analytic 3.20)




Reference
Adding manual legend to ggplot2, viewed 20 Jan 2022 https://community.rstudio.com/t/adding-manual-legend-to-ggplot2/41651
Arrange rows by column values, viewed 26 Jan 2022 https://dplyr.tidyverse.org/reference/arrange.html
Convert a Numeric Object to Character, viewed 20 Jan 2022 https://www.geeksforgeeks.org/convert-a-numeric-object-to-character-in-r-programming-as-character-function/
Convert Factor to Numeric, viewed 20 Jan 2022 https://www.geeksforgeeks.org/convert-factor-to-numeric-and-numeric-to-factor-in-r-programming/
Descriptive Statistics with dplyr, viewed 26 Jan 2022 https://www.marsja.se/learn-how-to-calculate-descriptive-statistics-in-r-the-easy-way/
Filter function, viewed 19 Jan 2022 https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter
Filtering Data with dplyr, viewed 26 Jan 2022 https://blog.exploratory.io/filter-data-with-dplyr-76cf5f1a258e
ggplot2 scatter plots, viewed 23 Jan 2022 http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization
Group by one or more variables, viewed 26 Jan 2022 https://dplyr.tidyverse.org/reference/group_by.html
head() or glimpse function, viewed 26 Jan 2022 https://stackoverflow.com/questions/23408510/head-function-in-r-package-dplyr
How to Make ECDF Plot with ggplot2, viewed 20 Jan 2022 https://www.geeksforgeeks.org/how-to-make-ecdf-plot-with-ggplot2-in-r/
Random number generator, viewed 25 Jan 2022 https://www.educba.com/random-number-generator-in-r/
Repeating rows, viewed 25 Jan 2022 https://stackoverflow.com/questions/8753531/repeat-rows-of-a-data-frame-n-times
RMIT Course Math 2404 Data Visualisation and Communication
RMIT Course Math 2406 Appplied Analytic
Select certain rows, viewed 23 Jan 2022 https://stackoverflow.com/questions/2854625/select-only-rows-if-its-value-in-a-particular-column-is-less-than-the-value-in-t
Select values that have specific characters, viewed 22 Jan 2022 https://community.rstudio.com/t/how-to-select-values-that-have-specific-characters/68748
```
6.2 Comment on Trends