1.1 Print out the dimensions of the data frame.
## [1] 56000 4
1.2 Print out the names and type of each of the data frame’s columns.
## tibble [56,000 x 4] (S3: tbl_df/tbl/data.frame)
## $ region : chr [1:56000] "SSC20005" "SSC20005" "SSC20005" "SSC20005" ...
## $ age : num [1:56000] 0 0 1 1 2 2 3 3 4 4 ...
## $ gender : chr [1:56000] "M" "F" "M" "F" ...
## $ population: num [1:56000] 0 0 0 0 0 0 0 0 0 0 ...
1.3 Print out the number of unique regions in the dataset (500 unique regions, each with 112 observations).
## 'data.frame': 500 obs. of 2 variables:
## $ Group.1: chr "SSC20005" "SSC20012" "SSC20018" "SSC20027" ...
## $ x : int 112 112 112 112 112 112 112 112 112 112 ...
1.4 What is the minimum age bin? Ans: 0 year
1.5 What is the maximum age bin? Ans: 55 years
1.6 What is the bin size for the age field? Ans: 1 year
2.1 Use the expected value for the age to find the mean age for the whole data sample
## [1] 27.80027
Question 2.2 Standard Deviation for whole data sample
Ans=Sample Standard Deviation= 15.778,the same as the population Standard Deviation of 15.778 to 3dp.
## [1] 15.77804
## [1] 15.77818
Question 3 Statistics of mean age for each region
3.1 Mean=30.608
3.2 SD=7.996
3.3 Minimum = 2
3.4 First Quartile = 27.426
3.5 Median = 29.232
3.6 Third Quartile = 33.35
3.7 Maximum = 55
3.8 IQR = 5.924
## [1] "1"
## [1] 30.608
## [1] 7.9962
## [1] 2
## 25%
## 27.426
## 50%
## 29.232
## 75%
## 33.35
## [1] 55
## [1] 5.9243
3.9 Histogram of the distribution of means

## $breaks
## [1] 0 5 10 15 20 25 30 35 40 45 50 55
##
## $counts
## [1] 2 8 6 10 36 232 106 45 26 13 16
##
## $density
## [1] 0.0008 0.0032 0.0024 0.0040 0.0144 0.0928 0.0424 0.0180 0.0104 0.0052
## [11] 0.0064
##
## $mids
## [1] 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5
##
## $xname
## [1] "WMS"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
Question 4 Region with smallest population
SSC20099 is one of the regions with the smallest population of 3 people
Question 5 Region with largest population
## Group.1 population
## 1 SSC22015 37948

## geom_step: na.rm = FALSE
## stat_ecdf: n = NULL, pad = TRUE, na.rm = FALSE
## position_identity


Question 6.1 Ratio of old to young vs population scatter plot
## [1] "1"

Question 7.1 Ratio of female to male vs population scatter plot
## [1] "1"
Question 7.1 Ratio of female to male vs population scatter plot
## [1] 5.3333
## [1] 0

Question 7.2
The scatter plot in 7.1 indicates the following trends:
1. In regions where the population is low, the ratio of female/male ranges from very low to high of 5.33.
2 One possible reason for this trend is related to Question 6.2, where retired people tended to move to small country towns. Female generally live longer than male, and more males have died in these low population regions.
3 Another possible reason for the high ratio could be that the male of the family who are still fit and healthy to work, will go to more populous regions to find work and send money home. This way the female members could stay put in a region where cost of living is lower, and rely on the income earned by the male members of the family. e.g. This is quite typical in countries like china, where the male members will go from rural villages to the big city to earn an income to support his family in the poor villages.
4 The red regression line shows that the ratio trends around 1, representing the more common balance between female and male in most regions.
Question 8.1
Females 18 to 21 have been chosen as the primary customers for the hypothetical product of a tonic to make a young woman beautiful
Question 8.2
The two regions with the largest population of females between 18 and 21 are SSC22015 (1113) and SSC20492 (2566)