head(state.x77)
## Population Income Illiteracy Life Exp Murder HS Grad Frost Area
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
## Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
## California 21198 5114 1.1 71.71 10.3 62.6 20 156361
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
states <- as.data.frame(state.x77)
attach(states)
mean(Population)
## [1] 4246.42
median(Population)
## [1] 2838.5
The sample mean is a measure of central tendency that showcases the average value of the set, taking the sum of all the observational values and dividing it by the total number of observations. Thus, the sample mean tells us that the average estimated population of all 50 states in 1975 is about 4,246,420 people (within the sample). The sample median is another measure of central tendency that tells us the value of the middle ranked observation, under the condition that the observations are arranged in numerical order. Thus, the sample median informs us that half of the 50 states in the sample observed populations over 2,838,500 while others observed populations underneath that same central value.
var(Population)
## [1] 19931684
sd(Population)
## [1] 4464.491
The sample variance is a measure of spread that showcases the average of the squared differences between each point and the sample mean, indicating how much variance exists around the average value of the dataset. Thus, a high sample variance of 19,931,684E^6 indicates that the observed values of each 50 states populations create a considerable degree of variability and are widely spread out around the sample mean of 4,246,420 people. The variance is a this big a value because it is based on the squared deviations from the mean. Thus, standard deviation is often used as a better measurement for variability, deriving itself from the square root of the sample variance. The standard deviation of this sample tells us that, on average, the state populations differ from the mean by approximately 4,464,491 people, reflecting the notion that the population sizes of the states are quite spread out, creating greater variability.
max(Population)
## [1] 21198
min(Population)
## [1] 365
hist(Population)
The histogram showcases a large concentration of observations (state population sizes) among the values of 0 and 5,000,000, with frequencies of other observations outside that segment decreasing as less and less states experience large populations. Thus, majority of the states have populations between 0 and 5,000,000 people.
sum(Population)
## [1] 212321
big <- Population > 5000
table(big)
## big
## FALSE TRUE
## 38 12
The table illustrates that 12 states have populations larger than 5,000,000 people while the other 38 experience population sizes smaller than the designated value.
plot(Area, Population)
There appears to be a relatively moderate, negative correlation between population size and the state size (in square miles), with a majority of population sizes concentrating around 0 to 5,000,000 people and a majority of the square mileage for those same states concentrating around 0 to 100,000 miles squared.
pop.density <-Population/Area
hist(pop.density)
Most of the data on the on the histogram concentrates between the population density values of 0 and 0.2, with decreasing frequencies occurring as less states experience population densities over 0.2 and the following values. This observation tells us that most states experience low population densities, meaning they have a spread out population with few individuals per square mile of land. In essence, they experience relatively small population sizes compared to the amount of geographical area available.