This is an individual assignment. Submit your .Rmd and “knitted”.html files through Collab.
Upload your html file on RPubs and include the link when you submit your submission files on Collab.
Please don’t use ggplot2 for this assignment. We’ll use ggplot2 almost all the times after this assignment.
Use the occupational experience variable (“oexp”) of the income_example dataset and plot
You can either produce four separate but small plots, or you can use par(mfrow = c(2, 2)) to create a plotting region consisting of four subplots.
Briefly describe the distributions of occupational experience in words (also include your plots and the R syntax).
[“Play” with the hist() and density() functions; for instance, by choosing a different number of bins or different break points for the hist() function, or different bandwidths using the adjust argument in density(). See also the corresponding help files and the examples given there. Only include the histogram and density estimate you find most informative. Also, add useful axis-labels and a title using the following arguments inside the plotting functions: xlab, ylab, main. Use the help ?par() for the description of many more plotting parameter.]
[That’s how the plots could look like – but you have to do it with your group;-)]
#file.choose()
data <- read.table("/Users/michaelvaden/Downloads/income_exmpl.dat", header = TRUE, sep = "\t")
par(mfrow = c(2, 2))
hist(data$oexp, xlab = "Occupancy Experience (years)", main = "Histogram of Occupancy Experience", col = "darkblue", border = "orange")
plot(density(data$oexp), main = 'Density Estimate of Occupancy Experience', col = "orange", xlab = "Occupancy Experience (years)")
boxplot(data$oexp, horizontal = T, main="Boxplot of Occupancy Experience", col = "orange", border = "darkblue", xlab = "Occupancy Experience (years)")
boxplot(data$oexp ~ data$sex + data$occ, horizontal = T, main="Occ. Experience by Sex and Occ. Status", col = rep(c('orange', 'darkblue'), each = 3), border = rep(c('darkblue', 'orange'), each = 3), xlab = "Occupancy Experience (years)", ylab = "", las = 1)
The histogram of occupancy experience is right-skewed, with a mode of 0-5 years. The sample contains similar frequencies of occupancy experience for the year range 5-35, after which is there is a decrease in frequency.
The Density estimate of occupancy experience appears to be slightly right-skewed, with two humps. The highest density of occupancy experience is around 0.023 at roughly 8 years of occupancy experience. The density of occupancy experience is similar in the range of 10 to 30 years.
The boxplot of occupancy experience shows that the data is slightly right-skewed, with a range of approximately 50. The 25th percentile is approximately 9 years, while the 75th percentile is at 30 years. The median is at approximately 19 years.
The boxplots of occupancy experience by sex and occupancy status show that the smallest range is for males with low occupancy status. This group also has the lowest median and 75th percentile. All of the groups are slightly right-skewed, with the exception of males with high occupancy status, which has a slightly left-skewed box plot with the highest five-number summary.
jitter() or alpha() for avoiding overlying points.#file.choose()
library(foreign)
data <- read.spss("/Users/michaelvaden/Downloads/SCS_QE.sav", to.data.frame=TRUE)
## re-encoding from CP1252
## Warning in read.spss("/Users/michaelvaden/Downloads/SCS_QE.sav", to.data.frame =
## TRUE): Undeclared level(s) 0 added in variable: married
plot(jitter(data$mars, factor = 3), jitter(data$mathpre, factor = 3), xlab = "Math Anxiety Score", ylab = "Math Achievement Score")
| male” in your first argument to create a conditioning plot.coplot(mathpre ~ mars | male, data = data, xlab = "Math Anxiety Score", ylab = "Math Achievement Score")
describe your plots.
There appears to be a slight negative linear relationship on the plot between math anxiety and math scores. Simpson’s Paradox is essentially the phenomenon that, when a trend appears in groups of data, it disappears when the data is combined or split up. If we examine the plots split up by sex, we do not find much evidence of Simpson’s Paradox. Although the sample size for females is larger and there is greater range in the plot, the general trend of a slight negative linear relationship remains apparent in both plots separated by sex. When we re-examine the combined plot, there is no significant difference in trend.
Use a dataset that is available in data repositories (e.g., kaggle)
Briefly describe the dataset you’re using (e.g., means to access data, context, sample, variables, etc…)
This dataset catalogues the average life-expectancy at birth for each year in the range 1900 - 2013 overall and for each sex. All races are included, and the dataset also includes mortality rates. Populations in the study are based on standard population and the census, with non-census years using post-census estimates.
Re-do Part 2, i.e.,
jitter() or alpha() for avoiding overlying points.# file.choose()
lifedata <- read.csv("/Users/michaelvaden/Downloads/NCHS_-_Age-adjusted_death_rates_and_life-expectancy_at_birth___All_Races__Both_Sexes___United_States__1900-2013.csv")
# View(lifedata)
plot(jitter(lifedata$Year, factor = 2), jitter(lifedata$Average.Life.Expectancy, factor = 2), xlab = "Year", ylab = "Average Life Expectancy")
| C” in your first argument to create a conditioning plot.coplot(Average.Life.Expectancy ~ Year | Sex, data = lifedata, xlab = "Year", ylab = "Average Life Expectancy")
There appears to be a significant positive relationship between Year and Average Life Expectancy variables. There seems to be more variability before the year 1950, with a very strong positive relationship after the year 1950. If we examine the plots split up by sex, we do not find much evidence of Simpson’s Paradox. The plots are of similar sample sizes, and both the male and female plots are almost identical in trend to the plot of both sexes When we examine the combined plot along with the two separated by sex, there is no significant difference in trend.