This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
Choose one of the large datasets listed on the Realtime Board (e.g., babynames or nasaweather)
Make sure you have > 1000 data What is the problem that you were given?
remove(list=ls())
library("babynames", lib.loc="~/R/win-library/3.1")
#get information on babynames dataset
??babynames
## starting httpd help server ... done
#name dataset
x<-babynames
#view first few lines
head(x)
## year sex name n prop
## 1 1880 F Mary 7065 0.07238
## 2 1880 F Anna 2604 0.02668
## 3 1880 F Emma 2003 0.02052
## 4 1880 F Elizabeth 1939 0.01987
## 5 1880 F Minnie 1746 0.01789
## 6 1880 F Margaret 1578 0.01617
#observe the structure of the data, ie. how many variables
str(x)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1792091 obs. of 5 variables:
## $ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ name: chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...
#subset data for computational purposes. We will only look at babies named Mary.
x=subset(x,name=="Mary")
x
## year sex name n prop
## 1 1880 F Mary 7065 7.238e-02
## 1274 1880 M Mary 27 2.280e-04
## 2001 1881 F Mary 6919 6.999e-02
## 3239 1881 M Mary 29 2.678e-04
## 3936 1882 F Mary 8148 7.042e-02
## 5278 1882 M Mary 30 2.458e-04
## 6063 1883 F Mary 8012 6.673e-02
## 7408 1883 M Mary 32 2.845e-04
## 8147 1884 F Mary 9217 6.699e-02
## 9611 1884 M Mary 36 2.933e-04
## 10444 1885 F Mary 9128 6.430e-02
## 11911 1885 M Mary 38 3.277e-04
## 12738 1886 F Mary 9890 6.433e-02
## 14324 1886 M Mary 32 2.688e-04
## 15130 1887 F Mary 9888 6.362e-02
## 16672 1887 M Mary 47 4.300e-04
## 17503 1888 F Mary 11754 6.204e-02
## 19224 1888 M Mary 50 3.849e-04
## 20154 1889 F Mary 11648 6.156e-02
## 21894 1889 M Mary 41 3.444e-04
## 22744 1890 F Mary 12078 5.989e-02
## 24570 1890 M Mary 35 2.924e-04
## 25439 1891 F Mary 11703 5.954e-02
## 27230 1891 M Mary 39 3.569e-04
## 28099 1892 F Mary 13174 5.857e-02
## 30010 1892 M Mary 50 3.804e-04
## 31020 1893 F Mary 12784 5.676e-02
## 32903 1893 M Mary 55 4.544e-04
## 33851 1894 F Mary 13151 5.573e-02
## 35803 1894 M Mary 48 3.843e-04
## 36792 1895 F Mary 13446 5.441e-02
## 38860 1895 M Mary 47 3.711e-04
## 39841 1896 F Mary 13811 5.481e-02
## 41907 1896 M Mary 57 4.416e-04
## 42932 1897 F Mary 13413 5.402e-02
## 44941 1897 M Mary 63 5.166e-04
## 45960 1898 F Mary 14406 5.255e-02
## 48185 1898 M Mary 50 3.785e-04
## 49224 1899 F Mary 13172 5.322e-02
## 51301 1899 M Mary 50 4.340e-04
## 52266 1900 F Mary 16707 5.257e-02
## 54723 1900 M Mary 75 4.625e-04
## 55998 1901 F Mary 13136 5.167e-02
## 58171 1901 M Mary 58 5.017e-04
## 59151 1902 F Mary 14486 5.167e-02
## 61441 1902 M Mary 56 4.218e-04
## 62513 1903 F Mary 14275 5.131e-02
## 64825 1903 M Mary 63 4.871e-04
## 65902 1904 F Mary 14962 5.116e-02
## 68319 1904 M Mary 58 4.187e-04
## 69463 1905 F Mary 16067 5.185e-02
## 71936 1905 M Mary 68 4.747e-04
## 73119 1906 F Mary 16370 5.223e-02
## 75563 1906 M Mary 76 5.275e-04
## 76752 1907 F Mary 17580 5.210e-02
## 79402 1907 M Mary 70 4.414e-04
## 80700 1908 F Mary 18666 5.265e-02
## 83369 1908 M Mary 79 4.748e-04
## 84718 1909 F Mary 19258 5.232e-02
## 87513 1909 M Mary 80 4.523e-04
## 88945 1910 F Mary 22847 5.446e-02
## 91972 1910 M Mary 99 4.748e-04
## 93574 1911 F Mary 24390 5.521e-02
## 96703 1911 M Mary 98 4.060e-04
## 98441 1912 F Mary 32302 5.506e-02
## 102235 1912 M Mary 118 2.614e-04
## 104792 1913 F Mary 36641 5.595e-02
## 108875 1913 M Mary 125 2.331e-04
## 111760 1914 F Mary 45344 5.692e-02
## 116388 1914 M Mary 127 1.859e-04
## 119723 1915 F Mary 58187 5.683e-02
## 125120 1915 M Mary 159 1.805e-04
## 129082 1916 F Mary 61436 5.659e-02
## 134680 1916 M Mary 164 1.776e-04
## 138778 1917 F Mary 64280 5.720e-02
## 144538 1917 M Mary 159 1.657e-04
## 148693 1918 F Mary 67372 5.603e-02
## 154735 1918 M Mary 169 1.611e-04
## 159093 1919 F Mary 65835 5.605e-02
## 165134 1919 M Mary 155 1.527e-04
## 169462 1920 F Mary 70980 5.706e-02
## 175657 1920 M Mary 195 1.771e-04
## 180219 1921 F Mary 73981 5.781e-02
## 186539 1921 M Mary 187 1.643e-04
## 191075 1922 F Mary 72172 5.785e-02
## 197313 1922 M Mary 186 1.653e-04
## 201833 1923 F Mary 71631 5.720e-02
## 207997 1923 M Mary 203 1.793e-04
## 212478 1924 F Mary 73520 5.674e-02
## 218789 1924 M Mary 223 1.908e-04
## 223347 1925 F Mary 70600 5.590e-02
## 229487 1925 M Mary 261 2.267e-04
## 233989 1926 F Mary 67828 5.514e-02
## 239960 1926 M Mary 272 2.375e-04
## 244449 1927 F Mary 70628 5.713e-02
## 250401 1927 M Mary 282 2.428e-04
## 254857 1928 F Mary 66862 5.594e-02
## 260623 1928 M Mary 294 2.577e-04
## 265019 1929 F Mary 63507 5.487e-02
## 270596 1929 M Mary 327 2.953e-04
## 274831 1930 F Mary 64131 5.499e-02
## 280386 1930 M Mary 340 3.011e-04
## 284618 1931 F Mary 60296 5.464e-02
## 289898 1931 M Mary 326 3.049e-04
## 293908 1932 F Mary 59866 5.412e-02
## 299311 1932 M Mary 330 3.073e-04
## 303289 1933 F Mary 55490 5.306e-02
## 308445 1933 M Mary 316 3.099e-04
## 312300 1934 F Mary 56911 5.259e-02
## 317584 1934 M Mary 303 2.854e-04
## 321480 1935 F Mary 55067 5.068e-02
## 326697 1935 M Mary 276 2.581e-04
## 330512 1936 F Mary 54362 5.046e-02
## 335680 1936 M Mary 293 2.754e-04
## 339407 1937 F Mary 55639 5.050e-02
## 344653 1937 M Mary 288 2.634e-04
## 348353 1938 F Mary 56208 4.925e-02
## 353676 1938 M Mary 284 2.500e-04
## 357378 1939 F Mary 54898 4.841e-02
## 362670 1939 M Mary 272 2.401e-04
## 366297 1940 F Mary 56197 4.758e-02
## 371648 1940 M Mary 303 2.555e-04
## 375258 1941 F Mary 58026 4.658e-02
## 380672 1941 M Mary 315 2.511e-04
## 384342 1942 F Mary 63242 4.549e-02
## 390084 1942 M Mary 276 1.960e-04
## 393767 1943 F Mary 66164 4.610e-02
## 399514 1943 M Mary 254 1.747e-04
## 403173 1944 F Mary 62468 4.572e-02
## 408796 1944 M Mary 252 1.815e-04
## 412327 1945 F Mary 59286 4.404e-02
## 417996 1945 M Mary 198 1.444e-04
## 421353 1946 F Mary 67464 4.183e-02
## 427509 1946 M Mary 186 1.127e-04
## 431058 1947 F Mary 71679 3.943e-02
## 437658 1947 M Mary 184 9.907e-05
## 441431 1948 F Mary 68601 3.936e-02
## 447984 1948 M Mary 163 9.142e-05
## 451668 1949 F Mary 66844 3.809e-02
## 458235 1949 M Mary 170 9.443e-05
## 461924 1950 F Mary 65454 3.722e-02
## 468661 1950 M Mary 120 6.597e-05
## 472232 1951 F Mary 65677 3.557e-02
## 478979 1951 M Mary 160 8.377e-05
## 482691 1952 F Mary 65677 3.453e-02
## 489634 1952 M Mary 158 8.009e-05
## 493339 1953 F Mary 64338 3.336e-02
## 500428 1953 M Mary 146 7.302e-05
## 504169 1954 F Mary 67994 3.416e-02
## 511342 1954 M Mary 169 8.174e-05
## 515130 1955 F Mary 63165 3.151e-02
## 522458 1955 M Mary 149 7.135e-05
## 526244 1956 F Mary 61753 2.999e-02
## 533765 1956 M Mary 143 6.668e-05
## 537584 1957 F Mary 61086 2.912e-02
## 545166 1957 M Mary 184 8.412e-05
## 549148 1958 F Mary 55846 2.705e-02
## 556821 1958 M Mary 141 6.549e-05
## 560667 1959 F Mary 54474 2.621e-02
## 568457 1959 M Mary 170 7.847e-05
## 572434 1960 F Mary 51479 2.475e-02
## 580371 1960 M Mary 169 7.802e-05
## 584358 1961 F Mary 47660 2.296e-02
## 592496 1961 M Mary 167 7.747e-05
## 596533 1962 F Mary 43494 2.146e-02
## 604727 1962 M Mary 158 7.516e-05
## 608736 1963 F Mary 41550 2.090e-02
## 617066 1963 M Mary 136 6.585e-05
## 621015 1964 F Mary 40984 2.094e-02
## 629482 1964 M Mary 134 6.609e-05
## 633406 1965 F Mary 34272 1.876e-02
## 641579 1965 M Mary 136 7.174e-05
## 645359 1966 F Mary 28883 1.645e-02
## 653686 1966 M Mary 108 5.940e-05
## 657508 1967 F Mary 25314 1.475e-02
## 666017 1967 M Mary 127 7.135e-05
## 669908 1968 F Mary 21721 1.271e-02
## 678839 1968 M Mary 101 5.686e-05
## 682838 1969 F Mary 19853 1.126e-02
## 692293 1969 M Mary 108 5.900e-05
## 696590 1970 F Mary 19200 1.048e-02
## 706740 1970 M Mary 99 5.195e-05
## 711364 1971 F Mary 16698 9.530e-03
## 721865 1971 M Mary 86 4.729e-05
## 726653 1972 F Mary 13763 8.535e-03
## 737230 1972 M Mary 74 4.418e-05
## 742065 1973 F Mary 12322 7.929e-03
## 752968 1973 M Mary 54 3.345e-05
## 757739 1974 F Mary 11752 7.504e-03
## 769022 1974 M Mary 61 3.741e-05
## 773987 1975 F Mary 10964 7.026e-03
## 785585 1975 M Mary 66 4.066e-05
## 790919 1976 F Mary 10325 6.569e-03
## 802864 1976 M Mary 62 3.796e-05
## 808311 1977 F Mary 10661 6.482e-03
## 820842 1977 M Mary 52 3.042e-05
## 826483 1978 F Mary 10050 6.115e-03
## 839052 1978 M Mary 61 3.570e-05
## 844705 1979 F Mary 10552 6.125e-03
## 857750 1979 M Mary 65 3.629e-05
## 863731 1980 F Mary 11473 6.446e-03
## 877164 1980 M Mary 52 2.804e-05
## 883168 1981 F Mary 11037 6.174e-03
## 896442 1981 M Mary 67 3.599e-05
## 902634 1982 F Mary 10849 5.983e-03
## 915993 1982 M Mary 73 3.870e-05
## 922314 1983 F Mary 9894 5.531e-03
## 935553 1983 M Mary 57 3.061e-05
## 941709 1984 F Mary 9288 5.153e-03
## 955074 1984 M Mary 56 2.986e-05
## 961209 1985 F Mary 9238 5.006e-03
## 974815 1985 M Mary 67 3.484e-05
## 981288 1986 F Mary 8504 4.611e-03
## 995533 1986 M Mary 47 2.448e-05
## 1001923 1987 F Mary 8394 4.481e-03
## 1016293 1987 M Mary 69 3.541e-05
## 1023315 1988 F Mary 8508 4.427e-03
## 1038693 1988 M Mary 48 2.399e-05
## 1045673 1989 F Mary 8640 4.338e-03
## 1061435 1989 M Mary 74 3.532e-05
## 1069433 1990 F Mary 8662 4.218e-03
## 1086397 1990 M Mary 45 2.092e-05
## 1094146 1991 F Mary 8756 4.307e-03
## 1111841 1991 M Mary 32 1.510e-05
## 1119248 1992 F Mary 8447 4.215e-03
## 1137604 1992 M Mary 25 1.191e-05
## 1144663 1993 F Mary 8103 4.112e-03
## 1164011 1993 M Mary 18 8.719e-06
## 1170614 1994 F Mary 7741 3.973e-03
## 1192071 1994 M Mary 10 4.908e-06
## 1196612 1995 F Mary 7424 3.865e-03
## 1217026 1995 M Mary 13 6.466e-06
## 1222695 1996 F Mary 6937 3.620e-03
## 1242854 1996 M Mary 15 7.490e-06
## 1249117 1997 F Mary 6623 3.471e-03
## 1270078 1997 M Mary 13 6.511e-06
## 1276081 1998 F Mary 6423 3.315e-03
## 1297425 1998 M Mary 14 6.909e-06
## 1303971 1999 F Mary 6356 3.267e-03
## 1326768 1999 M Mary 11 5.398e-06
## 1332516 2000 F Mary 6179 3.099e-03
## 1356766 2000 M Mary 10 4.792e-06
## 1362281 2001 F Mary 5722 2.891e-03
## 1385756 2001 M Mary 13 6.290e-06
## 1392542 2002 F Mary 5449 2.761e-03
## 1416510 2002 M Mary 12 5.811e-06
## 1423106 2003 F Mary 5003 2.496e-03
## 1449087 2003 M Mary 9 4.287e-06
## 1454281 2004 F Mary 4801 2.382e-03
## 1476293 2004 M Mary 30 1.421e-05
## 1486324 2005 F Mary 4445 2.193e-03
## 1512891 2005 M Mary 10 4.706e-06
## 1518860 2006 F Mary 4080 1.954e-03
## 1546618 2006 M Mary 10 4.567e-06
## 1552938 2007 F Mary 3670 1.737e-03
## 1581349 2007 M Mary 10 4.521e-06
## 1587866 2008 F Mary 3488 1.678e-03
## 1622910 2009 F Mary 3152 1.560e-03
## 1657585 2010 F Mary 2860 1.463e-03
## 1691615 2011 F Mary 2700 1.398e-03
## 1725483 2012 F Mary 2559 1.326e-03
## 1756514 2012 M Mary 6 2.971e-06
## 1759140 2013 F Mary 2602 1.363e-03
attach(x)
5 factors, year (133 levels), sex (2 levels), name (many levels), n (response), prop (many levels)
#view summary statistics
summary(x)
## year sex name n
## Min. :1880 Length:263 Length:263 Min. : 6
## 1st Qu.:1912 Class :character Class :character 1st Qu.: 80
## Median :1945 Mode :character Mode :character Median : 2700
## Mean :1945 Mean :15694
## 3rd Qu.:1978 3rd Qu.:16534
## Max. :2013 Max. :73981
## prop
## Min. :0.00000
## 1st Qu.:0.00016
## Median :0.00140
## Mean :0.01727
## 3rd Qu.:0.04294
## Max. :0.07238
n (int) is the number of babies with a particular name prop (num) is n/total # of applicants in that year ### Response variables n is the number of each particular name ### The Data: How is it organized and what does it look like? The data are tabluated into 5 columns, with little missing data. Variables are numeric, characters, and integers ### Randomization The data were collected by the US Social Security Administration each year from 1880 to 2013
Since the dataset is so large, it will need to be subsetted in order to make computational analysis feasible. Analysis of Variance will be used to determine if the factors sex or year have an effect on the number of baby names. Factor interactions and blocking will also be considered.
### What is the rationale for this design? It is possible that individual factors by themselves may have an effect on the number, and it is also possible that the combination of factors may have an effect on the number. ### Randomize: What is the Randomization Scheme? The data were collected by the US Social Security Administration each year from 1880 to 2013 ### Replicate: Are there replicates and/or repeated measures? There are replicates of each name and there are repeated measures, as the SSA received birth certificates every year
### Block: Did you use blocking in the design? Yes, one model was performed with blocking to determine if each of the factors had an effect by themselves.
par(mfrow=c(1,1))
boxplot(n~year)
boxplot(n~sex)
### Testing
#convert to factors
x$name=as.factor(x$year)
x$sex=as.factor(x$sex)
#run analysis of variance for individual factor effects
model1=(aov(n~year,data=x))
anova(model1)
## Analysis of Variance Table
##
## Response: n
## Df Sum Sq Mean Sq F value Pr(>F)
## year 1 2.03e+09 2.03e+09 3.68 0.056 .
## Residuals 261 1.44e+11 5.52e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#based on year alone, we fail to reject the H0 that year has no effect on n for all babies named #Mary.
model2=(aov(n~sex, data=x))
anova(model2)
## Analysis of Variance Table
##
## Response: n
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 6.14e+10 6.14e+10 189 <2e-16 ***
## Residuals 261 8.46e+10 3.24e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#based on sex alone, we reject the H0 that sex has no effect on n for all babies named Mary.
#run analysis of variance using blocking
block=aov(n~year+sex, data=x)
anova(block)
## Analysis of Variance Table
##
## Response: n
## Df Sum Sq Mean Sq F value Pr(>F)
## year 1 2.03e+09 2.03e+09 6.45 0.012 *
## sex 1 6.22e+10 6.22e+10 197.74 <2e-16 ***
## Residuals 260 8.18e+10 3.15e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#in this particular model, we reject the H0 that each factor independently has no effect on n for all babies named Mary. Both year and sex have an effect on the n for all babies named Mary.
#run analysis of variance with interaction
interaction=aov(n~year*sex, data=x)
anova(interaction)
## Analysis of Variance Table
##
## Response: n
## Df Sum Sq Mean Sq F value Pr(>F)
## year 1 2.03e+09 2.03e+09 6.62 0.0106 *
## sex 1 6.22e+10 6.22e+10 203.18 <2e-16 ***
## year:sex 1 2.50e+09 2.50e+09 8.15 0.0046 **
## Residuals 259 7.93e+10 3.06e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#when considering interaction, all factors jointly have an effect on n for all babies named Mary.
Based on these results from the multiple analyses of variance, we reject the H0 that sex and the year 1880 have no effect on the number of babies with each specific name. Said differently, variation in the number of babies names can be explained by something other than randomization. However, we fail to reject the H0 based on the interaction effect between sex and year on the response variable n. Finally, we must check the normality to ensure these results are valid.
# Shapiro-Wilk test of normality. Adequate if p < 0.1
shapiro.test(year[1:4999])
##
## Shapiro-Wilk normality test
##
## data: year[1:4999]
## W = 0.9561, p-value = 3.82e-07
One of the primary assumptions of the t-test is that the data are normally distributed, and since they are not, the results from the analysis of variance are essentially invalidated. ### Diagnostics/Model Adequacy Checking Describe
qqnorm(residuals(interaction))
interaction.plot(n,sex,year)
plot(fitted(interaction),residuals(interaction))
??babynames
library(“babynames”, lib.loc=“~/R/win-library/3.1”)