Base graphics
In-class exercises:
1.Run the R script to see what happens. Change the ‘df’ parameter to a slightly larger integer and do it again. What statistical concept does this script illustrate?
##X是向量設定為截距為-pi*2, 2*pi斜率為 .05
x <- seq(-pi*2, 2*pi, .05)
##Z設為X的常態曲線
z <- dnorm(x)
##y是X的T分配,自由度為3
y <- dt(x, df=3)
##使用X與Z的設定畫圖,X軸設為Standard unit,Y軸設為Density,bty是決定圖形邊框的
plot(x, z, type="l", bty="L", xlab="Standard unit", ylab="Density")
##畫x, rev(x)以及y, rev(z)的陰影圖形並設定為藍色
polygon(c(x, rev(x)), c(y, rev(z)), col='aliceblue')
##多加一條以x,與y畫成的線,設定為藍色
lines(x, y, col='cadetblue')##把自由度,T分配越集中,陰影面積就變小
x <- seq(-pi*2, 2*pi, .05)
z <- dnorm(x)
y <- dt(x, df=9)
plot(x, z, type="l", bty="L", xlab="Standard unit", ylab="Density")
polygon(c(x, rev(x)), c(y, rev(z)), col='aliceblue')
lines(x, y, col='cadetblue')2.Doll (1955) showed per capita consumption of cigarettes in 11 countries in 1930, and the death rates from lung cancer for men in 1950. Use R base graphics to generate the plot shown below. Source: Freedman, et al. (1997). Statistics. pp. 148-150.
Column 1: Country names Column 2: Cigarettes consumption (per million) Column 3: Death rate (per capita)
##開檔案
dta <- read.table("C:/Users/boss/Desktop/data_management/cigarettes.txt", header=T , stringsAsFactor=F, fill=T)
##看一下檔案
head(dta) Country consumption death
1 Australia 480 180
2 Canada 500 150
3 Denmark 380 170
4 Finland 1100 350
5 UK 1100 460
6 Iceland 230 60
'data.frame': 11 obs. of 3 variables:
$ Country : chr "Australia" "Canada" "Denmark" "Finland" ...
$ consumption: int 480 500 380 1100 1100 230 490 250 300 510 ...
$ death : int 180 150 170 350 460 60 240 90 110 250 ...
##把國家當子設定分類畫圖
dta3 <- subset(dta, Country=='Country')
with(dta, plot(consumption, death,
xlab="consumption(per million)",
ylab="Death rate(per capita)",
pch=" ",
main="Lung Cancer and Cigarette Consumption"))
##畫death與consumption的相關線
m0 <- lm(death ~ consumption, data=dta)
abline(m0, lty=2)
##把國家明貼上去,設定字型大小、位置
text(dta$consumption, dta$death, dta$Country,
cex = 1,
adj = -0.001)
##加格線
grid()3.Use R base graphics to create the national flag of Denmark :
##畫圖:把邊界設0,背景調紅色
par(mar = c(0, 0, 0, 0),
bg="red")
##定義圖的X軸與Y軸的欄位
plot(0:6, 0:6, xlim = c(0,6), ylim = c (0,6), type = "n")
##畫兩個白色長方形(不知道為什麼欄位要設成負的才能碰到邊)
rect(xleft = c(-1, 1.5), ybottom = c(2.5, -1), xright = c(7, 2.3), ytop = c(3.5, 7), col = c("white", "white"), border = c("white", "white"))4.Run the R script to see what happens first and then explain how the effect is achieved by the script.
#將n設為60,t設為0~2*pi的數列將項數分做60份,x為sin的t,y為cos的t
n <- 60
t <- seq(0, 2*pi, length=n)
x <- sin(t)
y <- cos(t)
#將繪圖區域設定為維持正方形
par(pty = "s")
#for為迴圈指令,i會依序帶入1~n的值,重複進行括號內的程式碼
#程式碼為:畫圖,點為三角函數減去1到n,Sys.sleep(x)代表延遲x秒
for (i in 1:n) {
plot.new()
plot.window(c(-1, 1), c(-1, 1))
lines(x*y, -y, col="gray")
points(x[i]*y[i], -y[i], pch=16,
col=gray((i-1)/(n+1)))
Sys.sleep(.05)}5.Draw a pie chart to represent 50 shades of gray. Hint: Use ‘?gray’ to examine the gray level specification documented for the gray{grDevices}. Use ‘?pie’ to study the function for making pie charts documented for pie{graphics}.
Exercises:
1.This R script illustrates how to split the plot region to include histograms on the margins of a scatter diagram using the Galton{HistData} data set. Compile it as a html document with comments on each code chunk.
# Galton's data on the heights of parents and their children
#安裝這個封包
#(我把他刪掉了,有這句就無法knit)
#將dta設為 HistData::Galton
dta <- HistData::Galton
#matrix() 函數並指定參數 nrow = 2 將一維的數字向量(1 到 6)轉換成一個 2x3 的矩陣
zones <- matrix(c(2, 0, 1, 3), ncol=2, byrow=TRUE)
##layout()是矩陣,數字代表畫圖的顺序;”0”代表空缺,不畫圖形
layout(zones, widths=c(4/5, 1/5), heights = c(1/5, 4/5))
xh <- with(dta, hist(parent, plot=FALSE))
yh <- with(dta, hist(child, plot=FALSE))
##max()最大值
ub <- max(c(xh$counts, yh$counts))
par(mar=c(3, 3, 1, 1))
##sunflowerplot散點圖中的每個點對應一個(x, y),如果同一對出現多次,點會重疊
with(dta, sunflowerplot(parent, child))
par(mar=c(0, 3, 1, 1))
##barplot()繪製bar chart,barplot()可以設定繪製水平與垂直圖形
barplot(xh$counts, axes=FALSE, ylim=c(0, ub), space=0)
par(mar=c(3, 0, 1, 1))
barplot(yh$counts, axes=FALSE, xlim=c(0, ub), space=0, horiz=TRUE)
par(oma=c(3, 3, 0, 0))
##mtext()可在現有圖表的四個邊緣之一加上文字
mtext("Average height of parents (in inch)", side=1, line=2,
outer=TRUE, adj=0,
at=.4 * (mean(dta$parent) - min(dta$parent))/(diff(range(dta$parent))))
mtext("Height of child (in inch)", side=2, line=2,
outer=TRUE, adj=0,
at=.4 * (mean(dta$child) - min(dta$child))/(diff(range(dta$child))))2.Deaths per 100,000 from male suicides for 5 age groups and 15 countries are given in the table below. The data set is available as suicides2{HSAUR3}. Construct side-by-side box plots for the data from different age groups and comment briefly.
#開檔案
library(HSAUR3)
pacman::p_load(HSAUR3)
data("suicides2", package="HSAUR3")
dta <- HSAUR3::suicides2
#看檔案
head(dta) A25.34 A35.44 A45.54 A55.64 A65.74
Canada 22 27 31 34 24
Israel 9 19 10 14 27
Japan 22 19 21 31 49
Austria 29 40 52 53 69
France 16 25 36 47 56
Germany 28 35 41 49 52
[1] "A25.34" "A35.44" "A45.54" "A55.64" "A65.74"
[1] "25-34" "35-44" "45-54" "55-64" "65-74"
#將排的第一個命名為1,使其1~n如此排列
row.names= 1
#stack函數將原先多個直行向量,轉換成單一向量進行標示。
dta1<-stack(dta)
#查看轉換狀況
head(dta1) values ind
1 22 25-34
2 9 25-34
3 22 25-34
4 29 25-34
5 16 25-34
6 28 25-34
#畫圖
boxplot( values ~ ind ,
data=dta1,
horizontal=T,
varwidth=T,
cex.axis=.6,
xlab='suicides',
ylab="Age")3.The R script illustrates how to implement ‘small multiples’ in base graphics given the 4 different diets of the ChickWeight{datasets} example. Adapt the script to produce a plot of 5 panels in which each panel shows a histogram of IQ for each of 5 classes with over 30 pupils in the nlschools{MASS} dataset.
'data.frame': 2287 obs. of 6 variables:
$ lang : int 46 45 33 46 20 30 30 57 36 36 ...
$ IQ : num 15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
$ class: Factor w/ 133 levels "180","280","1082",..: 1 1 1 1 1 1 1 1 1 1 ...
$ GS : int 29 29 29 29 29 29 29 29 29 29 ...
$ SES : int 23 10 15 23 10 10 23 10 13 15 ...
$ COMB : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
lang IQ class GS SES COMB
1 46 15.0 180 29 23 0
2 45 14.5 180 29 10 0
3 33 9.5 180 29 15 0
4 46 11.0 180 29 23 0
5 20 8.0 180 29 10 0
6 30 9.5 180 29 10 0
lang IQ class GS SES COMB
38 23 9.5 1280 31 27 1
39 43 10.5 1280 31 33 1
40 25 8.5 1280 31 33 1
41 24 7.5 1280 31 28 1
42 41 11.0 1280 31 37 1
43 32 10.5 1280 31 30 1
4.Use the dataset to replicate the plot below
library(tidyverse)
dta <- read.table("C:/Users/boss/Desktop/data_management/sat_gpa.txt", header=T , stringsAsFactor=F, fill=T)
str(dta)'data.frame': 6 obs. of 5 variables:
$ College: chr "Barnard" "Northwestern" "Bowdoin" "Colby" ...
$ SAT_No : int 1210 1243 1200 1220 1237 1233
$ GPA_No : num 3.08 3.1 2.85 2.9 2.7 2.62
$ SAT_Yes: int 1317 1333 1312 1280 1308 1287
$ GPA_Yes: num 3.3 3.24 3.12 3.04 2.94 2.8
College SAT_No GPA_No SAT_Yes GPA_Yes
1 Barnard 1210 3.08 1317 3.30
2 Northwestern 1243 3.10 1333 3.24
3 Bowdoin 1200 2.85 1312 3.12
4 Colby 1220 2.90 1280 3.04
5 Carnegie Mellon 1237 2.70 1308 2.94
6 Georgia Tech 1233 2.62 1287 2.80
dta_ml <- dta %>%
dplyr::filter(SAT_No=='SAT_No', SAT_Yes=='SAT_Yes') %>%
##這裡factor叫我只能設0,但是這樣沒東西能跑出來
dplyr::transmute(SAT_No=SAT_No, SAT_Yes=SAT_Yes, College=factor(0:n()))
n <- length(dta_ml$College)
with(dta_ml, plot(rep(1, n), SAT_No, axes=F,
xlim=c(0, 3),
ylim=range(2.6, 3.4)+c(-.5, .5),
xlab="SAT(V+M)",
ylab="First Year GPA",
panel.first=abline(h=seq(17, 23, 1),
col="grey80",
lty=3)))
with(dta_ml, points(rep(2, n), SAT_Yes))
axis(2)
axis(1, at=1:2,
labels=c("SAT_No","SAT_Yes"))
with(dta_ml,
segments(rep(1, n), SAT_No,
rep(2, n), SAT_Yes,
lty=3, lwd=2,
col="darkgray"))
with(dta_ml, text(.75, SAT_Yes, labels="College", cex = 1))5.Use the free recall data to replicate the figure
library(tidyverse)
library(readr)
dta <- read_csv("C:/Users/boss/Desktop/data_management/nobel_winners.txt")
str(dta)Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 1 variable:
$ Name Gender Year: chr "Patrick Modiano\" Male 2014 \n\"Bertrand Russell\" Male 1950\n\"Kazuo Ishiguro\" Male 2017\n\"Bob "| __truncated__
- attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 14 obs. of 5 variables:
..$ row : int 1 1 1 1 1 1 1 1 1 1 ...
..$ col : chr "Name Gender Year" "Name Gender Year" "Name Gender Year" "Name Gender Year" ...
..$ expected: chr "delimiter or quote" "delimiter or quote" "delimiter or quote" "delimiter or quote" ...
..$ actual : chr " " "B" " " "K" ...
..$ file : chr "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" ...
- attr(*, "spec")=
.. cols(
.. `Name Gender Year` = col_character()
.. )
# A tibble: 1 x 1
`Name Gender Year`
<chr>
1 "Patrick Modiano\" Male 2014 \n\"Bertrand Russell\" Male 1950\n\"~
[1] "Name Gender Year"
Trellis graphics
In-class exercises:
1.Render the R script for replicating figures in Chapter 4 of Lattice: Multivariate Data Visualization with R (Sarkar, D. 2008) to html document with comments at each code chunk indicated by ‘##’.
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0
[1] "matrix"
[1] dotplot.array* dotplot.default* dotplot.formula* dotplot.matrix*
[5] dotplot.numeric* dotplot.table*
see '?methods' for accessing help and source code
#把上面那張圖以直的方式畫#
dotplot(VADeaths, groups=FALSE,
layout=c(1, 4),
aspect=0.7,
origin=0,
type=c("p", "h"),
main="Death Rates in Virginia - 1940",
xlab="Rate (per 1000)")#畫曲線圖#
dotplot(VADeaths, type="o",
auto.key=list(lines=TRUE, space="right"),
main="Death Rates in Virginia - 1940",
xlab="Rate (per 1000)")#與第二張圖一樣,但是裡面換成長條圖#
barchart(VADeaths, groups=FALSE,
layout=c(1, 4),
aspect=0.7,
reference=FALSE,
main="Death Rates in Virginia - 1940",
xlab="Rate (per 100)")#開lattiveExtra與postdoc檔案#
data(postdoc, package="latticeExtra")
#畫長條分色圖,但是不易看出那一個專業領域絕對值的大小#
barchart(prop.table(postdoc, margin=1),
xlab="Proportion",
auto.key=list(adj=1))#畫成依照工作狀態分類,並將各種專業領域做比較#
dotplot(prop.table(postdoc, margin=1),
groups=FALSE,
xlab="Proportion",
par.strip.text=list(abbreviate=TRUE, minlength=10))#畫前一張,但是直行排列另外可做比較#
dotplot(prop.table(postdoc, margin=1),
groups=FALSE,
index.cond=function(x, y) median(x),
xlab="Proportion",
layout=c(1, 5),
aspect=0.6,
scales=list(y=list(relation="free", rot=0)),
prepanel=function(x, y) {
list(ylim=levels(reorder(y, x)))
},
panel=function(x, y, ...) {
panel.dotplot(x, reorder(y, x), ...)
})#開mlmRev以及Chem97這個檔案#
data(Chem97, package="mlmRev")
#把gcsescore.tab定義為將Chem97裡gcsescore 與 gender分別當下X軸與上X軸#
gcsescore.tab <- xtabs(~ gcsescore + gender, Chem97)
#將gcsescore.df定義為用來gcsescore.tab儲存類似 Excel 表格的變數類型,它跟矩陣類似,不過 data frame 的每個行(column)可以儲存不同變數類型的資料,甚至非狀巢結構的列表亦可#
gcsescore.df <- as.data.frame(gcsescore.tab)
#將gcsescore.df$gcsescore定義為數字類型的as.character(gcsescore.df$gcsescore)#
gcsescore.df$gcsescore <- as.numeric(as.character(gcsescore.df$gcsescore))
#畫成長條圖,以gcsescore | gender當頻率,X軸的樣式是h,以第1、2欄的資料作呈現#
xyplot(Freq ~ gcsescore | gender,
data = gcsescore.df,
type="h",
layout=c(1, 2),
xlab="Average GCSE Score")#將score.tab定義為score + gender的交叉表#
score.tab <- xtabs(~score + gender, Chem97)
#將score.df定義為score.tab轉換成data frame的形式#
score.df <- as.data.frame(score.tab)
#就gender、score.df畫圖#
barchart(Freq ~ score | gender, score.df, origin=0)2.Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:
#開檔案
library(Ecdat)
pacman::p_load(Ecdat)
data("Caschool", package="Ecdat")
dta <- Ecdat::Caschool
#看檔案長甚麼樣子
head(dta) distcod county district grspan enrltot teachers
1 75119 Alameda Sunol Glen Unified KK-08 195 10.90
2 61499 Butte Manzanita Elementary KK-08 240 11.15
3 61549 Butte Thermalito Union Elementary KK-08 1550 82.90
4 61457 Butte Golden Feather Union Elementary KK-08 243 14.00
calwpct mealpct computer testscr compstu expnstu str avginc elpct
1 0.5102 2.0408 67 690.8 0.3435898 6384.911 17.88991 22.690 0.000000
2 15.4167 47.9167 101 661.2 0.4208333 5099.381 21.52466 9.824 4.583333
3 55.0323 76.3226 169 643.6 0.1090323 5501.955 18.69723 8.978 30.000002
4 36.4754 77.0492 85 647.7 0.3497942 7101.831 17.35714 8.978 0.000000
readscr mathscr
1 691.6 690.0
2 660.5 661.9
3 636.3 650.9
4 651.9 643.5
[ reached 'max' / getOption("max.print") -- omitted 2 rows ]
#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#分割資料
dta<- dta%>%
mutate(ratio = enrltot/teachers,
Reading = cut(readscr, breaks = quantile(readscr, probs = c(0, .33, .67, 1)),
label = c("L", "M", "H"), ordered = T))
#確定分割成功
head(dta) distcod county district grspan enrltot teachers calwpct
1 75119 Alameda Sunol Glen Unified KK-08 195 10.90 0.5102
2 61499 Butte Manzanita Elementary KK-08 240 11.15 15.4167
3 61549 Butte Thermalito Union Elementary KK-08 1550 82.90 55.0323
mealpct computer testscr compstu expnstu str avginc elpct readscr
1 2.0408 67 690.8 0.3435898 6384.911 17.88991 22.690 0.000000 691.6
2 47.9167 101 661.2 0.4208333 5099.381 21.52466 9.824 4.583333 660.5
3 76.3226 169 643.6 0.1090323 5501.955 18.69723 8.978 30.000002 636.3
mathscr ratio Reading
1 690.0 17.88991 H
2 661.9 21.52466 M
3 650.9 18.69723 L
[ reached 'max' / getOption("max.print") -- omitted 3 rows ]
#畫圖
xyplot(readscr~ratio|Reading,data=dta,
type=c("p","g","r"),layout=c(3,1),
xlab="Student-Teacher Ratio",
ylab="Reading Score")3.The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.
#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#開檔案
dta<- read.table("C:/Users/boss/Desktop/data_management/beautyCourseEval.txt", h = T)
#看檔案
head(dta) eval beauty sex age minority tenure courseID
1 4.3 0.2015666 1 36 1 0 3
2 4.5 -0.8260813 0 59 0 1 0
3 3.7 -0.6603327 0 51 0 1 4
4 4.3 -0.7663125 1 40 0 1 2
5 4.4 1.4214450 1 31 0 0 0
6 4.2 0.5002196 0 62 0 1 0
#畫回歸圖並合併
lattice::xyplot(eval ~ beauty | courseID, type = c("p", "g", "r"), data =dta , auto.key = list(columns = 2), xlab = "eval", ylab = "beauty")#畫散點矩陣圖並合併
splom(~ dta[,c("eval", "minority", "tenure", "beauty")] ,groups = courseID,
data=dta,
pch='.',
axis.text.cex=0.3,
par.settings=standard.theme(color=FALSE))#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#開檔案
dta<- read.table("C:/Users/boss/Desktop/data_management/brainsize.txt", h = T)
#看檔案
head(dta) Sbj Gender FSIQ VIQ PIQ Weight Height MRICount
1 1 Female 133 132 124 118 64.5 816932
2 2 Male 140 150 124 NA 72.5 1001121
3 3 Male 139 123 150 143 73.3 1038437
4 4 Male 133 129 128 172 68.8 965353
5 5 Female 137 132 134 147 65.0 951545
6 6 Female 99 90 110 146 69.0 928799
#畫散佈圖
stripplot(FSIQ ~ PIQ | Gender,
data=dta,
pch=1,
cex=.5,
alpha=.5,
type=c('g','p'),
jitter.data=TRUE,
xlab="VIQ",
ylab='PIQ',
auto.key=list(space="top",
columns=4),
par.settings=standard.theme(color=FALSE))
Welch Two Sample t-test
data: dta$FSIQ by dta$Gender
t = -0.40267, df = 37.892, p-value = 0.6895
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-18.68639 12.48639
sample estimates:
mean in group Female mean in group Male
111.9 115.0
Welch Two Sample t-test
data: dta$VIQ by dta$Gender
t = -0.77262, df = 36.973, p-value = 0.4447
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-21.010922 9.410922
sample estimates:
mean in group Female mean in group Male
109.45 115.25
Welch Two Sample t-test
data: dta$PIQ by dta$Gender
t = -0.1598, df = 37.815, p-value = 0.8739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.72079 13.42079
sample estimates:
mean in group Female mean in group Male
110.45 111.60
#使用qq-plot畫圖
qqmath( Weight~ Height | Gender,
aspect="xy",
data=dta,
type=c('p','g'),
prepanel=prepanel.qqmathline,
panel=function(x, ...) {
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
},
par.settings=standard.theme(color=FALSE))
Call:
lm(formula = dta$Weight ~ dta$Gender, data = dta, na.action = na.omit)
Residuals:
Min 1Q Median 3Q Max
-34.444 -15.383 3.678 13.306 37.800
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 137.200 4.132 33.203 < 2e-16 ***
dta$GenderMale 29.244 6.004 4.871 2.23e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 18.48 on 36 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.3972, Adjusted R-squared: 0.3805
F-statistic: 23.73 on 1 and 36 DF, p-value: 2.227e-05
Call:
lm(formula = dta$Height ~ dta$Gender, data = dta, na.action = na.omit)
Residuals:
Min 1Q Median 3Q Max
-5.132 -2.432 0.235 2.152 5.568
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.7650 0.6298 104.42 < 2e-16 ***
dta$GenderMale 5.6666 0.9023 6.28 2.62e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.816 on 37 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.516, Adjusted R-squared: 0.5029
F-statistic: 39.44 on 1 and 37 DF, p-value: 2.624e-07
#使用xyplot來看性別、大腦size、智商有沒有顯著影響
xyplot(FSIQ ~ MRICount | Gender,
data=dta,
type="smooth",
panel=function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.grid(h=-1,
v=-1,
col="gray80",
lty=3, ...)
panel.average(x, y, fun=mean,
horizontal=FALSE,
col='gray', ...)},
par.settings=standard.theme(color=FALSE))##不同性別與大腦size有顯著影響力,但性別與智力沒有影響
summary(lm(dta$MRICount ~ dta$Gender, data=dta, na.action = na.omit))
Call:
lm(formula = dta$MRICount ~ dta$Gender, data = dta, na.action = na.omit)
Residuals:
Min 1Q Median 3Q Max
-74868 -34593 -7290 20014 128650
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 862655 12500 69.011 < 2e-16 ***
dta$GenderMale 92201 17678 5.216 6.76e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 55900 on 38 degrees of freedom
Multiple R-squared: 0.4172, Adjusted R-squared: 0.4019
F-statistic: 27.2 on 1 and 38 DF, p-value: 6.758e-06
Call:
lm(formula = dta$FSIQ ~ dta$Gender, data = dta, na.action = na.omit)
Residuals:
Min 1Q Median 3Q Max
-35.00 -24.18 3.55 23.32 29.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 111.900 5.444 20.556 <2e-16 ***
dta$GenderMale 3.100 7.699 0.403 0.689
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 24.34 on 38 degrees of freedom
Multiple R-squared: 0.004249, Adjusted R-squared: -0.02196
F-statistic: 0.1621 on 1 and 38 DF, p-value: 0.6894
Exercises:
1.Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.
#開工具
library(lattice)
library(tidyverse)
library(magrittr)
library(readr)
dta <- read_csv("C:/Users/boss/Desktop/data_management/nlsy86long.csv")
str(dta)Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 664 obs. of 9 variables:
$ id : num 2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
$ sex : chr "Female" "Female" "Female" "Male" ...
$ race : chr "Majority" "Majority" "Majority" "Majority" ...
$ time : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade: num 0 0 0 0 1 0 0 0 0 0 ...
$ year : num 6 6 6 5 7 5 6 7 6 6 ...
$ month: num 67 66 67 60 78 62 66 79 76 67 ...
$ math : num 14.29 20.24 17.86 7.14 29.76 ...
$ read : num 19.05 21.43 21.43 7.14 30.95 ...
- attr(*, "spec")=
.. cols(
.. id = col_double(),
.. sex = col_character(),
.. race = col_character(),
.. time = col_double(),
.. grade = col_double(),
.. year = col_double(),
.. month = col_double(),
.. math = col_double(),
.. read = col_double()
.. )
2.Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.
Column 1: Weight in grams Column 2: Height in mms Column 3: Head to butt length in mms Column 4: Head circumference in mms Column 5: Chest circumference in mms Column 6: Arm length in mms Column 7: Calf length in mms Column 8: Pelvis circumference in mms Column 9: Age in years Column 10: Girl ID
3.Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.
Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”; the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”; the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”; and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.
4.Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.