Base graphics & Trellis graphics Exercise

Base graphics

In-class exercises:

1.Run the R script to see what happens. Change the ‘df’ parameter to a slightly larger integer and do it again. What statistical concept does this script illustrate?

##X是向量設定為截距為-pi*2, 2*pi斜率為 .05
x <- seq(-pi*2, 2*pi, .05)
##Z設為X的常態曲線
z <- dnorm(x)
##y是X的T分配，自由度為3
y <- dt(x, df=3)
##使用X與Z的設定畫圖，X軸設為Standard unit，Y軸設為Density，bty是決定圖形邊框的
plot(x, z, type="l", bty="L", xlab="Standard unit", ylab="Density")
##畫x, rev(x)以及y, rev(z)的陰影圖形並設定為藍色
polygon(c(x, rev(x)), c(y, rev(z)), col='aliceblue')
##多加一條以x,與y畫成的線，設定為藍色
lines(x, y, col='cadetblue')

##把自由度，T分配越集中，陰影面積就變小
x <- seq(-pi*2, 2*pi, .05)
z <- dnorm(x)
y <- dt(x, df=9)
plot(x, z, type="l", bty="L", xlab="Standard unit", ylab="Density")
polygon(c(x, rev(x)), c(y, rev(z)), col='aliceblue')
lines(x, y, col='cadetblue')

2.Doll (1955) showed per capita consumption of cigarettes in 11 countries in 1930, and the death rates from lung cancer for men in 1950. Use R base graphics to generate the plot shown below. Source: Freedman, et al. (1997). Statistics. pp. 148-150.

Column 1: Country names Column 2: Cigarettes consumption (per million) Column 3: Death rate (per capita)

##開檔案
dta <- read.table("C:/Users/boss/Desktop/data_management/cigarettes.txt", header=T , stringsAsFactor=F, fill=T)
##看一下檔案
head(dta)

    Country consumption death
1 Australia         480   180
2    Canada         500   150
3   Denmark         380   170
4   Finland        1100   350
5        UK        1100   460
6   Iceland         230    60

##確定檔案的組成
str(dta)

'data.frame':   11 obs. of  3 variables:
 $ Country    : chr  "Australia" "Canada" "Denmark" "Finland" ...
 $ consumption: int  480 500 380 1100 1100 230 490 250 300 510 ...
 $ death      : int  180 150 170 350 460 60 240 90 110 250 ...

##把國家當子設定分類畫圖
dta3 <- subset(dta, Country=='Country')
with(dta, plot(consumption, death, 
                xlab="consumption(per million)", 
                ylab="Death rate(per capita)", 
                pch=" ",
                main="Lung Cancer and Cigarette Consumption"))
##畫death與consumption的相關線
m0 <- lm(death ~ consumption, data=dta)
abline(m0, lty=2)
##把國家明貼上去，設定字型大小、位置
text(dta$consumption, dta$death, dta$Country, 
     cex = 1,
     adj = -0.001)
##加格線
grid()

3.Use R base graphics to create the national flag of Denmark :

##畫圖：把邊界設0，背景調紅色
par(mar = c(0, 0, 0, 0), 
          bg="red")
##定義圖的X軸與Y軸的欄位
plot(0:6, 0:6, xlim = c(0,6), ylim = c (0,6), type = "n")
##畫兩個白色長方形(不知道為什麼欄位要設成負的才能碰到邊)
rect(xleft = c(-1, 1.5), ybottom = c(2.5, -1), xright = c(7, 2.3), ytop = c(3.5, 7), col = c("white", "white"), border = c("white", "white"))

4.Run the R script to see what happens first and then explain how the effect is achieved by the script.

#將n設為60，t設為0~2*pi的數列將項數分做60份，x為sin的t，y為cos的t
n <- 60
t <- seq(0, 2*pi, length=n)
x <- sin(t)
y <- cos(t)
#將繪圖區域設定為維持正方形
par(pty = "s")
#for為迴圈指令，i會依序帶入1~n的值，重複進行括號內的程式碼
#程式碼為：畫圖，點為三角函數減去1到n，Sys.sleep(x)代表延遲x秒
for (i in 1:n) {
  plot.new()
  plot.window(c(-1, 1), c(-1, 1))
  lines(x*y, -y, col="gray")
  points(x[i]*y[i], -y[i], pch=16,
  col=gray((i-1)/(n+1)))
  Sys.sleep(.05)}

###

5.Draw a pie chart to represent 50 shades of gray. Hint: Use ‘?gray’ to examine the gray level specification documented for the gray{grDevices}. Use ‘?pie’ to study the function for making pie charts documented for pie{graphics}.

pie(rep(1, 50), col = gray(0:8 / 8), radius = 1)

Exercises:

1.This R script illustrates how to split the plot region to include histograms on the margins of a scatter diagram using the Galton{HistData} data set. Compile it as a html document with comments on each code chunk.

# Galton's data on the heights of parents and their children
#安裝這個封包
#(我把他刪掉了，有這句就無法knit)
#將dta設為 HistData::Galton
dta <- HistData::Galton
#matrix() 函數並指定參數 nrow = 2 將一維的數字向量（1 到 6）轉換成一個 2x3 的矩陣
zones <- matrix(c(2, 0, 1, 3), ncol=2, byrow=TRUE)
##layout()是矩陣，數字代表畫圖的顺序；”0”代表空缺，不畫圖形
layout(zones, widths=c(4/5, 1/5), heights = c(1/5, 4/5))

xh <- with(dta, hist(parent, plot=FALSE))

yh <- with(dta, hist(child, plot=FALSE))
##max()最大值
ub <- max(c(xh$counts, yh$counts))

par(mar=c(3, 3, 1, 1))
##sunflowerplot散點圖中的每個點對應一個(x, y)，如果同一對出現多次，點會重疊
with(dta, sunflowerplot(parent, child))

par(mar=c(0, 3, 1, 1))
##barplot()繪製bar chart，barplot()可以設定繪製水平與垂直圖形
barplot(xh$counts, axes=FALSE, ylim=c(0, ub), space=0)

par(mar=c(3, 0, 1, 1))

barplot(yh$counts, axes=FALSE, xlim=c(0, ub), space=0, horiz=TRUE)

par(oma=c(3, 3, 0, 0))
##mtext()可在現有圖表的四個邊緣之一加上文字
mtext("Average height of parents (in inch)", side=1, line=2, 
      outer=TRUE, adj=0, 
      at=.4 * (mean(dta$parent) - min(dta$parent))/(diff(range(dta$parent))))
mtext("Height of child (in inch)", side=2, line=2, 
      outer=TRUE, adj=0,
      at=.4 * (mean(dta$child) - min(dta$child))/(diff(range(dta$child))))

2.Deaths per 100,000 from male suicides for 5 age groups and 15 countries are given in the table below. The data set is available as suicides2{HSAUR3}. Construct side-by-side box plots for the data from different age groups and comment briefly.

#開檔案
library(HSAUR3)
pacman::p_load(HSAUR3)
data("suicides2", package="HSAUR3")
dta <- HSAUR3::suicides2
#看檔案
head(dta)

        A25.34 A35.44 A45.54 A55.64 A65.74
Canada      22     27     31     34     24
Israel       9     19     10     14     27
Japan       22     19     21     31     49
Austria     29     40     52     53     69
France      16     25     36     47     56
Germany     28     35     41     49     52

names(dta)

[1] "A25.34" "A35.44" "A45.54" "A55.64" "A65.74"

#重新命名
names(dta)[c(1:5)] <- c("25-34", "35-44", "45-54", "55-64","65-74")
names(dta)

[1] "25-34" "35-44" "45-54" "55-64" "65-74"

#將排的第一個命名為1，使其1~n如此排列
row.names= 1
#stack函數將原先多個直行向量，轉換成單一向量進行標示。
dta1<-stack(dta)
#查看轉換狀況
head(dta1)

  values   ind
1     22 25-34
2      9 25-34
3     22 25-34
4     29 25-34
5     16 25-34
6     28 25-34

#畫圖
boxplot( values ~ ind , 
         data=dta1, 
         horizontal=T, 
         varwidth=T,
         cex.axis=.6,
         xlab='suicides',
         ylab="Age")

3.The R script illustrates how to implement ‘small multiples’ in base graphics given the 4 different diets of the ChickWeight{datasets} example. Adapt the script to produce a plot of 5 panels in which each panel shows a histogram of IQ for each of 5 classes with over 30 pupils in the nlschools{MASS} dataset.

#開檔案
library(MASS)
dta <- nlschools
#確定檔案
str(dta)

'data.frame':   2287 obs. of  6 variables:
 $ lang : int  46 45 33 46 20 30 30 57 36 36 ...
 $ IQ   : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
 $ class: Factor w/ 133 levels "180","280","1082",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ GS   : int  29 29 29 29 29 29 29 29 29 29 ...
 $ SES  : int  23 10 15 23 10 10 23 10 13 15 ...
 $ COMB : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

head(dta)

  lang   IQ class GS SES COMB
1   46 15.0   180 29  23    0
2   45 14.5   180 29  10    0
3   33  9.5   180 29  15    0
4   46 11.0   180 29  23    0
5   20  8.0   180 29  10    0
6   30  9.5   180 29  10    0

#分割資料
dta_class <- split(dta, dta$class)
#篩選資料：班級學生數>30
dta<- dta[dta$GS>=30,]
head(dta)

   lang   IQ class GS SES COMB
38   23  9.5  1280 31  27    1
39   43 10.5  1280 31  33    1
40   25  8.5  1280 31  33    1
41   24  7.5  1280 31  28    1
42   41 11.0  1280 31  37    1
43   32 10.5  1280 31  30    1

#使用lattice呈現
require(lattice)
#畫圖
histogram(x= ~ IQ | class, data=dta, xlab="class",layout=c(5,1))

4.Use the dataset to replicate the plot below

library(tidyverse)
dta <- read.table("C:/Users/boss/Desktop/data_management/sat_gpa.txt", header=T , stringsAsFactor=F, fill=T)
str(dta)

'data.frame':   6 obs. of  5 variables:
 $ College: chr  "Barnard" "Northwestern" "Bowdoin" "Colby" ...
 $ SAT_No : int  1210 1243 1200 1220 1237 1233
 $ GPA_No : num  3.08 3.1 2.85 2.9 2.7 2.62
 $ SAT_Yes: int  1317 1333 1312 1280 1308 1287
 $ GPA_Yes: num  3.3 3.24 3.12 3.04 2.94 2.8

head(dta)

          College SAT_No GPA_No SAT_Yes GPA_Yes
1         Barnard   1210   3.08    1317    3.30
2    Northwestern   1243   3.10    1333    3.24
3         Bowdoin   1200   2.85    1312    3.12
4           Colby   1220   2.90    1280    3.04
5 Carnegie Mellon   1237   2.70    1308    2.94
6    Georgia Tech   1233   2.62    1287    2.80

dta_ml <- dta %>% 
 dplyr::filter(SAT_No=='SAT_No', SAT_Yes=='SAT_Yes') %>%
  ##這裡factor叫我只能設0，但是這樣沒東西能跑出來
 dplyr::transmute(SAT_No=SAT_No, SAT_Yes=SAT_Yes, College=factor(0:n()))
n <- length(dta_ml$College)
with(dta_ml, plot(rep(1, n), SAT_No, axes=F, 
                  xlim=c(0, 3), 
                  ylim=range(2.6, 3.4)+c(-.5, .5),
                  xlab="SAT(V+M)",
                  ylab="First Year GPA",
                  panel.first=abline(h=seq(17, 23, 1), 
                                     col="grey80", 
                                     lty=3)))

with(dta_ml, points(rep(2, n), SAT_Yes))
axis(2)
axis(1, at=1:2, 
     labels=c("SAT_No","SAT_Yes"))
with(dta_ml, 
     segments(rep(1, n), SAT_No, 
              rep(2, n), SAT_Yes,
              lty=3, lwd=2, 
              col="darkgray"))
with(dta_ml, text(.75, SAT_Yes, labels="College", cex = 1))

5.Use the free recall data to replicate the figure

library(tidyverse)
library(readr)
dta <- read_csv("C:/Users/boss/Desktop/data_management/nobel_winners.txt")
str(dta)

Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    1 obs. of  1 variable:
 $ Name Gender Year: chr "Patrick Modiano\" Male 2014            \n\"Bertrand  Russell\" Male 1950\n\"Kazuo Ishiguro\" Male 2017\n\"Bob  "| __truncated__
 - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame':    14 obs. of  5 variables:
  ..$ row     : int  1 1 1 1 1 1 1 1 1 1 ...
  ..$ col     : chr  "Name Gender Year" "Name Gender Year" "Name Gender Year" "Name Gender Year" ...
  ..$ expected: chr  "delimiter or quote" "delimiter or quote" "delimiter or quote" "delimiter or quote" ...
  ..$ actual  : chr  " " "B" " " "K" ...
  ..$ file    : chr  "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" "'C:/Users/boss/Desktop/data_management/nobel_winners.txt'" ...
 - attr(*, "spec")=
  .. cols(
  ..   `Name Gender Year` = col_character()
  .. )

head(dta)

# A tibble: 1 x 1
  `Name Gender Year`                                                            
  <chr>                                                                         
1 "Patrick Modiano\" Male 2014            \n\"Bertrand  Russell\" Male 1950\n\"~

names(dta)

[1] "Name Gender Year"

Trellis graphics

In-class exercises:

1.Render the R script for replicating figures in Chapter 4 of Lattice: Multivariate Data Visualization with R (Sarkar, D. 2008) to html document with comments at each code chunk indicated by ‘##’.

#把VADeaths這份資料叫出來#
VADeaths

      Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0

#辨認資料是哪一種類別#
class(VADeaths)

[1] "matrix"

library("lattice")

#看dotplot有幾種#
methods("dotplot")

[1] dotplot.array*   dotplot.default* dotplot.formula* dotplot.matrix* 
[5] dotplot.numeric* dotplot.table*  
see '?methods' for accessing help and source code

#用VADeaths畫dotplot並依照Male/Female與Rural/Urban分別做比較#
dotplot(VADeaths, groups=FALSE)

#把上面那張圖以直的方式畫#
dotplot(VADeaths, groups=FALSE, 
        layout=c(1, 4), 
        aspect=0.7, 
        origin=0, 
        type=c("p", "h"),
        main="Death Rates in Virginia - 1940", 
        xlab="Rate (per 1000)")

#畫曲線圖#
dotplot(VADeaths, type="o",
        auto.key=list(lines=TRUE, space="right"),
        main="Death Rates in Virginia - 1940",
        xlab="Rate (per 1000)")

#與第二張圖一樣，但是裡面換成長條圖#
barchart(VADeaths, groups=FALSE,
         layout=c(1, 4), 
         aspect=0.7, 
         reference=FALSE, 
         main="Death Rates in Virginia - 1940",
         xlab="Rate (per 100)")

#開lattiveExtra與postdoc檔案#
data(postdoc, package="latticeExtra")

#畫長條分色圖，但是不易看出那一個專業領域絕對值的大小#
barchart(prop.table(postdoc, margin=1), 
         xlab="Proportion",
         auto.key=list(adj=1))

#畫成依照工作狀態分類，並將各種專業領域做比較#
dotplot(prop.table(postdoc, margin=1), 
        groups=FALSE, 
        xlab="Proportion",
        par.strip.text=list(abbreviate=TRUE, minlength=10))

#畫前一張，但是直行排列另外可做比較#
dotplot(prop.table(postdoc, margin=1), 
        groups=FALSE, 
        index.cond=function(x, y) median(x),
        xlab="Proportion", 
        layout=c(1, 5), 
        aspect=0.6,
        scales=list(y=list(relation="free", rot=0)),
        prepanel=function(x, y) {
            list(ylim=levels(reorder(y, x)))
        },
        panel=function(x, y, ...) {
            panel.dotplot(x, reorder(y, x), ...)
        })

#開mlmRev以及Chem97這個檔案#
data(Chem97, package="mlmRev")

#把gcsescore.tab定義為將Chem97裡gcsescore 與 gender分別當下X軸與上X軸#
gcsescore.tab <- xtabs(~ gcsescore + gender, Chem97)

#將gcsescore.df定義為用來gcsescore.tab儲存類似 Excel 表格的變數類型，它跟矩陣類似，不過 data frame 的每個行（column）可以儲存不同變數類型的資料，甚至非狀巢結構的列表亦可#
gcsescore.df <- as.data.frame(gcsescore.tab)

#將gcsescore.df$gcsescore定義為數字類型的as.character(gcsescore.df$gcsescore)#
gcsescore.df$gcsescore <- as.numeric(as.character(gcsescore.df$gcsescore))

#畫成長條圖，以gcsescore | gender當頻率，X軸的樣式是h，以第1、2欄的資料作呈現#
xyplot(Freq ~ gcsescore | gender, 
       data = gcsescore.df, 
       type="h", 
       layout=c(1, 2), 
       xlab="Average GCSE Score")

#將score.tab定義為score + gender的交叉表#
score.tab <- xtabs(~score + gender, Chem97)

#將score.df定義為score.tab轉換成data frame的形式#
score.df <- as.data.frame(score.tab)

#就gender、score.df畫圖#
barchart(Freq ~ score | gender, score.df, origin=0)

## The End

2.Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:

#開檔案
library(Ecdat)
pacman::p_load(Ecdat)
data("Caschool", package="Ecdat")
dta <- Ecdat::Caschool
#看檔案長甚麼樣子
head(dta)

  distcod  county                        district grspan enrltot teachers
1   75119 Alameda              Sunol Glen Unified  KK-08     195    10.90
2   61499   Butte            Manzanita Elementary  KK-08     240    11.15
3   61549   Butte     Thermalito Union Elementary  KK-08    1550    82.90
4   61457   Butte Golden Feather Union Elementary  KK-08     243    14.00
  calwpct mealpct computer testscr   compstu  expnstu      str avginc     elpct
1  0.5102  2.0408       67   690.8 0.3435898 6384.911 17.88991 22.690  0.000000
2 15.4167 47.9167      101   661.2 0.4208333 5099.381 21.52466  9.824  4.583333
3 55.0323 76.3226      169   643.6 0.1090323 5501.955 18.69723  8.978 30.000002
4 36.4754 77.0492       85   647.7 0.3497942 7101.831 17.35714  8.978  0.000000
  readscr mathscr
1   691.6   690.0
2   660.5   661.9
3   636.3   650.9
4   651.9   643.5
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#分割資料
dta<- dta%>%
  mutate(ratio = enrltot/teachers,
       Reading = cut(readscr, breaks = quantile(readscr, probs = c(0, .33, .67, 1)), 
                                           label = c("L", "M", "H"), ordered = T))
#確定分割成功
head(dta)

  distcod  county                    district grspan enrltot teachers calwpct
1   75119 Alameda          Sunol Glen Unified  KK-08     195    10.90  0.5102
2   61499   Butte        Manzanita Elementary  KK-08     240    11.15 15.4167
3   61549   Butte Thermalito Union Elementary  KK-08    1550    82.90 55.0323
  mealpct computer testscr   compstu  expnstu      str avginc     elpct readscr
1  2.0408       67   690.8 0.3435898 6384.911 17.88991 22.690  0.000000   691.6
2 47.9167      101   661.2 0.4208333 5099.381 21.52466  9.824  4.583333   660.5
3 76.3226      169   643.6 0.1090323 5501.955 18.69723  8.978 30.000002   636.3
  mathscr    ratio Reading
1   690.0 17.88991       H
2   661.9 21.52466       M
3   650.9 18.69723       L
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

#畫圖
xyplot(readscr~ratio|Reading,data=dta,
       type=c("p","g","r"),layout=c(3,1),
       xlab="Student-Teacher Ratio",
       ylab="Reading Score")

3.The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.

#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#開檔案
dta<- read.table("C:/Users/boss/Desktop/data_management/beautyCourseEval.txt", h = T)
#看檔案
head(dta)

  eval     beauty sex age minority tenure courseID
1  4.3  0.2015666   1  36        1      0        3
2  4.5 -0.8260813   0  59        0      1        0
3  3.7 -0.6603327   0  51        0      1        4
4  4.3 -0.7663125   1  40        0      1        2
5  4.4  1.4214450   1  31        0      0        0
6  4.2  0.5002196   0  62        0      1        0

#畫回歸圖並合併
lattice::xyplot(eval ~ beauty | courseID, type = c("p", "g", "r"), data =dta , auto.key = list(columns = 2), xlab = "eval", ylab = "beauty")

#畫散點矩陣圖並合併
splom(~ dta[,c("eval", "minority", "tenure", "beauty")] ,groups = courseID, 
      data=dta,
      pch='.', 
      axis.text.cex=0.3,
      par.settings=standard.theme(color=FALSE))

#開工具
library(lattice)
library(tidyverse)
library(magrittr)
#開檔案
dta<- read.table("C:/Users/boss/Desktop/data_management/brainsize.txt", h = T)
#看檔案
head(dta)

  Sbj Gender FSIQ VIQ PIQ Weight Height MRICount
1   1 Female  133 132 124    118   64.5   816932
2   2   Male  140 150 124     NA   72.5  1001121
3   3   Male  139 123 150    143   73.3  1038437
4   4   Male  133 129 128    172   68.8   965353
5   5 Female  137 132 134    147   65.0   951545
6   6 Female   99  90 110    146   69.0   928799

#畫散佈圖
stripplot(FSIQ ~ PIQ | Gender,
        data=dta, 
        pch=1, 
        cex=.5, 
        alpha=.5,
        type=c('g','p'),
        jitter.data=TRUE,
        xlab="VIQ", 
        ylab='PIQ', 
        auto.key=list(space="top", 
                      columns=4),
        par.settings=standard.theme(color=FALSE))

#三種智力與男女性別差異p值均大於.05，男女之間智力差不多
t.test(dta$FSIQ ~ dta$Gender, paired=F, na.action = na.pass)


    Welch Two Sample t-test

data:  dta$FSIQ by dta$Gender
t = -0.40267, df = 37.892, p-value = 0.6895
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18.68639  12.48639
sample estimates:
mean in group Female   mean in group Male 
               111.9                115.0

t.test(dta$VIQ ~ dta$Gender, paired=F, na.action = na.pass)


    Welch Two Sample t-test

data:  dta$VIQ by dta$Gender
t = -0.77262, df = 36.973, p-value = 0.4447
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -21.010922   9.410922
sample estimates:
mean in group Female   mean in group Male 
              109.45               115.25

t.test(dta$PIQ ~ dta$Gender, paired=F, na.action = na.pass)


    Welch Two Sample t-test

data:  dta$PIQ by dta$Gender
t = -0.1598, df = 37.815, p-value = 0.8739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -15.72079  13.42079
sample estimates:
mean in group Female   mean in group Male 
              110.45               111.60

#使用qq-plot畫圖
qqmath( Weight~ Height | Gender, 
       aspect="xy", 
       data=dta,
       type=c('p','g'),
       prepanel=prepanel.qqmathline,
       panel=function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       },
       par.settings=standard.theme(color=FALSE))

#性別與身高體重關係，均有顯著影響
summary(lm(dta$Weight ~ dta$Gender, data=dta, na.action = na.omit))


Call:
lm(formula = dta$Weight ~ dta$Gender, data = dta, na.action = na.omit)

Residuals:
    Min      1Q  Median      3Q     Max 
-34.444 -15.383   3.678  13.306  37.800 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     137.200      4.132  33.203  < 2e-16 ***
dta$GenderMale   29.244      6.004   4.871 2.23e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.48 on 36 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.3972,    Adjusted R-squared:  0.3805 
F-statistic: 23.73 on 1 and 36 DF,  p-value: 2.227e-05

summary(lm(dta$Height ~ dta$Gender, data=dta, na.action = na.omit))


Call:
lm(formula = dta$Height ~ dta$Gender, data = dta, na.action = na.omit)

Residuals:
   Min     1Q Median     3Q    Max 
-5.132 -2.432  0.235  2.152  5.568 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     65.7650     0.6298  104.42  < 2e-16 ***
dta$GenderMale   5.6666     0.9023    6.28 2.62e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.816 on 37 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.516, Adjusted R-squared:  0.5029 
F-statistic: 39.44 on 1 and 37 DF,  p-value: 2.624e-07

#使用xyplot來看性別、大腦size、智商有沒有顯著影響
xyplot(FSIQ ~ MRICount | Gender, 
       data=dta, 
       type="smooth",
       panel=function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.grid(h=-1, 
                    v=-1, 
                    col="gray80", 
                    lty=3, ...)
         panel.average(x, y, fun=mean, 
                       horizontal=FALSE, 
                       col='gray', ...)},
       par.settings=standard.theme(color=FALSE))

##不同性別與大腦size有顯著影響力，但性別與智力沒有影響
summary(lm(dta$MRICount ~ dta$Gender, data=dta, na.action = na.omit))


Call:
lm(formula = dta$MRICount ~ dta$Gender, data = dta, na.action = na.omit)

Residuals:
   Min     1Q Median     3Q    Max 
-74868 -34593  -7290  20014 128650 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      862655      12500  69.011  < 2e-16 ***
dta$GenderMale    92201      17678   5.216 6.76e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 55900 on 38 degrees of freedom
Multiple R-squared:  0.4172,    Adjusted R-squared:  0.4019 
F-statistic:  27.2 on 1 and 38 DF,  p-value: 6.758e-06

summary(lm(dta$FSIQ ~ dta$Gender, data=dta, na.action = na.omit))


Call:
lm(formula = dta$FSIQ ~ dta$Gender, data = dta, na.action = na.omit)

Residuals:
   Min     1Q Median     3Q    Max 
-35.00 -24.18   3.55  23.32  29.00 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     111.900      5.444  20.556   <2e-16 ***
dta$GenderMale    3.100      7.699   0.403    0.689    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.34 on 38 degrees of freedom
Multiple R-squared:  0.004249,  Adjusted R-squared:  -0.02196 
F-statistic: 0.1621 on 1 and 38 DF,  p-value: 0.6894

Exercises:

1.Use trellis graphics to explore various ways to display the sample data from the National Longitudinal Survey of Youth.

#開工具
library(lattice)
library(tidyverse)
library(magrittr)
library(readr)
dta <- read_csv("C:/Users/boss/Desktop/data_management/nlsy86long.csv")
str(dta)

Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    664 obs. of  9 variables:
 $ id   : num  2390 2560 3740 4020 6350 7030 7200 7610 7680 7700 ...
 $ sex  : chr  "Female" "Female" "Female" "Male" ...
 $ race : chr  "Majority" "Majority" "Majority" "Majority" ...
 $ time : num  1 1 1 1 1 1 1 1 1 1 ...
 $ grade: num  0 0 0 0 1 0 0 0 0 0 ...
 $ year : num  6 6 6 5 7 5 6 7 6 6 ...
 $ month: num  67 66 67 60 78 62 66 79 76 67 ...
 $ math : num  14.29 20.24 17.86 7.14 29.76 ...
 $ read : num  19.05 21.43 21.43 7.14 30.95 ...
 - attr(*, "spec")=
  .. cols(
  ..   id = col_double(),
  ..   sex = col_character(),
  ..   race = col_character(),
  ..   time = col_double(),
  ..   grade = col_double(),
  ..   year = col_double(),
  ..   month = col_double(),
  ..   math = col_double(),
  ..   read = col_double()
  .. )

2.Eight different physical measurements of 30 French girls were recorded from 4 to 15 years old. Explore various ways to display the data using trellis graphics.

Column 1: Weight in grams Column 2: Height in mms Column 3: Head to butt length in mms Column 4: Head circumference in mms Column 5: Chest circumference in mms Column 6: Arm length in mms Column 7: Calf length in mms Column 8: Pelvis circumference in mms Column 9: Age in years Column 10: Girl ID

3.Your manager gave you a sales data on sevral products in a SAS format. Your task is to summarize and report the data in tables and graphs using the R lattice package.

Recode the region variable (1 to 4) by “Nothern”, “Southern”, “Eastern” and “Western”; the district variable (1 - 5) by “North East”, “South East”, “South West”, “North West”, “Central West”; the quarter variable (1-4) by “1st”, “2nd”, “3rd”, “4th”; and the month variable (1-12) by “Jan”, “Feb”, etc. Set negative sales values to zero.

4.Use the Lattice package to graphically explore the age and gender effects on reaction time reported in the Bassin data example.