Q1

Chatterjee and Hadi (Regression by Examples, 2006) provided a link to the right to workdata set on their web page. Display the relationship between Income and Taxes.

#Import tab-delimited files
q1fL <- "http://www1.aucegypt.edu/faculty/hadi/RABE5/Data5/P005.txt"
q1dta <- read.csv(q1fL, sep='\t')

dplyr::glimpse(q1dta)

## Rows: 38
## Columns: 8
## $ City   <chr> "Atlanta", "Austin", "Bakersfield", "Baltimore", "Baton Rouge",~
## $ COL    <int> 169, 143, 339, 173, 99, 363, 253, 117, 294, 291, 170, 239, 174,~
## $ PD     <int> 414, 239, 43, 951, 255, 1257, 834, 162, 229, 1886, 643, 1295, 3~
## $ URate  <dbl> 13.6, 11.0, 23.7, 21.0, 16.0, 24.4, 39.2, 31.5, 18.2, 31.5, 29.~
## $ Pop    <int> 1790128, 396891, 349874, 2147850, 411725, 3914071, 1326848, 162~
## $ Taxes  <int> 5128, 4303, 4166, 5001, 3965, 4928, 4471, 4813, 4839, 5408, 463~
## $ Income <int> 2961, 1711, 2122, 4654, 1620, 5634, 7213, 5535, 7224, 6113, 480~
## $ RTWL   <int> 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, ~

head(q1dta[,c(1,6,7)]) |> knitr::kable()

City	Taxes	Income
Atlanta	5128	2961
Austin	4303	1711
Bakersfield	4166	2122
Baltimore	5001	4654
Baton Rouge	3965	1620
Boston	4928	5634

#擷取要分析的變數
mydata<-q1dta[,c(1,6,7)]

summary(mydata)

##      City               Taxes          Income    
##  Length:38          Min.   :3965   Min.   : 782  
##  Class :character   1st Qu.:4620   1st Qu.:3110  
##  Mode  :character   Median :4858   Median :4865  
##                     Mean   :4903   Mean   :4709  
##                     3rd Qu.:5166   3rd Qu.:6082  
##                     Max.   :6404   Max.   :8392

# 散布圖
with(mydata, plot(Income, Taxes, 
               xlab = "Income",
               ylab = "Taxes",
               main = "Overall relationship between tax and income",
               xlim = c(750,8500),
               ylim = c(3900, 6500),
               type='n', bty='n'))
grid()
abline(lm(Taxes ~ Income, data=mydata))
with(mydata, text(Income, Taxes, 
               labels=City, 
               cex=.5))

所得與稅率的關係似乎有一點正相關。

Q2

The pdf version of the paper by Hsu, et al (2020). Health inequality: a longitudinal study on geographic variations in lung cancer incidence and mortality in Taiwanis available on-line. Extract the values in Table 1of the articlc to replicate Figure 2 in the paper.

#Extract tables from web pages
pacman::p_load(rvest)

q2fL <- "https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-020-09044-2/tables/1"

df <- q2fL |> 
  read_html() |> 
  html_nodes("tables") |> 
  html_table(fill = T,header = T) #fill = T缺失值填上NA

#dta <- data.frame(df[[1]])[-1,1:5] #抓第一個表格，減第一列表頭，1:5 column資料

#head(dta,7) |> knitr::kable()

出現錯誤訊息Error in df[[1]] : subscript out of bounds 不懂為何會越界?!

Q3

Reproducible data and code are available for Leongomez, J.D. et al.(2020). Self-reported Health is Related to Body Height and Waist Circumference in Rural Indigenous and Urbanised Latin-American Populations. Scientific Report, 10, 4391.

Summarize the mean of height, waist, and weight in each study sample by gender.

#Download a file to a directory
dir.create(file.path(getwd(), "tmp_data"), showWarnings=FALSE)

fL <- "https://osf.io/kxut3/"
fD <- "./tmp_data/Full_data.csv(Version: 2)"
download.file(fL, destfile = fD)

出現訊息 trying URL ‘https://osf.io/kxut3/’ Content type ‘text/html; charset=utf-8’ length 41091 bytes (40 KB) downloaded 40 KB

這表示資料已經下載了嗎??

#data.table::fread(fD, fill=F)

無法排除error :(

#改手動下載後讀取資料
q3dta <- read.csv("C:/Users/Ching-Fang Wu/Documents/dataM/Full_data.csv", header=T)
head(q3dta)

##     ID  Country Population    Sex Age    Waist       Hip   Height   Weight
## 1 F001 Colombia      Urban Female  23 67.33333  90.43333 157.8000 48.80000
## 2 F003 Colombia      Urban Female  24 97.50000 107.50000 164.6000 71.43333
## 3 F004 Colombia      Urban Female  19 81.13333 106.06667 164.7667 73.90000
## 4 F005 Colombia      Urban Female  19 70.30000  96.06667 161.0667 56.16667
## 5 F009 Colombia      Urban Female  18 66.73333  91.46667 162.1667 54.80000
## 6 F010 Colombia      Urban Female  18 82.53333 102.06667 160.9667 72.73333
##        Fat VisceralFat      BMI   Muscle Health         Sample
## 1 29.96667    3.000000 19.70000 24.66667  75.00 Colombia-Urban
## 2 42.56667    5.000000 26.23333 26.36667  50.00 Colombia-Urban
## 3 43.50000    5.000000 27.10000 23.73333  43.75 Colombia-Urban
## 4 34.23333    3.666667 21.66667 25.66667  50.00 Colombia-Urban
## 5 32.36667    3.000000 20.90000 26.23333  68.75 Colombia-Urban
## 6 45.96667    5.000000 28.10000 22.23333  62.50 Colombia-Urban

#Summarize the mean of height, waist, and weight in each study sample by gender.

show(mss <- aggregate(cbind(Height, Waist, Weight) ~ Sex, data=q3dta, FUN=mean))

##      Sex   Height    Waist   Weight
## 1 Female 157.4532 75.34818 58.44692
## 2   Male 170.1851 80.66893 68.16513

女性平均身高157.4532；男性平均身高170.1851 女性平均腰圍75.35；男性平均腰圍80.67 女性平均體重58.45；男性平均體重68.17

Q4

The following zip file contains one subject’s laser-event potentials (LEP) data for 4 separate conditions (different level of stimulus intensity), each in a plain text file (1w.dat, 2w.dat, 3w.dat and 4w.dat).

The rows are time points from -100 to 800 ms sampled at 2 ms per record. The columns are channel IDs.

Input all the files into R for graphical exploration.

#Import zipped stata file

#step1 Create a temp.file
tmp <- tempfile()

#step2 Use download.file() to fetch the file into the temp. file
fL <- "http://140.116.183.121/~sheu/dataM/Data/Subject1.zip"
#download.file(fL, tmp)

Warning in download.file(fL, tmp, mode = “wb”) : cannot open URL ‘http://140.116.183.121/~sheu/dataM/Data/Subject1.zip’: HTTP status was ‘401 Unauthorized’ 意思是未授權下載檔案??

#手動下載後再解壓縮zip檔
q4dta<- unzip("C:/Users/Ching-Fang Wu/Documents/dataM/tmp_data/Subject1.zip",unzip = "internal")

str(q4dta)

##  chr [1:4] "./Subject1/1w.dat" "./Subject1/2w.dat" "./Subject1/3w.dat" ...

#dplyr::glimpse(q4dta <- foreign::read.dta("./Subject1/1w.dat"))

Q5

The ASCII (plain text) file schiz.asc contains response times (in milliseconds) for 11 non-schizophrenics and 6 schizophrenics (30 measurements for each person). Summarize and compare descriptive statistics of the measurements from the two groups. Source: Belin, T., & Rubin, D. (1995). The analysis of repeated-measures data on schizophrenic reaction times using mixture models. Statistics in Medicine 14(8), 747-768.

#Import plain text files
q5fL<-"http://www.stat.columbia.edu/~gelman/book/data/schiz.asc"

Q6

Built my file on google scholar.

myfile<-"https://scholar.google.com/citations?hl=en&user=g8B2YwwAAAAJ"

dataM L4 Exercise

Ching-Fang Wu

Tue Oct 26 21:53:44 2021

Q1

Q2

Q3

Q4

Q5

Q6