DataM: Inclass Exercise 0323
In-class exercise 1.
The following student ID file is missing an ID U76067010 at the third to the last position. Find a way to fix it? Dowload and display the data contents in R.
[Solution and Answer]
studentID <- read.table("../data/student_ID.txt", header = T)
studentID$ID <- as.character(studentID$ID)
library(DataCombine)
studentID <- InsertRow(studentID, NewRow = rep('U76067010', 1),
RowNum = nrow(studentID)+1-3+1)
studentIDIn-class exercise 2.
A classmate of yours used data.entry() to change the first woman’s height to 50 in the women{datasets}. She then closed the editor and issued plot(women). To her surprise, she got this message:
Error in xy.coords(x, y, xlabel, ylabel, log) :
‘x’ is a list, but does not have components ‘x’ and ’y
[Solution and Answer]
[1] "list"
Since using data.entry(women) will make the data frame women turn into a list, the command plot(women) can not recognize what variable in women x and y. Hence, we should use edit(women) instead of data.entry(women). We also can spicify x and y when calling plot():
In-class exercise 3.
Make a Google citations plot (for the last 5 years) of one of NCKU faculty members you know.
[Solution and Answer]
library(scholar)
citation <- get_citation_history('39ghs30AAAAJ')
plot(citation, xlab="Year", ylab="Citations", main='Google citations plot of Prof. Cheng-Te Li',
type='h', lwd=2, xaxt="n", xlim = c(2009, 2020))
axis(side=1, at=seq(2007, 2019, by=1), cex.axis=0.7)
abline(h=seq(0, 400, by=200), lty=3, col="gray")In-class exercise 4.
Data on body temperature, gender, and heart rate. are taken from Mackowiak et al. (1992). “A Critical Appraisal of 98.6 Degrees F …,” in the Journal of the American Medical Association (268), 1578-80. Import the file. Find the correlation between body temperature and heart rate and investigate if there is a gender difference in mean temperature.
[Solution and Answer]
Classes 'tbl_df', 'tbl' and 'data.frame': 130 obs. of 3 variables:
$ Temp : num 96.3 96.7 96.9 97 97.1 97.1 97.1 97.2 97.3 97.4 ...
$ Sex : num 1 1 1 1 1 1 1 1 1 1 ...
$ Beats: num 70 71 74 80 73 75 82 64 69 70 ...
[1] 0.2536564
There is a slightly positively correlation between body temperature and heart rate.
For the investigation of gender difference in mean temperature, we conduct an independent two sample t-test.
Welch Two Sample t-test
data: Temp by Sex
t = -2.2854, df = 127.51, p-value = 0.02394
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.53964856 -0.03881298
sample estimates:
mean in group 1 mean in group 2
98.10462 98.39385
Since \(p<\alpha=.05\), we reject the null hypothesis. There is a significant gender difference in body temperature.
In-class exercise 5.
The AAUP2 data set is a comma-delimited fixed column format text file with '*' for missing value. Import the file into R and indicate missing values by 'NA'. Hint: ?read.csv
[Solution and Answer]
library(readr)
AAUP2 <- read_fwf('../data/AAUP2.txt', na = '*',
col_positions = fwf_widths(c(6, 31, 3, 5, 4, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 5)))Parsed with column specification:
cols(
X1 = col_double(),
X2 = col_character(),
X3 = col_character(),
X4 = col_character(),
X5 = col_character(),
X6 = col_double(),
X7 = col_double(),
X8 = col_character(),
X9 = col_character(),
X10 = col_double(),
X11 = col_character(),
X12 = col_character(),
X13 = col_double(),
X14 = col_double(),
X15 = col_double(),
X16 = col_double(),
X17 = col_double()
)
AAUP2 is not comma-delimited actually. It is delimited without a regular form thus we should use read_fwf{readr} to load it correctly. Also, the arguement na = '*' can work to make missing values diplay in NA.