Source file ⇒ 2017-lec1.Rmd
Stat 133 student testamonial: “Not to sound overeager, but learning R in 133 with you last year was the most valuable educational experience I’ve had at Cal! I am very interested in the lab assistant position, how can I learn more?”
I will communicate with you about assignments through b-courses: b-courses
Piazza Site (to be set up soon)
Lab starts this week! You must go to your assigned lab. Attendance part of your lab grade.
My OH are T,Th 10-11 in Evans 449. I am very good with email (alucas@berkeley.edu)
Get i-clicker by next Tuesday (participation points)
Final exam rescheduled for Monday May 8 at 3pm.
Examples of Tidy data:
Imagine a data set with three variables, name
, trt
, result
.
name
has three values: (John, Mary, and Jane) trt
has two values: (a and b) result
has six values (-, 16,3,2,11,1)
When we display the data set where the columns are our variales and the rows are observations we call the data set a data table or a data frame. Data tables makes it easy to analyse and visualize data because it provides a standard way of structuring a data set. For this reason we call the data in a data table tidy data.
For example:
The data in the table above is tidy because the data is organized in two simple rules:
The rows, called cases, each refer to a specific, unique and similar sort of thing. For example the treatment and result of a particular patient.
The columns, called variables, each have the same sort of value recorded for each row. For example trt
are categorical (a or b) and result
are numerical.
Notice that tidy data isn’t usually concise. You might see the data represented more concisely as below.
This isn’t a data table (i.e the data isn’t tidy).
In this example the columns (person
, treatmenta
, treatmentb
) are not all variables. treatmenta
and treatmentb
are values of the variable trt
they aren’t variables themselves.
Is the following data set tidy? Why or why not?
i-clicker questions: