Overview: In this lab exercise, you will begin to familiarize yourself with the statistical software R and RStudio.
Objectives: At the end of this lab, you will be able to:
Open RStudio.
In the bottom-right pane of RStudio, click the Packages tab, then Install. In the search box, type and choose “mosaic” (lower case), then leaving everything else as it is, click Install.
Choose a location for your lab files that you will be able to access later. For example, we strongly recommend the OSU OneDrive folder while in the RStudio server.
In your chosen location (the OSU OneDrive folder strongly
recommended), create a directory or folder called
PUBHBIO 2210 Labs
.
Enter the PUBHBIO 2210 Labs
directory or folder and
create a subdirectory named Lab 1
.
Download the five lab files from Carmen while in the RStudio server:
lab-01-intro-blank.html
lab-01-intro-blank.Rmd
lab-01-intro-worksheet-blank.docx
nhanesSubset.csv
nhanes-data-dictionary.xls
If you have not downloaded all of these files, do so now.
Save the five downloaded files in the
PUBHBIO 2210 Labs/Lab 1
directory (i.e., save the
downloaded files in the Lab 1
directory or folder created).
When working on labs, it is important to keep all related files in the
same directory.
The file lab-01-intro-blank.Rmd
is a
source file, and lab-01-intro-blank.html
is the output file from “knitting” the source file
using RStudio. Text written in the source file will
generally appear in the output file in a nice format, along with results
from running code. You can read the lab instructions from the source or
output files as is convenient for you.
Open the source file lab-01-intro-blank.Rmd
in
RStudio. In the toolbar above the file editor (top
left) window, click Knit (with a blue yarn ball icon).
Make sure that the output file is produced.
At the top of this file, replace “Firstname Lastname” with your name as the author. Knit the document and make sure that your name appears at the top of the output.
Follow the instructions below to complete the rest of the lab,
writing your code and answers in lab-01-intro-blank.Rmd
using RStudio. As you work, knit the document to see
your progress and make sure everything is working correctly.
When prompted to answer a question, answer it in the worksheet
file lab-01-worksheet-blank.docx
. You will submit the
completed worksheet and the updated RMD
and
HTML
files to Carmen.
We will load a dataset from the nhanesSubset.csv
file
into R, using the read.csv()
function. For example, the
command
# Not evaluated
mydata <- read.csv("datafile.csv")
loads dataset from the file “datafile.csv” and stores it in an object
called “mydata”. The code above is not executed by R
because the option eval = FALSE
is used (“eval” for
“evaluate”).
In the code chunk below, read the dataset from the
nhanesSubset.csv
file and store it in an object called
nhanes
.
nhanes <- read.csv("nhanesSubset.csv")
The dataset is now held in the object named nhanes
.
We will convert the nhanes
object into the format
(tibble
, like “table”) that we will work with, and print
that object (i.e., the stored dataset), using code similar to the
following:
# Not evaluated
mydata <- as_tibble(mydata)
print(mydata)
In the code chunk below, convert the nhanes
object to a
tibble
and print it.
# Enter code here
nhanes <- as_tibble(nhanes)
print(nhanes)
## # A tibble: 100 × 33
## id race ethnicity sex age familySize urban region
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 1 2 3 2 56 1 1 2
## 2 2 1 3 2 73 1 2 4
## 3 3 1 3 2 25 2 1 3
## 4 4 1 1 2 53 2 2 3
## 5 5 1 1 2 68 2 2 3
## 6 6 1 3 2 44 3 2 4
## 7 7 2 3 2 28 2 1 3
## 8 8 1 3 1 74 2 2 2
## 9 9 1 3 2 65 1 2 1
## 10 10 1 2 2 61 3 1 4
## # ℹ 90 more rows
## # ℹ 25 more variables: pir <dbl>, yrsEducation <int>,
## # maritalStatus <int>, healthStatus <int>,
## # heightInSelf <int>, weightLbSelf <int>, beer <int>,
## # wine <int>, liquor <int>, everSmoke <int>,
## # smokeNow <int>, active <int>, SBP <int>, DBP <int>,
## # weightKg <dbl>, heightCm <dbl>, waist <dbl>, …
Printing the dataset by simply entering the object name
nhanes
on its own line should produce some output when you
knit your document. Note the following components of the table:
A tibble: 100 x 33
: this means the table has 100 rows
(observations) and 33 columns (variables);id race ethnicity ...
: these are the column (variable)
names;<int> <int> <int> ... <dbl>
:
these are variable types, which will be discussed in Part 2;NA
(not
available), indicating missing data;tibble
object only shows the
first few rows and columns. At the bottom of the output, there is a note
that there are 90 more rows and 24 more variables, along with a list of
the remaining variables and their modes.We may not be interested all of the variables at the same time. We
can use the select()
function to select only a few columns
or variables . For example, the command
# Not evaluated 1
mydata %>% select(variable.1, variable.2, variable.3)
will select variable.1
, variable.2
, and
variable.3
from mydata
, and print the result.
The “pipe” symbol %>%
means that we start with the
object on the left side, and apply an action described by the right
side.
In the code chunk below, use select()
to select and
print just the id
, race
,
ethnicity
, sex
, age
, and
healthStatus
variables in the nhanes
dataset.
# Enter code here
nhanes %>% select(id, race, ethnicity, sex, age, healthStatus)
## # A tibble: 100 × 6
## id race ethnicity sex age healthStatus
## <int> <int> <int> <int> <int> <int>
## 1 1 2 3 2 56 4
## 2 2 1 3 2 73 4
## 3 3 1 3 2 25 3
## 4 4 1 1 2 53 5
## 5 5 1 1 2 68 4
## 6 6 1 3 2 44 2
## 7 7 2 3 2 28 3
## 8 8 1 3 1 74 3
## 9 9 1 3 2 65 3
## 10 10 1 2 2 61 3
## # ℹ 90 more rows
In order to perform statistical analyses correctly in
R, we need to pay attention to the
type of the data. In a tibble
, we can have
the following types:
<int>
in a tibble
) are numbers without a decimal part, i.e., a
discrete variable;<dbl>
for “double precision” in a tibble
) are numbers with a
decimal part, i.e., a continuous variable;<fct>
or <ord>
in a tibble
) are not numbers
and have defined levels, i.e., nominal or ordinal
variables;<chr>
in a tibble
) are free text;<lgl>
in a tibble
) can take values of TRUE
or
FALSE
.R will perform an analysis depending on the way the variable is stored. For example, R will not permit you to calculate a mean for a variable stored as a factor (nominal or ordinal variable).
As with many datasets, this NHANES dataset is coded. This means that
instead of recording responses like “male” and “female” as these words
(text), we store the data as numeric values that correspond to specific
responses (i.e., numeric codes). For example, the variable
sex
has values 1
and 2
in the
NHANES dataset. We need to know what each numeric value corresponds to
if we want to understand the demographics of our sample. We can
determine what type each variable should be by reading the data
dictionary (nhanes_dataDictionary.xlsx
).
The inspect
command prints a summary of a data
object:
# Not evaluated
inspect(mydata)
In the code chunk below, inspect the nhanes
dataset.
# Enter code here
inspect(nhanes)
##
## quantitative variables:
## name class min Q1 median
## 1 id integer 1.000 25.75000 50.5000
## 2 race integer 1.000 1.00000 1.0000
## 3 ethnicity integer 1.000 1.00000 3.0000
## 4 sex integer 1.000 1.00000 2.0000
## 5 age integer 17.000 34.75000 48.5000
## 6 familySize integer 1.000 2.00000 2.0000
## 7 urban integer 1.000 1.00000 2.0000
## 8 region integer 1.000 2.00000 3.0000
## 9 pir numeric 0.179 1.17625 1.9145
## 10 yrsEducation integer 0.000 8.00000 12.0000
## 11 maritalStatus integer 1.000 1.00000 1.0000
## 12 healthStatus integer 1.000 2.00000 3.0000
## 13 heightInSelf integer 52.000 62.00000 66.0000
## 14 weightLbSelf integer 82.000 135.00000 160.0000
## 15 beer integer 0.000 0.00000 0.0000
## 16 wine integer 0.000 0.00000 0.0000
## 17 liquor integer 0.000 0.00000 0.0000
## 18 everSmoke integer 0.000 0.00000 0.0000
## 19 smokeNow integer 1.000 2.00000 2.0000
## 20 active integer 1.000 1.25000 2.0000
## 21 SBP integer 94.000 112.00000 126.0000
## 22 DBP integer 0.000 64.00000 74.0000
## 23 weightKg numeric 35.800 63.85000 72.6500
## 24 heightCm numeric 139.400 157.20000 164.5000
## 25 waist numeric 68.100 84.50000 92.0000
## 26 tricep numeric 5.100 11.45000 19.2500
## 27 thigh numeric 4.400 10.10000 18.0000
## 28 BMD numeric 0.358 0.80750 0.9010
## 29 RBC numeric 3.430 4.28500 4.5900
## 30 lead numeric 0.700 2.07500 3.0500
## 31 cholesterol integer 122.000 179.25000 206.0000
## 32 triglyceride integer 29.000 86.75000 137.5000
## 33 hdl integer 14.000 41.75000 49.0000
## Q3 max mean sd n missing
## 1 75.2500 100.00 50.5000000 29.0114920 100 0
## 2 1.0000 2.00 1.2200000 0.4163332 100 0
## 3 3.0000 3.00 2.4400000 0.8912595 100 0
## 4 2.0000 2.00 1.5800000 0.4960450 100 0
## 5 70.2500 90.00 51.4300000 21.5285833 100 0
## 6 4.0000 10.00 3.0100000 1.7780650 100 0
## 7 2.0000 2.00 1.5400000 0.5009083 100 0
## 8 3.0000 4.00 2.7400000 0.9600084 100 0
## 9 3.3545 6.77 2.4536395 1.6725237 86 14
## 10 13.0000 17.00 10.8989899 4.0317489 99 1
## 11 4.0000 7.00 2.6500000 2.2036471 100 0
## 12 4.0000 5.00 2.8200000 1.0766578 100 0
## 13 69.0000 76.00 65.7473684 4.4338361 95 5
## 14 180.0000 270.00 160.0736842 37.6004307 95 5
## 15 0.5000 365.00 7.1313131 38.2141246 99 1
## 16 0.0000 61.00 1.2424242 6.4824581 99 1
## 17 0.0000 13.00 0.7070707 2.3571538 99 1
## 18 1.0000 1.00 0.4444444 0.4994328 99 1
## 19 2.0000 2.00 1.7878788 0.4108907 99 1
## 20 3.0000 3.00 2.2040816 0.8243608 98 2
## 21 141.0000 216.00 129.4216867 22.8815191 83 17
## 22 81.0000 118.00 73.1084337 16.6374629 83 17
## 23 82.1500 137.75 73.4323656 17.4040663 93 7
## 24 171.9000 184.10 164.4440860 9.9532981 93 7
## 25 98.7000 129.90 93.3670588 13.1250613 85 15
## 26 26.6000 39.80 19.7340909 9.5348349 88 12
## 27 29.5000 41.80 20.4780822 11.1813491 73 27
## 28 1.0510 1.26 0.9028824 0.1881680 68 32
## 29 4.9850 6.15 4.6184783 0.5013256 92 8
## 30 4.7250 13.30 3.6934783 2.5428549 92 8
## 31 239.5000 370.00 210.0326087 43.8146574 92 8
## 32 192.2500 524.00 152.6304348 92.8197731 92 8
## 33 61.0000 101.00 51.8913043 15.8068399 92 8
You will notice that some variables that should be nominal or categorical are not stored as factors because the minimum, 1st quartile, median, 3rd quartile, maximum, mean, and standard deviation are displayed instead of what we would expect, the percentage per category level. We will do two tasks at once: convert a variable to a factor, and assign informative labels to the numeric codes, using code similar to that below.
# Not evaluated
mydata <- mydata %>%
mutate(
sex = recode_factor(
sex,
`1` = "male",
`2` = "female"
)
)
Before moving on, let’s parse the above code.
mydata <-
means that we’re going to assign a new
value to the mydata
object. Everything in the code after
this part is creating the value that is going to be assigned to
mydata
.mydata %>%
means we’re going to start with the
current mydata
object and perform a sequence of one or more
actions on it to produce the final value.mutate()
is the action we’re performing. Mutating data
adds a variable or replaces an existing one. We’ve split the argument
(everything between the parentheses) of this function into multiple
lines to make it easier to read.sex = recode_factor(sex, ...)
means that we’re
replacing the variable sex
with a version of itself that
has been recoded as a factor. We’ve also split up the
argument of recode_factor()
into multiple lines for
readability.`1` = "male"
means that we’re assigning the label
"male"
to the code 1
, and similarly for
`2` = "female"
.In the code chunk below, convert the sex
variable in the
nhanes
dataset to a factor and code it according to the
data dictionary.
# Enter code here
nhanes <- nhanes %>%
mutate(
sex = recode_factor(
sex,
`1` = "male",
`2` = "female"
)
)
In the code chunk below, print the nhanes
object again,
and then use inspect()
to see what in the summary has
changed.
# Enter code here
print(nhanes)
## # A tibble: 100 × 33
## id race ethnicity sex age familySize urban region
## <int> <int> <int> <fct> <int> <int> <int> <int>
## 1 1 2 3 fema… 56 1 1 2
## 2 2 1 3 fema… 73 1 2 4
## 3 3 1 3 fema… 25 2 1 3
## 4 4 1 1 fema… 53 2 2 3
## 5 5 1 1 fema… 68 2 2 3
## 6 6 1 3 fema… 44 3 2 4
## 7 7 2 3 fema… 28 2 1 3
## 8 8 1 3 male 74 2 2 2
## 9 9 1 3 fema… 65 1 2 1
## 10 10 1 2 fema… 61 3 1 4
## # ℹ 90 more rows
## # ℹ 25 more variables: pir <dbl>, yrsEducation <int>,
## # maritalStatus <int>, healthStatus <int>,
## # heightInSelf <int>, weightLbSelf <int>, beer <int>,
## # wine <int>, liquor <int>, everSmoke <int>,
## # smokeNow <int>, active <int>, SBP <int>, DBP <int>,
## # weightKg <dbl>, heightCm <dbl>, waist <dbl>, …
inspect(nhanes)
##
## categorical variables:
## name class levels n missing
## 1 sex factor 2 100 0
## distribution
## 1 female (58%), male (42%)
##
## quantitative variables:
## name class min Q1 median
## 1 id integer 1.000 25.75000 50.5000
## 2 race integer 1.000 1.00000 1.0000
## 3 ethnicity integer 1.000 1.00000 3.0000
## 4 age integer 17.000 34.75000 48.5000
## 5 familySize integer 1.000 2.00000 2.0000
## 6 urban integer 1.000 1.00000 2.0000
## 7 region integer 1.000 2.00000 3.0000
## 8 pir numeric 0.179 1.17625 1.9145
## 9 yrsEducation integer 0.000 8.00000 12.0000
## 10 maritalStatus integer 1.000 1.00000 1.0000
## 11 healthStatus integer 1.000 2.00000 3.0000
## 12 heightInSelf integer 52.000 62.00000 66.0000
## 13 weightLbSelf integer 82.000 135.00000 160.0000
## 14 beer integer 0.000 0.00000 0.0000
## 15 wine integer 0.000 0.00000 0.0000
## 16 liquor integer 0.000 0.00000 0.0000
## 17 everSmoke integer 0.000 0.00000 0.0000
## 18 smokeNow integer 1.000 2.00000 2.0000
## 19 active integer 1.000 1.25000 2.0000
## 20 SBP integer 94.000 112.00000 126.0000
## 21 DBP integer 0.000 64.00000 74.0000
## 22 weightKg numeric 35.800 63.85000 72.6500
## 23 heightCm numeric 139.400 157.20000 164.5000
## 24 waist numeric 68.100 84.50000 92.0000
## 25 tricep numeric 5.100 11.45000 19.2500
## 26 thigh numeric 4.400 10.10000 18.0000
## 27 BMD numeric 0.358 0.80750 0.9010
## 28 RBC numeric 3.430 4.28500 4.5900
## 29 lead numeric 0.700 2.07500 3.0500
## 30 cholesterol integer 122.000 179.25000 206.0000
## 31 triglyceride integer 29.000 86.75000 137.5000
## 32 hdl integer 14.000 41.75000 49.0000
## Q3 max mean sd n missing
## 1 75.2500 100.00 50.5000000 29.0114920 100 0
## 2 1.0000 2.00 1.2200000 0.4163332 100 0
## 3 3.0000 3.00 2.4400000 0.8912595 100 0
## 4 70.2500 90.00 51.4300000 21.5285833 100 0
## 5 4.0000 10.00 3.0100000 1.7780650 100 0
## 6 2.0000 2.00 1.5400000 0.5009083 100 0
## 7 3.0000 4.00 2.7400000 0.9600084 100 0
## 8 3.3545 6.77 2.4536395 1.6725237 86 14
## 9 13.0000 17.00 10.8989899 4.0317489 99 1
## 10 4.0000 7.00 2.6500000 2.2036471 100 0
## 11 4.0000 5.00 2.8200000 1.0766578 100 0
## 12 69.0000 76.00 65.7473684 4.4338361 95 5
## 13 180.0000 270.00 160.0736842 37.6004307 95 5
## 14 0.5000 365.00 7.1313131 38.2141246 99 1
## 15 0.0000 61.00 1.2424242 6.4824581 99 1
## 16 0.0000 13.00 0.7070707 2.3571538 99 1
## 17 1.0000 1.00 0.4444444 0.4994328 99 1
## 18 2.0000 2.00 1.7878788 0.4108907 99 1
## 19 3.0000 3.00 2.2040816 0.8243608 98 2
## 20 141.0000 216.00 129.4216867 22.8815191 83 17
## 21 81.0000 118.00 73.1084337 16.6374629 83 17
## 22 82.1500 137.75 73.4323656 17.4040663 93 7
## 23 171.9000 184.10 164.4440860 9.9532981 93 7
## 24 98.7000 129.90 93.3670588 13.1250613 85 15
## 25 26.6000 39.80 19.7340909 9.5348349 88 12
## 26 29.5000 41.80 20.4780822 11.1813491 73 27
## 27 1.0510 1.26 0.9028824 0.1881680 68 32
## 28 4.9850 6.15 4.6184783 0.5013256 92 8
## 29 4.7250 13.30 3.6934783 2.5428549 92 8
## 30 239.5000 370.00 210.0326087 43.8146574 92 8
## 31 192.2500 524.00 152.6304348 92.8197731 92 8
## 32 61.0000 101.00 51.8913043 15.8068399 92 8
We can alter several variables at once using the
mutate()
command, for example,
# Not executed
nhanes <- nhanes %>% mutate(
race = recode_factor(
race,
`1` = "white",
`2` = "black",
`3` = "other"
),
ethnicity = recode_factor(
ethnicity,
`1` = "mexican-american",
`2` = "other hispanic",
`3` = "not hispanic"
),
urban = recode_factor(
urban,
`1` = "metro area of 1 million",
`2` = "other"
)
)
alters the race
, ethnicity
, and
urban
variables all at once.
In the code chunk below, convert all nominal variables (including the
three above) to factors using mutate()
and
recode_factor()
. Read the data dictionary carefully, as
some variable codings are not consistent (e.g., everSmoke
and smokeNow
).
# Enter code here
nhanes <- nhanes %>% mutate(
race = recode_factor(
race,
`1` = "white",
`2` = "black",
`3` = "other"
),
ethnicity = recode_factor(
ethnicity,
`1` = "mexican-american",
`2` = "other hispanic",
`3` = "not hispanic"
),
urban = recode_factor(
urban,
`1` = "metro area of 1 million",
`2` = "other"
),
everSmoke = recode_factor(
everSmoke,
`1` = "yes",
`2` = "no"
),
smokeNow = recode_factor(
smokeNow,
`1` = "every day",
`2` = "some days",
`3` = "not at all"
),
sex = recode_factor(
sex,
`1` = "male",
`2` = "female"
),
maritalStatus = recode_factor(
maritalStatus,
`1` = "married",
`2` = "widowed",
`3` = "divorced",
`4` = "separated",
`5` = "never married",
`6` = "living with partner"
)
)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `everSmoke = recode_factor(everSmoke, `1` =
## "yes", `2` = "no")`.
## Caused by warning:
## ! Unreplaced values treated as NA as `.x` is not
## compatible.
## Please specify replacements exhaustively or supply
## `.default`.
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining
## warning.
The variable healthStatus
is an ordinal variable, so we
want to tell R
that it is an ordered factor. Create
ordered value labels for the variable
healthStatus
using the .ordered = TRUE
option
as in following code:
# Not evaluated
mydata <- mydata %>% mutate(
smokingCausesCancer = recode_factor(
smokingCausesCancer,
`1` = "strongly disagree",
`2` = "disagree",
`3` = "unsure",
`4` = "agree",
`5` = "strongly agree",
.ordered = TRUE
)
)
In the code chunk below, convert the variable
healthStatus
to an ordinal variable with an appropriate
ordering.
# Enter code here
nhanes <- nhanes %>% mutate(
healthStatus = recode_factor(
healthStatus,
`1` = "poor",
`2` = "fair",
`3` = "good",
`4` = "very good",
`5` = "excellent",
.ordered = TRUE
)
)
In the code chunk below, use the select()
function as
you did earlier to select and print just the variables id
,
race
, and healthStatus
.
# Enter code here
nhanes %>% select(id, race, healthStatus)
## # A tibble: 100 × 3
## id race healthStatus
## <int> <fct> <ord>
## 1 1 black very good
## 2 2 white very good
## 3 3 white good
## 4 4 white excellent
## 5 5 white very good
## 6 6 white fair
## 7 7 black good
## 8 8 white good
## 9 9 white good
## 10 10 white good
## # ℹ 90 more rows
Note the variable types <fct>
for “factor” and
<ord>
for “ordinal”.
To look closely at a single variable, we can use the
pull()
function, as in
# Not evaluated
mydata %>% pull(smokingCausesCancer)
In the code chunk below, use the pull()
function to
print just the race
variable, and in a separate command,
print just the healthStatus
variable.
# Enter code here
nhanes %>% pull(race)
## [1] black white white white white white black white white
## [10] white white white white white white black white white
## [19] white white white white white white white white white
## [28] white white black white white black white white white
## [37] white white white black white white white white white
## [46] white white white black black white white white white
## [55] white black black black black white white white black
## [64] white white white white white white black black white
## [73] white white white black white white black white white
## [82] white white white black white white white black black
## [91] white white white white black white black white white
## [100] white
## Levels: white black
nhanes %>% pull(healthStatus)
## [1] very good very good good excellent very good
## [6] fair good good good good
## [11] good excellent fair fair very good
## [16] fair good fair very good very good
## [21] fair very good fair good fair
## [26] fair fair very good good very good
## [31] good good very good poor excellent
## [36] very good good good very good good
## [41] fair very good good poor good
## [46] good good poor good very good
## [51] fair good poor poor good
## [56] fair good poor poor good
## [61] fair good good good poor
## [66] very good fair good good excellent
## [71] good very good good fair very good
## [76] fair fair very good fair good
## [81] excellent fair fair poor good
## [86] very good very good poor good poor
## [91] fair good poor good poor
## [96] very good very good good fair fair
## Levels: poor < fair < good < very good < excellent
Note that the order of the levels of healthStatus
is the
order in which they were defined, not the order of the old numeric
codes.
Look at the Codes
(i.e., the labels for the numeric
values) for the variable active
. Do this by consulting the
data dictionary (Excel file you downloaded earlier:
nhanes_dataDictionary.xlsx
, see column E) and then, in the
code chunk below, reorder the response options to appear in a logical
order if necessary. Here, note that you are not required to change the
original codes
(i.e., the labels for the numeric values)
for the variable. Instead, in your coding efforts, reorder the numeric
values such that their given labels in the data dictionary appear in a
logical order.
# Enter code here
nhanes$active <- factor(nhanes$active,
levels = c(2, 3, 1),
labels = c("Less Active", "About the Same", "More Active"))
You can identify a specific row from the nhanes
dataset
by extracting the row that corresponds to a unique patient using the
filter()
function (note the double equal signs). Here we
use two “pipes” (%>%
) to perform two actions in
sequence: filter
just the rows with id == 10
then select
a few columns or variables to display. For
example:
# Not evaluated
nhanes %>%
filter(id == 10) %>%
select(id, race, ethnicity, sex, age)
prints the variables id
, race
,
ethnicity
, and age
for the subject with id =
10.
In the code chunk below, print the variables id
,
race
, ethnicity
, sex
,
age
, and healthStatus
for the subject with id
= 2.
# Enter code here
nhanes %>%
filter(id == 2) %>%
select(id, race, ethnicity, sex, age, healthStatus)
## # A tibble: 1 × 6
## id race ethnicity sex age healthStatus
## <int> <fct> <fct> <fct> <int> <ord>
## 1 2 white not hispanic female 73 very good
In order to save all the work you did on this dataset (such as
setting variable types and value labels), you need to save the dataset
as an R
object using the following command:
# Not evaluated
save(nhanes, file = "nhanes.RData")
In the code chunk below, use the exact command above to save the modified dataset. Open the folder to make sure the file has been saved.
# Enter code here
save(nhanes, file = "nhanes.RData")
The next time you want to use this R
object, you would
set your working directory to where this object is stored and then use
the following command:
# Not evaluated
load("nhanes.RData")
Please turn in your completed worksheet (DOCX, i.e., word document), and your RMD file and updated HTML file to Carmen by the due date. Here, ensure to upload all the three (3) files before you click on the “Submit Assignment” tab to complete your submission.