- Set up R and RStudio
- Understand the layout and functionality of RStudio
- Experiment with basic R expressions and data types
- Advanced data types
- Data manipulation
R is a programming environment
Step by step installation guides from YouTube:
RStudio is a development environment for R, and provides many advanced features to improve efficiency and ease of use for R users.
This is the most important panel, because this is where R actually does stuff
Collections of commands (scripts) can be edited and saved.
There are ways to speed up the workflow:
If you don’t select any code, R will just execute the line where the blinking cursor is
Instead of clicking the “Run” icon, you can just use the keyboard shortcut: Ctrl + Enter
7 + 5
## [1] 12
7 - 5
## [1] 2
7 * 5
## [1] 35
7 / 5
## [1] 1.4
Like a calculator, R also has many functions that let you do more sophisticated manipulations.
round(2.05)
## [1] 2
factorial(3) # 3! = 3 * 2 * 1
## [1] 6
sqrt(9) # square root
## [1] 3
Note:
#
(number sign) to comment your codes#
There will be many occasions where you want to learn more about a built-in command or function. Type help(function_name)
or ?function_name
to get more information. For example:
help(factorial) ?factorial
Use two question marks to search the whole help database, especially when you don’t know exactly the function name. For example,
??read
R can recognize different types of data:
Any number, no quotes.
Appropriate for math.
1 + 1 200000 sqrt(9)
class(0.3) # "class" is function that shows the data type of an input
## [1] "numeric"
Any symbols surrounded by single quotes (’) or double quotes (“)
class("Business Intelligence")
## [1] "character"
nchar('Business Intelligence')
## [1] 21
toupper("Business Intelligence")
## [1] "BUSINESS INTELLIGENCE"
paste("Business", "Intelligence", sep="_")
## [1] "Business_Intelligence"
How many characters are in the following strings:
123
Email
meiling_jiang@baylor.edu
Use paste to join the following words so that the result looks like How#are#you?
(hint: ?paste)
We can give names to data objects; these give us variables
Variables are created with the assignment operator, <-
or =
Be careful that R is a case sensitive language. FOO, Foo, and foo are three different variables!
x = 2 # use the equal sign to assign value y <- 3 # you can also use an arrow to assign value x # print the value of a variable by typing its name
## [1] 2
x * y
## [1] 6
The assignment operator also changes values:
x
## [1] 2
x <- 8 x
## [1] 8
Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read
Variable names cannot begin with numbers. Wise to avoid special characters, except for period (.
) and underline (_
)
Example of valid names:
Example of invalid names:
A data frame is a set of vectors of equal length. Consider data frame as an Excel sheet or a database table.
Column names are preserved or guessed if not explicitly set
course <- c("MIS4730", "MIS4710", "MIS4950", "MIS1234") num_of_students <- c(20, 10, 40, 30) data_analytics_minor <- c(TRUE, TRUE, TRUE, FALSE) df <- data.frame(course, n_students=num_of_students, data_analytics_minor, stringsAsFactors=F) df # notice the column names and row names
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 2 MIS4710 10 TRUE ## 3 MIS4950 40 TRUE ## 4 MIS1234 30 FALSE
ncol(df) # number of columns
## [1] 3
nrow(df) # number of rows
## [1] 4
colnames(df) # get column names
## [1] "course" "n_students" "data_analytics_minor"
rownames(df) # get row names
## [1] "1" "2" "3" "4"
You can change column and row names:
df2 <- df # create a copy of df, and name it as "df2" colnames(df2) <- c("col1", "col2", "col3") # assign column names colnames(df2) # they were "course", "n_students", "ba_minor_course"
## [1] "col1" "col2" "col3"
rownames(df2) <- c("row1", "row2", "row3", "row4") # assign row names rownames(df2) # they were "1", "2", "3", "4"
## [1] "row1" "row2" "row3" "row4"
There are many ways you can get values out of a column:
dataframe_name$column_name
df$course
## [1] "MIS4730" "MIS4710" "MIS4950" "MIS1234"
df$n_students
## [1] 20 10 40 30
df
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 2 MIS4710 10 TRUE ## 3 MIS4950 40 TRUE ## 4 MIS1234 30 FALSE
All of a row
df[2,] # row 2
## course n_students data_analytics_minor ## 2 MIS4710 10 TRUE
df
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 2 MIS4710 10 TRUE ## 3 MIS4950 40 TRUE ## 4 MIS1234 30 FALSE
Multiple rows
df[c(1,3),] # rows 1 & 3
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 3 MIS4950 40 TRUE
df[2,1] # row 2, column 1
## [1] "MIS4710"
df[c(3,4),c(1,2)] # rows 3 & 4, columns 1 & 2
## course n_students ## 3 MIS4950 40 ## 4 MIS1234 30
df["2","n_students"] # "2": row name, "n_students": column name
## [1] 10
Rows matching a condition:
df
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 2 MIS4710 10 TRUE ## 3 MIS4950 40 TRUE ## 4 MIS1234 30 FALSE
# Be careful that it's df$course in the brackets, not just course df[df$course == 'MIS4730', ]
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE
df
## course n_students data_analytics_minor ## 1 MIS4730 20 TRUE ## 2 MIS4710 10 TRUE ## 3 MIS4950 40 TRUE ## 4 MIS1234 30 FALSE
Retrieve Course MIS4950’s number of students from the data frame (using the data frame name and column name)
#Please do not use: df[3,2] or df$n_student[3]
Importing data into R is fairly simple. We can use built-in functions or libraries to read data from the following sources:
You can go to Canvas - Dataset Module to download the following data.
Your working directory is the folder on your computer in which you are currently working.
# Show your current working directory getwd() # List the files and folders in the current working directory list.files()
You can set your working directory in the following ways:
You can set your working directory in the following ways:
You can set your working directory in the following ways:
Create a new folder BI-Dataset
for this course; Download the two HousePrices data files and move them into the folder
Set this new folder as your working directory
Run list.files()
in the R console. Do you see these files?
Important: From this point forward and throughout the rest of the semester, I will assume that you are in the BI-Dataset
project when you are developing R code in RStudio.
You can import your dataset in the following ways:
Import the HousePrices.csv file to R
What the data looks like?
head()
is a function allows you to see the top few rows of the data frame
HousePrices <- read.csv("~/Downloads/HousePrices.csv") head(HousePrices, n=3) # n indicates how many rows you'd like to see
## X price lot_size waterfront age land_value construction air_cond fuel ## 1 1 132500 0.09 No 42 50000 No No Electric ## 2 2 181115 0.92 No 0 22300 No No Gas ## 3 3 109000 0.19 No 133 7300 No No Gas ## heat sewer living_area fireplaces bathrooms rooms ## 1 Electric Private 906 1 1.0 5 ## 2 Hot Water Private 1953 0 2.5 6 ## 3 Hot Water Public 1944 1 1.0 8
To read Excel data, we need to use a library: readxl
First, you need to install the package on your computer.
install.packages("readxl") # R will then go online and donwload the libray
library("readxl") # You need to load the library into R before you can use it. df <- read_excel("HousePrices.xlsx", col_names = TRUE) head(df, n=3)
## # A tibble: 3 × 15 ## ...1 price lot_size waterfront age land_value construction air_cond fuel ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> ## 1 1 132500 0.09 No 42 50000 No No Elect… ## 2 2 181115 0.92 No 0 22300 No No Gas ## 3 3 109000 0.19 No 133 7300 No No Gas ## # ℹ 6 more variables: heat <chr>, sewer <chr>, living_area <dbl>, ## # fireplaces <dbl>, bathrooms <dbl>, rooms <dbl>
file_path <- "my_test.csv" write.csv(df, file=file_path, row.names = FALSE) # row.names are usually not very useful. So I like to exclude them. file_path <- "my_test2.csv" write.table(df, file_path, sep=",", row.names = FALSE)
str(df)
## tibble [1,728 × 15] (S3: tbl_df/tbl/data.frame) ## $ ...1 : num [1:1728] 1 2 3 4 5 6 7 8 9 10 ... ## $ price : num [1:1728] 132500 181115 109000 155000 86060 ... ## $ lot_size : num [1:1728] 0.09 0.92 0.19 0.41 0.11 0.68 0.4 1.21 0.83 1.94 ... ## $ waterfront : chr [1:1728] "No" "No" "No" "No" ... ## $ age : num [1:1728] 42 0 133 13 0 31 33 23 36 4 ... ## $ land_value : num [1:1728] 50000 22300 7300 18700 15000 14000 23300 14600 22200 21200 ... ## $ construction: chr [1:1728] "No" "No" "No" "No" ... ## $ air_cond : chr [1:1728] "No" "No" "No" "No" ... ## $ fuel : chr [1:1728] "Electric" "Gas" "Gas" "Gas" ... ## $ heat : chr [1:1728] "Electric" "Hot Water" "Hot Water" "Hot Air" ... ## $ sewer : chr [1:1728] "Private" "Private" "Public" "Private" ... ## $ living_area : num [1:1728] 906 1953 1944 1944 840 ... ## $ fireplaces : num [1:1728] 1 0 1 1 0 1 1 1 0 0 ... ## $ bathrooms : num [1:1728] 1 2.5 1 1.5 1 1 1.5 1.5 1.5 1.5 ... ## $ rooms : num [1:1728] 5 6 8 5 3 8 8 9 8 6 ...
summary(df)
## ...1 price lot_size waterfront ## Min. : 1.0 Min. : 5000 Min. : 0.0000 Length:1728 ## 1st Qu.: 432.8 1st Qu.:145000 1st Qu.: 0.1700 Class :character ## Median : 864.5 Median :189900 Median : 0.3700 Mode :character ## Mean : 864.5 Mean :211967 Mean : 0.5002 ## 3rd Qu.:1296.2 3rd Qu.:259000 3rd Qu.: 0.5400 ## Max. :1728.0 Max. :775000 Max. :12.2000 ## age land_value construction air_cond ## Min. : 0.00 Min. : 200 Length:1728 Length:1728 ## 1st Qu.: 13.00 1st Qu.: 15100 Class :character Class :character ## Median : 19.00 Median : 25000 Mode :character Mode :character ## Mean : 27.92 Mean : 34557 ## 3rd Qu.: 34.00 3rd Qu.: 40200 ## Max. :225.00 Max. :412600 ## fuel heat sewer living_area ## Length:1728 Length:1728 Length:1728 Min. : 616 ## Class :character Class :character Class :character 1st Qu.:1300 ## Mode :character Mode :character Mode :character Median :1634 ## Mean :1755 ## 3rd Qu.:2138 ## Max. :5228 ## fireplaces bathrooms rooms ## Min. :0.0000 Min. :0.0 Min. : 2.000 ## 1st Qu.:0.0000 1st Qu.:1.5 1st Qu.: 5.000 ## Median :1.0000 Median :2.0 Median : 7.000 ## Mean :0.6019 Mean :1.9 Mean : 7.042 ## 3rd Qu.:1.0000 3rd Qu.:2.5 3rd Qu.: 8.250 ## Max. :4.0000 Max. :4.5 Max. :12.000
head(df, n=2)
## # A tibble: 2 × 15 ## ...1 price lot_size waterfront age land_value construction air_cond fuel ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> ## 1 1 132500 0.09 No 42 50000 No No Elect… ## 2 2 181115 0.92 No 0 22300 No No Gas ## # ℹ 6 more variables: heat <chr>, sewer <chr>, living_area <dbl>, ## # fireplaces <dbl>, bathrooms <dbl>, rooms <dbl>
df[1:2,] # df[ rows_you_want, columns_you_want ]
## # A tibble: 2 × 15 ## ...1 price lot_size waterfront age land_value construction air_cond fuel ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> ## 1 1 132500 0.09 No 42 50000 No No Elect… ## 2 2 181115 0.92 No 0 22300 No No Gas ## # ℹ 6 more variables: heat <chr>, sewer <chr>, living_area <dbl>, ## # fireplaces <dbl>, bathrooms <dbl>, rooms <dbl>
tail(df, n=2)
## # A tibble: 2 × 15 ## ...1 price lot_size waterfront age land_value construction air_cond fuel ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> ## 1 1727 125000 0.24 No 48 16800 No No Gas ## 2 1728 111300 0.59 No 86 26000 No No Gas ## # ℹ 6 more variables: heat <chr>, sewer <chr>, living_area <dbl>, ## # fireplaces <dbl>, bathrooms <dbl>, rooms <dbl>
install.packages("tidyverse")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.3 ✔ readr 2.1.4 ## ✔ forcats 1.0.0 ✔ stringr 1.5.0 ## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0 ## ✔ purrr 1.0.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
read.csv()
vs. read_csv()
First, let’s use the usual way to read csv data into R (make sure that you are using the right working directory).
df_original <- read.csv(file="HousePrices.csv", header=TRUE, stringsAsFactors=FALSE) class(df_original) # this is a data frame
## [1] "data.frame"
We can also use read_csv()
from the readr package in the tidyverse to read the same csv file:
df <- read_csv(file="HousePrices.csv") class(df) # this is a tibble (tbl_df)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
If you haven’t done this already, download HousePrices.csv and put it into your working directory.
Run the following code:
install.packages("tidyverse") library(tidyverse) df_original <- read.csv(file="HousePrices.csv", header=TRUE, stringsAsFactors=FALSE) df <- read_csv(file="HousePrices.csv")
And then print these two data sets by entering their names to the console. What are your observations?
df_original df
dplyr provides the following verbs for data manipulation.
We will be using HousePrices.csv throughout this lab. Please download from Canvas (Sample Dataset Module) and put the file in your working directory. (Use getwd
and setwd
to make sure you have a correct working directory.)
select()
: Pick columns by nameselect(df, c(price, lot_size)) # equivalent to df[, c("price", "lotsize")]
## # A tibble: 1,728 × 2 ## price lot_size ## <dbl> <dbl> ## 1 132500 0.09 ## 2 181115 0.92 ## 3 109000 0.19 ## 4 155000 0.41 ## 5 86060 0.11 ## 6 120000 0.68 ## 7 153000 0.4 ## 8 170000 1.21 ## 9 90000 0.83 ## 10 122900 1.94 ## # ℹ 1,718 more rows
Pipe operator %>%
makes the code much readable. Essentially, you send the data through a set of operations and the operations are connected by a pipe.
Note: You must have a space before and after %>%
df %>% select(,c(price, lot_size)) #or more easily: df %>% select(c(price, lot_size))
## # A tibble: 1,728 × 2 ## price lot_size ## <dbl> <dbl> ## 1 132500 0.09 ## 2 181115 0.92 ## 3 109000 0.19 ## 4 155000 0.41 ## 5 86060 0.11 ## 6 120000 0.68 ## 7 153000 0.4 ## 8 170000 1.21 ## 9 90000 0.83 ## 10 122900 1.94 ## # ℹ 1,718 more rows
df %>% select(price:heat) # all columns between price and fuel
## # A tibble: 1,728 × 9 ## price lot_size waterfront age land_value construction air_cond fuel heat ## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> ## 1 132500 0.09 No 42 50000 No No Elec… Elec… ## 2 181115 0.92 No 0 22300 No No Gas Hot … ## 3 109000 0.19 No 133 7300 No No Gas Hot … ## 4 155000 0.41 No 13 18700 No No Gas Hot … ## 5 86060 0.11 No 0 15000 Yes Yes Gas Hot … ## 6 120000 0.68 No 31 14000 No No Gas Hot … ## 7 153000 0.4 No 33 23300 No No Oil Hot … ## 8 170000 1.21 No 23 14600 No No Oil Hot … ## 9 90000 0.83 No 36 22200 No No Elec… Elec… ## 10 122900 1.94 No 4 21200 No No Gas Hot … ## # ℹ 1,718 more rows
# select columns that contain "room" in their column name df %>% select(contains("room"))
## # A tibble: 1,728 × 2 ## bathrooms rooms ## <dbl> <dbl> ## 1 1 5 ## 2 2.5 6 ## 3 1 8 ## 4 1.5 5 ## 5 1 3 ## 6 1 8 ## 7 1.5 8 ## 8 1.5 9 ## 9 1.5 8 ## 10 1.5 6 ## # ℹ 1,718 more rows
Select columns age and land_value from df
filter()
: Keep rows that match criteriadf %>% filter(price < 200000, fuel == "Electric")
## # A tibble: 246 × 15 ## ...1 price lot_size waterfront age land_value construction air_cond fuel ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> ## 1 1 132500 0.09 No 42 50000 No No Elec… ## 2 9 90000 0.83 No 36 22200 No No Elec… ## 3 13 85860 8.97 No 13 4800 No No Elec… ## 4 21 112000 1 No 12 8600 No No Elec… ## 5 22 104900 0.43 No 21 5600 No No Elec… ## 6 25 90400 0.36 No 16 5200 No No Elec… ## 7 60 175000 0.47 No 15 27200 No No Elec… ## 8 65 171000 3.16 No 15 24100 No No Elec… ## 9 69 182000 1 No 16 26600 No No Elec… ## 10 81 114000 0.06 No 14 1600 No No Elec… ## # ℹ 236 more rows ## # ℹ 6 more variables: heat <chr>, sewer <chr>, living_area <dbl>, ## # fireplaces <dbl>, bathrooms <dbl>, rooms <dbl>
df %>% filter(price < 200000, fuel == "Electric") %>% select(price, fuel, air_cond)
## # A tibble: 246 × 3 ## price fuel air_cond ## <dbl> <chr> <chr> ## 1 132500 Electric No ## 2 90000 Electric No ## 3 85860 Electric No ## 4 112000 Electric No ## 5 104900 Electric No ## 6 90400 Electric No ## 7 175000 Electric No ## 8 171000 Electric No ## 9 182000 Electric No ## 10 114000 Electric No ## # ℹ 236 more rows
Note: In the interest of readability, put different operations in separate lines.
After the operations, you may want to save the result as a new data frame. You can then use the new data frame for other analyses.
new_df <- df %>% filter(price < 200000, fuel == "Electric") %>% select(price, fuel, air_cond) nrow(new_df)
## [1] 246
Use pipe %>%
to chain the following two operations
arrange()
: Reorder rowsdf %>% select(price, air_cond, rooms) %>% arrange(price)
## # A tibble: 1,728 × 3 ## price air_cond rooms ## <dbl> <chr> <dbl> ## 1 5000 Yes 6 ## 2 10300 No 4 ## 3 10300 No 6 ## 4 20000 No 4 ## 5 25000 No 6 ## 6 45000 No 4 ## 7 49387 No 5 ## 8 58500 No 5 ## 9 60000 No 6 ## 10 60000 No 7 ## # ℹ 1,718 more rows
use desc()
for a descending order
df %>% select(price, air_cond, rooms) %>% arrange(desc(price))
## # A tibble: 1,728 × 3 ## price air_cond rooms ## <dbl> <chr> <dbl> ## 1 775000 Yes 9 ## 2 775000 Yes 12 ## 3 760000 Yes 12 ## 4 725000 Yes 10 ## 5 670000 No 8 ## 6 658300 Yes 11 ## 7 655000 No 6 ## 8 650000 Yes 10 ## 9 650000 Yes 12 ## 10 649000 Yes 12 ## # ℹ 1,718 more rows
mutate()
: Add new variablesCreate new variables that are functions of existing variables. (But this won’t be saved to original df)
df %>% mutate(bedrooms = rooms - bathrooms) %>% filter(bedrooms>3) %>% select(bedrooms, bathrooms, rooms)
## # A tibble: 1,381 × 3 ## bedrooms bathrooms rooms ## <dbl> <dbl> <dbl> ## 1 4 1 5 ## 2 3.5 2.5 6 ## 3 7 1 8 ## 4 3.5 1.5 5 ## 5 7 1 8 ## 6 6.5 1.5 8 ## 7 7.5 1.5 9 ## 8 6.5 1.5 8 ## 9 4.5 1.5 6 ## 10 11 1 12 ## # ℹ 1,371 more rows
mutate()
: Add new variablesIf we want to save to the original df
#Option 1 df<- df %>% mutate(bedrooms = rooms - bathrooms) #Option 2 df$bedrooms = df$rooms - df$bathrooms
If we want to use mutate to create a new column, expensive_house
, which equals “yes” if the house price
is greater than 150000 and “no” otherwise.
price
and expensive_house
columns and arrange the rows in the descending order of price
.df %>% mutate(expensive_house = ifelse(price>150000, "yes", "no")) %>% arrange(desc(price)) %>% select(price, expensive_house)
## # A tibble: 1,728 × 2 ## price expensive_house ## <dbl> <chr> ## 1 775000 yes ## 2 775000 yes ## 3 760000 yes ## 4 725000 yes ## 5 670000 yes ## 6 658300 yes ## 7 655000 yes ## 8 650000 yes ## 9 650000 yes ## 10 649000 yes ## # ℹ 1,718 more rows
Use mutate to create a new column, old_house
, which equals “TRUE” if the house age
is greater than 40 years and “False” otherwise.
After that, select the price
, age
and old_house
columns and arrange the rows in the descending order of price
.
## # A tibble: 1,728 × 3 ## price age old_house ## <dbl> <dbl> <chr> ## 1 775000 5 FALSE ## 2 775000 31 FALSE ## 3 760000 2 FALSE ## 4 725000 3 FALSE ## 5 670000 121 TRUE ## 6 658300 3 FALSE ## 7 655000 55 TRUE ## 8 650000 2 FALSE ## 9 650000 3 FALSE ## 10 649000 10 FALSE ## # ℹ 1,718 more rows
summarise()
: Reduce variables to valuesgroup_by()
creates the groups that will be operated onsummarise()
uses the provided aggregation function to summarise each groupdf %>% group_by(air_cond) %>% summarise(avg_price = mean(price))
## # A tibble: 2 × 2 ## air_cond avg_price ## <chr> <dbl> ## 1 No 187022. ## 2 Yes 254904.
You can have multiple summary/aggregate statistics:
df %>% group_by(air_cond) %>% summarise(n_house = n(), # n() gives the number of rows in each group avg_price = mean(price))
## # A tibble: 2 × 3 ## air_cond n_house avg_price ## <chr> <int> <dbl> ## 1 No 1093 187022. ## 2 Yes 635 254904.
Some houses have a drive way, some don’t. Use group_by
and summarise
to obtain the following:
## # A tibble: 2 × 4 ## construction n_house min_lotsize max_lotsize ## <chr> <int> <dbl> <dbl> ## 1 No 1647 0 12.2 ## 2 Yes 81 0.01 3.47
Once you finish the following tasks, please put everything in one single R file with the file name assignment1.R
(.R is the file extension) and upload it to Canvas.
In addition, lab assignments will be graded based on:
## name female age ## row_1 Alex FALSE 20 ## row_2 Bob FALSE 25 ## row_3 Claire TRUE 30 ## row_4 Denise TRUE 35
Use the HousePrices.csv dataset and import it as a tibble with read_csv function.
Show price
, air_cond
, heat
, fireplaces
for houses that have no fireplaces
First create a new variable bedroom
using rooms
minusbathrooms
. Then create a new variable price_per_bedroom
which is price divided by the number of bedrooms. Show only price
, bedrooms
, and price_per_bedroom
columns and arrange the rows in the descending order of price_per_bedroom
Create a new variable has_4_or_more_bedrooms
which is TRUE
if the house has 4 or more bedrooms and FALSE
otherwise. Use this variable and group_by
and summarise()
to find how many houses have 4 or more bedrooms and how many don’t