Summary Case Study
This is the pack of case study including the R Markdown code and the
analysis results.
The pack can be accessed easily via a HTML link: <?????>.
Load library needed
Create R Project
knitr::include_graphics("C:\\Users\\bewad\\OneDrive\\Desktop\\case study\\case study - folder structure.png")
An R Project (Rproj) is a specific directory that RStudio recognizes as
a project home. There are a lot of advantages to manage a project with
“Rproj”.
Work with Raw Data
Load the raw data
clean the data
Normally the raw data is dirty and should be cleaned before the analysis. For this case study 3 common and simple clean actions for illustration.
# Remove all the empty rows and columns
raw_data <- raw_data[
!apply(is.na(raw_data), 1, all),
!apply(is.na(raw_data), 2, all)
]
# Remove both leading and trailing whitespace, replace all internal whitespace with a single space.
raw_data <- raw_data |>
mutate(across(where(is.character), str_squish))
# Remove all the duplicates
raw_data <- distinct(raw_data) Explore the data
The data structure should be understood before running the analysis. For this case analysis, using “summary” function for illustration.
## hotel_address additional_number_of_scoring review_date
## Length:700899 Min. : 1.0 Length:700899
## Class :character 1st Qu.: 169.0 Class :character
## Mode :character Median : 342.0 Mode :character
## Mean : 498.4
## 3rd Qu.: 660.0
## Max. :2682.0
## NA's :185687
## average_score hotel_name reviewer_nationality negative_review
## Min. :5.2 Length:700899 Length:700899 Length:700899
## 1st Qu.:8.1 Class :character Class :character Class :character
## Median :8.4 Mode :character Mode :character Mode :character
## Mean :8.4
## 3rd Qu.:8.8
## Max. :9.8
## NA's :185687
## review_total_negative_word_counts total_number_of_reviews positive_review
## Min. : 0.00 Min. : 43 Length:700899
## 1st Qu.: 2.00 1st Qu.: 1161 Class :character
## Median : 9.00 Median : 2134 Mode :character
## Mean : 18.54 Mean : 2745
## 3rd Qu.: 23.00 3rd Qu.: 3633
## Max. :408.00 Max. :16670
## NA's :185687 NA's :185687
## review_total_positive_word_counts total_number_of_reviews_reviewer_has_given
## Min. : 0.00 Min. : 1.00
## 1st Qu.: 5.00 1st Qu.: 1.00
## Median : 11.00 Median : 3.00
## Mean : 17.78 Mean : 7.16
## 3rd Qu.: 22.00 3rd Qu.: 8.00
## Max. :395.00 Max. :355.00
## NA's :185687 NA's :185687
## reviewer_score tags days_since_review lat
## Min. : 2.5 Length:700899 Length:700899 Min. :41.33
## 1st Qu.: 7.5 Class :character Class :character 1st Qu.:48.21
## Median : 8.8 Mode :character Mode :character Median :51.50
## Mean : 8.4 Mean :49.44
## 3rd Qu.: 9.6 3rd Qu.:51.52
## Max. :10.0 Max. :52.40
## NA's :185687 NA's :188955
## lng order_id product quantity_ordered
## Min. :-0.37 Length:700899 Length:700899 Length:700899
## 1st Qu.:-0.14 Class :character Class :character Class :character
## Median : 0.00 Mode :character Mode :character Mode :character
## Mean : 2.82
## 3rd Qu.: 4.83
## Max. :16.43
## NA's :188955
## price_each order_date purchase_address
## Length:700899 Length:700899 Length:700899
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
Thank you
Thank you Lisa and Rob for preparing the study case! I have thoroughly enjoyed the scripting excercise process.