Electronic Shipment Analysis

Author

Joe Galuppo

{r setup, include=FALSE} # Set global options to mute errors and warnings options(quarto.error = FALSE, quarto.warning = FALSE)


Electronics shipping data

In this document, we are going to explore reviews for two electronic shipping companies. We are scrapping a website for reviews on both to better understand how their customer view them. We will conduct a sentiment analysis to compare reviews and she which company stands out more and if there are anything in the reviews that will help us understand why.

Analysis

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'rvest'


The following object is masked from 'package:readr':

    guess_encoding



Attaching package: 'magrittr'


The following object is masked from 'package:purrr':

    set_names


The following object is masked from 'package:tidyr':

    extract
Warning: package 'textdata' was built under R version 4.3.2

Attaching package: 'textdata'

The following object is masked from 'package:httr':

    cache_info
Warning: package 'tidytext' was built under R version 4.3.2

To begin, we want to find out which company gets the more positive reviews overall to give us better insights on which company we would rather use. Using a graph and table we will be able to compare which company’s reviews are overall more positive, which words show up in the reviews of each company, and which words are used most frequently. We will be limiting this to only words that appear at least 5 times in the reviews of each company. Here we are using the Bing lexicon we can score our words and then compare them.

Joining with `by = join_by(word)`
`summarise()` has grouped output by 'Business', 'review_id'. You can override
using the `.groups` argument.
Joining with `by = join_by(word)`
`summarise()` has grouped output by 'Business', 'word'. You can override using
the `.groups` argument.

As you can see each company appears to be overall more net positive. Mimeo reviewers more consistantly say that Mimeo is easy to use while Electronic Express reviewers often say that the company is fast and helpful.

`summarise()` has grouped output by 'Business'. You can override using the
`.groups` argument.
# A tibble: 4 × 3
# Groups:   Business [2]
  Business           sentiment total_n
  <chr>              <chr>       <int>
1 Electronic express negative       95
2 Electronic express positive      470
3 Mimeo              negative      114
4 Mimeo              positive      558

Here we can see that overall, Mimeo has more sentiment in their reviews. However, if we look at their ratios, Mimeo has a 4.89 positive to negative ratio while Electronic Express has a 4.94 positive to negative ratio. This tells us overall, Electronic Express’s reviewers are more positive. I believe that if you want fast and helpful service you should always go to Electronic Express and if you want easy service you should go with Mimeo.

Next, we will be looking at the positivity scores by month for each company. We want to do this to understand when the best time to use each company would be, according to reviewers scores for each month.

`summarise()` has grouped output by 'Business', 'review_time', 'review_id'. You
can override using the `.groups` argument.
Joining with `by = join_by(word)`
`summarise()` has grouped output by 'Business', 'review_time', 'word'. You can
override using the `.groups` argument.
`summarise()` has grouped output by 'Business', 'month'. You can override using
the `.groups` argument.

Here you can see the months June through December. Granted, the months June and December have a lot less values than the rest of the months because of the data that was scrapped. Looking at this graph however, we can see that Mimeo beats out Electronic Express 4 out of 7 months of the year according to their reviewers sentiments. In earlier months, people seem to enjoy Mimeo better and then in later months people would like Electronic Express better. Except for December.

Finally, we will be looking at which sentiments each company has overall using the NRC lexicon. We want to do this so we can steer our decision to use one company over the other. We will show this in a graph to easily compare which sentiments are higher.

`summarise()` has grouped output by 'Business', 'review_id'. You can override
using the `.groups` argument.
Joining with `by = join_by(word)`
Warning in inner_join(., nrc): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 7 of `x` matches multiple rows in `y`.
ℹ Row 2924 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
`summarise()` has grouped output by 'Business', 'sentiment'. You can override
using the `.groups` argument.
`summarise()` has grouped output by 'sentiment'. You can override using the
`.groups` argument.

Finally, we can see what emotions are most attributed to each business’s reviews. Overall, Mimeo seems to be more positive while simultaneously being more negative. Reviewers tell us that Electronic Express is can be trusted more, however, they lack on a lot of the rest of the emotions. Honestly, with a shipping company not having a lot of emotion from your customers can be a good thing because that means they had nothing to complain about and the job got done right. Also comparing the positive to negative we can see both companies make people happy overall. If I were to order from one of these two companies, I would use Electronic express because they endue more trust in their customers.