# import packages 
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(trelliscopejs)

Problem 1

For a data set of your choosing, make a faceted plot using the trelliscopejs package. You may make any type of plot; scatter plot, histogram, etc. but, as mentioned in the discussion below, you must explain why you chose this plot and what you are investigating about the variable you are graphing.

The trelliscope plot must include one cognostic measure of your own. Include a description of what it is and what information this measure gives.

# load in phone sales data
df <- read.csv("sales.csv")

# view structure
str(df)
## 'data.frame':    3114 obs. of  11 variables:
##  $ Brands        : chr  "SAMSUNG" "Nokia" "realme" "Infinix" ...
##  $ Models        : chr  "GALAXY M31S" "3.2" "C2" "Note 5" ...
##  $ Colors        : chr  "Mirage Black" "Steel" "Diamond Black" "Ice Blue" ...
##  $ Memory        : chr  "8 GB" "2 GB" "2 GB" "4 GB" ...
##  $ Storage       : chr  "128 GB" "16 GB" "" "64 GB" ...
##  $ Camera        : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Rating        : num  4.3 3.8 4.4 4.2 4.6 4 NA 4.6 4.2 4.3 ...
##  $ Selling.Price : int  19330 10199 6999 12999 49900 2199 99900 42999 20400 21736 ...
##  $ Original.Price: int  20999 10199 7999 12999 49900 2199 99900 47900 20400 22999 ...
##  $ Mobile        : chr  "SAMSUNG GALAXY M31S" "Nokia 3.2" "realme C2" "Infinix Note 5" ...
##  $ Discount      : int  1669 0 1000 0 0 0 0 4901 0 1263 ...
# format prices into correct dollar amounts 
df$Selling.Price <- df$Selling.Price / 100
df$Original.Price <- df$Original.Price / 100
df$Discount <- df$Discount / 100

# add a cognostic for the discount percent
df$Discount.Percent <- cog(round(df$Discount / df$Original.Price, 2), desc = "Discount Percent")

# trelliscope plot
ggplot(df, aes(x = Selling.Price, y = Rating)) + 
  geom_point(color = "lightseagreen") + 
  labs(x = "Selling Price", y = "Rating") + 
  geom_smooth(method = "lm", color = "hotpink4", se = FALSE) +
  facet_trelliscope(~ Brands, 
                    nrow = 2, ncol = 3, 
                    name = "Phone Rating vs Price By Brand", 
                    desc = "Examining the correlation between price and customer satisfaction", 
                    scales = "sliced", 
                    path = ".",
                    self_contained = TRUE)


Description 2-3 paragraphs.

Describe the data set. Explain the variable you are graphing in your plots and the reason you are investigating with it. Discuss the reason/motivation you chose the variable to facet on, and what insight or trend you are attempting to investigate. Discuss any challenges you had in making the graphs and how you dealt with these challenges. Name at least one cognostic measure (this can include the cognostic you created or be different) the reader could investigate, and explain any insight they might gain from it.

I am using a dataset that has information about the sales of different phone brands. It includes information such as the model, color, memory, storage, camera, rating, original price, discount, and selling price. I specifically wanted to investigate the relationship between the price that a customer paid for a phone and their rating of it. Phones these days, especially for specific brands, are getting very expensive, so I wanted to see if you are actually paying for a better experience. I expected to see that the more a customer paid for a phone, the more satisfied they would be and therefore would give a higher rating. I choose to facet this by brand so that I could look into this relationship for different brands. From what I found this is true for most of the brands. Some of them have much stronger positive correlations such as ASUS, HTC, and LG. Roughly 6 of the brands have a slight negative correlation between rating and price, showing that the more a customer pays the less satisfied they are. I am heasitant to draw a generalized conclusion from this data because there is not a ton of data point for all of the brands. In order to confidently reject or accept my hypothesis, I would want to collect some further data for some brands.

My biggest challenge for this assignment was finding a dataset. I struggled to find one that I felt was big enough to warrent a faceted trelliscop plot, and one that included a good categorical variable to facet on with enough quantitave ones to plot and create a cognostic for. Once I found this data I played around with plotting different combinations of the variables to explore the data and see what insights I could extract. I originally wanted to try to make a cognostic for the storage variable. I tried creating 3 bins (low, medium, high) for the 27 different storage values, but that did not end up working out how I wanted it to. I wanted to see if brands that offered higher storage options had higher prices and or higher ratings. I also wanted readers to be able to filter the dataset to see which brands did offer higher storage options because that is an important feature for a lot of people. Right now, since the storage variable is of the character datatype and is not ordered, effectively sorting on storage is not possible right now.

I chose to make a discount percentage cognostic so the data could be sorted by the brands who offer the highest percentage discounts. This would be useful if someone were on a budget and wanted to know which brands to look into for a discounted phone. This would also be useful if you were looking into purchasing a phone from a specifc brand and wanted to see if the phones typically go on a discount and by how much they usually are discounted for. Another useful cognostic that is available is sorting by the mean rating. This would allow the reader to see which brands on average have the highest rating, which would help inform their decision about the next phone that they are going to purchase.


grading: trelliscope plot[25 points], discussion[25 points]


Note: you can add a url directly to the text and it will be active in the html (and word document if you knit to that)

Example: https://www.google.com

If you want to be fancy and make your url active text, you can do this