The database we selected

Our team selected data set named Mobile Phones with 1105 pieces, then we assigned and set a data set called Phones.

We first selected the Score column of the data set and used hist () and dotchart () to quickly get an initial view of the data to get an overview. Based on this, we think that this column of data is relatively consistent with the characteristics of normal distribution.

Next, we use the ggplot () function for customizable data analysis. The first variable of this function, data=, is used to store our data elements; and the second variable, mapping=, is used to select the Star column to map to the X-axis aesthetics, finally forming the histogram and dot plot based on the ggplot () function, as below:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.

It is relatively easy to see that after using the ggplot( ) function. Compared to the ordinary dotchart, the ggplot2 dotplot can be more intuitively observed the degree of dispersion of the Stars column data for us. For a more rigorous analysis of the normality of the data, we next use the geom_jitter() function and the geom_boxplot function to make the relevant boxplot based on the quartile.

As the result, we found that most of the phone stars are concentrated between 3.7 and 4.6.

This is a bit more intuitive, and such a cool boxplot shows the dispersion and distribution of the Stars column. 25% to 50% of the data sets range from 4.0 to about 4.3. Between 50% and 75% of the data points were narrower and concentrated between 4.3 to 4.4. It means that 50% of phones stars between 4.0 and 4.4. The remaining 50% of the data, the whiskers portion of the boxplot, is distributed in other scoring ranges. The whiskers portion area is much larger than the box area, which also contains 50% of the data. This shows the normality of this column of data. Through various data visualization methods, our team has intuitively understood the normality of Stars data. We believe that this group of data belongs to normal distribution rather than Uniform distribution. To further verify the validity of our conclusion, we first get some relevant data through the Summary () function.

The summary of Phone stars

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.800   4.000   4.300   4.182   4.400   5.000

Data ranges from a minimum of 2.8 stars to 5 stars. Since the median of 4.3 stars is greater than the mean

of 4.182 stars, we also know that there are many data closer to the minimum than to the maximum. And the first quartile is 4.0, the third quartile is 4.4, which is consistent with our analysis of Boxplot above. Then we calculated the standard deviation of the sample and the data points within one standard deviation and two standard deviations of the mean by a quick counting approach:

Standard Deviation

## [1] 0.2914782
lower_bound <- mean(phone$Stars)-sd(phone$Stars)
upper_bound <- mean(phone$Stars)+sd(phone$Stars)
index <- phone$Stars > lower_bound &
    phone$Stars < upper_bound
sum(index)/nrow(phone)
## [1] 0.7744565
lower_bound <- mean(phone$Stars)-2*sd(phone$Stars)
upper_bound <- mean(phone$Stars)+2*sd(phone$Stars)
index <- phone$Stars > lower_bound &
    phone$Stars < upper_bound
sum(index)/nrow(phone)
## [1] 0.9673913

As a result, this is roughly consistent with the 68% rule and 95% rule of perfect normal distribution, verifying the validity of our conclusion on the normality of this set of data.

Thank you!

Thank you! Thank you! Thank you!