R Project 2 Report
INTRODUCTION
1. Descriptive statistics refers to
data analysis that serves to explain, show, or summarize data in a
comprehensible way, allowing patterns to emerge from the
data.Descriptive statistics are highly important since it would be
difficult to visualize what the data was indicating if we simply
presented it as raw data, especially if there was a lot of it.
Descriptive statistics allow us to portray data in a more meaningful
form, allowing for easier comprehension of the data.
Inferential
statistics are techniques that allow us to use these samples to make
generalizations about the populations from which the samples were drawn.
It is, therefore, important that the sample accurately represents the
population. The process of achieving this is called sampling (sampling
strategies are discussed in detail in the section, Sampling Strategy, on
our sister site). Inferential statistics arise out of the fact that
sampling naturally incurs sampling error and thus a sample is not
expected to perfectly represent the population.
2. Data
presentation is not only utilized to make your Independent Investigation
look more visually appealing; effective data presentation will also make
reading the results more fascinating to the reader. Instead, the main
reason for extracting and presenting the relevant data from your results
is to show the reader and marker of your study that you can select the
data most appropriate for answering your research questions and
graphically work with the data to allow it to highlight its own inherent
correlations and relationships. While a lengthy data table may
technically do the same purpose, forcing the reader to ‘discover’ the
pertinent data among a mess of numbers is a symptom of poor
research.
3.Banking firms, like financial institutions, use R
programming for credit risk presentation and many sorts of risk
analysis. Banks make extensive use of the Mortgage Haircut Model, which
allows them to take control of the property in the event of a credit
failure. Modeling for a home loan comprises the following: Deals worth
circulation, The cost of doing business is volatile and The estimated
shortage is calculated.
For these reasons, R programming is
typically used in conjunction with property tools such as SAS. R is used
by the bank for financial reporting. The information researchers can use
R to break out money-related losses and employ R’s perceptive devices.
ANALYSIS
Task 1: Presenting a table
with first and last 5 observations
• Here, a table
consisting of first and last 5 observations is presented in a
presentable format
head(M2Data, 5)
## # A tibble: 5 × 10
## Region Market Company_Segment Product_Category Product_SubCate… Price Quantity
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Centr… USCA Consumer Technology Phones 222. 2
## 2 Ocean… Asia … Corporate Furniture Chairs 3709. 9
## 3 Ocean… Asia … Consumer Technology Phones 5175. 9
## 4 Weste… Europe Home Office Technology Phones 2893. 5
## 5 Weste… Africa Consumer Technology Copiers 2833. 8
## # … with 3 more variables: Sales <dbl>, Profits <dbl>, ShippingCost <dbl>
tail(M2Data, 5)
## # A tibble: 5 × 10
## Region Market Company_Segment Product_Category Product_SubCate… Price
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Eastern Asia Asia … Consumer Furniture Tables 2615.
## 2 Western US USCA Corporate Office Supplies Appliances 69.5
## 3 Oceania Asia … Consumer Technology Copiers 637.
## 4 South America LATAM Corporate Furniture Bookcases 2751.
## 5 Southeastern … Asia … Corporate Technology Phones 1587
## # … with 4 more variables: Quantity <dbl>, Sales <dbl>, Profits <dbl>,
## # ShippingCost <dbl>
rbind(head(M2Data,5), tail(M2Data, 5))
## # A tibble: 10 × 10
## Region Market Company_Segment Product_Category Product_SubCate… Price
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Central US USCA Consumer Technology Phones 222.
## 2 Oceania Asia … Corporate Furniture Chairs 3709.
## 3 Oceania Asia … Consumer Technology Phones 5175.
## 4 Western Euro… Europe Home Office Technology Phones 2893.
## 5 Western Afri… Africa Consumer Technology Copiers 2833.
## 6 Eastern Asia Asia … Consumer Furniture Tables 2615.
## 7 Western US USCA Corporate Office Supplies Appliances 69.5
## 8 Oceania Asia … Consumer Technology Copiers 637.
## 9 South America LATAM Corporate Furniture Bookcases 2751.
## 10 Southeastern… Asia … Corporate Technology Phones 1587
## # … with 4 more variables: Quantity <dbl>, Sales <dbl>, Profits <dbl>,
## # ShippingCost <dbl>
knitr::kable(rbind(head(M2Data,5), tail(M2Data, 5)))
| Region | Market | Company_Segment | Product_Category | Product_SubCategory | Price | Quantity | Sales | Profits | ShippingCost |
|---|---|---|---|---|---|---|---|---|---|
| Central US | USCA | Consumer | Technology | Phones | 221.98 | 2 | 443.96 | 62.15 | 40.77 |
| Oceania | Asia Pacific | Corporate | Furniture | Chairs | 3709.40 | 9 | 33384.60 | -288.77 | 923.63 |
| Oceania | Asia Pacific | Consumer | Technology | Phones | 5175.17 | 9 | 46576.53 | 919.97 | 915.49 |
| Western Europe | Europe | Home Office | Technology | Phones | 2892.51 | 5 | 14462.55 | -96.54 | 910.16 |
| Western Africa | Africa | Consumer | Technology | Copiers | 2832.96 | 8 | 22663.68 | 311.52 | 903.04 |
| Eastern Asia | Asia Pacific | Consumer | Furniture | Tables | 2614.69 | 7 | 18302.83 | -821.96 | 203.26 |
| Western US | USCA | Corporate | Office Supplies | Appliances | 69.48 | 1 | 69.48 | 20.84 | 12.04 |
| Oceania | Asia Pacific | Consumer | Technology | Copiers | 636.78 | 2 | 1273.56 | 286.50 | 203.20 |
| South America | LATAM | Corporate | Furniture | Bookcases | 2751.20 | 10 | 27512.00 | 110.00 | 203.13 |
| Southeastern Asia | Asia Pacific | Corporate | Technology | Phones | 1587.00 | 3 | 4761.00 | -76.56 | 203.08 |
• As a data analyst, we are expected to work with large datasets.
So, it is really hard to see entire dataset at once. The head and tail
funtion enables us to get a glance of a large dataset
Task 2:Finding Categories of Market and their
frequencies
•We intend to find the categories of market and
their frequency distribuiton
knitr::kable(rbind(table(M2Data$Market)))
| Africa | Asia Pacific | Europe | LATAM | USCA |
|---|---|---|---|---|
| 54 | 365 | 248 | 133 | 200 |
• Frequency indicates the number of occurences of a value in a
data. From the above table we get an idea of how many times a sale has
been occured in particular category of market
Task3: Plotting a Bar Graph for Market and their
Frequencies
• A Bar graph of the market and the
frequencies is plotted adding colors with the help of RcolorBrewer
library. Using text() function, the values of frequencies are
shown.
table1 = table(M2Data$Market)
plot2 = barplot(table1, horiz = TRUE, xlab = 'frequency', ylab = 'category', xlim = c(0,400), col = brewer.pal(6, "Accent"))
text(x=table(M2Data$Market),plot2,table(M2Data$Market), cex = 0.8, pos = 3)
• Bar graph is a great way of visualizing data. As we can see above,
it is easy to understand the frequency of a particular category of the
market using different colors which makes the observations more visually
appealing
Task4: Analysing product category and frequency of African
market using Pie Chart
• Here, we Analyse the frequency of
product categories of African market and visualize it using a Pie
Chart
t4Africa = dplyr::filter(M2Data, Market=="Africa")
tablet4 = table(t4Africa$Product_Category)
pie(tablet4)
• Using above Pie chart, we can say that the market share of
technology products is highest in the African market followed by
Furniture and then Office Supplies. Pie charts is one the simple ways to
visualize the data and used to make sense to show parts-to-whole
relationship for categorical data.
Task5: Analyzing product subcategory and their
frequencies
•Using a Barplot to visualize subcategory of
products and their frequencies in African market
task5_table = table(t4Africa$Product_SubCategory)
t5bar = barplot(task5_table)
text(y = table(t4Africa$Product_SubCategory), t5bar, table(t4Africa$Product_SubCategory), cex = 0.8, pos = 3)
• It is clear from the barplot that the sale of phones in African
market is the largest compared to other products in the subcategory.
Barplot provides a systematic understanding of data and the frequency
values.
Task6: Improvization of BarPlot of subcategory and their
frequencies
• Adding labels,colors and setting margins for
eye catching visualization
barplot(task5_table, xlab="Subcategory",ylab="Frequency", col = brewer.pal(6, "Accent"),
main="Product Subcategory")
par(mar=c(1, 1.2, 1, 1))
• BarPlots have a good scope of adding attractive features for
proper visualization.With the ColorBrewer library, we can add colors to
each bar in the Barplot. Also, adding labels and setting margins is
important to view the observations in a systematic way.
Task7: Finding Mean Sales per Subcategory
• In
this task we find the Average Sales per Subcategory and use a dot plot
for observations
Mean_Sales = tapply(t4Africa$Sales, t4Africa$Product_SubCategory, mean)
knitr::kable(Mean_Sales)
| x | |
|---|---|
| Accessories | 6478.980 |
| Appliances | 8601.975 |
| Bookcases | 10441.840 |
| Chairs | 19306.760 |
| Copiers | 26338.286 |
| Machines | 6991.880 |
| Phones | 15001.698 |
| Storage | 21289.200 |
| Tables | 14738.970 |
dotchart(Mean_Sales)
• Dot plots are one of the simple statistical plots suitable for
small sized datasets. It is convinent to use dot plot when we are
supposed to analyse categorical data and get precise insights.
Task8: Finding Total sales per region in the African
market
• We suppose to interpret the total sale in a
particular region in the African market
Total_Sales = tapply(t4Africa$Sales, t4Africa$Region, sum)
knitr::kable(Total_Sales)
| x | |
|---|---|
| Central Africa | 205523.8 |
| Eastern Africa | 96575.4 |
| North Africa | 178792.3 |
| Southern Africa | 161749.4 |
| Western Africa | 116827.0 |
barplot(Total_Sales, xlab="Region",ylab="Sale", col = brewer.pal(6, "Accent"),
main="Total Regional Sales")
par(mai=c(1, 0.6, 1, 1))
• According to the statistics, the total sale in particular
regions is seen with reference to the barplot. Eastern Africa should be
the most focused area when it comes to increasing the total sale.
Task9: Finding average shipping cost per region in the African
market
• Average shipping cost per region is analyzed in
the North African market
Mean_Shipping = tapply(t4Africa$ShippingCost, t4Africa$Region, mean)
knitr::kable(Mean_Shipping)
| x | |
|---|---|
| Central Africa | 354.3857 |
| Eastern Africa | 386.9600 |
| North Africa | 326.8583 |
| Southern Africa | 325.5718 |
| Western Africa | 351.1562 |
barplot(Mean_Shipping, xlab="Region",ylab="Shipping Cost", col = brewer.pal(6, "Accent"),main="Average Shipping Cost")
•As we can see from the barplot that average shipping cost of
Eastern Africa is more but from previous plot, total sale is lowest. So,
a proper strategy to optimize the sale and shipping is needed.
Task10: Differences on data type designations used in
R:
•There are several classes classified as “numeric,” the
two most common of which are double (for double precision floating point
values) and integer.
•R will automatically convert between numeric
classes when necessary, so it makes little difference to the average
user whether the value 3 is now stored as an integer or as a double.
•Because most math is done with double precision, that is frequently
the default storage.Because integers take up less storage capacity, we
may choose to save a vector as integers if we know it will never be
changed to doubles (for ID values or indexing).
•However, if they
are going to be used in any math that would convert them to doubles, it
is probably best to store them as doubles from the start.
Task11: Analyzing Profits
• Here, we visualize
the profits through a boxplot and a Histogram
par(mfcol=c(2,1),
mai = c(1,1,0.2,0.4),
mar = c(4,4,0.5,2))
boxplot(M2Data$Profits,
horizontal = T)
hist(M2Data$Profits,
breaks = 50,
main = "Histogram",
xlab = "Profits",
col = brewer.pal(12, "Set3"),
las = 1,
ylim = c(0,100))
• Using above plots we can say that the median profits is in the
range of 0 to less that 1000. Histogram and boxplot provides great
visualization when it comes to visualizing big data.
Task12: Finding profits in the Latin American Market
• We intend to find the profits in the Latin American Market using
boxplot and a histogram
t13LATAM = dplyr::filter(M2Data, Market=="LATAM")
par(mfcol=c(2,1),
mai = c(1,1,0.2,0.4),
mar = c(4,4,0.5,2))
LATAM_profits = (t13LATAM$Profits)
boxplot(LATAM_profits,main = "Boxplot",
horizontal = T)
hist(LATAM_profits,
breaks = 50,
main = "Histogram",
xlab = "Profits ",
col = brewer.pal(12, "Set3"),
las = 1,
ylim = c(0,20),
xlim = c(-2000,1500))
• In the Latin American market, the maximum profits are in the range
of 0 to 500, though there are some outliers as well. Outliers can be
seen specifically with a boxplot and histogram shows the profits
distribution.
Task13: Finding total sales in Latin American market
• We are supposed to find the total amount of sales in the Latin
American market
Total_Sales1 = tapply(t13LATAM$Sales, t13LATAM$Region, sum)
knitr::kable(Total_Sales1)
| x | |
|---|---|
| Caribbean | 196775.2 |
| Central America | 924226.2 |
| South America | 457623.3 |
• With the above table we can depict the total sales in the Latin
American market with three sub regions that are Carribean, Central
America and South America, with lowest sale in Caribbean region and
highest in Central America
Task14: Find Regional Profits in the Latin American
market
•We find Regional Profits of the Latin American
market and visualizing it with the help of a boxplot
boxplot(Profits~Region,data=t13LATAM,
xlab="Regions", ylab="Profits made",main = "Boxplot",
horizontal = F)
• Using a boxplot, we find the Profits in a particular region in the
Latin American market. The median of central america is lesser than
other regions.
Task15:Probabilty distribution table for Subcategories of
products
•A table containing frequency, cumulative
frequency, probability and cumulative probability of product
subcategories
t15 = M2Data$Product_SubCategory%>%
table()%>%
as.data.frame()%>%
rename(Coloumn1 = Freq)%>%
mutate(coloumn2 = cumsum(Coloumn1),
coloumn3 = Coloumn1/nrow(M2Data),
coloumn4 = cumsum(coloumn3))
colnames(t15)<-c('Product Type','Frequency','Cum Frequency','Probability','Cum Probability')
knitr::kable(t15,
digits = 2,
caption = "Probability of product Subcategory",
format = "html",
table.attr = "style='width:40%;'",
align = 'c')%>%
kable_classic(bootstrap_options = "striped",
full_width = TRUE,
position = "center",
font_size = 12)
| Product Type | Frequency | Cum Frequency | Probability | Cum Probability |
|---|---|---|---|---|
| Accessories | 38 | 38 | 0.04 | 0.04 |
| Appliances | 125 | 163 | 0.12 | 0.16 |
| Art | 18 | 181 | 0.02 | 0.18 |
| Binders | 38 | 219 | 0.04 | 0.22 |
| Bookcases | 130 | 349 | 0.13 | 0.35 |
| Chairs | 95 | 444 | 0.10 | 0.44 |
| Copiers | 126 | 570 | 0.13 | 0.57 |
| Envelopes | 2 | 572 | 0.00 | 0.57 |
| Fasteners | 6 | 578 | 0.01 | 0.58 |
| Furnishings | 14 | 592 | 0.01 | 0.59 |
| Labels | 6 | 598 | 0.01 | 0.60 |
| Machines | 52 | 650 | 0.05 | 0.65 |
| Paper | 32 | 682 | 0.03 | 0.68 |
| Phones | 179 | 861 | 0.18 | 0.86 |
| Storage | 45 | 906 | 0.04 | 0.91 |
| Supplies | 7 | 913 | 0.01 | 0.91 |
| Tables | 87 | 1000 | 0.09 | 1.00 |
•A table containing frequency, cumulative frequency, probability
and cumulative probability of product subcategories is shown above.
Task 16: Plotting Frequencies and Probability of product
Subcategory
• We intend to plot pie charts for frequency
and probability and Barplots for cumulative frequency and cumulative
probability
par(mfrow=c(2,2))
pie(t15$Frequency, radius = 1,
col = brewer.pal(ncol(t15), "Paired"),
border = "white",
lty = 1,
cex=0.9,
font = 3)
barplot(t15$`Cum Frequency`, xlab="Product Subcategory",ylab="Frequency", col = brewer.pal(6, "Accent"),main="Cumulative Frequency")
pie(t15$Probability, radius = 1,
col = brewer.pal(ncol(t15), "Paired"),
border = "white",
lty = 1,
cex=0.9,
font = 3)
barplot(t15$`Cum Probability`, xlab="Product Subcategory",ylab="Probability", col = brewer.pal(6, "Accent"),main="Cumulative Probability")
•We can visualize the observations of frequency and probability via
Pie charts for frequency and probability and Barplots for cumulative
frequency and cumulative probability.
Task17: Average sale of products in company segment
• We intend to find average sales of products for company segments
and visualize it with help of a barplot
Segment_Sale = tapply(M2Data$Sales, M2Data$Company_Segment, mean)
barplot(Segment_Sale, xlab="Segment",ylab="Sale", col = brewer.pal(6, "Accent"),
main="Average Segment Sale")
• We infer that the average sale of corporate products is higher
than consumer and Home office segment. Thus, to increase sale, Research
and Development and a good marketing strategy in consumer and Home
Office products is needed.
CONCLUSION
In this R Project, I learned how
to analyze data, show data visually, and do basic calculations using
explanatory analysis. I have a thorough understanding of R Markdown and
its purpose. This assignment, I believe, helped me develop a solid
understanding of analytics and R programming. Several factors influence
market growth, sales, and profitability while reviewing market
statistics. To reach a conclusion and move forward with strategy
development, a thorough grasp of all elements is required. To gain a
basic comprehension of the circumstance, visually attractive graphs are
required. So, for each Analysis, we created a graphical representation
using Bar plots, Box plots, Histograms, and Pie charts. With these data,
we can conclude that a robust sales strategy is required in the African
market.We can sell more technology products in African regions because
the sale of technology products is the largest. Phones are also the most
popular electronic gadgets. On the African continent. Eastern Africa has
the lowest sales and the greatest shipping costs. As a result, a
strategy to reduce shipping costs and improve sales in the Eastern
African market is required. The products in the corporate sector have
the highest average sale. As a result, products for corporate sale must
be available at all times.
BIBLIOGRAPHY
1.Introduction to analytics using
R, R Studio and R Markdown Short manual series by Dr. Dee Chiluiza, PhD.
Retrieved from https://rpubs.com/Dee_Chiluiza/home
2.R
Applications- 9 Real World use cases of R Programming. Retrieved from https://techvidvan.com/tutorials/r-applications/
3.Harvard Business Review-Present Your Data Like a Pro by Joel
Schwartzberg. Retrieved from https://hbr.org/2020/02/present-your-data-like-a-pro
4.Harvard Business Review-A Refresher on Statistical Significance
It’s too often misused and misunderstood. by Amy Gallo. Retrieved from
https://hbr.org/2016/02/a-refresher-on-statistical-significance
APPENDIX
R for Data Science by Hadley
Wickham published by O’Reilly. https://r4ds.had.co.nz/index.html
Discovering
Statistics using R by Andy Field published by SAGE publications limited.