Clear Data
rm(list = ls()) # Clear all files from your environment
gc() # Clear unused memory
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 526490 28.2 1169547 62.5 NA 669420 35.8
## Vcells 970770 7.5 8388608 64.0 16384 1851931 14.2
cat("\f") # Clear the console
graphics.off() # Clear all graphs
Do a few Google searches and tell us what is correlation (5 lines max)
Correlations is a standardized measure between -1 and 1, indicating the strength and direction of the linear relationship of two variables.
1 indicates a perfect positive relations
-1 indicates a perfect negative relations
0 indicates no negative relations
Do a few Google searches and tell us what is covariance (5 lines max).
Covariance measures the degree to which two random variables vary together, they may have a positive or negative relationship (- \(\infty\), + \(\infty\)). If positive they will increase or decrease together while if negative one will increase while the other decreases.
Unlike correlations, the magnitude of covariance is influenced by the scales of the variables.
Try merging any dataset that interests you based on the data dictionary (pay attention to the unique keys), and create a meaningful dataset (that have some interesting y (outcome) and an interesting x (independent variable).
library(readr)
setwd("~/Desktop/Data Analysis/Discussion 12 - Merging Data & correlation and covariance")
orders <- read_csv("order_details.csv")
## Rows: 48620 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): pizza_id
## dbl (3): order_details_id, order_id, quantity
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pizzas <- read.csv("pizzas.csv")
# Merge data
pizza_merged <- merge(pizzas, orders, by = "pizza_id", all = TRUE)
Create a summary statistics table of the merged dataset.
#install.packages("stargazer")
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(pizza_merged,
type = "text",
title = "Summary")
##
## Summary
## ==========================================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------------------
## price 48,625 16.494 3.622 9.750 35.950
## order_details_id 48,620 24,310.500 14,035.530 1 48,620
## order_id 48,620 10,701.480 6,180.120 1 21,350
## quantity 48,620 1.020 0.143 1 4
## ----------------------------------------------------------
summary(pizza_merged)
## pizza_id pizza_type_id size price
## Length:48625 Length:48625 Length:48625 Min. : 9.75
## Class :character Class :character Class :character 1st Qu.:12.75
## Mode :character Mode :character Mode :character Median :16.50
## Mean :16.49
## 3rd Qu.:20.25
## Max. :35.95
##
## order_details_id order_id quantity
## Min. : 1 Min. : 1 Min. :1.00
## 1st Qu.:12156 1st Qu.: 5337 1st Qu.:1.00
## Median :24310 Median :10682 Median :1.00
## Mean :24310 Mean :10701 Mean :1.02
## 3rd Qu.:36465 3rd Qu.:16100 3rd Qu.:1.00
## Max. :48620 Max. :21350 Max. :4.00
## NA's :5 NA's :5 NA's :5
Pick any two quantitative variables from the data set that interests you. Run a Correlation (measures strength of linear relationship) between the two variables, and run thev Covariance between the two variables. Interpret.
#Create Sales field
pizza_merged$sales <- pizza_merged$price * pizza_merged$quantity
#Stats
correlation <- cor(pizza_merged$quantity,
pizza_merged$sales,
use = "complete.obs")
covariance <- cov(pizza_merged$quantity,
pizza_merged$sales,
use = "complete.obs")
#Print
print(correlation)
## [1] 0.5419262
print(covariance)
## [1] 0.3440633
Based on the Correlation of 0.54 and Covariance of 0.34 we can tell there is a positive relationship between Sales and quantity. The correlation is specifically interesting to see as it shows a moderately strong relationship. This tells us that as the quantity of pizzas sold increases, so does the sales.