library(RCurl)
library(Hmisc)
library(ggplot2)
Is there a correlation in diamond value with respect to it’s carat amount and cut quality?
This dataset contains the prices and various attributes of almost 54,000 diamonds. Specifically, it is a data frame with 53940 rows and 10 variables. Some characteristics specified for each row are price[in U.S Dollars], carat, cut quality, color, clarity, total depth percentage and table.
This classic dataset has been found on the Kaggle site.
This is an observational study since we are trying to infer from already collected data.No specific experiment is being conducted data is being observed only.
Original data is from Kaggle site: https://www.kaggle.com/shivam2503/diamonds
The response variable is the diamond value and it is quantitative.
Carat amount is the quantitative independent variable Cut quality is the qualitative independent variable
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
my_data <- getURL("https://raw.githubusercontent.com/johnm1990/DATA607/master/diamonds.csv")
diamond_info <- read.csv(text=my_data)
diamond_df <- data.frame(diamond_info)
summary(diamond_df$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 326 950 2401 3933 5324 18823
summary(diamond_df$carat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2000 0.4000 0.7000 0.7979 1.0400 5.0100
summary(diamond_df$cut)
## Length Class Mode
## 53940 character character
boxplot(diamond_df$price, main = "Price")
boxplot(diamond_df$carat ~ diamond_df$cut)