Dataset Description

View of dataset

Dataset Description

dataset Structure

tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Summary

     carat               cut        color        clarity          depth      
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065   Min.   :43.00  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258   1st Qu.:61.00  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194   Median :61.80  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171   Mean   :61.75  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066   3rd Qu.:62.50  
 Max.   :5.0100                     I: 5422   VVS1   : 3655   Max.   :79.00  
                                    J: 2808   (Other): 2531                  
     table           price             x                y         
 Min.   :43.00   Min.   :  326   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710   1st Qu.: 4.720  
 Median :57.00   Median : 2401   Median : 5.700   Median : 5.710  
 Mean   :57.46   Mean   : 3933   Mean   : 5.731   Mean   : 5.735  
 3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540   3rd Qu.: 6.540  
 Max.   :95.00   Max.   :18823   Max.   :10.740   Max.   :58.900  
                                                                  
       z         
 Min.   : 0.000  
 1st Qu.: 2.910  
 Median : 3.530  
 Mean   : 3.539  
 3rd Qu.: 4.040  
 Max.   :31.800  
                 

univariate Analysis

Histogram of carat

Histogram for price

cut Quality

Color

Clarity

Carat vs. Price colored by Cut

Price distribution across Cuts and Colors

Price distribution across Clarity and Colors

Boxplot of Diamond Prices by Color

Boxplot of Diamond Prices by Cut

---
title: "Exploratary data analysis for Diamonds data set"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: scroll
    theme: journal
    social: menu
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(MASS)
library(lattice)
library(DT)
library(ggplot2)
attach(diamonds)

```


## Dataset Description {.tabset}
### View of dataset
```{r}
  datatable(quine, extensions = 'Buttons' ,options = list(dom='Bfrtip',Buttons=c('copy','print','PDF','csv')))
```

## Dataset Description {.tabset}
### dataset Structure

```{r}
    str(diamonds)
```

### Summary

```{r}
summary(diamonds)

```

## univariate Analysis {.tabset}
### Histogram of carat

```{r}
  ggplot(diamonds ,aes(carat)) +
  geom_histogram( fill = "skyblue", color = "black") +
  geom_vline(xintercept = mean(carat), color = "red", linetype = "dashed") +
  geom_vline(xintercept = median(carat), color = "green", linetype = "dashed") +
  labs(title = "Histogram of Carat",
       x = "Carat Value",
       y = "Count")  

```

### Histogram for price

```{r}
ggplot(diamonds ,aes(price)) +
  geom_histogram( fill = "skyblue", color = "black") +
  geom_vline(xintercept = mean(price), color = "red", linetype = "dashed") +
  geom_vline(xintercept = median(price), color = "green", linetype = "dashed") +
  labs(title = "Histogram for price",
       x = "Price",
       y = "Frequence")  +
  xlim(c(0,10000)) 
```

### cut Quality
```{r}
  ggplot(diamonds, aes(x = cut)) +
  geom_bar(fill = "skyblue", color = "black") +  # Basic barplot with color
  labs(title = "Count of Diamonds by Cut Quality",
       x = "Cut Quality",
       y = "Count")

```


### Color
```{r}
  ggplot(diamonds, aes(x = color)) +
  geom_bar(fill = "skyblue", color = "black") +  # Basic barplot with color
  labs(title = "Count of Diamonds by COlor",
       x = "Color",
       y = "Count")
```

### Clarity
```{r}
  ggplot(diamonds, aes(x = clarity)) +
  geom_bar(fill = "skyblue", color = "black") +  # Basic barplot with color
  labs(title = "Count of Diamonds by clarity",
       x = "clarity",
       y = "Count")

```

###  Carat vs. Price colored by Cut
```{r}
  ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point(alpha = 0.5) +
  labs(title = "Carat vs. Price colored by Cut")

```

###  Price distribution across Cuts and Colors
```{r}
ggplot(diamonds, aes(x = cut, y = price, fill = color)) +
  geom_boxplot() +
  labs(title = "Price distribution across Cuts and Colors")

```

###  Price distribution across Clarity and Colors
```{r}
  ggplot(diamonds,aes(x=clarity,y=price,fill=color)) +
  geom_boxplot()+
  labs(title = "Price distribution across Clarity and Colors")

```

###  Boxplot of Diamond Prices by Color
```{r}
  ggplot(diamonds, aes(x = color, y = price)) +
  geom_boxplot() +  # Create the boxplot
  labs(title = "Boxplot of Diamond Prices by Color",
       x = "Cut Quality",
       y = "Price (in USD)")
```

###  Boxplot of Diamond Prices by Cut
```{r}
  ggplot(diamonds, aes(x = cut, y = price)) +
  geom_boxplot() +  # Create the boxplot
  labs(title = "Boxplot of Diamond Prices by Cut",
       x = "Cut Quality",
       y = "Price (in USD)")

```