Dataset Description

Summary of dataset

      X.4              X.3              X.2              X.1        
 Min.   :   1.0   Min.   :   1.0   Min.   :   1.0   Min.   :   1.0  
 1st Qu.: 500.8   1st Qu.: 500.8   1st Qu.: 500.8   1st Qu.: 500.8  
 Median :1000.5   Median :1000.5   Median :1000.5   Median :1000.5  
 Mean   :1000.5   Mean   :1000.5   Mean   :1000.5   Mean   :1000.5  
 3rd Qu.:1500.2   3rd Qu.:1500.2   3rd Qu.:1500.2   3rd Qu.:1500.2  
 Max.   :2000.0   Max.   :2000.0   Max.   :2000.0   Max.   :2000.0  
       X                ID        Marital.Status        Gender         
 Min.   :   1.0   Min.   :11000   Length:2000        Length:2000       
 1st Qu.: 500.8   1st Qu.:15291   Class :character   Class :character  
 Median :1000.5   Median :19744   Mode  :character   Mode  :character  
 Mean   :1000.5   Mean   :19966                                        
 3rd Qu.:1500.2   3rd Qu.:24471                                        
 Max.   :2000.0   Max.   :29447                                        
     Income          Children      Education          Occupation       
 Min.   : 10000   Min.   :0.000   Length:2000        Length:2000       
 1st Qu.: 30000   1st Qu.:0.000   Class :character   Class :character  
 Median : 60000   Median :2.000   Mode  :character   Mode  :character  
 Mean   : 56215   Mean   :1.901                                        
 3rd Qu.: 70000   3rd Qu.:3.000                                        
 Max.   :170000   Max.   :5.000                                        
  Home.Owner             Cars       Commute.Distance      Region         
 Length:2000        Min.   :0.000   Length:2000        Length:2000       
 Class :character   1st Qu.:1.000   Class :character   Class :character  
 Mode  :character   Median :1.000   Mode  :character   Mode  :character  
                    Mean   :1.454                                        
                    3rd Qu.:2.000                                        
                    Max.   :4.000                                        
      Age        Purchased.Bike    
 Min.   :25.00   Length:2000       
 1st Qu.:35.00   Class :character  
 Median :43.00   Mode  :character  
 Mean   :44.18                     
 3rd Qu.:52.00                     
 Max.   :89.00                     

dataset

  X.4 X.3 X.2 X.1 X    ID Marital.Status Gender Income Children       Education
1   1   1   1   1 1 12496        Married Female  40000        1       Bachelors
2   2   2   2   2 2 24107        Married   Male  30000        3 Partial College
3   3   3   3   3 3 14177        Married   Male  80000        5 Partial College
4   4   4   4   4 4 24381         Single   Male  70000        0       Bachelors
5   5   5   5   5 5 25597         Single   Male  30000        0       Bachelors
6   6   6   6   6 6 13507        Married Female  10000        2 Partial College
      Occupation Home.Owner Cars Commute.Distance  Region Age Purchased.Bike
1 Skilled Manual        Yes    0        0-1 Miles  Europe  42             No
2       Clerical        Yes    1        0-1 Miles  Europe  43             No
3   Professional         No    2        2-5 Miles  Europe  60             No
4   Professional        Yes    1       5-10 Miles Pacific  41            Yes
5       Clerical         No    0        0-1 Miles  Europe  36            Yes
6         Manual        Yes    0        1-2 Miles  Europe  50             No

str of the dataset

'data.frame':   2000 obs. of  18 variables:
 $ X.4             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X.3             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X.2             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X.1             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ID              : int  12496 24107 14177 24381 25597 13507 27974 19364 22155 19280 ...
 $ Marital.Status  : chr  "Married" "Married" "Married" "Single" ...
 $ Gender          : chr  "Female" "Male" "Male" "Male" ...
 $ Income          : num  40000 30000 80000 70000 30000 10000 160000 40000 20000 60000 ...
 $ Children        : int  1 3 5 0 0 2 2 1 2 2 ...
 $ Education       : chr  "Bachelors" "Partial College" "Partial College" "Bachelors" ...
 $ Occupation      : chr  "Skilled Manual" "Clerical" "Professional" "Professional" ...
 $ Home.Owner      : chr  "Yes" "Yes" "No" "Yes" ...
 $ Cars            : num  0 1 2 1 0 0 4 0 2 1 ...
 $ Commute.Distance: chr  "0-1 Miles" "0-1 Miles" "2-5 Miles" "5-10 Miles" ...
 $ Region          : chr  "Europe" "Europe" "Europe" "Pacific" ...
 $ Age             : int  42 43 60 41 36 50 33 43 58 43 ...
 $ Purchased.Bike  : chr  "No" "No" "No" "Yes" ...

Univariate

Histogram of income

Histogram of Age:

Density Plot of Income:

Bivariate

Bar Plot of Gender:

scatter plot of income

Scatter Plot of Age vs. Gender:

Scatter Plot of Age vs. Income:

Multivariate

Scatter Plot with Color Gradient:

Line Plot of Age vs. Occupation

Heatmap:

---
title: "Assignment-2"
output: 
  flexdashboard::flex_dashboard:
    orientation: row
    vertical_layout: scroll
    source_code: embed
    theme: spacelab
    social: menu
---

```{r}
library(reshape2)
library('ggvis')
library('tidyverse')
library('ggplot2')
library(corrplot)

```

## Dataset Description {.tabset}

### Summary of dataset
```{r}
bike_buyers = read.csv("bike_buyers_clean.csv", header=T, na.strings='')
summary(bike_buyers)

```

### dataset
```{r}
head(bike_buyers)
```

### str of the dataset
```{r}
 
str(bike_buyers)

```

## Univariate {.tabset}

### Histogram of income
```{r}
hist(bike_buyers$Income)

```

### Histogram of Age:
```{r}
hist(bike_buyers$Age)
```

### Density Plot of Income:
```{r}
 plot(density(bike_buyers$Income), main='Income Density Spread')


```


## Bivariate {.tabset}

### Bar Plot of Gender:
```{r}
counts <- table(bike_buyers$Cars, bike_buyers$Gender)
barplot(counts, main = '',
        xlab="Number of Gears",
        legend = rownames(counts))

```

### scatter plot of income
```{r}
 plot(bike_buyers$Income, type= "p")
```

### Scatter Plot of Age vs. Gender:
```{r}
ggplot(bike_buyers, aes(y = Age, x = Gender)) +
  geom_point()


```

### Scatter Plot of Age vs. Income:
```{r}
ggplot(bike_buyers, aes(y = Age, x = Income)) +
  geom_point()



```



## Multivariate {.tabset}

### Scatter Plot with Color Gradient:
```{r}
p3 <- ggplot(bike_buyers, aes(x = Age, y = Income)) + 
  theme(legend.position="top", axis.text=element_text(size = 6))
p4 <- p3 + geom_point(aes(color = Age), alpha = 0.5, size = 1.5, position = position_jitter(width = 0.25, height = 0))
p4 + scale_x_discrete(name="Income") + scale_color_continuous(name="", low = "blue", high = "red")


```

### Line Plot of Age vs. Occupation
```{r}
 p5 <- ggplot(bike_buyers, aes(x = Age, y = Occupation))
p5 + geom_line(aes(color = Age)) + facet_wrap(~Gender)

```



### Heatmap:
```{r}
# Select numeric variables
numeric_vars <- bike_buyers %>% select_if(is.numeric)

# Compute the correlation matrix
corr_matrix <- cor(numeric_vars, use = "complete.obs")

# Melt the correlation matrix for ggplot
melted_corr_matrix <- melt(corr_matrix)

# Create the heatmap
ggplot(melted_corr_matrix, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Correlation Heatmap", x = "Variable", y = "Variable")


```