Dataset Description

Structure of dataset

'data.frame':   1836 obs. of  10 variables:
 $ id           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ country      : chr  "India" "India" "India" "India" ...
 $ state        : chr  "Andhra_Pradesh" "Andhra_Pradesh" "Andhra_Pradesh" "Andhra_Pradesh" ...
 $ city         : chr  "Amaravati" "Amaravati" "Amaravati" "Amaravati" ...
 $ station      : chr  "Secretariat, Amaravati - APPCB" "Secretariat, Amaravati - APPCB" "Secretariat, Amaravati - APPCB" "Secretariat, Amaravati - APPCB" ...
 $ pollutant_id : chr  "PM2.5" "PM10" "NO2" "NH3" ...
 $ last_update  : chr  "21-10-2021 01:00:00" "21-10-2021 01:00:00" "21-10-2021 01:00:00" "21-10-2021 01:00:00" ...
 $ pollutant_min: int  69 82 10 4 16 15 4 47 49 11 ...
 $ pollutant_max: int  109 138 42 5 42 45 82 111 120 44 ...
 $ pollutant_avg: int  86 105 19 4 27 32 42 71 86 23 ...

Summary

       id           country             state               city          
 Min.   :   1.0   Length:1836        Length:1836        Length:1836       
 1st Qu.: 459.8   Class :character   Class :character   Class :character  
 Median : 918.5   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 918.5                                                           
 3rd Qu.:1377.2                                                           
 Max.   :1836.0                                                           
                                                                          
   station          pollutant_id       last_update        pollutant_min   
 Length:1836        Length:1836        Length:1836        Min.   :  1.00  
 Class :character   Class :character   Class :character   1st Qu.:  5.00  
 Mode  :character   Mode  :character   Mode  :character   Median : 14.00  
                                                          Mean   : 28.41  
                                                          3rd Qu.: 39.00  
                                                          Max.   :217.00  
                                                          NA's   :98      
 pollutant_max    pollutant_avg  
 Min.   :  1.00   Min.   :  1.0  
 1st Qu.: 21.00   1st Qu.: 12.0  
 Median : 63.00   Median : 31.0  
 Mean   : 96.87   Mean   : 54.1  
 3rd Qu.:124.00   3rd Qu.: 70.0  
 Max.   :500.00   Max.   :314.0  
 NA's   :98       NA's   :98     

Dataset visualization

Bar plot to compare average pollutant levels by state

Boxplot of pollutant_avg for each state

Boxplot of pollutant_avg by pollutant type

HISTOGRAM visualization

Histogram of PM2.5 pollutant average values

Histogram of PM2.5 pollutant minimum values

Histogram of PM2.5 pollutant maximum values

Boxplot visualization

Boxplot of PM2.5 pollutant average values

Boxplot of PM2.5 pollutant minimum values

Boxplot of PM2.5 pollutant maximum values

Scatterplot visualization

Scatter plot of pollutant_min vs pollutant_max for PM2

Scatter plot of pollutant_avg vs pollutant_min for PM2.5

Scatter plot of pollutant_avg vs pollutant_max for PM2.5

Linechart visualization

Line plot of PM2.5 pollutant averages over time

---
title: "Air Quality Dashboard"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: scroll
    theme: cosmo
    social: menu
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
library(ggplot2)
library(DT)
```

## Dataset Description {.tabset}

### Structure of dataset 

```{r}
# Load the dataset
air_quality <- read.csv("Air_Quality.csv")

# View the structure of the dataset
str(air_quality)
```

### Summary

```{r}
#summary of the dataset
summary(air_quality)

# Replace missing values with median
air_quality$pollutant_min[is.na(air_quality$pollutant_min)] <- median(air_quality$pollutant_min, na.rm = TRUE)
air_quality$pollutant_max[is.na(air_quality$pollutant_max)] <- median(air_quality$pollutant_max, na.rm = TRUE)
air_quality$pollutant_avg[is.na(air_quality$pollutant_avg)] <- median(air_quality$pollutant_avg, na.rm = TRUE)


# Remove duplicate rows
air_quality <- air_quality %>% distinct()


```

## Dataset visualization {.tabset}
###  Bar plot to compare average pollutant levels by state

```{r}
# Aggregate data to get the mean pollutant_avg for each state
state_pollution_avg <- air_quality %>% 
  group_by(state) %>% 
  summarise(mean_pollutant_avg = mean(pollutant_avg, na.rm = TRUE))

# Bar plot to compare average pollutant levels by state
ggplot(state_pollution_avg, aes(x = reorder(state, mean_pollutant_avg), y = mean_pollutant_avg)) + 
  geom_bar(stat = "identity", fill = "lightblue") + 
  coord_flip() +  # Flip coordinates to make the plot more readable
  labs(title = "Average Pollutant Levels by State", x = "State", y = "Average Pollutant Levels") +
  theme_minimal()
```

### Boxplot of pollutant_avg for each state
```{r}
ggplot(air_quality, aes(x = state, y = pollutant_avg)) + 
  geom_boxplot(fill = "lightgreen") +
  coord_flip() +  # Flip coordinates for better readability
  labs(title = "Boxplot of Pollutant Levels by State", x = "State", y = "Pollutant Average") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
```

### Boxplot of pollutant_avg by pollutant type
```{r}
ggplot(air_quality, aes(x = pollutant_id, y = pollutant_avg)) + 
  geom_boxplot(fill = "lightgreen") +
  coord_flip() + 
  labs(title = "Boxplot of Pollutant Levels by Pollutant Type", x = "Pollutant", y = "Pollutant Level") +
  theme_minimal()


```

##  HISTOGRAM visualization {.tabset}

### Histogram of PM2.5 pollutant average values
```{r}
# Subset data for a specific pollutant, e.g., PM2.5
subset_PM25 <- air_quality %>% filter(pollutant_id == "PM2.5")

ggplot(subset_PM25, aes(x = pollutant_avg)) + 
  geom_histogram(binwidth = 10, fill = "lightgreen", color = "black") +
  labs(title = "Histogram of PM2.5 Pollutant Averages", x = "Pollutant Average", y = "Frequency")

```

###  Histogram of PM2.5 pollutant minimum values
```{r}
ggplot(subset_PM25, aes(x = pollutant_min)) + 
  geom_histogram(binwidth = 10, fill = "lightcoral", color = "black") +
  labs(title = "Histogram of PM2.5 Pollutant Minimum Values", x = "Pollutant Min", y = "Frequency")

```

### Histogram of PM2.5 pollutant maximum values
```{r}
ggplot(subset_PM25, aes(x = pollutant_max)) + 
  geom_histogram(binwidth = 10, fill = "lightblue", color = "black") +
  labs(title = "Histogram of PM2.5 Pollutant Maximum Values", x = "Pollutant Max", y = "Frequency")

```

##  Boxplot visualization {.tabset}
### Boxplot of PM2.5 pollutant average values
```{r}
ggplot(subset_PM25, aes(y = pollutant_avg)) + 
  geom_boxplot(fill = "lightblue") +
  labs(title = "Boxplot of PM2.5 Pollutant Averages", y = "Pollutant Average")

```

### Boxplot of PM2.5 pollutant minimum values
```{r}
ggplot(subset_PM25, aes(y = pollutant_min)) + 
  geom_boxplot(fill = "lightpink") +
  labs(title = "Boxplot of PM2.5 Pollutant Minimum Values", y = "Pollutant Min")

```

### Boxplot of PM2.5 pollutant maximum values
```{r}
ggplot(subset_PM25, aes(y = pollutant_max)) + 
  geom_boxplot(fill = "lightyellow") +
  labs(title = "Boxplot of PM2.5 Pollutant Maximum Values", y = "Pollutant Max")

```

##  Scatterplot visualization {.tabset}

### Scatter plot of pollutant_min vs pollutant_max for PM2
```{r}
ggplot(subset_PM25, aes(x = pollutant_min, y = pollutant_max)) + 
  geom_point(color = "blue") +
  labs(title = "Scatter Plot of PM2.5 Min vs Max", x = "Pollutant Min", y = "Pollutant Max")

```

### Scatter plot of pollutant_avg vs pollutant_min for PM2.5
```{r}
ggplot(subset_PM25, aes(x = pollutant_min, y = pollutant_avg)) + 
  geom_point(color = "darkgreen") +
  labs(title = "Scatter Plot of PM2.5 Avg vs Min", x = "Pollutant Min", y = "Pollutant Avg")

```

###  Scatter plot of pollutant_avg vs pollutant_max for PM2.5
```{r}
ggplot(subset_PM25, aes(x = pollutant_max, y = pollutant_avg)) + 
  geom_point(color = "orange") +
  labs(title = "Scatter Plot of PM2.5 Avg vs Max", x = "Pollutant Max", y = "Pollutant Avg")

```

## Linechart visualization {.tabset}
### Line plot of PM2.5 pollutant averages over time
```{r}
ggplot(subset_PM25, aes(x = last_update, y = pollutant_avg)) + 
  geom_line(color = "purple") +
  labs(title = "PM2.5 Average Over Time", x = "Date", y = "Pollutant Average") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

```