In the realm of data analysis and visualization, the ability to effectively communicate insights through graphical representations is paramount. One of the most powerful tools for achieving this is the ggplot2 library in R. This library allows for the creation of sophisticated and aesthetically pleasing visualizations that can convey complex data in an accessible manner.

A foundational ggplot visualization serves as the starting point for exploring and presenting data. It involves the basic construction of plots using the ggplot() function, which specifies the variables for the x and y axes. By leveraging the geom_line() function, one can create line plots that illustrate trends and patterns over time. Additionally, the customization of axis labels and plot colors enhances the clarity and interpretability of the visualizations.

Throughout this introduction, we will delve into the essential steps for creating these foundational visualizations, using the mtcars dataset as a practical example.

1. Libraries

library(ggplot2)
library(tidyverse)
library(plotly)

2. Dataset

The mtcars dataset in R is designed to provide a comprehensive overview of various car models, focusing on their performance and characteristics. It is widely used for statistical analysis and machine learning tasks, allowing users to explore relationships between different aspects of car performance and make predictions based on these data points.

mtcars

The mtcars dataset contains data on 32 car models and 11 variables related to their performance and characteristics. The variables include:

  • mpg: Miles per gallon (fuel efficiency).
  • cyl: Number of cylinders.
  • disp: Displacement (in cubic inches).
  • hp: Horsepower.
  • drat: Rear axle ratio.
  • wt: Weight (in 1000 lbs).
  • qsec: 1/4 mile time.
  • vs: Engine (0 = V-shaped, 1 = Straight).
  • am: Transmission (0 = automatic, 1 = manual).
  • gear: Number of forward gears.
  • carb: Number of carburetors.

3. Explore data

glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, …
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 14…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123,…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.9…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, …
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, …
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, …

4. Create Basic Charts with ggplot2

4.1 Bar chart

Bar charts are crucial for visualizing categorical data and comparing different groups. They are effective because they allow for quick comparison between different categories or groups, help identify patterns or trends in the data, and provide a clear and straightforward representation of quantitative information, making it easier to interpret and communicate. Bar charts are widely used in various fields such as economics, social sciences, and business due to their simplicity and effectiveness in conveying complex information in an accessible manner.


qplot(mtcars$cyl,
      geom = "bar",
      fill = I("red"),
      colour = I("black"),
      xlab = "Cylinders",
      ylab = "Number of Vehicles",
      main = "Cylinders in mtcars") +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  )

The bar chart represents the number of vehicles in the mtcars dataset based on the number of cylinders they have. The horizontal axis (x-axis) shows the categories for the number of cylinders (4, 6, and 8), while the vertical axis (y-axis) shows the count of vehicles.

4.2 Histograms

Histograms are essential for visualizing the distribution of a dataset, allowing us to understand the frequency of data points within specified ranges (bins). They help identify patterns such as skewness, central tendency, and the spread of the data, as well as detect outliers and anomalies. By providing a clear graphical representation of the data’s distribution, histograms facilitate better decision-making and insights in various fields, including statistics, data analysis, and research.


qplot(mtcars$hp,
      geom = "histogram",
      binwidth = 20,
      fill = I("blue"),
      colour = I("black"),
      xlab = "Horsepower",
      ylab = "Number of Cars",
      alpha = I(1),
      main = "Histogram of Horsepower") +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  )

NA
NA

The histogram illustrates the distribution of cars based on their horsepower. The horizontal axis (x-axis) represents horsepower, ranging from 0 to over 300 horsepower. The vertical axis (y-axis) shows the number of cars, ranging from 0 to over 7.5.

The histogram features several blue bars, each representing different ranges of horsepower and the corresponding number of cars within each range. The tallest bar is in the range between approximately 75 and 100 horsepower, indicating that there are about 7.5 cars in this range. Another significant peak is observed between the ranges of approximately 125 and 150 horsepower, with around 4 cars.

Overall, most cars have horsepower below 150, with a notable decrease in frequency as horsepower increases beyond this range. There are very few cars with horsepower above 250.

4.3 Pie Chart

Pie charts are important for visualizing the proportions of different categories within a dataset. They clearly show the relative sizes of parts to the whole, making it easy to compare proportions. This type of chart provides a straightforward and intuitive way to understand the distribution of categories, which is particularly useful when you want to highlight the composition of a dataset. Additionally, pie charts are visually engaging and can quickly convey information at a glance, making them a popular choice in business, marketing, and social sciences for presenting data in an easily interpretable manner.

# Convert cyl to factor 

mtcars <- mtcars %>%
  mutate(cyl_factor = as.factor(cyl))

In this code, the mutate function from the dplyr package is used to create a new column called cyl_factor by converting the existing cyl column into a factor. This transformation is useful for categorical data analysis and visualization.


ggplot(data = mtcars, aes(x = cyl_factor, fill = cyl_factor)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Number of Cars by Cylinder Count",
    x = "Number of Cylinders",
    y = "Count of Cars",
    fill = "Cylinders"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
)

The bar chart illustrates the number of cars based on their cylinder count. The vertical axis represents the “Count of Cars,” while the horizontal axis shows the “Number of Cylinders.” There are three bars, each representing a different cylinder count: 4, 6, and 8.

  • The red bar represents cars with 4 cylinders.
  • The green bar represents cars with 6 cylinders.
  • The blue bar represents cars with 8 cylinders.

The height of each bar indicates the number of cars with that specific cylinder count. From the chart, we can see that cars with 8 cylinders are the most common, followed by cars with 4 cylinders, and then cars with 6 cylinders.

4.4 Stacked bar chart

Stacked bar charts are important for visualizing the composition of different categories within a dataset, allowing for a detailed comparison of sub-groups within each category. They enable the comparison of multiple sub-groups within each category, providing a more granular view of the data. By showing the proportion of each sub-group relative to the total, stacked bar charts make it easy to see how each part contributes to the whole. This type of chart helps identify trends and patterns across different categories and sub-groups, facilitating better understanding of the data.


ggplot(data = mtcars, aes(x = "", fill = cyl_factor)) +
  geom_bar(position = "stack") +
  labs(
    title = "Distribution of Cars by Cylinder Count",
    x = "",
    y = "Count of Cars",
    fill = "Cylinders"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.y = element_text(size = 14),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

ggplot( data = mtcars,
        aes(x = " " ,
            fill = cyl_factor))+
  geom_bar(position = "stack") +
  coord_polar(theta = "y") + 
  scale_fill_brewer(palette = "Dark2") +
  theme_classic()

From this pie chart, you can gain several insights about the distribution of cars based on their cylinder count in the mtcars dataset. The chart visually represents the proportion of cars with different cylinder counts, making it easy to see which cylinder count is most common and which is least common among the cars in the dataset. The largest segment of the pie chart indicates the most prevalent cylinder count. In this case, cars with 8 cylinders form the largest segment, suggesting they are the most common in the dataset.

By comparing the sizes of the segments, you can easily understand the relative frequency of each cylinder count. For example, cars with 4 cylinders are more common than those with 6 cylinders, but less common than those with 8 cylinders. This comparison helps in understanding the distribution and prevalence of different cylinder counts within the dataset.

The pie chart provides a clear and immediate understanding of how the dataset is composed in terms of cylinder counts. This visualization is useful for quickly grasping the overall composition of the dataset, which can be beneficial for further analysis or decision-making.

Example of GGPLOT2
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl), shape = factor(cyl))) +
    geom_point(size=2) + 
    labs(x = "Gross horsepower", 
         y = "Miles/gallon", 
         color = "Cylinders", 
         shape = "Cylinders", 
         title = "Mileage by horsepower and number of cylinders",
         subtitle = "Data source: 1974 Motor Trend US magazine") +
    theme_dark() +
    scale_color_brewer(palette = "Set2")

The graph shows the relationship between mileage (miles per gallon) and gross horsepower of vehicles, differentiated by the number of cylinders. The data is from the 1974 Motor Trend US magazine. On the vertical axis, we have miles per gallon, ranging from 10 to 35. On the horizontal axis, we have gross horsepower, ranging from 50 to 350. The points on the graph are color-coded and shaped to represent different numbers of cylinders: green for 4 cylinders, orange for 6 cylinders, and blue for 8 cylinders.

There is a clear inverse relationship between horsepower and miles per gallon. As horsepower increases, the miles per gallon tend to decrease. This indicates that more powerful cars generally have lower fuel efficiency. Cars with 4 cylinders tend to have higher fuel efficiency (higher mpg) and lower horsepower, clustering towards the top-left of the graph. Cars with 6 cylinders have a moderate range of horsepower and fuel efficiency, positioned in the middle of the graph. Cars with 8 cylinders generally have higher horsepower and lower fuel efficiency, clustering towards the bottom-right of the graph.

The most fuel-efficient cars (highest mpg) are those with 4 cylinders and lower horsepower. Conversely, the least fuel-efficient cars are those with 8 cylinders and higher horsepower. The graph highlights the trade-off between performance (horsepower) and fuel efficiency (mpg). Higher performance vehicles (more horsepower) tend to sacrifice fuel efficiency.

This visualization helps in understanding how the number of cylinders and horsepower affect the fuel efficiency of vehicles, providing valuable insights for both consumers and manufacturers.

4.5 Scatter Plots

Scatter plots are an essential tool in data visualization and analysis, offering numerous benefits that enhance our understanding of data. One of the primary advantages of scatter plots is their ability to visualize relationships between two quantitative variables. By plotting data points on a two-dimensional graph, scatter plots help identify patterns, trends, and correlations, making it easier to understand how one variable affects another. This visual representation is crucial for making data-driven decisions and forming hypotheses.


ggplot(mtcars, aes(x = mpg, y = wt, shape = cyl_factor, color = cyl_factor)) +
  geom_point(size = 3) +  # Size of the points
  labs(title = "Relationship Between MPG and Weight of Cars",
       x = "Miles Per Gallon (MPG)",
       y = "Weight (1000 lbs)",
       shape = "Number of Cylinders",
       color = "Number of Cylinders") + # Custom labels
  theme_minimal() # Minimalist theme for a clean look

The scatter plot reveals a negative correlation between miles per gallon (MPG) and the weight of cars. Generally, as the weight of the car increases, the MPG decreases. This indicates that heavier cars tend to be less fuel-efficient. Understanding this relationship is crucial for assessing how vehicle weight impacts fuel efficiency.

4.6 Box Plot

A box plot is a crucial tool in data analysis as it visually summarizes the distribution of a dataset, highlighting the median, quartiles, and potential outliers, which helps in quickly understanding the central tendency, spread, and overall range of the data, making it easier to compare multiple datasets, identify anomalies, and support informed decision-making across various fields.


ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + 
  geom_boxplot() +
  labs(
    title = "Distribution of Miles per Gallon by Number of Cylinders",
    x = "Number of Cylinders",
    y = "Miles per Gallon"
  ) +
 theme_classic()

The box plot illustrates the relationship between the number of cylinders in a vehicle and its miles per gallon (MPG) performance. Vehicles with 4 cylinders have a median MPG of around 25, with an interquartile range (IQR) from approximately 22 to 30. Vehicles with 6 cylinders have a median MPG of about 18, with an IQR from roughly 16 to 20. Vehicles with 8 cylinders have a median MPG of around 14, with an IQR from about 13 to 15. The plot also highlights that vehicles with fewer cylinders generally achieve higher MPG, and it identifies some outliers, particularly among vehicles with more cylinders.

---
title: "**Creating a foundational ggplot visualization.**"
author: Miguelangel Palma Rojas
output:
  html_notebook:
    df_print: paged
    toc: true
    toc_float: true
    toc_depth: 5
---

In the realm of data analysis and visualization, the ability to effectively communicate insights through graphical representations is paramount. One of the most powerful tools for achieving this is the `ggplot2` library in R. This library allows for the creation of sophisticated and aesthetically pleasing visualizations that can convey complex data in an accessible manner.

A foundational ggplot visualization serves as the starting point for exploring and presenting data. It involves the basic construction of plots using the `ggplot()` function, which specifies the variables for the x and y axes. By leveraging the `geom_line()` function, one can create line plots that illustrate trends and patterns over time. Additionally, the customization of axis labels and plot colors enhances the clarity and interpretability of the visualizations.

Throughout this introduction, we will delve into the essential steps for creating these foundational visualizations, using the **mtcars** dataset as a practical example. 

### **1. Libraries **

```{r setup , message=FALSE , warning=FALSE}
library(ggplot2)
library(tidyverse)
library(plotly)
```

### **2. Dataset**

The **mtcars** dataset in R is designed to provide a comprehensive overview of various car models, focusing on their performance and characteristics. It is widely used for statistical analysis and machine learning tasks, allowing users to explore relationships between different aspects of car performance and make predictions based on these data points.


```{r}
mtcars
```
The __**mtcars**__ dataset contains data on 32 car models and 11 variables related to their performance and characteristics. The variables include:

- **mpg**: Miles per gallon (fuel efficiency).
- **cyl**: Number of cylinders.
- **disp**: Displacement (in cubic inches).
- **hp**: Horsepower.
- **drat**: Rear axle ratio.
- **wt**: Weight (in 1000 lbs).
- **qsec**: 1/4 mile time.
- **vs**: Engine (0 = V-shaped, 1 = Straight).
- **am**: Transmission (0 = automatic, 1 = manual).
- **gear**: Number of forward gears.
- **carb**: Number of carburetors.


### **3. Explore data**

```{r}
glimpse(mtcars)
```

###  **4. Create Basic Charts with ggplot2**


#### _4.1 Bar chart_

Bar charts are crucial for visualizing categorical data and comparing different groups. They are effective because they allow for quick comparison between different categories or groups, help identify patterns or trends in the data, and provide a clear and straightforward representation of quantitative information, making it easier to interpret and communicate. Bar charts are widely used in various fields such as economics, social sciences, and business due to their simplicity and effectiveness in conveying complex information in an accessible manner.

```{r}

qplot(mtcars$cyl,
      geom = "bar",
      fill = I("red"),
      colour = I("black"),
      xlab = "Cylinders",
      ylab = "Number of Vehicles",
      main = "Cylinders in mtcars") +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  )

```

The bar chart represents the number of vehicles in the `mtcars` dataset based on the number of cylinders they have. The horizontal axis (x-axis) shows the categories for the number of cylinders (4, 6, and 8), while the vertical axis (y-axis) shows the count of vehicles.


#### _4.2 Histograms_

Histograms are essential for visualizing the distribution of a dataset, allowing us to understand the frequency of data points within specified ranges (bins). They help identify patterns such as skewness, central tendency, and the spread of the data, as well as detect outliers and anomalies. By providing a clear graphical representation of the data's distribution, histograms facilitate better decision-making and insights in various fields, including statistics, data analysis, and research.

```{r}

qplot(mtcars$hp,
      geom = "histogram",
      binwidth = 20,
      fill = I("blue"),
      colour = I("black"),
      xlab = "Horsepower",
      ylab = "Number of Cars",
      alpha = I(1),
      main = "Histogram of Horsepower") +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14)
  )


```
The histogram illustrates the distribution of cars based on their horsepower. The horizontal axis (x-axis) represents horsepower, ranging from 0 to over 300 horsepower. The vertical axis (y-axis) shows the number of cars, ranging from 0 to over 7.5.

The histogram features several blue bars, each representing different ranges of horsepower and the corresponding number of cars within each range. The tallest bar is in the range between approximately 75 and 100 horsepower, indicating that there are about 7.5 cars in this range. Another significant peak is observed between the ranges of approximately 125 and 150 horsepower, with around 4 cars.

Overall, most cars have horsepower below 150, with a notable decrease in frequency as horsepower increases beyond this range. There are very few cars with horsepower above 250.


#### _4.3 Pie Chart_ 

Pie charts are important for visualizing the proportions of different categories within a dataset. They clearly show the relative sizes of parts to the whole, making it easy to compare proportions. This type of chart provides a straightforward and intuitive way to understand the distribution of categories, which is particularly useful when you want to highlight the composition of a dataset. Additionally, pie charts are visually engaging and can quickly convey information at a glance, making them a popular choice in business, marketing, and social sciences for presenting data in an easily interpretable manner.

```{r}
# Convert cyl to factor 

mtcars <- mtcars %>%
  mutate(cyl_factor = as.factor(cyl))

```

In this code, the mutate function from the `dplyr` package is used to create a new column called `cyl_factor` by converting the existing `cyl` column into a factor. This transformation is useful for categorical data analysis and visualization.


```{r}

ggplot(data = mtcars, aes(x = cyl_factor, fill = cyl_factor)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Number of Cars by Cylinder Count",
    x = "Number of Cylinders",
    y = "Count of Cars",
    fill = "Cylinders"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
)

```
The bar chart illustrates the number of cars based on their cylinder count. The vertical axis represents the **"Count of Cars,"** while the horizontal axis shows the **"Number of Cylinders."** There are three bars, each representing a different cylinder count: 4, 6, and 8.

- The red bar represents cars with 4 cylinders.
- The green bar represents cars with 6 cylinders.
- The blue bar represents cars with 8 cylinders.

The height of each bar indicates the number of cars with that specific cylinder count. From the chart, we can see that cars with 8 cylinders are the most common, followed by cars with 4 cylinders, and then cars with 6 cylinders.


#### _4.4 Stacked bar chart_

Stacked bar charts are important for visualizing the composition of different categories within a dataset, allowing for a detailed comparison of sub-groups within each category. They enable the comparison of multiple sub-groups within each category, providing a more granular view of the data. By showing the proportion of each sub-group relative to the total, stacked bar charts make it easy to see how each part contributes to the whole. This type of chart helps identify trends and patterns across different categories and sub-groups, facilitating better understanding of the data.

```{r}

ggplot(data = mtcars, aes(x = "", fill = cyl_factor)) +
  geom_bar(position = "stack") +
  labs(
    title = "Distribution of Cars by Cylinder Count",
    x = "",
    y = "Count of Cars",
    fill = "Cylinders"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title.y = element_text(size = 14),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

```

```{r}
ggplot( data = mtcars,
        aes(x = " " ,
            fill = cyl_factor))+
  geom_bar(position = "stack") +
  coord_polar(theta = "y") + 
  scale_fill_brewer(palette = "Dark2") +
  theme_classic()
```
From this pie chart, you can gain several insights about the distribution of cars based on their cylinder count in the mtcars dataset. The chart visually represents the proportion of cars with different cylinder counts, making it easy to see which cylinder count is most common and which is least common among the cars in the dataset. The largest segment of the pie chart indicates the most prevalent cylinder count. In this case, cars with 8 cylinders form the largest segment, suggesting they are the most common in the dataset.

By comparing the sizes of the segments, you can easily understand the relative frequency of each cylinder count. For example, cars with 4 cylinders are more common than those with 6 cylinders, but less common than those with 8 cylinders. This comparison helps in understanding the distribution and prevalence of different cylinder counts within the dataset.

The pie chart provides a clear and immediate understanding of how the dataset is composed in terms of cylinder counts. This visualization is useful for quickly grasping the overall composition of the dataset, which can be beneficial for further analysis or decision-making.

##### _Example of GGPLOT2_


```{r}
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl), shape = factor(cyl))) +
    geom_point(size=2) + 
    labs(x = "Gross horsepower", 
         y = "Miles/gallon", 
         color = "Cylinders", 
         shape = "Cylinders", 
         title = "Mileage by horsepower and number of cylinders",
         subtitle = "Data source: 1974 Motor Trend US magazine") +
    theme_dark() +
    scale_color_brewer(palette = "Set2")
```

The graph shows the relationship between mileage (miles per gallon) and gross horsepower of vehicles, differentiated by the number of cylinders. The data is from the 1974 Motor Trend US magazine. On the vertical axis, we have miles per gallon, ranging from 10 to 35. On the horizontal axis, we have gross horsepower, ranging from 50 to 350. The points on the graph are color-coded and shaped to represent different numbers of cylinders: green for 4 cylinders, orange for 6 cylinders, and blue for 8 cylinders.

There is a clear inverse relationship between horsepower and miles per gallon. As horsepower increases, the miles per gallon tend to decrease. This indicates that more powerful cars generally have lower fuel efficiency. Cars with 4 cylinders tend to have higher fuel efficiency (higher mpg) and lower horsepower, clustering towards the top-left of the graph. Cars with 6 cylinders have a moderate range of horsepower and fuel efficiency, positioned in the middle of the graph. Cars with 8 cylinders generally have higher horsepower and lower fuel efficiency, clustering towards the bottom-right of the graph.

The most fuel-efficient cars (highest mpg) are those with 4 cylinders and lower horsepower. Conversely, the least fuel-efficient cars are those with 8 cylinders and higher horsepower. The graph highlights the trade-off between performance (horsepower) and fuel efficiency (mpg). Higher performance vehicles (more horsepower) tend to sacrifice fuel efficiency.

This visualization helps in understanding how the number of cylinders and horsepower affect the fuel efficiency of vehicles, providing valuable insights for both consumers and manufacturers.

#### _4.5 Scatter Plots _

Scatter plots are an essential tool in data visualization and analysis, offering numerous benefits that enhance our understanding of data. One of the primary advantages of scatter plots is their ability to visualize relationships between two quantitative variables. By plotting data points on a two-dimensional graph, scatter plots help identify patterns, trends, and correlations, making it easier to understand how one variable affects another. This visual representation is crucial for making data-driven decisions and forming hypotheses.

```{r}

ggplot(mtcars, aes(x = mpg, y = wt, shape = cyl_factor, color = cyl_factor)) +
  geom_point(size = 3) +  # Size of the points
  labs(title = "Relationship Between MPG and Weight of Cars",
       x = "Miles Per Gallon (MPG)",
       y = "Weight (1000 lbs)",
       shape = "Number of Cylinders",
       color = "Number of Cylinders") + # Custom labels
  theme_minimal() # Minimalist theme for a clean look

```

The scatter plot reveals a negative correlation between miles per gallon (MPG) and the weight of cars. Generally, as the weight of the car increases, the MPG decreases. This indicates that heavier cars tend to be less fuel-efficient. Understanding this relationship is crucial for assessing how vehicle weight impacts fuel efficiency.

#### _4.6 Box Plot_

A box plot is a crucial tool in data analysis as it visually summarizes the distribution of a dataset, highlighting the median, quartiles, and potential outliers, which helps in quickly understanding the central tendency, spread, and overall range of the data, making it easier to compare multiple datasets, identify anomalies, and support informed decision-making across various fields.


```{r}

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + 
  geom_boxplot() +
  labs(
    title = "Distribution of Miles per Gallon by Number of Cylinders",
    x = "Number of Cylinders",
    y = "Miles per Gallon"
  ) +
 theme_classic()

```
The box plot illustrates the relationship between the number of cylinders in a vehicle and its miles per gallon (MPG) performance. Vehicles with 4 cylinders have a median MPG of around 25, with an interquartile range (IQR) from approximately 22 to 30. Vehicles with 6 cylinders have a median MPG of about 18, with an IQR from roughly 16 to 20. Vehicles with 8 cylinders have a median MPG of around 14, with an IQR from about 13 to 15. The plot also highlights that vehicles with fewer cylinders generally achieve higher MPG, and it identifies some outliers, particularly among vehicles with more cylinders.
