This section below loads the tidyverse library

This section below takes the txhousing dataset which is part of ggplot2, and reduces it to just two columns of city and year, and one new column of total_sales. The cities are limited to just Austin, Dallas, Houston and San Antonio. And the years are limited to just 2000,2005,2010 and 2015. Below, take a look at the original dataset txhousing, and the new dataset txHouses.

data(txhousing)
txHouses <- txhousing |> 
  group_by(city,year) |> 
  filter(city %in% c("Austin", "Dallas","Houston","San Antonio")) |> 
  filter(year %in% c(2000,2005,2010,2015)) |> 
  summarise(total_sales = sum(sales))
## `summarise()` has grouped output by 'city'. You can override using the
## `.groups` argument.

We will work with the dataset txHouse that has been derived from the txhousing dataset provided by ggplot2. See here for details of the original dataset: https://ggplot2.tidyverse.org/reference/txhousing.html. txHouses contains three columns: city (containing four Texas cities), year (containing four years between 2000 and 2015) and total_sales indicating the total number of sales for the specified year and city.

QUESTION 1: Use ggplot to make a bar plot of the total housing sales (column total_sales) for each city and show one panel per year. You do not have to worry about the order of the bars. Hint: Use facet_wrap().

txHouses |> 
  ggplot(aes(x = city, y = total_sales)) +
  geom_col() +
  facet_wrap(~year)

QUESTION 2: Use ggplot to make a bar plot of the total housing sales (column total_sales) for each year. Color the bar borders with color “gray20” and assign a fill color based on the city column.

txHouses |> 
  ggplot(aes(x = year, y = total_sales)) +
  geom_col(aes(fill = city), color = "gray20")

QUESTION 3: Modify the plot from Question 2 by placing the bars for each city side-by-side rather than stacked.

txHouses |> 
  ggplot(aes(x = year, y = total_sales)) +
 geom_col(aes(fill = city), color = "gray20", position = "dodge")

QUESTION 4: We will work with the diamonds dataset. See here for details: https://ggplot2.tidyverse.org/reference/diamonds.html.

Use ggplot to make a bar plot of the total diamond count per color and show the proportion of each cut within each color category. Show the Fair cut at the bottom, while the Ideal cut is at the TOP.

library(forcats)
data(diamonds)
diamonds |> 
ggplot(aes(x = color, fill=fct_infreq(cut)))+ 
  geom_bar(color = "black") +
 labs(title = "Diamond Cut",
       x = "Color",
       y = "Cut (in carats)") +
  scale_fill_discrete(name = "Cut type")

QUESTION 5: we will work with the dataset OH_pop that contains Ohio state demographics and has been derived from the midwest dataset provided by ggplot2. See here for details of the original dataset: https://ggplot2.tidyverse.org/reference/midwest.html. OH_pop contains two columns: county and poptotal (the county’s total population), and it only contains counties with at least 100,000 inhabitants.

Use ggplot to make a scatter plot of county vs total population (column poptotal) and order the counties by increasing population.

Rename the axes and set appropriate limits, breaks and labels. Note: Do not use xlab() or ylab() to label the axes.

library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
OH_pop <- midwest |> 
  filter (poptotal>=100000) |> 
  filter(state == "OH") |> 
  select(county, poptotal)

OH_pop |> 
  ggplot(aes(x =reorder(county, poptotal), y = poptotal)) +
  geom_point() +
    labs(title = "Total Population",
       subtitle = "Comparison of County Populations ",
       x = "County Name",
       y = "Population") +
  scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_y_continuous(labels = label_comma(), limits = c(NA, 1500000))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.