Q1: Data visualization and exploration tasks with gpa data set

  1. By doing your own research, give the precise meaning of each variable.
?gpa
glimpse(gpa)
## Rows: 55
## Columns: 5
## $ gpa        <dbl> 3.890, 3.900, 3.750, 3.600, 4.000, 3.150, 3.250, 3.925, 3.4…
## $ studyweek  <int> 50, 15, 15, 10, 25, 20, 15, 10, 12, 2, 10, 30, 30, 21, 10, …
## $ sleepnight <dbl> 6.0, 6.0, 7.0, 6.0, 7.0, 7.0, 6.0, 8.0, 8.0, 8.0, 8.0, 6.0,…
## $ out        <dbl> 3.0, 1.0, 1.0, 4.0, 3.0, 3.0, 1.0, 3.0, 2.0, 4.0, 1.0, 2.0,…
## $ gender     <fct> female, female, female, male, female, male, female, female,…

A:from the output, we can see that:

  1. Visualize the relationship between studyweek and gpa. What does your graph indicate?
ggplot(gpa, aes(x= studyweek, y = gpa)) +
  geom_point(position = "jitter") + geom_smooth() +
  labs(title = "Study vs gpa", x = "studyweek", y = "gpa") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
      axis.title = element_text(rel(1.2), color = "blue"),
      axis.title.x = element_text(margin = margin(10,5,5,5)),
      axis.title.y = element_text(margin = margin(5,10,5,5)),
      axis.text = element_text(size = rel(1.2)))

A: From the figue, we can see that as study time increased, gpa gradually goes up.

  1. Visualize the relationship between out and gpa. What does your graph indicate?
ggplot(gpa, aes(x= out, y = gpa)) +
  geom_point(position = "jitter") + geom_smooth() +
  labs(title = "out vs gpa", x = "out", y = "gpa") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From the figue, we can’t see a clear relationship between out time and gpa.

  1. Visualize the relationship between out and sleepnight. What does your graph indicate?
ggplot(gpa, aes(x= out, y = sleepnight)) +
  geom_point(position = "jitter") + geom_smooth() +
  labs(title = "out vs sleepnight", x = "out", y = "sleepnight") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From the figure, we can see that as out time increases, sleep time at night also been increases.

  1. Visualize the relationship between gender and studyweek. What does your graph indicate?
ggplot(gpa, mapping = aes(x= gender, y = studyweek)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  
  labs(title = "gender vs studyweek.", x = "gender", y = "studyweek.") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From the figure, we can see that fmale spend more time in study at night than male.

  1. Visualize the relationship between gender and out. What does your graph indicate?
ggplot(gpa, mapping = aes(x= gender, y = out)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  
  labs(title = "gender vs out", x = "gender", y = "out") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From two plots, we can see that male go out at night more often than female.

  1. Present a question of your own interest related to this data set. Answer your question with analysis or visualization.

Question: Visualize the relationship between gender and gpa. What does your graph indicate?

ggplot(gpa, mapping = aes(x= gender, y = gpa)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  
  labs(title = "gender vs gpa", x = "gender", y = "gpa") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.4), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(0.8)))

A: From two plots, we can see that in general, female tend to have more higher gpa than male, on the other hand, more spread than male.

Q2: Data visualization tasks with loans_full_schema data set

Finish the following data visualization tasks using the full loans_full_schema data set (55 columns) in openintro library. For each task, you need to summarize what you learn from the graph accurately and concisely.

  1. Create a histogram of a numeric variable that you select and plot a density curve on top of the histogram. Carefully select bin numbers/sizes/boundaries to make the plot informative. What does this graph indicate?
my_data <- loans_full_schema %>%
  filter(annual_income > 0) %>%
  mutate(log_income = log10(annual_income))
ggplot(my_data, aes(x = log_income)) +
  geom_histogram(aes(y = ..density..), binwidth = 0.1,color = "white") +
  geom_density(color = "blue", size = 1.2) + 
  scale_x_continuous(
    breaks = seq(2, 6, by = 0.5)
  ) +
  labs(title = "Distribution of Log Annual Income",
       x = "Log Annual Income",
       y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = rel(1.5)),
    axis.title = element_text(size = rel(1.4)),
    axis.text = element_text(size = rel(1.2)),
    axis.text.x = element_text(angle = 0, hjust = 0.5)
  )

A: From the figure, we can see that income roughly follows normal distribution, and the most common income is about \(10^{4.7}\).

  1. Create a graph to study the effect of a categorical/discrete variable on the distributions of a numeric variable. What does this graph indicate?
ggplot(loans_full_schema, mapping = aes(x= homeownership, y = debt_to_income)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  
  labs(title = "homeownership vs debt_to_income", x = "homeownership", y = "debt to income") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

ggplot(loans_full_schema, aes(x = debt_to_income, y = homeownership, 
                  fill = homeownership, color = homeownership)) + 
  geom_density_ridges(alpha = 0.5)

A: From those plots, we can see that no matter homeownership status, the debt to income rate fallows a similar distribution.

  1. Create a bin heatmap (2d density plot) to study the relationship between two numeric variables that you select. Summarize the findings from the graph.
ggplot(loans_full_schema) + 
  geom_bin_2d(aes(x = interest_rate/100, y = annual_income)) +
  scale_x_continuous(name = "interest rate", labels = scales::percent, limits = c(0, 1)) +
  scale_y_log10(limits = c(5000, 2500000), labels = scales::dollar) +
  labs(title = "interest rate vs Annual Income", 
       x = "interest rate Ratio (in percentage)", 
       y = "Annual Income (in US dollar)") + 
  theme(plot.title = element_text(hjust = 0.5, size = rel(1.5), margin = margin(15,15,15,15)), 
        axis.title = element_text(size = rel(1.4)), 
        axis.title.x = element_text(margin = margin(10,5,5,5)), 
        axis.title.y = element_text(margin = margin(5,10,5,5)), 
        axis.text = element_text(size = rel(1.4)))

A: From the plot, we can see that people with higher interest rates usually have lower incomes.

  1. Use facet_wrap to create an informative plot. Summarize the findings from the graph.
ggplot(data = loans_full_schema) + 
  geom_point(mapping = aes(x = emp_length, y = debt_to_income/100), position = "jitter",alpha = 0.5) + 
  facet_wrap(~ homeownership, nrow = 2) +
  ylim(0, 2) +
  labs(title = "emp_length vs debt_to_income rates by homeownership", 
       x = "emp_length", 
       y = "debt_to_income") + 
  theme(plot.title = element_text(hjust = 0.5, size = rel(1.1), margin = margin(15,15,15,15)), 
        axis.title = element_text(size = rel(1.2)), 
        axis.title.x = element_text(margin = margin(10,5,5,5)), 
        axis.title.y = element_text(margin = margin(5,10,5,5)), 
        axis.text = element_text(size = rel(1.2))) +
  theme_minimal()

A: From the plot, we can see that as people emptly length increases, debt to income rate will decrease, no matter what kind of homeownership they are.

  1. Use facet_grid to create an informative plot. Summarize the findings from the graph.
ggplot(data = loans_full_schema) + 
  geom_point(mapping = aes(x = emp_length, y = debt_to_income/100), position = "jitter",alpha = 0.5) + 
  facet_grid(grade ~ homeownership) +
  ylim(0, 2) +
  labs(title = "emp_length vs debt_to_income rates by homeownership and grade", 
       x = "emp_length", 
       y = "debt_to_income") + 
  theme(plot.title = element_text(hjust = 0.5, size = rel(1.1), margin = margin(15,15,15,15)), 
        axis.title = element_text(size = rel(0.5)), 
        axis.title.x = element_text(margin = margin(10,5,5,5)), 
        axis.title.y = element_text(margin = margin(5,10,5,5)), 
        axis.text = element_text(size = rel(0.8))) +
  theme_minimal()

A: From the plot, we can see the relationship with empty length and debt to income rates in different grade and different homeownership, for example, grade G people tend to have lower debt to income in any situation.

  1. Present a question of your own interest related to this data set. Answer your question with analysis or visualization.

Question: what is the relationship between grade and annual_income

ggplot(loans_full_schema, mapping = aes(x= grade, y = annual_income)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot(aes(fill = grade)) +
  ylim(0, 8000) +
  labs(title = "grade vs annual_income", x = "grade", y = "annual_income") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.4), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(0.8)))

A: in my opinion, since NA too much, same grades don’t appear in this plot, but we can still see from remaining boxplots that as grade increases, they annual income also increase.

Q3: Data visualization and exploration tasks with ames data set

The ames data set is available through openintro package in R.

  1. Write an introductory paragraph to the data set which provides the basic information - what the data set is about; the number of samples and features; the scope that the features cover.
?ames
glimpse(ames)
## Rows: 2,930
## Columns: 82
## $ Order           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ PID             <int> 526301100, 526350040, 526351010, 526353030, 527105010,…
## $ area            <int> 1656, 896, 1329, 2110, 1629, 1604, 1338, 1280, 1616, 1…
## $ price           <int> 215000, 105000, 172000, 244000, 189900, 195500, 213500…
## $ MS.SubClass     <int> 20, 20, 20, 20, 60, 60, 120, 120, 120, 60, 60, 20, 60,…
## $ MS.Zoning       <fct> RL, RH, RL, RL, RL, RL, RL, RL, RL, RL, RL, RL, RL, RL…
## $ Lot.Frontage    <int> 141, 80, 81, 93, 74, 78, 41, 43, 39, 60, 75, NA, 63, 8…
## $ Lot.Area        <int> 31770, 11622, 14267, 11160, 13830, 9978, 4920, 5005, 5…
## $ Street          <fct> Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pave, …
## $ Alley           <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Lot.Shape       <fct> IR1, Reg, IR1, Reg, IR1, IR1, Reg, IR1, IR1, Reg, IR1,…
## $ Land.Contour    <fct> Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, HLS, Lvl, Lvl, Lvl,…
## $ Utilities       <fct> AllPub, AllPub, AllPub, AllPub, AllPub, AllPub, AllPub…
## $ Lot.Config      <fct> Corner, Inside, Corner, Corner, Inside, Inside, Inside…
## $ Land.Slope      <fct> Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl,…
## $ Neighborhood    <fct> NAmes, NAmes, NAmes, NAmes, Gilbert, Gilbert, StoneBr,…
## $ Condition.1     <fct> Norm, Feedr, Norm, Norm, Norm, Norm, Norm, Norm, Norm,…
## $ Condition.2     <fct> Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, …
## $ Bldg.Type       <fct> 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, TwnhsE, TwnhsE, Tw…
## $ House.Style     <fct> 1Story, 1Story, 1Story, 1Story, 2Story, 2Story, 1Story…
## $ Overall.Qual    <int> 6, 5, 6, 7, 5, 6, 8, 8, 8, 7, 6, 6, 6, 7, 8, 8, 8, 9, …
## $ Overall.Cond    <int> 5, 6, 6, 5, 5, 6, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 2, …
## $ Year.Built      <int> 1960, 1961, 1958, 1968, 1997, 1998, 2001, 1992, 1995, …
## $ Year.Remod.Add  <int> 1960, 1961, 1958, 1968, 1998, 1998, 2001, 1992, 1996, …
## $ Roof.Style      <fct> Hip, Gable, Hip, Hip, Gable, Gable, Gable, Gable, Gabl…
## $ Roof.Matl       <fct> CompShg, CompShg, CompShg, CompShg, CompShg, CompShg, …
## $ Exterior.1st    <fct> BrkFace, VinylSd, Wd Sdng, BrkFace, VinylSd, VinylSd, …
## $ Exterior.2nd    <fct> Plywood, VinylSd, Wd Sdng, BrkFace, VinylSd, VinylSd, …
## $ Mas.Vnr.Type    <fct> Stone, None, BrkFace, None, None, BrkFace, None, None,…
## $ Mas.Vnr.Area    <int> 112, 0, 108, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603,…
## $ Exter.Qual      <fct> TA, TA, TA, Gd, TA, TA, Gd, Gd, Gd, TA, TA, TA, TA, TA…
## $ Exter.Cond      <fct> TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, Gd, TA, TA…
## $ Foundation      <fct> CBlock, CBlock, CBlock, CBlock, PConc, PConc, PConc, P…
## $ Bsmt.Qual       <fct> TA, TA, TA, TA, Gd, TA, Gd, Gd, Gd, TA, Gd, Gd, Gd, Gd…
## $ Bsmt.Cond       <fct> Gd, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA…
## $ Bsmt.Exposure   <fct> Gd, No, No, No, No, No, Mn, No, No, No, No, No, No, Gd…
## $ BsmtFin.Type.1  <fct> BLQ, Rec, ALQ, ALQ, GLQ, GLQ, GLQ, ALQ, GLQ, Unf, Unf,…
## $ BsmtFin.SF.1    <int> 639, 468, 923, 1065, 791, 602, 616, 263, 1180, 0, 0, 9…
## $ BsmtFin.Type.2  <fct> Unf, LwQ, Unf, Unf, Unf, Unf, Unf, Unf, Unf, Unf, Unf,…
## $ BsmtFin.SF.2    <int> 0, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1120, 0, 0…
## $ Bsmt.Unf.SF     <int> 441, 270, 406, 1045, 137, 324, 722, 1017, 415, 994, 76…
## $ Total.Bsmt.SF   <int> 1080, 882, 1329, 2110, 928, 926, 1338, 1280, 1595, 994…
## $ Heating         <fct> GasA, GasA, GasA, GasA, GasA, GasA, GasA, GasA, GasA, …
## $ Heating.QC      <fct> Fa, TA, TA, Ex, Gd, Ex, Ex, Ex, Ex, Gd, Gd, Ex, Gd, Gd…
## $ Central.Air     <fct> Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, …
## $ Electrical      <fct> SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr…
## $ X1st.Flr.SF     <int> 1656, 896, 1329, 2110, 928, 926, 1338, 1280, 1616, 102…
## $ X2nd.Flr.SF     <int> 0, 0, 0, 0, 701, 678, 0, 0, 0, 776, 892, 0, 676, 0, 0,…
## $ Low.Qual.Fin.SF <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Bsmt.Full.Bath  <int> 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, …
## $ Bsmt.Half.Bath  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Full.Bath       <int> 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 3, 2, 1, …
## $ Half.Bath       <int> 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
## $ Bedroom.AbvGr   <int> 3, 2, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 1, 4, 4, 1, …
## $ Kitchen.AbvGr   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Kitchen.Qual    <fct> TA, TA, Gd, Ex, TA, Gd, Gd, Gd, Gd, Gd, TA, TA, TA, Gd…
## $ TotRms.AbvGrd   <int> 7, 5, 6, 8, 6, 7, 6, 5, 5, 7, 7, 6, 7, 5, 4, 12, 8, 8,…
## $ Functional      <fct> Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ,…
## $ Fireplaces      <int> 2, 0, 0, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, …
## $ Fireplace.Qu    <fct> Gd, NA, NA, TA, TA, Gd, NA, NA, TA, TA, TA, NA, Gd, Po…
## $ Garage.Type     <fct> Attchd, Attchd, Attchd, Attchd, Attchd, Attchd, Attchd…
## $ Garage.Yr.Blt   <int> 1960, 1961, 1958, 1968, 1997, 1998, 2001, 1992, 1995, …
## $ Garage.Finish   <fct> Fin, Unf, Unf, Fin, Fin, Fin, Fin, RFn, RFn, Fin, Fin,…
## $ Garage.Cars     <int> 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, …
## $ Garage.Area     <int> 528, 730, 312, 522, 482, 470, 582, 506, 608, 442, 440,…
## $ Garage.Qual     <fct> TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA…
## $ Garage.Cond     <fct> TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA…
## $ Paved.Drive     <fct> P, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, …
## $ Wood.Deck.SF    <int> 210, 140, 393, 0, 212, 360, 0, 0, 237, 140, 157, 483, …
## $ Open.Porch.SF   <int> 62, 0, 36, 0, 34, 36, 0, 82, 152, 60, 84, 21, 75, 0, 5…
## $ Enclosed.Porch  <int> 0, 0, 0, 0, 0, 0, 170, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X3Ssn.Porch     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Screen.Porch    <int> 0, 120, 0, 0, 0, 0, 0, 144, 0, 0, 0, 0, 0, 0, 140, 210…
## $ Pool.Area       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Pool.QC         <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Fence           <fct> NA, MnPrv, NA, NA, MnPrv, NA, NA, NA, NA, NA, NA, GdPr…
## $ Misc.Feature    <fct> NA, NA, Gar2, NA, NA, NA, NA, NA, NA, NA, NA, Shed, NA…
## $ Misc.Val        <int> 0, 0, 12500, 0, 0, 0, 0, 0, 0, 0, 0, 500, 0, 0, 0, 0, …
## $ Mo.Sold         <int> 5, 6, 6, 4, 3, 6, 4, 1, 3, 6, 4, 3, 5, 2, 6, 6, 6, 6, …
## $ Yr.Sold         <int> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, …
## $ Sale.Type       <fct> WD , WD , WD , WD , WD , WD , WD , WD , WD , WD , WD ,…
## $ Sale.Condition  <fct> Normal, Normal, Normal, Normal, Normal, Normal, Normal…
unique(ames)

A: Data set contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010, contains with 2930 rows and 82 variables.

  1. Use a plot to analyze how area correlates with price. Summarize your finding from the graph.
ggplot(ames, aes(x= area, y = price)) +
  geom_point(position = "jitter", alpha = 0.7) + geom_smooth() +
  labs(title = "area vs price.", x = "area", y = "price.") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2))) +
  theme_minimal()

A: From the figure, we can see that as area increases, price also increase in generally.

  1. Use a plot to analyze how Bldg.Type correlates with price. Explain the meaning of each label for Bldg.Type and summarize your finding from the graph.
unique(ames$Bldg.Type)
## [1] 1Fam   TwnhsE Twnhs  Duplex 2fmCon
## Levels: 1Fam 2fmCon Duplex Twnhs TwnhsE
ggplot(ames, aes(x = price, fill = Bldg.Type)) +
  geom_histogram(binwidth = 20000, alpha = 0.7, color = "white") +  
  scale_x_continuous(labels = scales::dollar) +
  labs(title = "Prices vs dwelling Type",
       x = "Sale Price", 
       y = "Count",
       fill = "dwelling Type") + 
  theme_minimal() + 
  theme(
    plot.title = element_text(hjust = 0.5, size = rel(1.5)),
    axis.title = element_text(size = rel(1.2)),
    axis.text = element_text(size = rel(1.1)),
    
  )

A: - 1Fam: standalone residential building - TwnhsE : part of a row of houses but is located at either end of the row - Twnhs : middle unit within a row of townhouses - Duplex: contains two separate living units

From the plot , we can see that 1Fam is most common dwelling type, TwnhsE is least common type.

  1. Use a plot to analyze how Bldg.Type and area altogether correlates with price. Summarize your finding from the graph.
ggplot(ames, aes(x = area, y = price, color = Bldg.Type)) +
  geom_point(alpha = 0.7, position = "jitter") +
    geom_smooth() +
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "area vs price", x = "area", y = "price") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From the plot, we can see that as area increase, prise also increase, but 1Fama will be not when area more than about 3500.

  1. You may need to self-study to fulfill this task: use a plot to study how area and Year.Built together correlates with price. Summarize your finding from the graph. You are allowed to ask AI to give you hints about the plot types that you can use. But you are not allowed to ask AI to generate codes or give function names directly.
ggplot(ames, aes(x = area, y = price, color = Year.Built)) +
  geom_point(alpha = 0.3, position = "jitter") +
  scale_y_continuous(labels = dollar_format()) + 
    geom_smooth(color = "black") +
  scale_color_gradient(low = "blue", high = "yellow") + 
  labs(title = "area vs price by Year.Built", x = "area", y = "price") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

  1. Present a question of your own interest related to this data set. Answer your question with analysis or visualization. Q: Use a plot to study how area and House.Style together correlates with price.
ggplot(ames, aes(x = area, y = price, color = House.Style)) +
  geom_point(alpha = 0.7, position = "jitter") +
    geom_smooth() +
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "area vs price by House.Style", x = "area", y = "price") +
  theme(plot.title = element_text(hjust = 0.5,size = rel(1.5), color = "red", margin = margin(15,15,15,15)),
  axis.title = element_text(rel(1.2), color = "blue"),
  axis.title.x = element_text(margin = margin(10,5,5,5)),
  axis.title.y = element_text(margin = margin(10,5,5,5)),
  axis.text = element_text(size = rel(1.2)))

A: From the plot, we can see that in generally as area increases, price also increase, but 1story and 2story will be decrease when area is to big

warnings()