The gpa data set is available through openintro package in R. Answer the following questions with an appropriate graph. Summarize your finding in plain text for each graph to answer the question.
library(airports)
library(cherryblossom)
library(usdata)
library(tidyverse)
library(openintro)
library(scales)
view(gpa)
glimpse(gpa)
?gpa
GPA is a data frame with 55 observations on the following 5 variables: gpa, study week, sleep night, out, gender
gpa: Grade Point Average (GPA) is a standardized measure of a student’s academic performance on a scale from 0.0 to 4.0
study week : hours of study per week
sleep night: number of night per week student sleep
out: number of night per week student go out
gender: levels female and male, indicating the student’s gender
ggplot(gpa, aes(x=studyweek,y= gpa))+
geom_point(position= 'jitter')+
geom_smooth(fill = "blue") +
labs(title = "Relationship between Study hours per week and GPA",
x = "Study hour per (week)",
y = "GPA ") +
theme(plot.title = element_text(hjust = 0.5) )
Based on the first glance, the graph looks quite uniform, like the study hours per week does not really affect the GPA. But the graph actually tends to have an upper slope when there are more study hours, there are better results.
ggplot(gpa, aes(x=out,y= gpa))+
geom_point(position= 'jitter')+
geom_smooth(fill = "blue")+
labs(title = "Relationship between Hanging out hours per week and GPA",
x = "Hanging out hours per (week)",
y = "GPA ") +
theme(plot.title = element_text(hjust = 0.5) )
It’s obvious that if student spend about 2 nights going out then there is a higher chance to get lower grade
ggplot(gpa, aes(x=out,y= sleepnight))+
geom_point(position= 'jitter')+
geom_smooth(fill = "blue") +
labs(title = "Relationship between Hanging out hours per week and Hour of sleep",
x = "Hanging out hours per (week)",
y = "Hour of sleep ") +
theme(plot.title = element_text(hjust = 0.5) )
This gharph indicates an upper slope where people with longer houtsid need more sleep days
ggplot(gpa, aes(x=gender,y= studyweek))+
stat_boxplot(geom = "errorbar", width = 0.5)+
geom_boxplot() +
labs(title = "Relationship between Gender and Study hour per week",
y = "Study hours per (week)",
x = "Gender ") +
theme(plot.title = element_text(hjust = 0.5) )
The box plots show that in general Female study harder han male but ont he average, both of them study for about 15 hours a week.
ggplot(gpa, aes(x=gender,y= out))+
stat_boxplot(geom = "errorbar", width = 0.5)+
geom_boxplot()+
labs(title = "Relationship between Hanging out hours per week and Gender",
y = "Hanging out hours per (week)",
x = "Gender ") +
theme(plot.title = element_text(hjust = 0.5) )
According from the graph, boys tend to have more time to go out compare to the girl
Finish the following data visualization tasks using the full loans_full_schema data set (55 columns) in openintro library. For each task, you need to summarize what you learn from the graph accurately and concisely.
view(loans_full_schema)
glimpse(loans_full_schema)
Data: Interest_rate
ggplot(loans_full_schema, aes(x = interest_rate))+
geom_histogram( aes(y = after_stat(density)), bins = 20, boundary= 0, fill = "white", color = "black")+
geom_density( adjust = 3/2, color = "blue")+
labs(title = "Density of Interest rate",
x = "Interest rate (percent)",
y = "Density ") +
theme(plot.title = element_text(hjust = 0.5) )
The graph shows an inversion relationship where the higher the interestate, the lest people own/ attracted to it, which is pretty fair. The most common interest rate is 10%
Data: homeowner_ship and loan_amount
ggplot(loans_full_schema, aes(x=homeownership,y= loan_amount))+
stat_boxplot(geom = "errorbar", width = 0.5)+
geom_boxplot()+
labs(title = "Comparision of Amount of Loan between Home ownership",
x = "Home ownership",
y = "Loan amount ($) ") +
theme(plot.title = element_text(hjust = 0.5) )
The graph shoes that people who have mortgage often have a higher loan amount compare to people who own their house or rent their house.
Data :
ggplot(loans_full_schema) +
geom_bin_2d(aes(x = debt_to_income, y = interest_rate), bins= 20) +
xlim(0, 100) +
labs(title = "2D Density plot of Debt-to-income ratio to Interest rate",
y = "Interest rate (percent)",
x = "Debt-to-income ratio") +
theme(plot.title = element_text(hjust = 0.5) )
The lower the debt to income, the higher the trust/ good credit, and that make the lower the interest rate. We can tell that under 36% of debt to income, there are a lot of people get the lower interest rate. And since the higher the density,the lighter the color, that’s why the color in the left down corner is in lighter blue range.
ggplot(data = loans_full_schema) +
geom_point(mapping = aes(x = interest_rate, y = annual_income)) +
facet_wrap(~ grade, nrow = 2) +
labs(title = "Interest rate compare with annual income by Grade",
x = "Interest rate (percent)",
y = "Annual income (dollar)") +
theme(plot.title = element_text(hjust = 0.5,))
People with higher income and lower interest rate would have a better grade associated with the loan.
ggplot(data = loans_full_schema) +
geom_point(mapping = aes(x = interest_rate, y = annual_income)) +
facet_grid(grade~ homeownership) +
labs(title = "Interest rate compare with annual income by Grade and Home ownership",
x = "Interest rate (percent)",
y = "Annual income (dollar)") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.y =element_text(size = 8) )
There is not much a different between the household ownership but there we can tell many people do mortgage and most of them has a lot of money and normally the one that doesn’t have much money have a higher interest rate, nomatter if they do mortgate or owning a house or renting a house
The ames data set is available through openintro package in R.
view (ames)
glimpse (ames)
?ames
####: Answer:
Data set contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010. This Data set is including with 2930 sample and 82 feature covering all the information about Housing in ames, Iowa (price, area, streets,…).
ggplot(ames, aes(x= area, y = price))+
geom_point()+
geom_smooth()+
scale_y_continuous(labels = comma)+
labs(title = "Realtionship between Area of housing and Price in Ames ",
x = "Area (square feet)",
y = "Price (dollar)") +
theme(plot.title = element_text(hjust = 0.5,))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
For Area that’s smaller than 4500 square feet, the trend of the price is increasing when the area get larger but then at the area of 4500 or larger the price go on a slightly downward slope.
ggplot(ames, aes(x=Bldg.Type,y= price))+
stat_boxplot(geom = "errorbar", width = 0.5)+
geom_boxplot() +
scale_y_continuous(labels = comma)+
labs(title = "Relationship between Type of dwelling and Prices of housing in Ames",
y = "Price (dollar) ",
x = "Type of dwelling") +
theme(plot.title = element_text(hjust = 0.5) )
Normally Prices across different type of dwelling are not muhc different, especially for 2fmCon and Duplex, the price are almost close, and Duplex is just slightly more expensive but they both have a small range of pricing. But for 1Fam type of dwelling, they have a large range of outlier, which mean there are many people sell over prices house for this type of dwelling and it can be super expensive. 1Fam is also a type of dwelling that most of the house is built here.
ggplot(ames, aes(x=Bldg.Type,y= area))+
stat_boxplot(geom = "errorbar", width = 0.5)+
geom_boxplot() +
labs(title = "Relationship between Type of dwelling and Area of housing in Ames",
y = "Area (square feet) ",
x = "Type of dwelling") +
theme(plot.title = element_text(hjust = 0.5) )
In this plot, 1Fam still has the largest range with lot’s of outliers which mean the size of a 1 Fam house can be really big or really small. For 2fmCon and Duplex,they both have the quite similar in area, but on average, Duplex is larger than 2fmCon. The smallest area on average is Twnhs which the averrage just slighly more than 1000 square feet
ggplot(data = ames) +
geom_point(mapping = aes(x = area, y = price, color =
Year.Built), position = "jitter") +
scale_y_continuous(labels = comma)+
labs(title = "Area and price between each Year",
x = "Area (Square feet)",
y = "Price (dollar)") +
theme(plot.title = element_text(hjust = 0.5,))
Based on the plot, for most of the data, the larger the area, the more expensive the house, and it seems like the house that was built before 1960 was bigger and cheaper than most of the house that was built in 2000. But there is still a few houses that was build in 2000 with a large area and a low price
ggplot(data = ames) +
geom_point(mapping = aes(x = area, y = price, color = Year.Built), position = "jitter", fill = "White") +
facet_wrap(~Bldg.Type, nrow = 2)+
scale_y_continuous(labels = comma)+
labs(title = "Area and price between each Year separated by Type of Dwelling",
x = "Area (Square feet)",
y = "Price (dollar)") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(size = 5))
This plot showed that most of the house in this aarea is 1Fam type of house. This type of hose has variety of price and area and has been trendy for the whole time. 2fm is the type of house that was build in the end of 1800s and beginning of the 1900s more. They have relatively low price at that time. We can clearly tell Twnhs and TwnhsE are more of the modern house since it’s built in later 1990, start of 2000.