answer: gpa: The GPA (grade) of the students
studyweek: Students’ study hours per week
sleepnight: Students’ sleep hours per night
out: The number of nights students go out per week (average)
gender: students’ gender (male/female)
ggplot(gpa, aes(x = studyweek, y = gpa)) +
geom_density2d_filled() +
labs(title = "Students GPA from study hours per week",
x = "Study Hours per week",
y = "GPA") +
ylim(2.5,4) +
theme(plot.title = element_text(hjust = 0.5, size = 20),
text = element_text(size = 15))
Comments: Almost all the students have GPA above 3. A lot of the students study hours is between 10 to 20 hours. Based on the graph, the correlation between students’ study hours per week and their GPA is pretty weak. There might be many reasons that lead to this result. It is possible that most of the students in the Duke university are just excellent or other possible reasons.
ggplot(gpa, aes(x = out, y = gpa)) +
geom_density2d_filled() +
labs(title = "Graph of Students social time and GPA",
x = "Average nights students go out",
y = "GPA") +
ylim(2.5,4) +
theme(plot.title = element_text(hjust = 0.5, size = 20),
text = element_text(size = 15))
Comments: Most of the students spend around 1.5 to 2 average nights out each week while they have GPA around 3.5 and higher. Some students speed at average 3 nights go out per week but still having a pretty good GPA, which close to 4.0. The correlation between out and gpa is still pretty weak. It is hard to make a conclusion on how much GPA a student got by the nights he goes out per week.
ggplot(gpa, aes(x = out, y = sleepnight)) +
geom_density2d_filled() +
labs(title = "Graph of Students social time and sleep hours per night",
x = "Average nights students go out",
y = "Sleep hours per night") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
ggplot(gpa, aes(x = out, y = sleepnight)) +
geom_point(position = "jitter") +
geom_smooth() +
labs(title = "Graph of Students social time and sleep hours per night",
x = "Average nights students go out",
y = "Sleep hours per night") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
comments: My graph indicate that the number of nights students go out has better correlation. Both graphs show that as more nights a student goes out a week, he intend to sleep more hours at night. However, this correlation become weak when a student sleep for around 7.5 hours. A lot of students go out for around 1.5 to 2 nights result sleep for around 7 hours at night.
ggplot(gpa, aes(studyweek, gender)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar", width = 0.5) +
labs(title = "Graph of Students' gender and study hours per week",
x = "Study hours per week",
y = "Gender") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
Comments: My graph indicate that female students’ study hours range is bigger and they are more likely study more hours per week than male students. Male students study hours more concentrated lower than female students’ study hours as a group.
ggplot(gpa, aes(out, gender)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar", width = 0.5) +
labs(title = "Graph of Students' gender and nights go out per week",
x = "Nights go out per week",
y = "Gender") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
Comments: Based on my graph, the male students spend more nights going out. The data for the male students are pretty concentrated too. Compared to the male students, the female students are spend less nights going out in average, but the data is more seperate than male students.
ggplot(loans_full_schema, aes(annual_income)) +
geom_histogram(aes(y = after_stat(density)), bins = 30, boundary = 0, fill = "skyblue", colour = "white") +
geom_density(linewidth = 1.2, colour = "red", adjust = 30/8) +
scale_x_continuous(labels = scales::dollar, limits = c(0, 300000)) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Density plot of Annual income",
x = "Annual income (USD)",
y = "Density") +
theme(plot.title = element_text(hjust = 0.5, size = 20),
text = element_text(size = 15))
Comments: Based on the graph, it is easy to tell that many of the loans applier have their annual income around $50k. It is possible that they need the money to buy houses and other expansive stuffs but they don’t have that much money right now. It is also shows only very few people earn a lots of money while a lot of them earn pretty low.
ggplot(loans_full_schema) +
geom_boxplot(aes(homeownership, debt_to_income/100)) +
scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
labs(title = "Loan Data of debt-to-income and homeownership",
x = "Homeownership",
y = "Debt to income ratio") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
Comments: The homeownership does not affect a person’s debt-to-income ratio in general. However, I found that people who have their home mortgage have more people haveing a higher debt-to-income ratio.
ggplot(loans_full_schema, aes(total_credit_limit, annual_income)) +
geom_density2d_filled() +
scale_x_continuous(name = "Total credit limit (USD)", labels = scales::dollar, limits = c(0, 500000)) +
scale_y_continuous(name = "Annual Income (USD)", labels = scales::dollar, limits = c(0, 150000)) +
labs(title = "2D density plot of annual income and total credit limit") +
theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 12))
comments: Base on the plot, I found the correlation between annual income and total credit limit of a person. Normally, the more you made, the higer the credit limit you have. A lot of people have their annual income around 30k and the total credit limit at around the same.
ggplot(data = loans_full_schema) +
geom_bin2d(mapping = aes(x = total_credit_limit, y = debt_to_income/100)) +
facet_wrap(~ homeownership, nrow = 3) +
labs(title = "Leading Club loans data to homeownership",
x = "Total credit limit (USD)",
y = "debt-to-income ratio") +
scale_x_continuous(labels = scales::dollar, limits = c(0, 500000)) +
scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
theme(plot.title = element_text(hjust = 0.5, size = rel(1.5), margin = margin(15,15,15,15)),
axis.title = element_text(size = 15),
axis.text = element_text(size = 10))
Comments: It is surprise that some of the people who have their residence mortgage having more total credit limit than others while at quite heavy debt-to-income ratio. Residence owners and tenants have similar total credit limit while the tenants have more people under larger debt-to-income.
ggplot(data = loans_full_schema) +
geom_bin_2d(mapping = aes(x = loan_amount, y = interest_rate/100), bins = 20) +
facet_grid(verified_income ~ grade) +
labs(title = "Leading Club loans data",
x = "Loan amount (USD)",
y = "Interest rate") +
scale_x_continuous(labels = scales::dollar) +
scale_y_continuous(labels = scales::percent, limits = c(0,0.5)) +
theme(plot.title = element_text(hjust = 0.5, size = 20, margin = margin(15,15,15,15)),
text = element_text(size = 15),
axis.text.x = element_text(angle = 60, size = 10, hjust = 0.5, vjust = 0.5))
Comments: The higher level of loan grade (A is the best), the lower the interest rate apply to them. The income Verified applicants are less in umber than others. The state of income verification also have influence on the amount of loan one’s can get.
Answer: The ames data set contains data about the individual residential properties that sold in Ames from 2006 to 2010. There are 2930 rows and 82 variables, which comprising a mix of nominal, ordinal, discrete, and continuous data types. This data set has detailed measurements of the houses, including total living area, basement square footage, location, garage size, and distinct porch types. Quality rating for each houses for an better analysis. It also contain information on heating or cooling systems, electrical systems, and the presence of amenities likes fireplaces, pools and masonry veneers.
ggplot(ames, aes(x = area, y = price)) +
geom_density2d_filled() +
labs(title = "2D density plot for house living area and price",
x = "House living (ground) area (square feet)",
y = "Price (USD)") +
scale_y_continuous(labels = scales::dollar, limits = c(0,400000)) +
xlim(0,3000) +
theme(plot.title = element_text(hjust = 0.5, size = 20),
text = element_text(size = 15))
Comments: Based on my graph, the price has a strong correlation with its ground living area. The larger the area the higher the price. Most of the houses that are included in the data have area around 1000 square feet.
ggplot(ames, aes(x = Bldg.Type, y = price)) +
geom_boxplot() +
stat_boxplot(geom = 'errorbar', width = 0.5) +
labs(title = "Box plot for house living area and price",
x = "Type of dwelling",
y = "Price (USD)") +
scale_y_continuous(labels = scales::dollar, limits = c(0,400000)) +
theme(plot.title = element_text(hjust = 0.5, size = 20),
text = element_text(size = 15))
Comments: In Bldg.Type, 1Fam means single-family Detached; 2fmCon means Two-family Conversion, but originally built as one-family dwelling; Duplex means duplex dwelling; TwnhsE means Townhouse end unit; Twnhsl means Townhouse inside unit. Single-family detached dwelling are having a big range of price. Most of them are more expensive than others. Townhouse end units are a really similar to the single-family detached dwelling, but sharing wall with neighbors, also price pretty high. The lowest price of them are higher than the lowest price of single-family detached dwelling. 2fmCom, Duplex and Twnhs have their price range small. and cheaper than most of the single-family detached dwelling and townhouse end units.
ggplot(data = ames, aes(x = area, y = price)) +
geom_bin_2d() +
geom_smooth(color = "red") +
facet_wrap(~ Bldg.Type, nrow = 2) +
labs(title = "2D bins plot for house type and house area affection on price",
x = "House Area (Square Feet)",
y = "Price (USD)") +
scale_y_continuous(labels = scales::dollar, limits = c(0, 600000)) +
xlim(0, 4000) +
theme(plot.title = element_text(hjust = 0.5, size = rel(1.5), margin = margin(15,15,15,15)),
axis.title = element_text(size = 15),
axis.text = element_text(size = 10))
**Comments:* The town house end units have the highest price increasing rate compare to others when we ignoring the outliars which have are bigger than 2000. Single family detached dwellings have the second fast price increasing rate.