The first step in any comprehensive data analysis is to explore each import variable in turn. Univariate graphs plot the distribution of data from a single variable. The variable can be categorical (e.g., race, sex, political affiliation) or quantitative (e.g., age, weight, income).
The Marriage dataset contains the marriage records of 98 individuals in Mobile County, Alabama (see Appendix A.5). We’ll explore the distribution of three variables from this dataset: the age and race of the wedding participants and the occupation of the wedding officials.
The race of the participants and the occupation of the officials are both categorical variables. The distribution of a single categorical variable is typically plotted with a bar chart, a pie chart, or (less commonly) a tree map or waffle chart.
library(ggplot2)
data(Marriage, package = "mosaicData")
# plot the distribution of race
ggplot(Marriage, aes(x = race)) +
geom_bar()
labs(title = "Race of 98 Individuals From Marriage Records",
subtitle = "Mobile County Alabama",
x = "Race",
y = "# People")
## $x
## [1] "Race"
##
## $y
## [1] "# People"
##
## $title
## [1] "Race of 98 Individuals From Marriage Records"
##
## $subtitle
## [1] "Mobile County Alabama"
##
## attr(,"class")
## [1] "labels"
What do you observe?
I observe that in Mobile County Alabama the race with the most marriage records is white.
How can we improve this graph?
We can improve this graph by not having the colors bland, having the bars smaller, adding color to the bars, and adding.
You can modify the bar fill and border colors, plot labels, and title by adding options to the geom_bar function. In ggplot2, the fill parameter is used to specify the color of areas such as bars, rectangles, and polygons. The color parameter specifies the color objects that technically do not have an area, such as points, lines, and borders.
# plot the distribution of race with modified colors and labels
ggplot(Marriage, aes(x=race)) +
geom_bar(fill = "cornflowerblue",
color="black") +
labs(x = "Race",
y = "Frequency",
title = "Participants by race")
Suppose we want to modify this to represent percents instead of counts? Guess how we might do this?
We would do this by specifying we want percentages through an argument.
Bars can represent percents rather than counts. For bar charts, the code aes(x=race) is actually a shortcut for aes(x = race, y = after_stat(count)), where count is a special variable representing the frequency within each category. You can use this to calculate percentages, by specifying y variable explicitly.
# plot the distribution as percentages
ggplot(Marriage,
aes(x = race, y = after_stat(count/sum(count)))) +
geom_bar() +
labs(x = "Race",
y = "Percent",
title = "Participants by race") +
scale_y_continuous(labels = scales::percent)
It is often helpful to sort the bars by frequency. The frequencies are calculated explicitly in the code below.
# calculate number of participants in each race category
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
plotdata <- Marriage %>%
count(race)
plotdata
## race n
## 1 American Indian 1
## 2 Black 22
## 3 Hispanic 1
## 4 White 74
This new dataset is then used to create the graph with the following modifications:
reorder function is used to sort the categories by
frequency.# plot the bars in ascending order
ggplot(plotdata,
aes(x = reorder(race, n), y = n)) +
geom_bar(stat="identity") +
labs(x = "Race",
y = "Frequency",
title = "Participants by race")
The graph bars are sorted in ascending order. Use
reorder(race, -n) to sort in descending order.
Try making this change to the above graph.
ggplot(plotdata,
aes(x = reorder(race, -n), y = n)) +
geom_bar(stat="identity") +
labs(x = "Race",
y = "Frequency",
title = "Participants by race")
Finally, you may want to label each bar with its numerical value.
In the code below:
# plot the bars with numeric labels
ggplot(plotdata,
aes(x = race, y = n)) +
geom_bar(stat="identity") +
geom_text(aes(label = n), vjust=-0.5) +
labs(x = "Race",
y = "Frequency",
title = "Participants by race")
Modify the above graph with the following:
ggplot(plotdata,
aes(x = reorder(race, -n),
y = n/sum(n))) +
geom_bar(stat="identity",fill= "indianred3", color = "black") +
geom_text(aes(label = paste(c(
round(n/sum(n),2)*100),"%", sep="")),
vjust=-0.2) +
labs(x = "Race",
y = "Frequency",
title = "Participants by race") +
scale_y_continuous(breaks = seq(0,80,20),
labels = scales::percent)
It’s a bit tricky, but we sort the bars AND relabel the vertical axis and bars as percent:
# plot the bars with numeric labels
ggplot(plotdata,
aes(x = reorder(race, -n),
y = n/sum(n))) +
geom_bar(stat="identity",fill= "indianred3", color = "black") +
geom_text(aes(label = paste(c(
round(n/sum(n),2)*100),"%", sep="")),
vjust=-0.5) +
labs(x = "Race",
y = "Frequency",
title = "Participants by race") +
scale_y_continuous(limits = c(0,0.8),
labels = scales::percent)
Consider the distribution of marriage officials.
What is problematic with the following?
The bar chart is going to have overlapping labels.
# Basic bar chart with overlapping labels
ggplot(Marriage, aes(x=officialTitle)) +
geom_bar() +
labs(x = "Officiate",
y = "Frequency",
title = "Marriages by officiate")
What ideas do you have for fixing this?
Sorting the bars in descending order, adding color, and spacing out the labels so that they are not overlapping each other.
Here are three approaches. Identify the key features of each.
In the chart below the labels are organized better because they do not overlap, making the chart easier to read.
# horizontal bar chart
ggplot(Marriage, aes(x = officialTitle)) +
geom_bar() +
labs(x = "",
y = "Frequency",
title = "Marriages by officiate") +
coord_flip()
In the chart below the labels are slanted which prevents them from overlapping, making the chart easier to read.
# bar chart with rotated labels
ggplot(Marriage, aes(x=officialTitle)) +
geom_bar() +
labs(x = "",
y = "Frequency",
title = "Marriages by officiate") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1))
In the chart below the labels alternate between being positioned higher and lower, making the chart easier to read.
# bar chart with staggered labels
lbls <- paste0(c("","\n"), levels(Marriage$officialTitle))
ggplot(Marriage,
aes(x=factor(officialTitle,
labels = lbls))) +
geom_bar() +
labs(x = "",
y = "Frequency",
title = "Marriages by officiate")
Another variant of a bar chart is the stacked bar chart. This helps visualize the relative frequency of counts as part of a whole.
To make a stacked bar chart, we specify the following:
ggplot( data = Marriage,
aes( x = "",
fill = officialTitle)) +
geom_bar( position = "stack" )
We can also specify the
ggplot( data = Marriage,
aes( x = "",
fill = officialTitle)) +
geom_bar( position = "stack" )
Use whichever of the above graphs you prefer, then add appropriate title and axis lables:
ggplot( data = Marriage,
aes( x = "",
fill = officialTitle)) +
geom_bar( position = "stack" ) +
labs( title = "Wedding Officiate Occupation",
x = "",
fill = "Official Title")
Choose another categorical variable from the marriage dataset to make a stacked bar chart.
ggplot( data = Marriage,
aes( x = "",
fill = race)) +
geom_bar( position = "stack" ) +
labs( title = "Wedding Officiate by Race",
x = "",
fill = "Race")
Pie charts are controversial in statistics.
Why do you think this is the case?
Because they are not as exact as other visualizations.
A pie chart is essentially a stacked bar chart in polor coordinates.
So to make a pie chart in ggplot2, we simply add the layer
coord_polar() to a stacked barchart:
ggplot( data = Marriage,
aes( x = "",
fill = sign)) +
geom_bar( position = "stack" ) +
coord_polar( theta = "y")
What is problematic here?
The problem is that you cannot tell the exact number of each variable.
Make a bar chart of the
signvariable. Which is more informative, this or the previous graph?
ggplot(data = Marriage,
aes(x = sign,
fill = sign)) +
geom_bar(position = "stack") +
coord_flip() +
labs(x = "Sign",
y = "Count",
title = "Frequency of Wedding Signs")
If you aim to compare the frequency of categories, you are better off with bar charts (humans are better at judging the length of bars than the volume of pie slices). If your goal is to compare each category with the the whole (e.g., what portion of participants are Hispanic compared to all participants), and the number of categories is small, then pie charts may work.
Make a pie chart of the race variable, with appropriate labels. Compared to the race variable, which do you prefer?
ggplot(data = Marriage,
aes(x = "",
fill = race)) +
geom_bar(position = "stack") +
coord_polar(theta = "y") +
labs(fill = "Race",
title = "Officiants By Race",
x = "",
y = "percentage")
An alternative to a pie chart is a tree map. Unlike pie charts, it can handle categorical variables that have many levels.
library(treemapify)
# create a treemap of marriage officials
plotdata <- Marriage %>%
count(officialTitle)
ggplot(plotdata,
aes(fill = officialTitle, area = n)) +
geom_treemap() +
labs(title = "Marriages by officiate")
Here is a more useful version with labels.
# create a treemap with tile labels
ggplot(plotdata,
aes(fill = officialTitle,
area = n,
label = officialTitle)) +
geom_treemap() +
geom_treemap_text(colour = "white",
place = "centre") +
labs(title = "Marriages by officiate") +
theme(legend.position = "none")
Make a tree map for the
signvariable.
install.packages("treemapify")
## Warning: package 'treemapify' is in use and will not be installed
library(ggplot2)
library(treemapify)
library(dplyr)
plotdata <- Marriage %>%
group_by(sign) %>%
summarise(n = n(), .groups = 'drop')
ggplot(plotdata,
aes(fill = sign,
area = n,
label = sign)) +
geom_treemap() +
geom_treemap_text(colour = "white",
place = "centre",
grow = TRUE) +
labs(title = "Marriages by Sign") +
theme(legend.position = "none")
A waffle chart, also known as a gridplot or square pie chart, represents observations as squares in a rectangular grid, where each cell represents a percentage of the whole. You can create a ggplot2 waffle chart using the geom_waffle function in the waffle package.
Let’s create a waffle chart for the professions of wedding officiates. As with treemaps, start by summarizing the data into groups and counts.
library(dplyr)
plotdata <- Marriage %>%
count(officialTitle)
plotdata
## officialTitle n
## 1 BISHOP 2
## 2 CATHOLIC PRIEST 2
## 3 CHIEF CLERK 2
## 4 CIRCUIT JUDGE 2
## 5 ELDER 2
## 6 MARRIAGE OFFICIAL 44
## 7 MINISTER 20
## 8 PASTOR 22
## 9 REVEREND 2
Next, create the ggplot2 graph. Set the fill to the grouping variable and values to the counts. Don’t specify an x and y.
Download the
wafflepackage.
# create a basic waffle chart
library(waffle)
library(dplyr)
plotdata <- Marriage %>%
group_by(officialTitle) %>%
summarise(n = n()) %>%
ungroup()
ggplot(plotdata, aes(fill = officialTitle, values=n)) +
geom_waffle(na.rm=TRUE)
Next, we’ll customize the graph by
# Create a customized caption
cap <- paste0("1 square = ", ceiling(sum(plotdata$n)/100),
" case(s).")
library(waffle)
ggplot(plotdata, aes(fill = officialTitle, values=n)) +
geom_waffle(na.rm=TRUE,
n_rows = 10,
size = .4,
color = "white") +
scale_fill_brewer(palette = "Spectral") +
coord_equal() +
theme_minimal() +
theme_enhance_waffle() +
theme(legend.title = element_blank()) +
labs(title = "Proportion of Wedding Officials",
caption = cap)
Make a waffle chart of the
signvariable.
library(dplyr)
plotdata <- Marriage %>%
group_by(sign) %>%
summarise(n = n()) %>%
ungroup()
cap <- paste0("1 square = ", ceiling(sum(plotdata$n)/100),
" case(s).")
library(waffle)
ggplot(plotdata, aes(fill = sign, values=n)) +
geom_waffle(na.rm=TRUE,
n_rows = 10,
size = .4,
color = "white") +
scale_fill_brewer(palette = "Spectral") +
coord_equal() +
theme_minimal() +
theme_enhance_waffle() +
theme(legend.title = element_blank()) +
labs(title = "Proportion of Wedding Officials",
caption = cap)
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Spectral is 11
## Returning the palette you asked for with that many colors
In the Marriage dataset, age is a quantitative variable. The distribution of a single quantitative variable is typically plotted with a histogram, kernel density plot, or dot plot.
Histograms are the most common approach to visualizing a quantitative variable. In a histogram, the values of a variable are typically divided up into adjacent, equal-width ranges (called bins), and the number of observations in each bin is plotted with a vertical bar.
library(ggplot2)
# plot the age distribution using a histogram
ggplot(Marriage, aes(x = age)) +
geom_histogram() +
labs(title = "Participants by age",
x = "Age")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Most participants appear to be in their early 20’s with another group in their 40’s, and a much smaller group in their late sixties and early seventies. This would be a multimodal distribution.
Histogram colors can be modified using two options
# plot the histogram with blue bars and white borders
ggplot(Marriage, aes(x = age)) +
geom_histogram(fill = "cornflowerblue",
color = "white") +
labs(title="Participants by age",
x = "Age")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
One of the most important histogram options is bins, which control the number of bins into which the numeric variable is divided (i.e., the number of bars in the plot). The default is 30, but it is helpful to try smaller and larger numbers to get a better impression of the shape of the distribution.
# plot the histogram with 20 bins
ggplot(Marriage, aes(x = age)) +
geom_histogram(fill = "cornflowerblue",
color = "white",
bins = 20) +
labs(title="Participants by age",
subtitle = "number of bins = 20",
x = "Age")
Alternatively, you can specify the binwidth, and the width of the bins represented by the bars.
# plot the histogram with a binwidth of 5
ggplot(Marriage, aes(x = age)) +
geom_histogram(fill = "cornflowerblue",
color = "white",
binwidth = 5) +
labs(title="Participants by age",
subtitle = "binwidth = 5 years",
x = "Age")
As with bar charts, the y-axis can represent counts or percent of the total.
# plot the histogram with percentages on the y-axis
library(scales)
ggplot(Marriage,
aes(x = age, y= after_stat(count/sum(count)))) +
geom_histogram(fill = "cornflowerblue",
color = "white",
binwidth = 5) +
labs(title="Participants by age",
y = "Percent",
x = "Age") +
scale_y_continuous(labels = scales::percent)
Make a histogram of the
dayOfBirthvariable. What is an appropriate number of bins to use?
# plot the histogram with percentages on the y-axis
library(scales)
ggplot(Marriage,
aes(x = dayOfBirth, y= after_stat(count/sum(count)))) +
geom_histogram(fill = "cornflowerblue",
color = "white",
bins = 12) +
labs(title="Participants by age",
y = "Percent",
x = "Age")
What if we realize, that we really want to plot the month of each person’s date of birth? How could we do this?
Hint: you can extract the month from a date date type using the
lubridate package.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# Extract month from dates
month(Marriage$dob)
## [1] 4 8 2 5 12 2 10 1 12 7 2 11 9 10 10 11 1 5 2 5 4 11 10 1 11
## [26] 8 10 10 4 3 2 9 4 6 2 11 5 3 5 9 2 3 5 9 2 9 5 7 12 2
## [51] 4 3 5 12 11 12 9 3 7 4 4 4 2 6 3 4 4 11 6 9 10 5 3 2 9
## [76] 8 6 12 10 11 4 3 7 3 8 8 10 2 2 9 7 8 9 6 6 1 5 8
plotdata = Marriage%>%
mutate(month_of_birth = as.factor(month(dob)))
library(lubridate)
# Extract month from dates
month(Marriage$dob)
## [1] 4 8 2 5 12 2 10 1 12 7 2 11 9 10 10 11 1 5 2 5 4 11 10 1 11
## [26] 8 10 10 4 3 2 9 4 6 2 11 5 3 5 9 2 3 5 9 2 9 5 7 12 2
## [51] 4 3 5 12 11 12 9 3 7 4 4 4 2 6 3 4 4 11 6 9 10 5 3 2 9
## [76] 8 6 12 10 11 4 3 7 3 8 8 10 2 2 9 7 8 9 6 6 1 5 8
plotdata <- Marriage %>%
mutate(month_of_birth = month(dob, label = TRUE)) %>% # Extracts the month and labels it (e.g., "Jan", "Feb")
group_by(month_of_birth) %>%
summarise(n = n()) %>%
ungroup()
library(dplyr)
ggplot(data = plotdata,
aes( x = reorder(month_of_birth, n),
y = n)) +
geom_bar(stat = "identity")
An alternative to a histogram is the kernel density plot. Technically, kernel density estimation is a nonparametric method for estimating the probability density function of a continuous random variable (what??). We are trying to draw a smoothed histogram, where the area under the curve equals one.
# Create a kernel density plot of age
ggplot(Marriage, aes(x = age)) +
geom_density() +
labs(title = "Participants by age")
The graph shows the distribution of scores. For example, the proportion of cases between 20 and 40 years old would be represented by the area under the curve between 20 and 40 on the x-axis.
As with previous charts, we can use fill and color to specify the fill and border colors.
# Create a kernel density plot of age
ggplot(Marriage, aes(x = age)) +
geom_density(fill = "indianred3") +
labs(title = "Participants by age")
The degree of smoothness is controlled by the bandwidth parameter
bw. To find the default value for a particular variable,
use the bw.nrd0 function. dsLarger values will result in
more smoothing, while smaller values will produce less smoothing.
# default bandwidth for the age variable
bw.nrd0(Marriage$age)
## [1] 5.181946
# Create a kernel density plot of age
ggplot(Marriage, aes(x = age)) +
geom_density(fill = "deepskyblue",
bw = 1) +
labs(title = "Participants by age",
subtitle = "bandwidth = 1")
In this example, the default bandwidth for age is 5.18. Choosing a value of 1 resulted in less smoothing and more detail.
Kernel density plots allow you to easily see which scores are most frequent and which are relatively rare. However, it can be difficult to explain the meaning of the y-axis means to a non-statistician. (But it will make you look really smart at parties!)
Make two kernel density plots of the
dayOfBirthvariable: one with the default smoothing parameter, and one with a smoothing parameter of your choice.
ggplot(Marriage, aes(x = dob)) +
geom_density(fill = "deepskyblue",
bw = 1) +
labs(title = "Participants by Date of Birth",
subtitle = "bandwidth = 1")
ggplot(Marriage, aes(x = dob)) +
geom_density(fill = "deepskyblue",
bw = 10) +
labs(title = "Participants by Date of Birth",
subtitle = "bandwidth = 1")
Another alternative to the histogram is the dot chart. Again, the quantitative variable is divided into bins, but rather than summary bars, each observation is represented by a dot. By default, the width of a dot corresponds to the bin width, and dots are stacked, with each dot representing one observation. This works best when the number of observations is small (say, less than 150).
# plot the age distribution using a dot plot
ggplot(Marriage, aes(x = age)) +
geom_dotplot() +
labs(title = "Participants by age",
y = "Proportion",
x = "Age")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
The fill and color options can be used to specify the fill and border color of each dot respectively.
# Plot ages as a dot plot using
# gold dots with black borders
ggplot(Marriage, aes(x = age)) +
geom_dotplot(fill = "gold",
color="black") +
labs(title = "Participants by age",
y = "Proportion",
x = "Age")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
There are many more options available. See ?geom_dotplot for details and examples.
Make a dot chart of
delay,
dotsize and stackratio parameters.scale_x_continuous().scale_y_continuous(NULL, breaks = NULL)ggplot(Marriage, aes(x = delay)) +
geom_dotplot(dotsize = 0.8, stackratio = 0.5, fill = "deepskyblue") +
labs(title = "Participants by Delay In marriage",
y = "Proportion",
x = "Delay") +
scale_x_continuous(name = "Delay (minutes)",
breaks = seq(0, max(Marriage$delay, na.rm = TRUE), by = 5)) +
scale_y_continuous(NULL, breaks = NULL)
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Since dotplots represent one observation per dot, they lend themselves to use fill colors by a categorical variable.
Let’s add
raceas fill to the above chart. What do you notice?
I notice that there are now two different colors in the dotplot.
ggplot( Marriage,
aes( x = delay, fill = race)) +
geom_dotplot(stackratio = 0.7,
dotsize = 0.85) +
scale_y_continuous(NULL, breaks = NULL) +
scale_x_continuous(breaks = seq(0,30,2))
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
To fix this, we need the following arguments:
stackgroups = TRUE allows different fill groups to be
stacked togetherbinpositions = "all" determines position of bins with
all the data taken together; this is used for aligning dot stacks across
multiple groups.ggplot( Marriage,
aes( x = delay, fill = race)) +
geom_dotplot(stackgroups = TRUE,
binpositions = "all",
stackratio = 0.7,
dotsize = 0.85,
color = "black") +
scale_x_continuous(breaks = seq(0,30,2))
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Make a dotplot of the age variable, colored by race.
ggplot( Marriage,
aes( x = age, fill = race)) +
geom_dotplot(stackgroups = TRUE,
binpositions = "all",
stackratio = 0.7,
dotsize = 0.7,
color = "black") +
scale_y_continuous(limits = c(0, 100),
breaks = seq(0, 100, 10)) +
scale_x_continuous(breaks = seq(0,30,2))
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Create graphs to analyze the following variables of the
loan50.csv data set separately.
loan50 <- read.csv("loan50.csv")
library(dplyr)
plotdata2 <- loan50 %>%
count(state)
library(ggplot2)
ggplot(plotdata2,
aes(x=reorder(state, -n),y=n))+
geom_bar(fill = "plum",
stat = "identity")
labs(x = "state", y = "frequency")
## $x
## [1] "state"
##
## $y
## [1] "frequency"
##
## attr(,"class")
## [1] "labels"
library(dplyr)
library(ggplot2)
ggplot(loan50,
aes( x = emp_length)) +
geom_histogram(fill = "deeppink1") +
labs(title = "Years of Employment",
x = "Length of Employment",
y = "Count") +
scale_x_continuous(breaks = seq(0,10,2)) +
scale_y_continuous(breaks = seq(0,10,2))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
library(dplyr)
library(ggplot2)
plotdata <- loan50 %>%
count(homeownership)
ggplot(plotdata, aes(x = "", y = n, fill = homeownership)) +
geom_bar(stat = "identity", position = "stack") +
coord_polar(theta = "y") +
labs(x = "", y = "", fill = "Homeownership") +
geom_text(aes(label = n),
position = position_stack(vjust = 0.5))
library(dplyr)
library(ggplot2)
plotdata <- loan50 %>%
count(debt_to_income)
ggplot(plotdata, aes(x = debt_to_income)) +
geom_dotplot(fill = "darkmagenta")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
labs(title = "Rate of Debt to Income",
y = "Rate",
x = "Count") +
scale_x_continuous(breaks = seq(0,6,0.5))
## NULL
library(ggplot2)
ggplot(loan50, aes(x = annual_income)) +
geom_histogram(fill = "lightpink",
color = "magenta3") +
labs(title = "Participants Annual Income",
x = "Annual Income") +
scale_x_continuous(breaks = seq(0,325000,400000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
library(dplyr)
library(ggplot2)
ggplot(loan50, aes(x = loan_purpose)) +
geom_bar(fill = "rosybrown3") +
labs(x = "Loan Purpose",
y = "Frequency",
title = "Reason for Loan") +
coord_flip()
library(ggplot2)
ggplot(loan50, aes(x = interest_rate)) +
geom_density(fill = "mediumorchid", bw = 0.5) +
labs(title = "Interest Rate")
Choose the most appropriate graph for each variable, that provides the best understanding.
Please refer to the following for more description of the variables
in this data set.
https://www.openintro.org/data/index.php?data=loan50