R can be used to make basic visual analytics, which can be helpful in understanding the data holistically. Additionally, R can help find correlations between variables and create scatter plots.
Tableau is a tool more tailored for visual analytics, while R is a powerful tool for statistics and other advanced topics in data analytics. In this lab we will explore both capabilities using two earlier sets of data credistrisk and marketing.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
Read the file marketing.csv and make sure all the columns are captured by checking the first couple rows of the dataset “head(mydata)”
mydata = read.csv("data/marketing.csv")
head(mydata)
## case_number State sales radio paper tv pos
## 1 1 IL 11125 65 89 250 1.3
## 2 2 IL 16121 73 55 260 1.6
## 3 3 AZ 16440 74 58 270 1.7
## 4 4 AZ 16876 75 82 270 1.3
## 5 5 IL 13965 69 75 255 1.5
## 6 6 MI 14999 70 71 255 2.1
How to create a bar chart using categorical variable
# Extract the State column from mydata
state = mydata$State
# Create a frequency table to extract the count for each state
state_table = table(state)
# Execute the command
barplot(state_table)
# Add labels to plot by replacing the ?? with a proper title
barplot(state_table, xlab= '??', ylab = '??' )
barplot(state_table, xlab = 'State (Abreviated)', ylab = 'Number of Occurences Analyzed')
How to create a histogram
# Extract the TV column from the data and create a histogram by running the command hist(variable)
# where variable corresponds to the extracted sales column variable
tv=mydata$tv
hist(tv)
sales = mydata$sales
hist(sales)
###TASK 2B. Can you find the total cummulative sales from the histogram? Explain your answer ANSWER:
How to create a pie chart
# The command to create a pie chart is pie(variable) where variable is in reference to the particular column extracted from the file. In this example we define a variable called x.
x = c(2,3,4,5)
pie(x)
pie (state_table)
ANSWER: The earlier historgram gives the analyst a better look at the amount of occurances and better look at the data when it comes to total sales. The pie chart is difficult to know the proportion of the data becasue it doesn’t give any numerical value of the occurences. An anylyst can’t tell the differences in the slices compared to the historagram.
The previous task focused on visualizing one variable. A good way to visualize two variables and also very common is a scatter plot.
How to create a scatter plot
# Plot Sales vs. Radio
# Radio will be on the x-axis
# Sales will be on the y-axis
sales = mydata$sales
radio = mydata$radio
plot(radio,sales)
# It is easier to see the trend and possible relationship by including a line that fit through the points.
# This is done with the command
scatter.smooth(radio,sales)
plot(tv, sales)
scatter.smooth(tv, sales)
paper = mydata$paper
plot(paper, sales)
scatter.smooth(paper, sales)
pos = mydata$pos
plot(pos, sales)
scatter.smooth(pos, sales)
cor(sales,tv)
## [1] 0.9579703
cor(sales,paper)
## [1] -0.2830683
cor(sales,pos)
## [1] 0.0126486
ANSWER: The numerical values display what is heppening in the previous scatter plots. Radio/sales are the highest correlation considering they are the closest to 1. Then tv/sales after that. even though sales/paper are a negative number, they have more of a correlation than sales/pos. The numberical value depends on the absolute value. The negative or positve tells us which direction the trand is heading; down or up.
Follow the directions on the worksheet, download tableau academic on your personal computer or use one of the labs computers.
– Download Tableau academic here: https://www.tableau.com/academic/students
– Refer to file ‘creditrisk.csv’ in the data folder
– Start Tableau and enter your LUC email if prompted.
– Import the file into Tableau. Choose the Text File option when importing
| Click on Sheet 1 (bottom left). – Under the dimensions tab located on the left side of the screen DOUBLE click on the ‘Loan Purpose’, then DOUBLE click on ‘Number of Records’ variable located under Measures on the bottom left of the screen. |
| – From the upper right corner of the screen select the horizontal bars view and note the new chart. Familiarize yourself with the tool by trying other views. Only valid views will be highlighted in the menu. |
| – Create a new sheet by clicking on the icon in the bottom next to your current sheet. |
| ###TASK 6A. Go to new worksheet. Double-click on the ‘Age’ variable in Measures and select the ‘Histogram’ view. Capture a screen shot and include here. |
| ###TASK 6B. Which age bin has the highest age count and what is the count? ANSWER: The age bin with the highest count is age 22 with 97 total counts. |
| ###TASK 7A. Drag-drop the variable ’Marital Status’found under Dimensions into the Marks Color box (make sure to drop in the Marks Color box until you see the small plus mark). Capture a screen shot and include here. |
| ###TASK 7B. Which age bin has the highest divorce count and what is the count? ANSWER: The age bin with the highest divorce count is age 22 with a total of 46 counts. |
| ###TASK 8A. Create another new sheet. Double-click ‘Months Employed’ and then double-click ‘Age’. Make sure Age appears in the columns field as shown in the image below. From the Sum(Age) drop down menu select Dimension. Repeat for Sum(Months Employed). Add another variable to the scatter plot by drag-drop the dimension variable ‘Gender’ into the Marks Color box (until you see the small plus mark). Capture a screen shot and include here. |
| ###TASK 8B. Share your observations ANSWER: The Majority of the employees who have been employed less than 40 months are under the age of 35 years old. Most of the employees are male. |
| ###TASK 9A. In a new sheet generate a view of Gender, Number of Records, and Marital Status. Choose the best fitting view of your choice for the intended scope. Capture a screen shot and include here. More male than female. |
| ###TASK 9B. Share your observations. ANSWER: Married Singles who are male is the largest group at 233. Then divorced females at 135. Married males at 36 and divorced males at 21 are the smallest groups. |
| ###TASK 10A. Create another new sheet. Double-click ‘Job’ and then double-click ‘Number of Records’. Add another variable to the graph by drag-drop the dimension variable ‘Gender’ into the Marks Color box (until you see the small plus mark). Click on staked bars. Capture a screen shot and include here. |
| ###TASK 10B. Share your observations. ANSWER: Majority of the employees are skilled males at 182;and skilled females at 89. |