Lecture 28 - Course Wrap Up and Review
Penelope Pooler Eisenbies
BUA 345
2024-04-28
💥 Lecture 28 In-class Exercises - Q1 - Review 💥
Session ID: bua345s24
If you suspect your time series data have a seasonal component, the third set of practice questions demonstrates that you should develop TWO versions of the Auto ARIMA (auto.arima
in R) model and compare them.
Specify the option correctly with no spaces that states that the model should assume there IS a seasonal components.
Upcoming Dates
HW 10 was due on Monday, 4/22.
- Grace Period ends tonight (Thu. 4/25) at midnight.
HW 10 is posted and due Monday, 4/22
Lecture 26 was optional and updated material are posted if you are interested
Today is the last lecture, but I will hold a Zoom review next week if there is interest.
- Google Poll will be sent out.
Continuing in Business Analytics
BUA 345 gives you a good foundation in data fluency and literacy to work with analysts.
IF you find this material interesting and want to go on to take the lead on analyzing data, consider the Business Analytics (BA) major.
Skills that the BA major focuses on:
Advanced Analytics focuses on dealing with very large data sets.
Data Management shows students how to go from raw data on the internet to informative data visualization dashboards.
Predictive Analytics expands on the modeling skills in this course to show students how to develop models and make sound business predictions.
Data Mining and Network Modeling
Visual Analytics builds on visualization skills in data management course to show how to present data effectively.
Financial Analytics and Marketing Analytics are electives for the BA major.
Expand your Analytics and data Science Skill-set.
Talking about Your Skill-Set
Explaining your analytics skill-set is challenging, but it’s getting easier.
As Data Science and Analytics grow in importance, more people understand what this skill set can offer.
You should not assume that interviewers, colleagues, supervisors understand your skills.
This White Paper from Data camp (also posted on Blackboard) is helpful.
Starting on Page 9 it lays out different roles people take on when working with data.
Comparing these descriptions to skills you learn in BUA 345 (and future courses) will help you communicate your skill-set with confidence.
Overview of Course
Excel Skills
Correlation, SLR, MLR, Logistic Regression
Lectures 7 - 18
HW Assignments 4.2, 5, 6, 7, 8
Non-linear Models and Optimization
Lectures 21 - 23
HW 9 and Q1 of HW 10
Forecasting
Additional Important Course Material
Recommended Studying Strategy
Go through previous quizzes and all three sets of practice questions.
Take notes for your self on skills and terminology you are unsure of.
Go back to those skills and terms in HW assignments, lectures, and videos and take notes.
Redo questions.
Come to Zoom Review with Questions
Excel Skills - Lectures 1 - 6
💥 Lecture 28 In-class Exercises - Q2 💥
Session ID: bua345s24
What percent of the females that survived the Titanic disaster were in Second Class?
There are MANY ways to approach this question.
Hint: This may be easier to do if you keep the data as counts instead of converting values to percentages.
Round answer to the closest whole percent and don’t include percent sign in your answer.
Correlation, SLR, MLR, and Logistic Regression
This material comprises the largest part of the course.
Correlation, and Simple Linear Regression included in Quiz 1
Calculating and interpreting correlations using the cor
command
Creating a Simple Linear Regression (SLR) model and verifying it is valid.
Knowing when the log
, natural log transformation is useful.
All MLR Regression topics included in Quiz 2
Model Selection Methods
Measures of Goodness of Fit, e.g., Adjusted \(R^2\) and AIC
Basic commands for creating a model, e.g. lm
, ols_regress
Logistic Regression (glm
)
💥 Lecture 28 In-class Exercises - Q3 & Q4 💥
Question 3. What is the slope for the Premium
cut diamond category in these data?
Question 4. In this diamonds dataset, Ideal
cut is the baseline category and there are a total of three cut categories, Ideal
, Premium
, and Very Good
.
Which other category, Very Good
or Premium
, is not significantly different from Ideal?
Model Summary
--------------------------------------------------------------------
R 0.880 RMSE 401.048
R-Squared 0.774 MSE 161810.497
Adj. R-Squared 0.773 Coef. Var 16.202
Pred R-Squared 0.769 AIC 14840.040
MAE 310.166 SBC 14874.394
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
-----------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-----------------------------------------------------------------------------
Regression 550650373.945 5 110130074.789 680.611 0.0000
Residual 160839633.955 994 161810.497
Total 711490007.900 999
-----------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------
(Intercept) -231.059 83.222 -2.776 0.006 -394.369 -67.749
carat 4112.089 120.651 0.907 34.083 0.000 3875.330 4348.848
cutPremium 147.418 115.427 -0.079 1.277 0.202 -79.091 373.927
cutVery Good -152.745 120.764 -0.038 -1.265 0.206 -389.726 84.237
carat:cutPremium -425.098 163.797 -0.058 -2.595 0.010 -746.525 -103.670
carat:cutVery Good 117.709 174.465 0.014 0.675 0.500 -224.652 460.071
-----------------------------------------------------------------------------------------------------
Non-linear Models and Optimization - Lectures 21 - 23
Forecasting - Lectures 24 - 25
Cross-sectional vs. Time-Series Data
Forecasting Terminology
Using ts
command to correctly specify a time series
Implementing and Interpreting Auto-ARIMA models (auto.arima
in R)
Determining if data have a seasonal component
Reporting requested prediction bounds or the prediction interval (Hi - Lo)
Calculating model percent accuracy: (100 - MAPE)%
Examining and comparing model residuals (HW 10)
Key Points from Today
Evaluations are VERY Important: coursefeedback.syr.edu
The rest of today’s lecture will be a Q&A session.
Reminder of Recommended Study Strategy:
Go through previous quizzes and all three sets of practice questions.
Take notes for your self on skills and terminology you are unsure of.
Go back to those skills and terms in HW assignments, lectures, and videos and take notes.
Redo questions.
Come to Zoom Review with Questions
To submit an Engagement Question or Comment about material from Today’s Lecture: Submit by midnight today (day of lecture). Click on Link next to the ❓ under today’s lecture.