Lecture 28 - Course Wrap Up and Review

Penelope Pooler Eisenbies
BUA 345

2024-04-28

💥 Lecture 28 In-class Exercises - Q1 - Review 💥

Session ID: bua345s24

If you suspect your time series data have a seasonal component, the third set of practice questions demonstrates that you should develop TWO versions of the Auto ARIMA (auto.arima in R) model and compare them.


Specify the option correctly with no spaces that states that the model should assume there IS a seasonal components.

Upcoming Dates

  • HW 10 was due on Monday, 4/22.

    • Grace Period ends tonight (Thu. 4/25) at midnight.
  • HW 10 is posted and due Monday, 4/22

  • Lecture 26 was optional and updated material are posted if you are interested

  • Today is the last lecture, but I will hold a Zoom review next week if there is interest.

    • Google Poll will be sent out.

Continuing in Business Analytics

  • BUA 345 gives you a good foundation in data fluency and literacy to work with analysts.

  • IF you find this material interesting and want to go on to take the lead on analyzing data, consider the Business Analytics (BA) major.

  • Skills that the BA major focuses on:

    • Advanced Analytics focuses on dealing with very large data sets.

    • Data Management shows students how to go from raw data on the internet to informative data visualization dashboards.

    • Predictive Analytics expands on the modeling skills in this course to show students how to develop models and make sound business predictions.

    • Data Mining and Network Modeling

    • Visual Analytics builds on visualization skills in data management course to show how to present data effectively.

  • Financial Analytics and Marketing Analytics are electives for the BA major.

Expand your Analytics and data Science Skill-set.

  • LinkedIn Learning

    • Free for SU Students (Great value!)

    • Hundred of excellent courses in R, Python, SQL, and Quarto

  • Data Camp provides excellent (but not free) courses in R, Python and SQL.

Talking about Your Skill-Set

  • Explaining your analytics skill-set is challenging, but it’s getting easier.

  • As Data Science and Analytics grow in importance, more people understand what this skill set can offer.

  • You should not assume that interviewers, colleagues, supervisors understand your skills.

  • This White Paper from Data camp (also posted on Blackboard) is helpful.

  • Starting on Page 9 it lays out different roles people take on when working with data.

  • Comparing these descriptions to skills you learn in BUA 345 (and future courses) will help you communicate your skill-set with confidence.

Course Evaluations

  • Evaluations are VERY Important:

    • coursefeedback.syr.edu

    • I will step out before the Q&A to give students 5 minutes to complete evaluations.

    • Please complete evaluations for ALL courses.

Final exam Information

  • Timed test - 90 minutes

  • Students can choose ONE of the four options below

  • In-person Testing Options:

    • Friday 5/3 10:15 AM in Room 104

    • Tuesday 5/7 12:45 PM in Room 101

    • Students from both sections welcome to attend either in-person test or take the test asynchronously.

  • Asynchronous Options:

    • Friday 5/3 1:00 PM - 10:00 PM

    • Monday 5/6 10:00 AM - 10:00 PM

  • The final exam will be UNAVAILABLE after the exam period on Tuesday.

Overview of Course

  • Excel Skills

    • Lectures 1 - 6

    • HW Assignments 2, 3 and 4.1

  • Correlation, SLR, MLR, Logistic Regression

    • Lectures 7 - 18

    • HW Assignments 4.2, 5, 6, 7, 8

  • Non-linear Models and Optimization

    • Lectures 21 - 23

    • HW 9 and Q1 of HW 10

  • Forecasting

    • Lectures 24 - 25

    • HW 10

Additional Important Course Material

  • Practice Questions

    • For Quiz 1

    • For Quiz 2

    • Additional Practice Questions

  • Quizzes

    • Quiz 1

    • Quiz 2

  • Final Exam Questions will mostly be adapted from previous quiz questions and practice questions.

Excel Skills - Lectures 1 - 6

  • Relative and Absolute ($) cell references

  • Excel Tables and Table Options

    • Sorting, filter, finding duplicates
  • Excel Pivot Tables

    • Summarizing complex data

    • Many options

  • Vlookup

    • Including embedded Match command to increase functionality

    • Both Range and Exact match lookups

💥 Lecture 28 In-class Exercises - Q2 💥

Session ID: bua345s24

What percent of the females that survived the Titanic disaster were in Second Class?

There are MANY ways to approach this question.

Hint: This may be easier to do if you keep the data as counts instead of converting values to percentages.

Round answer to the closest whole percent and don’t include percent sign in your answer.

Correlation, SLR, MLR, and Logistic Regression

  • This material comprises the largest part of the course.

  • Correlation, and Simple Linear Regression included in Quiz 1

    • Calculating and interpreting correlations using the cor command

    • Creating a Simple Linear Regression (SLR) model and verifying it is valid.

    • Knowing when the log, natural log transformation is useful.

  • All MLR Regression topics included in Quiz 2

    • Model Selection Methods

    • Measures of Goodness of Fit, e.g., Adjusted \(R^2\) and AIC

    • Basic commands for creating a model, e.g. lm, ols_regress

  • Logistic Regression (glm)

    • When is it used?

    • How do we use model results to find probabilities?

💥 Lecture 28 In-class Exercises - Q3 & Q4 💥

Question 3. What is the slope for the Premium cut diamond category in these data?

Question 4. In this diamonds dataset, Ideal cut is the baseline category and there are a total of three cut categories, Ideal, Premium, and Very Good.

Which other category, Very Good or Premium, is not significantly different from Ideal?

 

                           Model Summary                             
--------------------------------------------------------------------
R                         0.880       RMSE                  401.048 
R-Squared                 0.774       MSE                161810.497 
Adj. R-Squared            0.773       Coef. Var              16.202 
Pred R-Squared            0.769       AIC                 14840.040 
MAE                     310.166       SBC                 14874.394 
--------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                    ANOVA                                     
-----------------------------------------------------------------------------
                     Sum of                                                  
                    Squares         DF      Mean Square       F         Sig. 
-----------------------------------------------------------------------------
Regression    550650373.945          5    110130074.789    680.611    0.0000 
Residual      160839633.955        994       161810.497                      
Total         711490007.900        999                                       
-----------------------------------------------------------------------------

                                         Parameter Estimates                                          
-----------------------------------------------------------------------------------------------------
             model        Beta    Std. Error    Std. Beta      t        Sig        lower       upper 
-----------------------------------------------------------------------------------------------------
       (Intercept)    -231.059        83.222                 -2.776    0.006    -394.369     -67.749 
             carat    4112.089       120.651        0.907    34.083    0.000    3875.330    4348.848 
        cutPremium     147.418       115.427       -0.079     1.277    0.202     -79.091     373.927 
      cutVery Good    -152.745       120.764       -0.038    -1.265    0.206    -389.726      84.237 
  carat:cutPremium    -425.098       163.797       -0.058    -2.595    0.010    -746.525    -103.670 
carat:cutVery Good     117.709       174.465        0.014     0.675    0.500    -224.652     460.071 
-----------------------------------------------------------------------------------------------------

Non-linear Models and Optimization - Lectures 21 - 23

  • Example Questions in Lectures, HW 9, HW 10, and Additional Practice Questions

  • Non-linear models

    • Excel is a great tool to efficiently compare different model choices

    • Model coefficients and \(R^2\) can be requested for each model.

    • Adjusted \(R^2\) values will be provided.

  • Optimization

    • Non-linear model optimization using GRG Non-linear method in Excel Solver.

    • Optimizing systems of linear equations using Simplex LP method in Excel Solver.

Forecasting - Lectures 24 - 25

  • Cross-sectional vs. Time-Series Data

  • Forecasting Terminology

  • Using ts command to correctly specify a time series

  • Implementing and Interpreting Auto-ARIMA models (auto.arima in R)

    • Determining if data have a seasonal component

    • Reporting requested prediction bounds or the prediction interval (Hi - Lo)

    • Calculating model percent accuracy: (100 - MAPE)%

    • Examining and comparing model residuals (HW 10)

Key Points from Today

  • Evaluations are VERY Important: coursefeedback.syr.edu

  • The rest of today’s lecture will be a Q&A session.

  • Reminder of Recommended Study Strategy:

    • Go through previous quizzes and all three sets of practice questions.

    • Take notes for your self on skills and terminology you are unsure of.

    • Go back to those skills and terms in HW assignments, lectures, and videos and take notes.

  • Redo questions.

  • Come to Zoom Review with Questions

To submit an Engagement Question or Comment about material from Today’s Lecture: Submit by midnight today (day of lecture). Click on Link next to the under today’s lecture.