Statistics 6933 Lecture Notes

1 Welcome to BA 6933 Statistics and Quantitative Methods in person session!!!! (480mins).

1.1 BA-6933: In Person Section to Hybrid Course, 8:00 am to 5:30pm Given by: J. Lizardi.

This class is an introduction to basic statistical theory and its business applications intended for graduate students attending one of Trine University’s Online Graduate Programs.

This session will cover various tools & methods used to summarize, analyze, and interpret data. These notes are for educational purposes and may be used, remixed, and shared freely without limitation.

1.1.0.0.1 THELINK-Trine University OER

1.1.0.0.1.1 CC License Information

2 TODAYS SCHEDULE:

● 8am-12pm: Set up, Q & A, Statistics& Quantitative Methods Review

○ Introductions - Who am I? Who are you?

○ Morning Attendance

○ Review class mapping and schedule.

● Lecture: Review of statistics Review Statistical Concepts: Descriptive and Inferential statistics. Univariate, Bivariate, and Multivariate Analysis

■ What is Data?

■ Where does Data come from? What do we do with it ?

○ Introduction to Scenario and Case Study Data

■ Statistical Tools Introduction: R Python Excel

■ The Limits of Excel

■ What is R? How is it used for Statistics?

■ RStudio

■ Statistical calculators: StatisticsKing, Wolfram

● Case Study: Use Statistical Techniques to Gain Insights from Business Data : Applied Data Analysis: Hypothesis Generation & Decision Making

○ T-test & Z-test, Chi² test, ANOVA, Statistical Test Assumptions, Parametric vs nonparametric test

● 12pm to 1:30pm: Lunch, Networking/Socializing, Free time

○ Lunch, Questions, 1 on 1 meetings about Grades, Assignments Quizzes

● 1:30pm to 5:30pm: After Lunch Attendance Case Study Cont.

○ Develop and use Statistical Models on Business Data to make Predictions.

■ Regression Analysis, Logistic Regression, Time Series Analysis, Neural Networks.

○ Closing Attendance

3 About You

4 About Me

My name is Joshua Lizardi. For the past 8 years, I have worked for various institutions teaching a wide range of courses in Math, Statistics and Technology. These included Quantitative Reasoning, Calculus ,Applied Technical Mathematics, Remedial Mathematics, Statistics, Computers & Office Automation, Introductory College Algebra, Intermediate College Algebra, Remedial Mathematics, and Business Statistics.

I hold a bachelor’s in mathematics (Mercy College), a master’s in applied mathematics (Purdue University), and a master’s in data analytics (Western Governors University). I also hold a few industry recognized certifications including “SAS Certified Statistical Business Analyst SAS 9”, “SAS Certified Base Programmer SAS 9”, “Oracle Database SQL Certified Associate”.

“Subjects like mathematics, statistics, and computer science should not be taught as if they were spectator sports, the best way to learn these subjects is to perform them. Although understanding textbooks and lecture notes is valuable, the learning that comes from one’s own attempts at solving problems is the key to becoming competent in the subject overall. I have always been passionate about mathematics statistics and computer science, and I enjoy encouraging students to see the utility of these subjects.”

SPECIALTIES

Applied Mathematics Applied Statistics Data Analytics Data Science Machine Learning Artificial Intelligence

SKILLS

R Python SQL SAS MiniTab Tableau Power BI Microsoft Office

https://www.youracclaim.com/users/joshua-lizardi

5 Introduction: What exactly is data?

What is Data? -Page 1 & 2

Where does Data come from? What do we Do with Data? - page 1

6 But what IS DATA?

experiment data survey data - all

A Variable whose value is determined by chance. The variable assumes a value (sales volume, rate of return, R&D cost, etc.) from a sample selection process that generates the values randomly, i.e., by “random” selection. (A random variable is usually designated by a capital letter,X, to differentiate it from observed values x. For instance, X might represent the amount of money a game made, or X might represent the type of a game made. Once the game has been selected, their sales amount or game type are given by the values ‘x’, say x = $688,000 or x = Sports.)

Data Format Types- all

What Is a Random Variable? Mixed Random Variables*

7 Domain Expertise

Business Data & Domain knowledge - all

8 The Case Study Domain & Dataset

Case Study- page 1 to page 4 & Videos

How large is the global gaming industry?

This just changed the gaming industry forever...

The Future Of Gaming Is Mobile

An Introduction to R Tools and Techniques - Video Game Cost & Sales Case Study

9 Data Types

What is Data? page 1 to page 2

Excel intro (discuss variable RL context ) show R

What is Data? page 3 to page 5

Excel data types* show R

What is data 6-8

Data Science & Statistics: Levels of measurement

Ordinal Vs. Discrete Variables

10 More on Data

Cleaning data

11 Summarizing & Presenting Data

What is Data?

Excel data (Histograms graphs) data types in context stats king R

What is data 9-10

What is Data?

Excel data -> calculate center and spread for each variable tie back to data types.

What is data? (what is summarizing ? (aggregation, information loss ))

What is Data?

information loss and data aggregation

What is data? 14- 17

What is Data?

excel( uni vs bivariate data, different types of bivariate data, multivariate data )- graphing and visualization…the limits of pictures.

BAD GRAPHS

What is data? 18- 19 finish

What is Data?

Where does data come from? 1-19

12 Statistics!

Where does data come from? 1-19

13 MATH TIME 2

Probability Probability Distributions- “The values of a random variable are subject to chance”, some values can be more likely to occur than others. For instance, the rating of a randomly selected game is more likely to be 5 than 10. It is the random variable’s probability distribution that determines the likelihood of possible values

Where does data come from?

We use (at least) four distributions designed to model data: the Normal, t, F, and Chi-Square. The normal distribution is the mother the other three. and this, among many other facts, makes it by far the most important distribution in statistical inference. Much of the “what we do and why we do it” is based upon an understanding of the properties of the normal distribution, and of the theorems involving it, particularly the Central Limit Theorem.

CLT

Populations, Samples, Parameters, and Statistics

Sampling

The Central Limit Theorem, Clearly Explained!!!

Z-scores to T-scores: “To convert any raw score into a T score, first transform the score into a Z score and then use the following basic formula: T = 10 z + 50”

T test to F-test - “in fact, if you have only two groups/factor levels, the F-test statistic is the square of the t-test statistic, and the F-test is equivalent to the two-sided t-test.”

keep in mind always Probabilities for any random variable X, means P (X< x) = Area under curve from (-inf to x), Probabilities of the form P(X > x) Area under curve after (x to inf) or P(a < X < b) = Area under curve between and b.

14 More Statistics!

So how do we find probabilities? MAtH MAJoRs!…

https://tutorial.math.lamar.edu/classes/calcii/probability.aspx

https://www2.econ.iastate.edu/classes/econ671/hallam/documents/RV_Prob_Distributions.pdf

BUT HOW AND WHY? Things that scare math majors…

https://www.math.uni-bielefeld.de/~grigor/mwlect.pdf

Statistical Inference: How?

We need a measure of spread and center. (one to test but both can be estimates)
Then we calculate a critical value.
We use that information to create intervals.
The width of the interval provides a sense of the accuracy for the estimate in question.
We can then use this information as evidence to reject or fail to reject a null hypotheses.

lets define some more things

Point Estimate. A single number used to estimate a parameter. For example, the sample mean is typically used to estimate the population mean.

Interval Estimate A range of values used as an estimate of a population parameter. The width of the interval provides a sense of the accuracy for a point estimate.

Intervals have a characteristic format:

Interval = estimate+/-(Critical Value)*(stdev/(sqrt(count of data)))

But how do we create statistical questions?

For example,we may have initially grouped games according to their Platform and IGN rating and made graphs and tables . So there could be many (questions asked) = (hypothesis made) about this grouping, for example are the differences we see among Game platform types significant? Does a significant relationship between Platform type and IGN rating exist? Can we use a variable or combination of variables to predict units sold?

How did I come up with these questions?

15 Hypotheses Generation

Hypotheses generation/question finding from Data.

An Introduction to R Tools and Techniques - Descriptive Analysis

16 Statistical Inference: Decision Making: Hypothesis Testing and Intervals

An Introduction to R Tools and Techniques - Inferential Analysis

Assumption checking & Parametric VS non-Parametric jam

Critical Values- cv’s - video

TEST_at aplha is defined by P (your observation > TEST_at alpha) = alpha .

The goal is to find the critical value associated with the significance level, How do we do this? tables or calculus.

The P-value Approach and Hypothesis Testing

P-value can be though of as the smallest significance level at which you would reject H0.

The p-value is calculated from the test statistic and is doubled for two-sided tests.

Note: alpha and the p-value are like the “before” and “after” significance levels for the test. We can reach a decision to accept or reject H0 by comparing the two significance levels.

Rule: If the p-value > alpha, then we “fail to reject” H0 If the p-value< alpha, then we reject H0, i.e., we reject H0 for small p-values.

Always avoid saying things like “accepting” the null hypothesis…. for the same reason juries never say the defendant is “innocent” they say the defendant is “not guilty”.

Although hypothesis testing uses probability distributions to arrive at a reasonable (and defensible) decision either to reject or “fail to reject” the claim associated with the null hypothesis of the test, H0, it does not guarantee that the decision is correct!

Decisions & Errors

Type I error: The error of incorrectly rejecting H0 when, in fact, it’s true. In a hypothesis test conducted at the significance level, the probability of making a type I error, if H0 is true, is at most alpha.

Type II error: The error of incorrectly failing to reject H0 when, in fact, it’s false.

Think cancer screening: Null Hypothesis is: Person has cancer a Type I error is the rejection a true null hypothesis. If a person has cancer and we miss it. This is worse then if a person doesn’t have cancer and we say they do(Type 2 error). So we control for type 1 errors.

Keep in mind: For a fixed sample size n, you cannot reduce the probability of making a Type I error and the probability of making a Type II error at the same time.

“This is the statistician’s version of”there is no such thing as a free lunch.”

However, if you can afford to take a larger sample (More Data), it is possible to reduce both.

16.1 Assumption Checking for Statistical Testing

Assumption checking allows you to determine if conclusions drawn from the results of your analysis are valid. Assumptions are the requirements you must fulfill (Moran, 2017). To stick to our car example just as you should not drive a car until you can demonstrate working knowledge of the rules of the road, you should not conduct statistical analyses without demonstrating that your data “follows the rules, and can receive a permit for testing if you will”.

….

This leaves us with the well known framework for Inferential Analysis. AKA the tools concerned with making decisions, predictions and calculating estimates,intervals based on information contained in a set of data (Scott, 2009).

17 Hypothesis Tests: The basics

below; the basic formal structure of hypothesis testing:

An observation has been made/ A question has been asked about the data.
The null (H₀ ) and alternative (H₁ or H_A) hypotheses are specified.
With given data, a value of a statistic is calculated.
Under a set of general assumptions about the data, as well as assuming the null hypothesis is true, the distribution of the test statistic is known.
Given the distribution and value of the test statistic, as well as the form of the alternative hypothesis, we can calculate a p-value of the test.
Based on the p-value and prespecified level of significance, we make a decision. One of: – Fail to reject the null hypothesis. OR. Reject the null hypothesis.

Inferential Analysis (statistical testing) is done for subgroups of variables from our dataset as a way to (answer any questions)(confirm or deny any hypothesis) about these subgroups of data that may have arisen during the Exploratory phase.

All statistical tests, include, a set of assumptions, a null hypothesis, an alternate hypothesis, p-value and a significance level.This allows us to create statistically valid estimates & intervals. (Given the set of general assumptions is true.)

Keep in mind! The point. The smaller the p-value the stronger the evidence is in support of the alternate hypothesis, The smaller the p-value the more likely the alternate hypothesis is true. Reject the null hypothesis for small p-values.

17.1 Summary of Inferential Analysis Steps

Step 1. Create a Question about the data, most likely based on the Exploratory phase.

Step 2. Take the question and make it a statement of a relationship among variables not existing, or a difference between variables not existing, and you have a null hypothesis.

Step 3. Choose an appropriate statistical test, verify any required test assumptions.

Step 4. Pick a significance level, which is 0.05, usually. (why?)

Step 5. Reject the null hypothesis if the calculated p-value is less than the significance level. (why).

17.1.1 So we can answer statistical questions !!!

18 Statistics: Ask and Answer.

CASE STUDY 4-9 3 VIDEOS ABOUT Excel vs R

18.1 Data Activities 1-

A finance manager claims that the average profit of most games we create is $6000, with a standard deviation of $1000. Find the probability that a random sample of 36 games averages less than $5700 in profit.
Suppose a senior analyst from the company claimed that the average profit from most games made by the company is at least $600,000, and any game that makes less is a fluke, an outlier. Suppose that you suspect the claim may be exaggerated. Use our sample of 50 games, find the average in profits. Test the CEO’s claim, against your suspicion, at the 5% level of significance.
“In Game Purchases” IGP, are a growing revenue stream for many video-game companies. One survey showed that up to 20% of players take part in IGP. Based this Information, what is the probability that for a random sample of 10 gamers 6 will take part in IGP. Do we need our dataset to answer this question?

Binomial dis
The CEO of the business claims that because the average profit from the past 3 Nintendo race car games was less than $200,000. The next Nintendo race car game we make will have an average profit less than 200,000. The Given this information. Find the probability that profit from a random sample of 3 of these games’ averages less than $200,000. What does this suggest about the claim made by the CEO?
The marketing manager wants to estimate the minimum cost of making a high rated MMO game.

ANOVA
Suppose the Operations Manager claimed that Research & development, Marketing, and Administration have roughly the same budget on every project. How can we check this claim.
On Average the company spends the same amount on marketing as we do on Research & Development per game.
Suppose a Manager claimed that Marketing, and Administration have roughly the same average budget on every project. How can we check this claim.
How could we check the claim that the average sales of MOBA games before 2012 were significantly higher than sales of MOBA games after 2012.
The CFO of the business claims that IGP is a main driver of sales among all games. Given this assumption. How could we check this?
How could we check the claim that average sales of MOBA games before 2012 were significantly lower than sales of RTS games after 2012?
Dataset 2 contains the play time in minutes from a high earning MOBA game. A new analyst is convinced time of day effects playtime regardless of country. Test whether country and times of day have an effect on playtime. Is the analyst claim justified ? Explain.
Suppose the Operations Manager claimed that the median number of minutes played in the US can’t be more than 750 minutes. A sales manager doubts the accuracy of this claim. Can you reject the claim given the data?

19 Regression analysis.
The Owner wants to understand the relationship between sales and the amount spent to develop, market, and distribute a game. She suspects the sales of new games can be predicted from the amount of time money and effort spent on the game, regardless of game type or console.

Regression

Over-fitting METRICS MODEL LIMITS

●              Correlation/Association

●              Model fitting

●              Model validation

●               Over fitting

●               Model refinement

●               Feature Selection

●               Interpreting results

Power points - videos of calculus for finding regression coeff. linear algebra for same.

A scatter plot can be used to show the relationship between two variables Correlation analysis is used to measure the strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the relationship

Dependent variable: the variable we predict Independent variable: the variables used to predict the dependent variable Types of Relationships: Strong/weak Linear/Curvilinear No relationship

Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes to independent variables on the dependent variable Simple Linear Regression Model Only one independent variable, X Relationship between X and Y is described by a linear function Changes in Y are assumed to be related to changes in X

20 Logistic Regression intro:

A freshly hired product analyst alleges that “in game purchases” are favored by the rating agency IGN Entertainment Inc. His claim is that this would bias any metrics based on these Ratings that we use to measure the performance of our games. “…essentially this would/could have us making games for high ratings and not for high sales or profit (for the customers), for example putting IGP in every game because it will raise our IGN rating.”Is there a significant association between IGP and IGN Rating?

21 Neural Networks intro:

Neural Networks are mathematical models inspired by the structure of the human brain used to detect patterns in data sets. These models can detect the most subtle and complex relationship in data. Neural networks can be used to make predictions on dependent variables of any type; including numerical, categorical and time series.

The structure of a neural-network model has three layers. The input layer of a neural network is where each variables starts hence the size of your input layer is the number of variables in your dataset. The output layer of a neural network is where the results will be displayed. The hidden layers are in the middle. One (very) simple way to start to think about a Neural Net Model for new analyst is a “net” of logit models.

Though, if we want we can use other activation functions. And we can even mix and match… it gets complicated …

The important conceptual point to keep in mind is we input variables, it outputs predictions. We can check the predictions using the techniques and metrics we have utilized for predictive analysis so far .

nn-slides

So what is next !

22 What Is a Transformer Model?

23 Tools for your careers.

https://www.kaggle.com

https://math.stackexchange.com/questions/358799/random-variables-that-arent-measurable

https://math.stackexchange.com/?tab=month

24 References:

3 Biggest Excel Problems That Limits A Data Scientist—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=Fhmug5fZPzc

3-Minute Data Science (Director). (2022, September 18). Logistic Regression in 3 Minutes. https://www.youtube.com/watch?v=EKm0spFxFG4

365 Data Science (Director). (2017, August 11). Data Science & Statistics: Levels of measurement. https://www.youtube.com/watch?v=eghn__C7JLQ

Amber Professional Development (Director). (2020, December 16). Deloitte Problem Solving: Generating Hypotheses. https://www.youtube.com/watch?v=AMpcCLk6yKs

ANOVA - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/19n_rGCYefY0-jnpzd2tzcyseRYvD7iHeoFAUaTtY1qQ/viewer?f=0

Assumption checking & Parametric VS non-Parametric—Google Jamboard. (n.d.-a). Retrieved April 26, 2023, from https://jamboard.google.com/d/1ZKu6LbwmLYM4BnJHsYUOV0_nRS_xMa5ISJmsvtskROQ/viewer?f=0

Assumption checking & Parametric VS non-Parametric—Google Jamboard. (n.d.-b). Retrieved April 26, 2023, from https://jamboard.google.com/d/1ZKu6LbwmLYM4BnJHsYUOV0_nRS_xMa5ISJmsvtskROQ/viewer?f=0

BAD GRAPHS - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/13GIvi72585lXMFiNUQ8A6ZRTB_kRJJLtZcp-F6UHRuA/viewer?f=0

Binomial Dis—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1RnUUXzlY_5vPPOsSdIc64j1BdPu1-LRKmfzbsZdYoGw/viewer?f=0

Burns, P. J. (2003). Robustness of the Ljung-Box Test and its Rank Equivalent. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.443560

Business Data & Domain knowledge—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1-OkfYNsSbKTWzkQqQJaG8ddhJ694vekjHtwqf8zqbqw/viewer?f=0

Calculus II - Probability. (n.d.). Retrieved April 26, 2023, from https://tutorial.math.lamar.edu/classes/calcii/probability.aspx

Case Study—Google Jamboard. (n.d.-a). Retrieved April 26, 2023, from https://jamboard.google.com/d/1vp9svDxgrLBMNCPRozl1HpzZE1l3IlQRnmghN5_8aJw/viewer?f=0

Case Study—Google Jamboard. (n.d.-b). Retrieved April 26, 2023, from https://jamboard.google.com/d/1vp9svDxgrLBMNCPRozl1HpzZE1l3IlQRnmghN5_8aJw/viewer?f=0

CGTN Europe (Director). (2020, October 4). How large is the global gaming industry? https://www.youtube.com/watch?v=ihi9seV-4uU

CLEANING DATA - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1g1GwOJuLjAYv1CNDBQbzRcnil8jAV2EbXJMz7OAV_9M/viewer?f=8

CLT - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1aoVtBo9B1kAc6M5G0EaMEc2Byd-Xxh2xziH0bA7cNoo/viewer?f=0

Data Format Types—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1GQK2dT2zcnYyaiUvO3Jw_qA7_zpEPQ4XYOTdtqlBiGw/viewer?f=0

Deriving the least squares estimators of the slope and intercept (simple linear regression)—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=ewnc1cXJmGA

experiment data survey data—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1_7Sl8uHjWU8s3IpCzKCJ9HnNw15RU2LWi1I8BY8zqmM/viewer?f=0

Eye on Tech (Director). (2019, December 17). What is Data? Data Types, Storage and Management. https://www.youtube.com/watch?v=Qnk2FP3_r-I

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond. Journal of Statistics and Data Science Education, 29(3), 251–266. https://doi.org/10.1080/26939169.2021.1999871

GitHub (Director). (2022, November 9). What is GitHub? https://www.youtube.com/watch?v=pBy1zgt0XPc

Great Learning (Director). (2017, July 6). Domain Knowledge is the Key to Success in Big Data Analytics | Amit Kapoor | Great Learning. https://www.youtube.com/watch?v=y1bDZXVqZy8

Grigoryan, A. (n.d.). Measure theory and probability.

Hassani, H., & Yeganegi, M. R. (2020). Selecting optimal lag order in Ljung–Box test. Physica A: Statistical Mechanics and Its Applications, 541, 123700. https://doi.org/10.1016/j.physa.2019.123700

Hyndman, R. J., & Khandakar, Y. (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software, 27(3). https://doi.org/10.18637/jss.v027.i03

information loss and data aggregation—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1vZN2aAjesB3MaIHQZLylxIlG5w0opQe9LdE8S1gkqtQ/viewer?f=0

jackfrags (Director). (2023, March 25). This just changed the gaming industry forever... https://www.youtube.com/watch?v=BEOunRvOUjQ

McHugh, M. L. (2013). The Chi-square test of independence. Biochemia Medica, 143–149. https://doi.org/10.11613/BM.2013.018

metrics—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1-JH4cLkSWdSoovKL2LxFksgIC4KooNV1mPCQc5IaleE/viewer?f=0

Mixed Random Variables | Examples. (n.d.). Retrieved April 26, 2023, from https://www.probabilitycourse.com/chapter4/4_3_1_mixed.php

model limits—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1GIMKNa-r0QSl9NzWzplgK6r_XID5lsFdWV6mgdNqO0E/viewer?f=0

Ordinal Vs. Discrete Variables—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1dqKaN8m_qohWBKudd5DVjpxmSKXTyUSdHs1CFEUy6r4/viewer?f=0

Overfitting and train test splits and CV - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1qEMNi19P6iY5TeS2Wm3mC6JxqopXIAo7HU0XGNLm0x8/viewer?f=0

Perform Linear Regression Using Matrices—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=Qa_FI92_qo8

Populations, Samples, Parameters, and Statistics—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/embed/MYjgfoNAKkk?feature=oembed

probability—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1dIzPQ_Eyy9mbskczMTsAz_elHRBa8J7zo-Kp_xUbtL8/viewer?f=0

Prof. Essa (Director). (2021, June 4). What is a critical value? https://www.youtube.com/watch?v=EhZUYFxjVPw

R programming for beginners – statistic with R (t-test and linear regression) and dplyr and ggplot—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=ANMuuq502rE

Random Variable: Definition, Types, How Its Used, and Example. (n.d.). Investopedia. Retrieved April 26, 2023, from https://www.investopedia.com/terms/r/random-variable.asp

Review of Basic Statistical Concepts - Introductory statistics dealt with three main areas: - Studocu. (n.d.). Retrieved April 26, 2023, from https://www.studocu.com/ph/document/university-of-baguio/statistics/review-of-basic-statistical-concepts/29620752

Regression—Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1RMK5D3XDR3cseI_QM9ay5siQr1TAqlyOj6iihqfzXgs/viewer?f=0

RPubs—Data Analysis: An Introduction to R Tools and Techniques. Compiled By: Joshua Lizardi. (n.d.). Retrieved April 26, 2023, from https://rpubs.com/ibang/An_Introduction_to_R_Tools_and_Techniques

SciToons (Director). (2019, January 31). Data Visualization and Misrepresentation. https://www.youtube.com/watch?v=x-rDVXVwW9s

Scott, D. M. (2009). Statistics, Inferential. In International Encyclopedia of Human Geography (pp. 429–435). Elsevier. https://doi.org/10.1016/B978-008044910-4.00535-6

The Central Limit Theorem, Clearly Explained!!! - YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/embed/YAlJCEDH2uY?feature=oembed

The Future Of Gaming Is Mobile—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=w8-af0msz1E

Types of Sampling Methods (4.1)—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/embed/pTuj57uXWlk?feature=oembed

What is Data? - Definition from WhatIs.com. (n.d.). Data Management. Retrieved April 26, 2023, from https://www.techtarget.com/searchdatamanagement/definition/data

What is Data? - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1cBX555LiytMSDFeuznpZApVvUC4JKZaf1Hk9FT_JwBM/viewer?f=0

What is Git? Explained in 2 Minutes! - YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=2ReR1YJrNOM

Where does Data come from? What do we Do with Data? - Google Jamboard. (n.d.). Retrieved April 26, 2023, from https://jamboard.google.com/d/1X_dN5x1Sv1xd_QUGzgaMvGhCB2eoNsStTugFBIPnxAs/viewer?f=0

Why Use R? - R Tidyverse Reporting and Analytics for Excel Users—YouTube. (n.d.). Retrieved April 26, 2023, from https://www.youtube.com/watch?v=jn_3N_o2d6Q

1 Welcome to BA 6933 Statistics and Quantitative Methods in person session!!!! (480mins).

1.1 BA-6933: In Person Section to Hybrid Course, 8:00 am to 5:30pm Given by: J. Lizardi.

1.1.0.0.1 THELINK-Trine University OER

1.1.0.0.1.1 CC License Information

2 TODAYS SCHEDULE:

3 About You

4 About Me

5 Introduction: What exactly is data?

6 But what IS DATA?

7 Domain Expertise

8 The Case Study Domain & Dataset

9 Data Types

10 More on Data

11 Summarizing & Presenting Data

12 Statistics!

13 MATH TIME 2

14 More Statistics!

15 Hypotheses Generation

16 Statistical Inference: Decision Making: Hypothesis Testing and Intervals

16.1 Assumption Checking for Statistical Testing

17 Hypothesis Tests: The basics

17.1 Summary of Inferential Analysis Steps

17.1.1 So we can answer statistical questions !!!

18 Statistics: Ask and Answer.

18.1 Data Activities 1-

19 Regression analysis.

20 Logistic Regression intro:

21 Neural Networks intro:

22 What Is a Transformer Model?

23 Tools for your careers.

24 References: