Data Analytics

Data analytics is the process of turning data into meaningful insights. While this definition is a bit broad, so too is the world of analytics. There are many types of analytic techniques that fall into this category such as basic data summaries, data visualizations, regression techniques, and others. Data science on the other hand, focuses on discovering new questions that we may not have known needed answering in order to drive innovation. We can think of data analytics as a subcategory of data science. In this course, while we often refer to ourselves as data analysts, we strive to go far beyond what is done by an analyst and into the data science space. It should be noted that most companies use these terms interchangeably (though, they may not know that they have done so). You can be confident that this program will give you the skills necessary to excel at both jobs.

Types of Analytics

Analytics normally falls into 3 or 4 broad categories: descriptive, diagnostic, predictive and prescriptive (diagnostic analytics is sometimes omitted as is the case in your textbook). Each type of analytics has a different level of complexity; however, as the complexity level rises, so too does the value to the business/ customer.

Descriptive analytics focuses on what happened in the past. Things like data quires, reports and descriptive statistics are involved in descriptive analytics. Data dashboards are often found in descriptive analytics as they are useful in visually illustrating several metrics simultaneously.

Diagnostic analytics focus on why something happened. Data mining is a broad term that describes the use of analytical techniques to understand patterns and relationships within data sets. Using data mining techniques, we can begin to build insight as to why there is a relationship within the data.

Predictive analytics consists of techniques that use models constructed from past data to predict the future or ascertain the impact of one variable on another. This type of analytics goes beyond understanding the relationship between data and tries to understand if the relationship will continue moving forward. Techniques like linear regression, simulation and logistic regression all fall under this umbrella. Often it is data mining that unearths these relationships and so data mining plays a role in this type of analytics, too.

Prescriptive analytics indicates a course of action moving forward for the company/ client. Thus, the output of a model is a decision. Forecasts and predictions come from predictive analytics. When those forecasts and predictions are combined with a set of rules, then the analysis becomes prescriptive. Hence, prescriptive analytics are sometimes referred to as rule-based models.

What About Big Data?

According to Wikipedia, “Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software”. While data that has a large number of columns often are useful in statistical techniques, these data tend to lead to an increase in discoveries that are ultimately incorrect.

Normally, we discuss big data in terms of V’s. While there is some disagreement as to the number of V’s that should be used, the standard is somewhere between three and five:

Other data V’s not discussed in your textbook include value and variability. As a result of these challenges, technologies like Hadoop were developed to help deal with big data. You can read more about Hadoop at it’s website.

How Big is Big Data

Because data are collected electronically, we are able to collect more of it. To be useful, these data must be stored, and this storage has led to vast quantities of data. Many companies now store in excess of 100 terabytes of data (a terabyte is 1,024 gigabytes). Thus, the volume of data has made traditional processing techniques difficult.

In the chart below, it is helpful to recall that \(2^{10} = 1,024\).

Name Equal to: Size in Bytes
Bit 1 bit 1/8
Nibble 4 bits 1/2 (rare)
Byte 8 bits 1
Kilobyte 1,024 bytes 1,024
Megabyte 1,024 kilobytes 1,048,576
Gigabyte 1,024 megabytes 1,073,741,824
Terrabyte 1,024 gigabytes 1,099,511,627,776
Petabyte 1,024 terrabytes 1,125,899,906,842,624
Exabyte 1,024 petabytes 1,152,921,504,606,846,976
Zettabyte 1,024 exabytes 1,180,591,620,717,411,303,424
Yottabyte 1,024 zettabytes 1,208,925,819,614,629,174,706,176

Real-time capture and analysis of data present unique challenges both in how data are stored, and the speed with which those data can be analyzed. The New York Stock Exchange, for example, collects 1 terabyte of data in a single trading session, and having current data and real-time rules for trades and predictive modeling are important for managing stock portfolios. This increase in velocity both in terms of intake and in terms of processing has made previous processing techniques obsolete.

To determine how many Kilobytes are in 1 Terrabyte, we observe: \[1 Tb \times \dfrac{2^{10}Gb}{1Tb} \times \dfrac{1^{10}Mb}{1Gb} \times \dfrac{2^{10}Kb}{1Mb} = 2^{30}Kb.\]

Types of Analytics

While deciding what flavor of data science you might pursue, it makes sense to keep a number of options in mind:

  1. Financial Analytics

Predictive models are used to forecast financial performance, to assess the risk of investment portfolios and projects, and to construct financial instruments such as derivatives.

  1. Human Resource (HR) Analytics

A relatively new area of application for analytics is the management of an organization’s human resources (people analytics). The HR function is charged with ensuring that the organization has the mix of skill sets necessary to meet its needs, is hiring the highest-quality talent and providing an environment that retains it, and achieves its organizational diversity goals.

  1. Marketing Analytics

A better understanding of consumer behavior through the use of scanner data and data generated from social media has led to an increased interest in marketing analytics. As a result, analytics are heavily used in marketing. A better understanding of consumer behavior through analytics leads to the better use of advertising budgets, more effective pricing strategies, improved forecasting of demand, improved product-line management, and increased customer satisfaction and loyalty.

  1. Health Care Analytics

The use of analytics in health care is on the increase because of pressure to simultaneously control costs and provide more effective treatment. A study by McKinsey Global Institute (MGI) and McKinsey & Company estimates that the health care system in the United States could save more than $300 billion per year by better utilizing analytics; these savings are approximately the equivalent of the entire gross domestic product of countries such as Finland, Singapore, and Ireland.

  1. Supply Chain Analytics

The core service of companies such as UPS and FedEx is the efficient delivery of goods, and analytics has long been used to achieve efficiency. The optimal sorting of goods, vehicle and staff scheduling, and vehicle routing are all key to profitability for logistics companies such as UPS and FedEx.

Optimal travel routes for delivery drivers is a problem proven to be very difficult (computationally infeasible) to solve. Thus, data science techniques in this area often deal with finding “better” solutions rather than finding the “best” solutions.

  1. Analytics for Nonprofits

Government and other nonprofit groups have used analytics to drive out inefficiencies and increase the effectiveness and accountability of programs. Nonprofit agencies have also used analytics to ensure their effectiveness and accountability to their donors and clients.

  1. Sports Analytics

The use of analytics for player evaluation and on-field strategy is now commonplace. Professional sports teams use analytics to assess players for amateur drafts and to make decisions on financial compensation during contract negotiations. Teams use analytics to assist with on-field decisions such as which pitchers to use and for how long. The use of analytics for off-the-field business decisions related to the fan experience inside stadiums is also increasing rapidly. Ensuring customer satisfaction is important for any company, and fans are the customers of sports teams. Over $4.5-million is spent annually on sports analytics.

  1. Web Analytics

Web analytics is the analysis of online activity, which includes, but is not limited to, visits to web sites and social media sites such as Facebook and LinkedIn. Web analytics obviously has huge implications for promoting and selling products and services via the Internet. Leading companies apply descriptive and advanced analytics to data collected in online experiments to determine the best way to configure web sites, position ads, and utilize social networks for the promotion of products and services.

  1. City Planning

Data science can be used to model traffic patterns for cars, bikes and pedestrians. Based on data from cellphones, city planners can determine optimal street light and traffic light patterns.

  1. Analytics for Military

Analytics is used in military operations has been going on since World War II. Analytics can predict the outcome of a mission, the probability of a successful operation and the likely casualties that may be incurred. It is also common to use analytics in applications related to information security and espionage, as we saw during World War II.

  1. Analytics in Law Enforcement

The U.S. Immigrations and Customs Enforcement (ICE) has used facial recognition technology to mine driver’s license photo databases, with the goal of deporting undocumented immigrants. Additionally, the American judicial system employs software to gauge an incarcerated person’s risk of reoffending.

  1. Analytics in Accounting

Tax evasion costs the U.S. government $458 billion each year. To combat these crimes, the IRS has modernized its fraud-detection protocols. The agency has improved efficiency by constructing multidimensional taxpayer profiles from public social media data, assorted metadata, emailing analysis, electronic payment patterns and more. Based on those profiles, the agency forecasts individual tax returns. Anyone with wildly different real and forecasted returns gets flagged for auditing.

  1. Analytics in Dating

When singles match on Tinder, a carefully-crafted algorithm works behind the scenes, boosting the probability of matches. Initially, this algorithm relied on users’ Elo scores, essentially an attractiveness ranking. Now, it prioritizes matches between active users, users near each other and users who seem like each other’s “types” based on their swiping history.

Citations

Camm, Jeffrey D. Business Analytics. Third edition, Cengage, 2019.

McKinsey Global Institute. (2016) The Age of Analytics: Competing in a Data-Driven World. Available here.

Wikipedia contributors. (2021, April 12). Apache Hadoop. In Wikipedia, The Free Encyclopedia. Available here.

Wikipedia contributors. (2021, April 12). Big data. In Wikipedia, The Free Encyclopedia. Available here.