Wikipedia :
“Data represents the raw facts and figures which can be used in such a manner in order to capture the useful information out of it”.
Data has been described as “the new oil of the digital economy”.
Data, as a general concept, refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.
Data is the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion
Information is defined as classified or organized data that has some meaningful value for the user. Information is also the processed data used to make decisions and take action.
Processed data must meet the following criteria for it to be of any significant use in decision-making:
Accuracy: The information must be accurate.
Completeness: The information must be complete.
Timeliness: The information must be available when it’s needed.
Information is created when data are processed, organized, or structured to provide context and meaning. Information is essentially processed data.
Knowledge is what we know. Knowledge is unique to each individual and is the accumulation of past experience and insight that shapes the lens by which we interpret, and assign meaning to, information. For knowledge to result in action, an individual must have the authority and capacity to make and implement a decision.
Knowledge (and authority) are needed to produce actionable information that can lead to impact
Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.
Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning.
Data science uses the most powerful hardware, programming systems, and most efficient algorithms to solve the data related problems. It is the future of artificial intelligence.
In short, we can say that data science is all about:
Asking the correct questions and analyzing the raw data.
Modeling the data using various complex and efficient algorithms.
Visualizing the data to get a better perspective.
Understanding the data to make better decisions and finding the final result.
Various sectors use Data Science to extract the information they need
to create different services and products. The below info-graphic
illustrates some of the domains where Data Science is creating its
impression.
Wikipedia
Analytics is the systematic computational analysis of data or statistics.
It is used for the discovery, interpretation, and communication of meaningful patterns in data.
Analytics entails applying data patterns toward effective decision-making.
Analytics can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.
Any business professional who makes decisions needs foundational data analytics knowledge. Access to data is more common than ever. If you formulate strategies and make decisions without considering the data you have access to, you could miss major opportunities or red flags that it communicates.
Professionals who can benefit from data analytics skills include:
Marketers, who utilize customer data, industry trends, and performance data from past campaigns to plan marketing strategies
Product managers, who analyze market, industry, and user data to improve their companies’ products
Finance professionals, who use historical performance data and industry trends to forecast their companies’ financial trajectories
Human resources and diversity, equity, and inclusion professionals, who gain insights into employees’ opinions, motivations, and behaviors and pair it with industry trend data to make meaningful changes within their organizations
Data by itself is just an information source. But, unless you understand it, you will not be able to use it effectively. Let’s take an example of Transactions in a Bank Account. A sample extract is illustrated below.
When these transaction
details are presented as a line chart, the deposit and withdrawal
patterns become apparent.It helps view and analyze general trends and
discrepancies.
Data
Analytics is a combination of processes to extract information
from datasets.
In today’s information age ,
organizations have access to more forms of data than ever before, with
new data and information coming from multiple sources by the minute.For
different stages of business analytics huge amount of data is processed
at various steps.
There are four main categories of data analytics — descriptive, diagnostic, predictive and prescriptive. These four types together answer everything a company needs to know- from what’s going on in the company to what solutions to be adopted for optimizing the functions.
This is the process of describing or summarizing the existing data using existing business intelligence tools to better understand what is going on or what has happened.
This type of analytics involves analyzing historical data and identifying Key Performance Indicators that can be part of the business objective.
This can be further separated into two categories: Ad hoc reporting and Canned reports.
A canned report is one that has been designed previously and contains information around a given subject. e.g. monthly report sent by the advertising team that details performance metrics on your latest ad efforts.
Ad hoc reports, on the other hand, are specifically designed , useful for obtaining more in-depth information about a specific query. e.g. social media profile looking at the types of people who’ve liked your page along with what other pages in your industry they’ve liked as well as any other engagement and demographic information.
Techniques like data aggregation, data mining, clustering and/or summary statistics all serve to provide analytics that describe a past state.
Diagnostic data analytics is the process of examining data to understand cause and event, or why something happened. The result of the analysis is often an analytic dashboard.n particular, diagnostic data analytics help answer why something occurred.
Diagnostic Data Analysis is broken down into two specific categories: discover & alerts and query & drill-downs.
i. Query and drill-downs are what you’ll use to get more detail from a report. For example, let’s say that one of your sales reps closed significantly fewer deals last month. A drill-down could show fewer work days, reminding you that they had used 2 weeks vacation that month explaining the dip.
ii. Discover and alerts can be used to be notified of a potential issue beforehand, such as alerting you to a low amount of man hours which could result in a dip in closed deals. You could also use diagnostic data analytics to “discover” information like who the best candidate for a new position at your company is.
Diagnostic analytics can also provide guidance by helping to:
i. Identify outliers : e.g. a sudden drop in sales or an explosion in website traffic that can’t be explained and may indicate a need for additional examination.
ii. Isolate patterns : in this case the analysts may need to look outside the existing data-set to identify the source of the pattern. e.g. a sudden drop in sales may have stemmed from the launch of a competitor product.
iii. Uncover relationships : using more complex analytics, analysts may employ probability theory, regression analysis, or time series to isolate cause and effect relationships.
Predictive data analytics emphasizes on predicting the possible outcome using statistical models and machine learning techniques.
Predictive analytics focuses on determining “what will happen” in the future based on analysis of historical data.
Prediction is accomplished by applying techniques such as principal components analysis, sensitivity analysis and training algorithms for classification and regression on historical data.
Predictive analytics can also provide a diagnosis i.e. server as a tool for diagnostic analytics.
Prescriptive analytics is a type of predictive analytics that is used to recommend one or more course of action on analyzing the data.
Prescriptive analytics builds on predictive analytics by helping determine recommended (prescribed) actions based on desired potential (predicted) outcomes, helping organizations achieve their business objectives.
Prescriptive analytics models are constantly “learning” through feedback mechanisms to continuously analyze action and event relationships and recommend the optimal solution. By simulating the solution, prescriptive analytics can examine all the key performance criteria to ensure the outcome would achieve the correct metric goals before anything is implemented.
Artificial intelligence, machine learning and neural network algorithms are often employed to support prescriptive analytics by helping to make specific suggestions based on nuanced patterns and perceptions of organizational goals, limitations and influencing factors.
The art and science of answering questions and exploring ideas through the processes of gathering data, describing data, and making generalizations about a population on the basis of a smaller sample.
Examples :
Choosing a Medication : Your doctor gives you the option to choose one of two different medications. She provides you with research studies comparing the two medications. How can you use those research studies to inform your decision?
School Curriculum : Your child’s school is selecting a new science curriculum. The administration has narrowed it down to three different curricula and is asking parents to vote. What information would you ask for to inform your vote?
Driving to Work : There are two routes that you could take to get to work in the morning. If you go through town, it usually takes between 6 and 14 minutes, depending on the traffic and red lights. If you take the interstate, it consistently takes 10 minutes. Which route will you take to work this morning?
Marketing Decisions : Your company has put you in charge of making a decision between two marketing campaigns. How can you design a research study to collect data to inform your decision?
Study of height in the population
Population : Any large collection of objects or individuals, such as Americans, students, or trees about which information is desired
Parameter : Any summary number, like an average or percentage, that describes the entire population
The population mean μ (the Greek letter “mu”) and the population proportion p are two different population parameters.
We might be interested in learning about the average weight of all middle-aged female Americans. The population consists of all middle-aged female Americans, and the parameter is μ
We might be interested in learning about the proportion of likely American voters approving of the president’s job performance. The population comprises all likely American voters, and the parameter is p
Sample : is a representative group drawn from the population
Statistic : any summary number, like an average or percentage, that describes the sample
Because samples are manageable in size, we can determine the actual value of any statistic. We use the known value of the sample statistic to learn about the unknown value of the population parameter.
Techniques of describing data in ways to capture the essence of the information in the data are called descriptive statistics. For example, the sample mean is a descriptive statistic. To draw conclusions from data about the population is called inferential statistics.
There are four steps in the statistical analysis process :
Step 1: Find the population of interest that suits the purpose of statistical analysis.
Step 2: Draw a random sample that represents the population.
Step 3: Compute sample statistics to describe the spread and shape of the data-set.
Step 4: Make inferences using the sample and calculations. Apply it back to the population.
Collecting data is an important first step in statistical analysis. The goal of statistics is to make inferences about a population based on a sample. How we collect the data is important. If the sample is not representative of the whole population, we cannot make inferences about the population from that sample.
The following are a few frequently used methods for collecting data:
Personal Interview : People usually respond when asked by a person but their answers may be influenced by the interviewer.
Telephone Interview : Cost-effective but need to keep it short since respondents tend to be impatient.
Self-Administered Questionnaires : Cost-effective but the response rate is lower and the respondents may be a biased sample.
Direct Observation : For certain quantities of interest, one may be able to measure it from the sample.
Web-Based Survey : Can only target the population who uses the web.
Simulation : A computer model for the operation of an (industrial) system is set up in which an important measurement is a percentage purity of a (chemical) product.
Controlled Experiments : An experiment is possible when the background conditions can be controlled, at least to some extent. For example, we may be interested in choosing the best type of grass seed to use in the sports field.
Variable : A variable is any characteristic, number, or quantity that can be measured, counted, or observed for record.
There may be many variables in a study. The variables may play different roles in the study.
Variables can be classified as either explanatory or response variables.
Response Variable : Variable that about which the researcher is posing the question. May also be called the outcome or the dependent variable.
Explanatory Variable : Variables that serve to explain changes in the response. They may also be called the predictor or independent variables.
Distinguishing between the different types of variables is a basic and integral part of applied statistics. The methods to analyze these data are very different and therefore it is important to make the distinction. Generally, variables will come in two varieties; categorical and quantitative.
Categorical variables group observations into separate categories that can be ordered or un-ordered.
Quantitative variables on the other hand are variables expressed numerically, whether as a count or measurement
Variable Types
Quantitative Variables
We can think of quantitative variables as any information about an observation that can only be described with numbers. Quantitative variables are generally counts or measurements of something (eg., number of points earned in a game or height).
There are two types of quantitative variables; discrete and continuous, and they both help to serve different functions in a dataset.
Discrete Variables : Discrete quantitative variables are numeric values that represent counts and can only take on integer values. They represent whole units that can not be broken down into smaller pieces, and as such cannot be meaningfully expressed with decimals or fractions. Examples of discrete variables are the number of children in a person’s family or the number of coin flips a person makes.
Continuous Variables : Continuous quantitative variables are numeric measurements that can be expressed with decimal precision. Theoretically, continuous variables can take on infinitely many values within a given range. Examples of continuous variables are length, weight, and age which can all be described with decimal values.
Sometimes the line between discrete and continuous variables can be a bit blurry. For example, age with decimal values is a continuous variable, but age IN CLOSEST WHOLE YEARS by definition is discrete. The precision with which a quantitative variable is recorded can also determine how we classify the variable
Categorical Variables
Categorical variables differ from quantitative variables in that they focus on the different ways data can be grouped rather than counted or measured. With categorical variables, we want to understand how the observations in our data-set can be grouped and separated from one another based on their attributes.There are two types of categorical variables : Ordinal and Nominal.
Ordinal Variables :
When the groupings of a categorical variable have a specific order
or ranking , it is an ordinal variable.
Suppose there was a variable containing responses to the question “Rate
your agreement with the statement: The minimum age to drive should be
lowered.” The response options are “strongly disagree”, “disagree”,
“neutral”, “agree”, and “strongly agree”.Because we can see an order
where “strongly disagree” < “disagree” < “neutral” < “agree”
< “strongly agree” in relation to agreement, we consider the variable
to be ordinal
Nominal Variables : If there is no apparent
order/ranking to the categories of a categorical variable, we refer to
to it as a nominal variable.
Nominal categorical variables are those variables with two or more
categories that do not have any relational order. Examples of nominal
categories could be states in the U.S., brands of computers, or
ethnicity. Notice how for each of these variables, there is no intrinsic
ordering that distinguishes a category as greater than or less than
another category.
The number of possible values for a nominal variable can be quite
large. It’s even possible that a nominal categorical variable will take
on a unique value for every observation in a dataset, like in the case
of unique identifiers such as name or email_address.
Binary Variables : Binary or dichotomous
variables are a special kind of nominal variable that have only two
categories.
Because there are only two possible values for binary variables, they
are mutually exclusive to one another. We can imagine a variable that
describes if a picture contains a cat or a dog as a binary variable. In
this case, if the picture is not a dog, it must be a cat, and vice
versa. Binary variables can also be described with numbers similar to
bits with 0 or 1 values. Likewise you may find binary variables
containing Boolean values of True or False.
Simplilearn
Geeks for Geeks
Codecademy
Wikipedia