Review from Lecture 1 and MAS 261 (or previous course)
Open the small data file, Lecture 2 Review Question.xlsx
Use relative and absolute cell references as shown to determine the value of the SMALLEST NEGATIVE deviation.
Round your answer to two decimal places.
Recall from MAS 261: A deviation is the difference from the mean, Observation - Mean
Parsing Text
You work at Jet Blue and plan to offer flights to Rapid City, South Dakota
Flight data for the state of South Dakota for 2024 is shown in the first sheet of Lecture 2 Excel Worksheets.xlsx.
We ONLY want data for flights TO Rapid City so we need to
Separate ‘Origin’ and ‘Destination’ into TWO separate columns
Subset (filter) data to observations with Rapid City as ‘Destination’
Parsing Text Con’t
In Excel:
Select column with text we want to ‘parse’ or ‘split’, column D.
Notice that between each Origin and Destination in the data there is a ‘-’.
Click on ‘Text to Columns’ in Data tab.
In the pop-up window, select Delimited and click Next.
Parsing Text Con’t
In Excel:
Under Delimiters select Other and specify –.
– is the delimiter or separator between ORIGIN and DESTINATION in our data.
The Data preview window below should show that the two variables are split correctly.
Click Finish (No need for Step 3).
Data set now has two SEPARATE variables, ORIGIN, and DESTINATION.
Filtering Data
We want to filter the data to rows (observations) that have Rapid City, SD as the destination.
Select the DESTINATION column and click on the Filter icon.
Click on the triangle that appears next to the variable name.
Uncheck ‘Select All’
Filtering Data Con’t
We want to filter the data to rows (observations) that have Rapid City, SD as the destination.
Scroll down, or enter Rapid City, SD in the search window.
Check the box next to Rapid City, SD and click OK.
Click grey triangle in upper left corner of sheet to select whole sheet and then copy and paste it into Sheet 2.
Green dashed lines between rows verify that only the filtered rows are selected.
Formatting Filtered Data
With whole second sheet (Flights - Worksheet 2) selected:
click Cells > Format > Autofit Column Width.
Save your Excel file.
Lecture 2 In-class Exercise - Q2-Q3
Question 2. How many observations are in this new data set? That is, how many flights were there to Rapid City, SD in 2024?
HINT: Exclude first row which lists the variable names.
Question 3. Now filter the data on Worksheet 2 to determine how many of the flights to Rapid City in 2024 were on Allegiant Air.
HINT: Select ‘CARRIER’ column and filter for ‘Allegiant Air’ only.
Undo the filter option from Question 3 before we continue.
Concatentating Text
Sometimes we may want to concatenate or join text from two variables into one.
This is the reverse of the parsing what we just did.
We will take the full data set (148 observations) on Worksheet 2 and create a new variable Origin / Destination.
This is done using the =CONCAT command in Excel and uses the RELATIVE cell references from Lecture 1.
We start by labeling our new column:
Concatentating Text Cont’d
In cell F2, type the following formula and click Enter or Return:
=CONCAT(D2, “ / ”, E2)
Note: Quotes are REQUIRED and that there is a space before and after the slash.
Copy and paste cell F2 into cells F3 through F149.
Delinking Columns
Column F is LINKED to columns D and E.
If we delete the text in cell D2, it will also disappear, in cell F2.
We can resolve this issue for ANY column with a formula, by replacing the formula with the values shown in the cells.
Delinking Columns
Select column we want to delink, column F and copy it.
Ctrl + C or Cmd + C
Right click to bring up Paste Options and select Values (V) (clipboard with numbers).
This replaces formulas with the values in the cells and delinks that text from columns used to create it.
We can then DELETE Origin and Destination and our new column will not change.
Excel Tables
Worksheet 3 has a famous dataset that shows the passengers on the ill-fated Titanic that sank on April 15th, 1912.
We want to convert these data to an Excel Table to make our work more efficient.
Note that these data are located in
Columns: A – F
Rows: 1 - 1311
Excel Tables Cont’d
Click on any NON-EMPTY cell in the worksheet.
In the ‘Tables’ section of the ‘Insert’ tab, click on ‘Table’
Verify (or fill in if incorrect) location of all of your data, and click ‘OK’
Excel Table Created!
This Excel table has useful features so we can:
remove duplicates automatically.
do quick(er) calculations.
sort and filter data efficiently.
Duplicates
Notice that the 2nd and 3rd rows appear to be duplicates.
These errors are not always so easy to spot and there may be others in the data.
Removing Duplicates
Click on any cell in table so that Table Design option appears and then click on it.
Click on Remove Duplicates
Removing Duplicates Cont’d
We opt to select only NAME and AGE to maximize the chance of finding all duplicates. Cick OK.
We see that only one duplicate was removed.
Note that the variables selected affects how duplicates are determined.
This choice depends on your data and may require trial and error.
Excel Table Calculations
The Fares (ticket prices) are what the passengers paid in 1912 in British Pounds.
Convert these fares to today’s British Pound (Fare x 146) as follows:
Type FARE TODAY in G1 and notice that table automatically expands.
Widen column if needed.
In cell G2 type =F2*146 and click Enter.
Entire column is instantly populated.
Excel Table Calculations Cont’d
Convert these fares in today’s British Pounds to today’s U.S. dollars.
Type FARE TODAY ($) in H1 and notice (again) that table automatically expands.
Widen column if needed.
In cell H2 type =G2*1.23 and click Enter.
Entire column is instantly populated.
Select column H and convert
Sorting Excel Tables
We can easily sort and filter data in Excel tablesto answer questions.
For example:
How much was the most expensive ticket in US Dollars?
Click triangle next to FARE TODAY ($).
This opens a menu for sorting and filtering.
Select Sort Largest to Smallest.
Lecture 2 In-class Exercise - Q4
Based on the sorted data from the Titanic, what was the price of the most expensive fare sold in today’s U.S. dollars?
Round answer to closest whole dollar.
Sorting and Filtering Excel Tables
We can sort and filter the data to find the age of the oldest male passenger that survived.
Filter SURVIVED? to Yes.
Filter GENDER to male.
Sort AGELargest to Smallest.
Lecture 2 In-class Exercise - Q5-Q6
Question 5. What is the age of the oldest male to survive the titanic disaster?
Question 6. What was the age of the oldest female in THIRD class to survive?
HINT: Switch the GENDER filter to female and filter CLASS to Third.
Key Points from Today
Text Functions
Text to Columns is used for parsing (splitting) data.
=CONCAT function is used for joining text from 2 columns.
Excel Tables
Expedites data cleaning
Automates column calcuations
Includes efficient sorting and filtering options
Note: These skills are excellent for single tasks.
If you plan to repeat these tasks on multiple similar data sets 5 or 10 or 100 times, then it’s time to learn some coding in R or Python.
For 1 or 2 (or 3) tasks, Excel may be more efficient
To submit an Engagement Question or Comment about material from Lecture 2: Submit it by midnight today (day of lecture).
Source Code
---title: "BUA 345 - Lecture 2"subtitle: "Data Preparation in Excel"author: "Penelope Pooler Eisenbies"date: last-modifiedlightbox: truetoc: truetoc-depth: 3toc-location: lefttoc-title: "Table of Contents"toc-expand: 1format: html: code-line-numbers: true code-fold: true code-tools: trueexecute: echo: fenced---## Housekeeping```{r setup, echo=FALSE, warning=F, message=F, include=F}#| include: false# this line specifies options for default options for all R Chunksknitr::opts_chunk$set(echo=F)# suppress scientific notationoptions(scipen=100)# install helper package that loads and installs other packages, if neededif (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")# install and load required packagespacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra, countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt, mosaicData, epiDisplay, vistributions, psych, tidyquant, dygraphs)# verify packages# p_loaded()```### Today's plan- Review Question- Text Functions - Parsing (Splitting) Text - Concatenating (Joining) Text- Excel Tables - Removing Duplicates - Quick Calculations - Intro to Sorting and Filtering<br>::: fragment**In-class Polling (Session ID: bua345s25)**:::## ### Lecture 2 In-class Exercise - Q1**Review from Lecture 1 and MAS 261 (or previous course)**Open the small data file, Lecture 2 Review Question.xlsxUse relative and absolute cell references as shown to determine the value of the SMALLEST NEGATIVE deviation.Round your answer to two decimal places.**Recall from MAS 261: A deviation is the difference from the mean, Observation - Mean**## Parsing Text- You work at Jet Blue and plan to offer flights to Rapid City, South Dakota- Flight data for the state of South Dakota for 2024 is shown in the first sheet of `Lecture 2 Excel Worksheets.xlsx`.::: fragment{fig-align="center"}:::- We ONLY want data for flights TO Rapid City so we need to 1. Separate ‘Origin’ and ‘Destination’ into TWO separate columns 2. Subset (filter) data to observations with Rapid City as ‘Destination’## :::::::: columns::: {.column width="48%"}### Parsing Text Con'tIn Excel:1. Select column with text we want to ‘parse’ or ‘split’, column `D`.2. Notice that between each Origin and Destination in the data there is a ‘-’.3. Click on ‘Text to Columns’ in Data tab.4. In the pop-up window, select `Delimited` and click `Next`.:::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center" height="2.5in"}:::::: fragment{fig-align="center" height="3.5in"}::::::::::::::::## ::::::::: columns:::: {.column width="48%"}### Parsing Text Con'tIn Excel:1. Under `Delimiters` select `Other` and specify `–`. - `–` is the delimiter or separator between `ORIGIN` and `DESTINATION` in our data.2. The `Data preview` window below should show that the two variables are split correctly.3. Click `Finish` (No need for Step 3).::: fragmentData set now has two SEPARATE variables, ORIGIN, and DESTINATION.:::::::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center" height="3.5in"}:::::: fragment{fig-align="center"}:::::::::::::::::## Filtering Data:::::::: columns::: {.column width="48%"}We want to filter the data to rows (observations) that have `Rapid City, SD` as the destination.1. Select the `DESTINATION` column and click on the `Filter` icon.2. Click on the triangle that appears next to the variable name.3. Uncheck ‘Select All’:::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center" height="2.5in"}:::::: fragment{fig-align="center" height="3.5in"}::::::::::::::::## ::::::::: columns::: {.column width="48%"}### Filtering Data Con'tWe want to filter the data to rows (observations) that have `Rapid City, SD` as the destination.4. Scroll down, or enter `Rapid City, SD` in the search window.5. Check the box next to `Rapid City, SD` and click OK.6. Click grey triangle in upper left corner of sheet to select whole sheet and then copy and paste it into Sheet 2. - Green dashed lines between rows verify that only the filtered rows are selected.:::::: {.column width="4%"}::::::::: {.column width="48%"}::: fragment{fig-align="center" height="2.5in"}:::::: fragment{fig-align="center" height="1in"}:::::: fragment{fig-align="center" height="1in"}::::::::::::::::::## Formatting Filtered Data1. With whole second sheet (`Flights - Worksheet 2`) selected: - click `Cells`\>`Format`\>`Autofit Column Width`.2. Save your Excel file.::: fragment{height="2.5in"}:::::: fragment:::## ### Lecture 2 In-class Exercise - Q2-Q3<br>**Question 2.** How many observations are in this new data set? That is, how many flights were there to Rapid City, SD in 2024?- HINT: Exclude first row which lists the variable names.<br>**Question 3.** Now filter the data on Worksheet 2 to determine how many of the flights to Rapid City in 2024 were on Allegiant Air.- HINT: Select ‘CARRIER’ column and filter for ‘Allegiant Air’ only.<br>::: fragment**Undo the filter option from Question 3 before we continue.**:::## Concatentating Text- Sometimes we may want to concatenate or join text from two variables into one.- This is the reverse of the parsing what we just did.- We will take the full data set (148 observations) on Worksheet 2 and create a new variable `Origin / Destination`.- This is done using the `=CONCAT` command in Excel and uses the RELATIVE cell references from Lecture 1.- We start by labeling our new column:::: fragment:::## ### Concatentating Text Cont'd1. In cell `F2`, type the following formula and click `Enter` or `Return`: - `=CONCAT(D2, “ / ”, E2)` - **Note:** Quotes are REQUIRED and that there is a space before and after the slash.::: fragment:::2. Copy and paste cell `F2` into cells `F3` through `F149`.::: fragment:::## Delinking Columns- Column `F` is LINKED to columns `D` and `E`.- If we delete the text in cell `D2`, it will also disappear, in cell `F2`.::: fragment:::- We can resolve this issue for ANY column with a formula, by replacing the formula with the values shown in the cells.## ::::::::: columns::: {.column width="48%"}### Delinking Columns1. Select column we want to delink, column `F` and copy it. - `Ctrl + C` or `Cmd + C`2. Right click to bring up `Paste Options` and select `Values (V)` (clipboard with numbers).- This replaces formulas with the values in the cells and delinks that text from columns used to create it.- We can then DELETE Origin and Destination and our new column will not change.:::::: {.column width="4%"}::::::::: {.column width="48%"}::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}::::::::::::::::::## Excel Tables:::::: columns::: {.column width="58%"}- Worksheet 3 has a famous dataset that shows the passengers on the ill-fated Titanic that sank on April 15th, 1912.- We want to convert these data to an Excel Table to make our work more efficient.- Note that these data are located in - Columns: A – F - Rows: 1 - 1311:::::: {.column width="4%"}:::::: {.column width="38%"}{fig-align="center"}:::::::::::: fragment{fig-align="center"}:::## :::::::: columns::: {.column width="58%"}### Excel Tables Cont'd{fig-align="center"}1. Click on any NON-EMPTY cell in the worksheet.2. In the ‘Tables’ section of the ‘Insert’ tab, click on ‘Table’3. Verify (or fill in if incorrect) location of all of your data, and click ‘OK’:::::: {.column width="4%"}:::::::: {.column width="38%"}::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}::::::::::::::::## :::::: columns::: {.column width="58%"}### Excel Table Created!- This Excel table has useful features so we can: - remove duplicates automatically. - do quick(er) calculations. - sort and filter data efficiently.:::::: {.column width="4%"}:::::: {.column width="38%"}{fig-align="center"}:::::::::::: fragment{fig-align="center"}:::## Duplicates:::::: columns::: {.column width="58%"}- Notice that the 2nd and 3rd rows appear to be duplicates.- These errors are not always so easy to spot and there may be others in the data.:::::: {.column width="4%"}:::::: {.column width="38%"}{fig-align="center"}:::::::::::: fragment{fig-align="center"}:::## Removing Duplicates1. Click on any cell in table so that `Table Design` option appears and then click on it.2. Click on `Remove Duplicates`::: fragment{fig-align="center"}:::::: fragment{fig-align="center" height="3in"}:::## :::::::: columns::: {.column width="48%"}### Removing Duplicates Cont'd1. We opt to select only NAME and AGE to maximize the chance of finding all duplicates. Cick OK.2. We see that only one duplicate was removed.- Note that the variables selected affects how duplicates are determined.- This choice depends on your data and may require trial and error.:::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}::::::::::::::::## Excel Table Calculations- The `Fares` (ticket prices) are what the passengers paid in 1912 in British Pounds.- Convert these fares to today’s British Pound (`Fare` x 146) as follows::::::::: columns::: {.column width="48%"}1. Type `FARE TODAY` in `G1` and notice that table automatically expands. - Widen column if needed.2. In cell `G2` type `=F2*146` and click `Enter`.3. Entire column is instantly populated.:::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}::::::::::::::::## ### Excel Table Calculations Cont'd- Convert these fares in today’s British Pounds to today's U.S. dollars.:::::::: columns::: {.column width="48%"}1. Type `FARE TODAY ($)` in `H1` and notice (again) that table automatically expands. - Widen column if needed.2. In cell `H2` type `=G2*1.23` and click `Enter`.3. Entire column is instantly populated.4. Select column `H` and convert:::::: {.column width="4%"}:::::::: {.column width="48%"}::: fragment{fig-align="center"}:::::: fragment{fig-align="center"}::::::::::::::::## Sorting Excel Tables::::::: columns::: {.column width="58%"}- We can easily sort and filter data in Excel tablesto answer questions.- For example: - How much was the most expensive ticket in US Dollars?1. Click triangle next to `FARE TODAY ($)`. - This opens a menu for sorting and filtering.2. Select `Sort Largest to Smallest`.:::::: {.column width="4%"}::::::: {.column width="38%"}::: fragment{fig-align="center"}::::::::::::::## ### Lecture 2 In-class Exercise - Q4<br>Based on the sorted data from the Titanic, what was the price of the most expensive fare sold in today’s U.S. dollars?Round answer to closest whole dollar.## ::::::::: columns::: {.column width="48%"}### Sorting and Filtering Excel Tables- We can sort and filter the data to find the age of the oldest male passenger that survived.1. Filter `SURVIVED?` to `Yes`.2. Filter `GENDER` to `male`.3. Sort `AGE``Largest to Smallest`.:::::: {.column width="4%"}::::::::: {.column width="48%"}::: fragment{fig-align="center" height="2in"}:::::: fragment{fig-align="center" height="2in"}:::::: fragment{fig-align="center" height="2in"}::::::::::::::::::## ### Lecture 2 In-class Exercise - Q5-Q6<br>**Question 5.** What is the age of the oldest male to survive the titanic disaster?<br>**Question 6.** What was the age of the oldest female in THIRD class to survive?- **HINT:** Switch the `GENDER` filter to `female` and filter `CLASS` to `Third`.## ### Key Points from Today- Text Functions - `Text to Columns` is used for parsing (splitting) data. - `=CONCAT` function is used for joining text from 2 columns.- Excel Tables - Expedites data cleaning - Automates column calcuations - Includes efficient sorting and filtering options- Note: These skills are excellent for single tasks.\ - If you plan to repeat these tasks on multiple similar data sets 5 or 10 or 100 times, then it’s time to learn some coding in R or Python. - For 1 or 2 (or 3) tasks, Excel may be more efficient::: fragment**To submit an Engagement Question or Comment about material from Lecture 2:** Submit it by midnight today (day of lecture).:::