Module 1
Individual / Group Practice 3–Simple Queries SELECT, WHERE, ORDER BY,
LIMIT
Part 1 (nfldb)
1. List the team names and cities for teams in cities with more than
a million people.
- SELECT teamname, city FROM teams WHERE population > 1000000;
2. List the city name and population of the most populous city from
Ohio or California. In the case of a tie in population, show only the
first alphabetically listed city.
- SELECT city FROM teams WHERE state IN (‘Ohio’, ‘California’) ORDER
BY population DESC, city LIMIT 1;
Part 2 (cincinnati_salary)
2. Employees with a missing salary value are currently in the
on-boarding process. Write a query to identify the name, business title,
and department name of all employees currently in the on-boarding
process.
- SELECT name, business_title, deptname FROM employee_salary WHERE
annual_rt IS NULL;
3. Write a query to find the name and salary of the 5 lowest paid
dentists in the city of Cincinnati.
- SELECT name, annual_rt FROM employee_salary WHERE jobtitle LIKE
(‘%dentist%’) AND annual_rt IS NOT NULL ORDER BY annual_rt ASC LIMIT
5;
Module 2
5–Practice Queries
Part 1: nfldb
1. List the cities that include one of the words from the following
list: “San”, “New”, “City” (hint: this is not asking for teams)
- SELECT DISTINCT city FROM teams WHERE city LIKE ‘%San%’ OR city LIKE
‘%New%’ OR city LIKE ‘%City%’;
2. List only the city name of the largest city in terms of
population from Ohio or California. In the case of a tie, show all
cities in order of landsize.
- SELECT city FROM teams WHERE state IN (‘Ohio’, ‘California’) AND
population = (SELECT max(population) FROM teams WHERE state IN (‘Ohio’,
‘California’) ORDER BY landsize DESC);
3. Report all cities, teams and population densities (population per
unit of landsize) for teams with a population greater than the average
population of all teams. Rank the results by population density from
largest to smallest.
- SELECT city, teamname, population/landsize AS
population density
FROM teams WHERE population > (SELECT
avg(population) FROM teams ORDER BY population/landsize DESC);
Part 2: cincinnati_salaries
1. The acronym “CPD” stands for “Cincinnati Police Department”. Show
only the name, job title, age range, annual salary and effective hourly
wage for all employees who either work in a department that includes the
acronym “CPD” or have a job title that includes the word “police”. Order
the result by hourly wage from largest to smallest.
- SELECT name, jobtitle, age_range, annual_rt, std_hours FROM
employee_salary WHERE deptname_abbrv LIKE ‘%CPD%’ OR jobtitle LIKE
‘%police%’ ORDER BY std_hours DESC;
2. How many city employees make more money than the most highly
compensated employee in the police department (‘CPD-Department of
Police’)? Order the result by salary from highest to lowest.
- SELECT count(name) AS
employees
FROM employee_salary
WHERE annual_rt > (SELECT max(annual_rt) FROM employee_salary WHERE
deptname IN (‘CPD-Department of Police’) );
3. What percent of employees are NOT in the department with the name
“CPD-Police”police officers?
- SELECT (SELECT count() FROM employee_salary WHERE deptname
<> ‘CPD-Department of Police’) / (SELECT count() FROM
employee_salary) AS
% of employees who are not in the Police Department
;
denom
- SELECT count(*) FROM employee_salary;
Num
- SELECT count(*) FROM employee_salary WHERE deptname <>
‘CPD-Department of Police’;
4. Write a query to answer the following question: “What percent of
departments have at least one employee earning a minimum of
$100,000?”
- SELECT (SELECT count(DISTINCT deptname) FROM employee_salary WHERE
annual_rt >= 100000) / (SELECT count(DISTINCT deptname) FROM
employee_salary ORDER BY deptname) ;
denom
- SELECT count(DISTINCT deptname) FROM employee_salary ORDER BY
deptname_abbrv;
num
- SELECT count(DISTINCT deptname) FROM employee_salary WHERE annual_rt
> 100000;
5. Assume any employee with the word ‘analyst’ in their job title is
considered an analyst. Show the following summary statistics for only
the analysts. Your calculated fields should be given intuitive names and
rounded to the hundredths place
- Total number of analysts.
- Average annual salary of analysts.
- Maximum and minimum number of hours worked by analysts.
- Number of departments with an analyst employee.
- SELECT COUNT(*), ROUND(avg(annual_rt), 2), ROUND(max(std_hours), 2),
ROUND(min(std_hours), 2), ROUND(count(DISTINCT deptname), 2) FROM
employee_salary WHERE jobtitle LIKE ‘%analyst%’;
Module 3
6–Practice Queries (nfldb + cincinnati_salaries)
1. Show a list of states and the number of Cities in the state with
an NFL team
- SELECT state, count(city) FROM teams GROUP BY state;
2. Which state(s) (has/have) the most NFL teams?
- SELECT state, count() AS
number of teams
FROM teams
GROUP BY state HAVING number of teams
= (SELECT
MAX(COUNT(*)
) FROM ( SELECT count() FROM teams GROUP
BY state) AS t1 );
3. Write a query that can be used to assess the following question:
“Do women employees make nominally less money than men on average?”
- SELECT gender, avg(annual_rt) FROM employee_salary GROUP BY
gender;
4. Write a query that can be used to assess the following question:
“What percent of employees are either racial minorities or female?”
- SELECT (SELECT count() FROM employee_salary WHERE gender = ‘F’
OR race <> ‘WHITE’) / (SELECT count() FROM employee_salary)
;
denom
- SELECT count(*) FROM employee_salary;
num
- SELECT count(*) FROM employee_salary WHERE gender = ‘F’ OR race
<> ‘WHITE’;
5. Write a query that can be used to assess the following question:
“Which business title(s) make the least amount of money on average for
at least 34 hours of work per week?”
SELECT business_title, avg(annual_rt) FROM employee_salary WHERE
std_hours >= 34 GROUP BY business_title)
- SELECT business_title FROM titles WHERE
AVG(annual_rt)
= (SELECT MIN(AVG(annual_rt)
) FROM titles);
Assignmnet 2 queries
Q1. [0.10 points] Write a query that compares the institutions named
“Xavier University” and “University of Cincinnati-Main Campus” in the
following ways:
- Family Income
- Percent of students receiving a pell grant
- Admission rate
- Average SAT scores
- Total revenue from undergraduate student tuition (HINT: Revenue
equals price times quantity) HINT: The output of this query should have
5 columns with only two observations: one for Xavier and one for
UC.
- SELECT faminc, pctpell, adm_rate, sat_avg, costt4_a*ugds FROM
scorecard WHERE instnm IN (“Xavier University”, “University of
Cincinnati-Main Campus”);
Q2. [0.15 points] Show the institution name, city, state postcode,
average cost of attendance and undergraduate enrollment for all the
public institutions where the average cost of attendance is greater than
the average cost of attendance for all institutions and where
undergraduate enrollment is over 5000 students. Order the result
alphabetically by state abbreviation first, city second, and institution
name third.
- SELECT INSTNM, CITY, STABBR, COSTT4_A, UGDS FROM scorecard WHERE
UGDS > 5000 AND CONTROL = 1 AND COSTT4_A > (SELECT AVG(COSTT4_A)
FROM scorecard) ORDER BY STABBR, CITY, INSTNM;
Q3. [0.25 points] Write a query to answer the following to the best
of your ability: “What percent of institutions in this table appear to
be a college or university?”
SELECT (SELECT COUNT(*) AS number of institutions
FROM scorecard WHERE instnm LIKE “%college%” OR instnm LIKE
“%university%”)
/ (SELECT COUNT(*) FROM scorecard) ;
SELECT COUNT(*) FROM scorecard;
Q4. [0.25 points] If the average undergraduate cost of attending an
institution times the number of undergraduates at an instituion is
defined as the instituion’s revenue, write a query that reports the
following:
- The institution name(s) and revenue amount(s) of the institution
with the highest revenue
- The institution name(s) and revenue amount(s) of the institution
with the lowest revenue
- SELECT instnm, costt4_augds AS
revenue
FROM
scorecard WHERE costt4_augds = ( SELECT MIN(costt4_a)ugds FROM
scorecard) OR ( SELECT MAX(costt4_a)ugds FROM scorecard);
Q5. [0.25 points] Write a query to answer the following question in
a single attribute instance: “On average, how many institutions are in a
state?” Round the result to the nearest whole unit and give it a name
that is easy to understand.
- SELECT ROUND((SELECT COUNT(*) FROM scorecard)
- / (SELECT COUNT(DISTINCT STABBR) FROM scorecard), 0) ;
- SELECT COUNT(*) FROM scorecard;
Q6. [0.25 points] Write a query that displays the answer to the
following question in a single attribute-instance: What percent of all
institutions are designated as either “men only” or “women only”? NOTE:
Assume no college can be both in this sample of data.
- SELECT (SELECT COUNT(*) AS
men only institutions
FROM
scorecard WHERE menonly = 1 OR womenonly = 1)
- /
- (SELECT COUNT(*) FROM scorecard) ;
Q8.[0.50 points]Show the following summary statistics for
institutions located in Ohio. Your calculated fields should be given
intuitive names and rounded to the hundredths place
- Total number of undergraduate students
- Average faculty salary (this does NOT need to be a weighted
average)
- Average pell grant percentage (this does NOT need to be a weighted
average)
- Total number of historically black colleges and universities
- Average age of entry (this does NOT need to be a weighted
average)
HINT: This query is asking for only a single instance summarizing the
entire state of Ohio.
- SELECT SUM(ugds), avgfacsal, avg(pctpell), COUNT(hbcu),
AVG(age_entry) FROM scorecard WHERE stabbr = “OH”;
Q10. [0.75 points] Report a list of state abbreviations in
alphabetical order (A-Z) that do NOT have any institutions with the word
‘seminary’ in their name.
- SELECT DISTINCT STABBR, instnm FROM scorecard WHERE stabbr NOT IN (
SELECT DISTINCT STABBR FROM scorecard WHERE instnm LIKE “%Seminary%”)
ORDER BY STABBR;
Assignment 3 Queries
Q1. [0.10 points] Show a list of the following summary statistics
for every state in the data:
- The total number of institutions
- The average family income
- The average cost of attendance
- The total number of undergrads
- Order the result by state abbreviation in alphabetical order and
rename any calculated fields to use intuitive names.
- HINT: These summary statistics do not need to be weighted
calculations. Your result should be a table with 59 rows and 5
columns.
- SELECT count(instnm) AS
total number of institutions
,
avg(faminc) AS avg family income
, avg(costt4_a) AS
avg cost of undergrads
, sum(ugds) AS
total number of undergrads
FROM scorecard ORDER BY stabbr
DESC;
Q2. [0.15 points] Write a query to display the average faculty
salary for each of the following states:
- Michigan, Illinois, Kentucky, Indiana, and Ohio.
- This result should show each state’s name and the corresponding
average faculty salary of the schools in that state.
- Order the result by average faculty salary from highest to
lowest.
- HINT: The output of this query should be at the state level
- SELECT stabbr, avgfacsal FROM scorecard GROUP BY stabbr HAVING
stabbr IN (‘MI’,‘IL’, ‘KY’, ‘IN’, ‘OH’) ORDER BY avgfacsal DESC ;
Q3. [0.25 points] Show a list of institutions with the word
“college” or “university” in their name that have an average family
income greater than that of Xavier University. Also include Xavier
University in this list of output for comparison. Sort this result by
average family income from highest to lowest.
- SELECT instnm, faminc FROM scorecard WHERE faminc >= ( SELECT
faminc FROM scorecard WHERE instnm = “Xavier University” ORDER BY faminc
DESC);
Q5. [0.25 points] For every city in the state of Ohio, write a query
that displays the following:
- The city name,
- The average faculty salary for all institutions in the city
- The average cost of attendance for all institutions in the city
- The average admission rate for all institutions in the city.
- Only show results for cities that do not have any missing (NULL)
data.
- Round any calculations to the second decimal place (two significant
digits).
- Order the results alphabetically by city name.
- Your calculations do NOT need to be weighted averages.
- All reported columns must use names that are easy to
understand.
- HINT: The output of this query should be at the city level.
WITH cities AS (
SELECT city, avgfacsal AS avg_fac_sal
, AVG(costt4_a)
AS avg_cost_per_a
, AVG(adm_rate) AS
avg_adm_rate
FROM scorecard GROUP BY STABBR)
SELECT * FROM cities WHERE avg_adm_rate IS NOT NULL AND
avg_fac_sal IS NOT NULL;
Q6. [0.50 points] A “percentile ranking” is a statistic that shows
the percentage of scores that are equal to or lower than a given score.
For example, Xavier’s average SAT score reported in the college
scorecard is 1190. If we compare this result against the index of values
provided by the College Board (the organization that administers the
SAT,) we see this score represents the 80th percentile: https://research.collegeboard.org/reports/sat-suite/understanding-scores/sat
We would interpret this result as: “80% of students who take the
SAT have an SAT score equal to or lower than the average Xavier
University student’s SAT score.”
Write a query to assess whether the college scorecard data
supports applying this percentile ranking at the institution level by
answering this question:
“Is Xavier University’s average SAT score better than 80% of all
institutions?” Briefly interpret your result and comment on whether your
findings support the 80 percentile claim.
SELECT (SELECT COUNT(sat_avg) FROM scorecard WHERE sat_avg IS NOT
NULL ORDER BY sat_avg DESC)*.2;
WITH sat_80 AS (
SELECT sat_avg FROM scorecard WHERE sat_avg IS NOT NULL ORDER BY
sat_avg DESC LIMIT 264)
SELECT min(sat_avg) FROM sat_80;
Q7. [0.50 points] Write a query that displays the answer to the
following question in a single attribute-instance: How many states have
a total number of undergraduates greater than the total number of
undergraduates in Ohio?
- SELECT count(DISTINCT stabbr) FROM scorecard WHERE ugds > (SELECT
SUM(ugds) FROM scorecard WHERE stabbr = “OH”);
Q8. [0.50 points] Write a query that reports the state (or states)
with the most institutions. For this state (or states), report only the
state abbreviation and the and the total number of undergrads attending
school in the state(s).
WITH states as (
SELECT stabbr, count(instnm), sum(ugds) FROM scorecard GROUP BY
stabbr)
SELECT * FROM states WHERE count(instnm)
= ( SELECT
max(count(instnm)
) FROM states);
Q9. [0.50 points] The average faculty salary column records the
average monthly salary of the faculty working at the institution. This
value is reported in monthly terms because most faculty contracts are
8-10 months rather than 12 months.
The family income column records the average annual
family income of students attending the institution.
Xavier is one of the few institutions in the country
where students attending the university come from families that are more
wealthy than the professors teaching them. Write a query to identify how
common this phenomenon is in the college scorecard data and interpret
your result.
WITH institutions AS (
SELECT instnm AS institution_name
, avgfacsal AS
avg_fac_sal
, faminc AS fam_inc
FROM
scorecard)
SELECT * FROM institutions WHERE fam_inc >= (SELECT
avg_fac_sal FROM institutions WHERE institution_name = “Xavier
University”);