Module 1

Individual / Group Practice 3–Simple Queries SELECT, WHERE, ORDER BY, LIMIT

Part 1 (nfldb)

1. List the team names and cities for teams in cities with more than a million people.
  • SELECT teamname, city FROM teams WHERE population > 1000000;
2. List the city name and population of the most populous city from Ohio or California. In the case of a tie in population, show only the first alphabetically listed city.
  • SELECT city FROM teams WHERE state IN (‘Ohio’, ‘California’) ORDER BY population DESC, city LIMIT 1;
3. List all team information for teams in Florida. Order the results by land size from greatest to smallest.
  • SELECT * FROM teams WHERE state = ‘Florida’ ORDER BY landsize DESC;

Part 2 (cincinnati_salary)

1. Show all employee information for those employees meeting all the following criteria:
  1. Have a salary over $100,000.
  2. Have a business title including the word ‘manager’
  3. In a department with an abbreviation of ‘Maint’ or ‘City Mgr’.
  4. Order the results by annual salary from largest to smallest.
  5. For employees having the same annual salary, order by name in alphabetical order.
  • SELECT * FROM employee_salary WHERE annual_rt > 100000 AND business_title LIKE ‘%manager%’ AND deptname_abbrv IN (‘Maint’, ‘City Mgr’) ORDER BY annual_rt DESC, name;
2. Employees with a missing salary value are currently in the on-boarding process. Write a query to identify the name, business title, and department name of all employees currently in the on-boarding process.
  • SELECT name, business_title, deptname FROM employee_salary WHERE annual_rt IS NULL;
3. Write a query to find the name and salary of the 5 lowest paid dentists in the city of Cincinnati.
  • SELECT name, annual_rt FROM employee_salary WHERE jobtitle LIKE (‘%dentist%’) AND annual_rt IS NOT NULL ORDER BY annual_rt ASC LIMIT 5;

Module 2

5–Practice Queries

Part 1: nfldb

1. List the cities that include one of the words from the following list: “San”, “New”, “City” (hint: this is not asking for teams)
  • SELECT DISTINCT city FROM teams WHERE city LIKE ‘%San%’ OR city LIKE ‘%New%’ OR city LIKE ‘%City%’;
2. List only the city name of the largest city in terms of population from Ohio or California. In the case of a tie, show all cities in order of landsize.
  • SELECT city FROM teams WHERE state IN (‘Ohio’, ‘California’) AND population = (SELECT max(population) FROM teams WHERE state IN (‘Ohio’, ‘California’) ORDER BY landsize DESC);
3. Report all cities, teams and population densities (population per unit of landsize) for teams with a population greater than the average population of all teams. Rank the results by population density from largest to smallest.
  • SELECT city, teamname, population/landsize AS population density FROM teams WHERE population > (SELECT avg(population) FROM teams ORDER BY population/landsize DESC);

Part 2: cincinnati_salaries

1. The acronym “CPD” stands for “Cincinnati Police Department”. Show only the name, job title, age range, annual salary and effective hourly wage for all employees who either work in a department that includes the acronym “CPD” or have a job title that includes the word “police”. Order the result by hourly wage from largest to smallest.
  • SELECT name, jobtitle, age_range, annual_rt, std_hours FROM employee_salary WHERE deptname_abbrv LIKE ‘%CPD%’ OR jobtitle LIKE ‘%police%’ ORDER BY std_hours DESC;
2. How many city employees make more money than the most highly compensated employee in the police department (‘CPD-Department of Police’)? Order the result by salary from highest to lowest.
  • SELECT count(name) AS employees FROM employee_salary WHERE annual_rt > (SELECT max(annual_rt) FROM employee_salary WHERE deptname IN (‘CPD-Department of Police’) );
3. What percent of employees are NOT in the department with the name “CPD-Police”police officers?
  • SELECT (SELECT count() FROM employee_salary WHERE deptname <> ‘CPD-Department of Police’) / (SELECT count() FROM employee_salary) AS % of employees who are not in the Police Department ;

denom

  • SELECT count(*) FROM employee_salary;

Num

  • SELECT count(*) FROM employee_salary WHERE deptname <> ‘CPD-Department of Police’;
4. Write a query to answer the following question: “What percent of departments have at least one employee earning a minimum of $100,000?”
  • SELECT (SELECT count(DISTINCT deptname) FROM employee_salary WHERE annual_rt >= 100000) / (SELECT count(DISTINCT deptname) FROM employee_salary ORDER BY deptname) ;

denom

  • SELECT count(DISTINCT deptname) FROM employee_salary ORDER BY deptname_abbrv;

num

  • SELECT count(DISTINCT deptname) FROM employee_salary WHERE annual_rt > 100000;
5. Assume any employee with the word ‘analyst’ in their job title is considered an analyst. Show the following summary statistics for only the analysts. Your calculated fields should be given intuitive names and rounded to the hundredths place
  1. Total number of analysts.
  2. Average annual salary of analysts.
  3. Maximum and minimum number of hours worked by analysts.
  4. Number of departments with an analyst employee.
  • SELECT COUNT(*), ROUND(avg(annual_rt), 2), ROUND(max(std_hours), 2), ROUND(min(std_hours), 2), ROUND(count(DISTINCT deptname), 2) FROM employee_salary WHERE jobtitle LIKE ‘%analyst%’;

Module 3

6–Practice Queries (nfldb + cincinnati_salaries)

1. Show a list of states and the number of Cities in the state with an NFL team
  • SELECT state, count(city) FROM teams GROUP BY state;
2. Which state(s) (has/have) the most NFL teams?
  • SELECT state, count() AS number of teams FROM teams GROUP BY state HAVING number of teams = (SELECT MAX(COUNT(*)) FROM ( SELECT count() FROM teams GROUP BY state) AS t1 );
3. Write a query that can be used to assess the following question: “Do women employees make nominally less money than men on average?”
  • SELECT gender, avg(annual_rt) FROM employee_salary GROUP BY gender;
4. Write a query that can be used to assess the following question: “What percent of employees are either racial minorities or female?”
  • SELECT (SELECT count() FROM employee_salary WHERE gender = ‘F’ OR race <> ‘WHITE’) / (SELECT count() FROM employee_salary) ;

denom

  • SELECT count(*) FROM employee_salary;

num

  • SELECT count(*) FROM employee_salary WHERE gender = ‘F’ OR race <> ‘WHITE’;
5. Write a query that can be used to assess the following question: “Which business title(s) make the least amount of money on average for at least 34 hours of work per week?”
  • with titles as (

SELECT business_title, avg(annual_rt) FROM employee_salary WHERE std_hours >= 34 GROUP BY business_title)

  • SELECT business_title FROM titles WHERE AVG(annual_rt) = (SELECT MIN(AVG(annual_rt)) FROM titles);

Assignmnet 2 queries

Q1. [0.10 points] Write a query that compares the institutions named “Xavier University” and “University of Cincinnati-Main Campus” in the following ways:
  1. Family Income
  2. Percent of students receiving a pell grant
  3. Admission rate
  4. Average SAT scores
  5. Total revenue from undergraduate student tuition (HINT: Revenue equals price times quantity) HINT: The output of this query should have 5 columns with only two observations: one for Xavier and one for UC.
  • SELECT faminc, pctpell, adm_rate, sat_avg, costt4_a*ugds FROM scorecard WHERE instnm IN (“Xavier University”, “University of Cincinnati-Main Campus”);
Q2. [0.15 points] Show the institution name, city, state postcode, average cost of attendance and undergraduate enrollment for all the public institutions where the average cost of attendance is greater than the average cost of attendance for all institutions and where undergraduate enrollment is over 5000 students. Order the result alphabetically by state abbreviation first, city second, and institution name third.
  • SELECT INSTNM, CITY, STABBR, COSTT4_A, UGDS FROM scorecard WHERE UGDS > 5000 AND CONTROL = 1 AND COSTT4_A > (SELECT AVG(COSTT4_A) FROM scorecard) ORDER BY STABBR, CITY, INSTNM;
Q3. [0.25 points] Write a query to answer the following to the best of your ability: “What percent of institutions in this table appear to be a college or university?”
  • SELECT (SELECT COUNT(*) AS number of institutions FROM scorecard WHERE instnm LIKE “%college%” OR instnm LIKE “%university%”)

  • / (SELECT COUNT(*) FROM scorecard) ;

  • SELECT COUNT(*) FROM scorecard;

Q4. [0.25 points] If the average undergraduate cost of attending an institution times the number of undergraduates at an instituion is defined as the instituion’s revenue, write a query that reports the following:
  1. The institution name(s) and revenue amount(s) of the institution with the highest revenue
  2. The institution name(s) and revenue amount(s) of the institution with the lowest revenue
  • SELECT instnm, costt4_augds AS revenue FROM scorecard WHERE costt4_augds = ( SELECT MIN(costt4_a)ugds FROM scorecard) OR ( SELECT MAX(costt4_a)ugds FROM scorecard);
Q5. [0.25 points] Write a query to answer the following question in a single attribute instance: “On average, how many institutions are in a state?” Round the result to the nearest whole unit and give it a name that is easy to understand.
  • SELECT ROUND((SELECT COUNT(*) FROM scorecard)
  • / (SELECT COUNT(DISTINCT STABBR) FROM scorecard), 0) ;
  • SELECT COUNT(*) FROM scorecard;
Q6. [0.25 points] Write a query that displays the answer to the following question in a single attribute-instance: What percent of all institutions are designated as either “men only” or “women only”? NOTE: Assume no college can be both in this sample of data.
  • SELECT (SELECT COUNT(*) AS men only institutions FROM scorecard WHERE menonly = 1 OR womenonly = 1)
  • /
  • (SELECT COUNT(*) FROM scorecard) ;
Q7. [0.50 points]Show the institution name, state, average faculty salary and average family income for the institutions with highest or second highest median ACT score. HINT: If multiple schools have the same highest or second highest median ACT score, your query should report more than 2 rows of output.
  • SELECT instnm, stabbr AS state, avgfacsal, faminc, actmmid FROM scorecard WHERE actmmid >= ( SELECT MAX(actmmid) FROM scorecard WHERE actmmid < ( SELECT MAX(actmmid) FROM scorecard)) ORDER BY faminc DESC;
Q8.[0.50 points]Show the following summary statistics for institutions located in Ohio. Your calculated fields should be given intuitive names and rounded to the hundredths place
  1. Total number of undergraduate students
  2. Average faculty salary (this does NOT need to be a weighted average)
  3. Average pell grant percentage (this does NOT need to be a weighted average)
  4. Total number of historically black colleges and universities
  5. Average age of entry (this does NOT need to be a weighted average)

HINT: This query is asking for only a single instance summarizing the entire state of Ohio.

  • SELECT SUM(ugds), avgfacsal, avg(pctpell), COUNT(hbcu), AVG(age_entry) FROM scorecard WHERE stabbr = “OH”;
Q9. [0.50 points] Display a list of institutions (with all the institution’s information) that meet one or more of the following criteria:
  1. The instution has the highest average faculty salary of all institutions
  2. The instution has the highest average SAT score of all instutions
  3. The instution has the highest average undergraduate cost of all institutions
  4. Order the results by institution name in alphabetical order.
  • SELECT instnm FROM scorecard WHERE avgfacsal = ( SELECT MAX(avgfacsal) FROM scorecard) OR sat_avg = ( SELECT MAX(sat_avg) FROM scorecard) OR costt4_a = ( SELECT MAX(costt4_a) FROM scorecard) ;
Q10. [0.75 points] Report a list of state abbreviations in alphabetical order (A-Z) that do NOT have any institutions with the word ‘seminary’ in their name.
  • SELECT DISTINCT STABBR, instnm FROM scorecard WHERE stabbr NOT IN ( SELECT DISTINCT STABBR FROM scorecard WHERE instnm LIKE “%Seminary%”) ORDER BY STABBR;

Assignment 3 Queries

Q1. [0.10 points] Show a list of the following summary statistics for every state in the data:
  1. The total number of institutions
  2. The average family income
  3. The average cost of attendance
  4. The total number of undergrads
  5. Order the result by state abbreviation in alphabetical order and rename any calculated fields to use intuitive names.
  6. HINT: These summary statistics do not need to be weighted calculations. Your result should be a table with 59 rows and 5 columns.
  • SELECT count(instnm) AS total number of institutions, avg(faminc) AS avg family income, avg(costt4_a) AS avg cost of undergrads, sum(ugds) AS total number of undergrads FROM scorecard ORDER BY stabbr DESC;
Q2. [0.15 points] Write a query to display the average faculty salary for each of the following states:
  1. Michigan, Illinois, Kentucky, Indiana, and Ohio.
  2. This result should show each state’s name and the corresponding average faculty salary of the schools in that state.
  3. Order the result by average faculty salary from highest to lowest.
  4. HINT: The output of this query should be at the state level
  • SELECT stabbr, avgfacsal FROM scorecard GROUP BY stabbr HAVING stabbr IN (‘MI’,‘IL’, ‘KY’, ‘IN’, ‘OH’) ORDER BY avgfacsal DESC ;
Q3. [0.25 points] Show a list of institutions with the word “college” or “university” in their name that have an average family income greater than that of Xavier University. Also include Xavier University in this list of output for comparison. Sort this result by average family income from highest to lowest.
  • SELECT instnm, faminc FROM scorecard WHERE faminc >= ( SELECT faminc FROM scorecard WHERE instnm = “Xavier University” ORDER BY faminc DESC);
Q5. [0.25 points] For every city in the state of Ohio, write a query that displays the following:
  1. The city name,
  2. The average faculty salary for all institutions in the city
  3. The average cost of attendance for all institutions in the city
  4. The average admission rate for all institutions in the city.
  5. Only show results for cities that do not have any missing (NULL) data.
  6. Round any calculations to the second decimal place (two significant digits).
  7. Order the results alphabetically by city name.
  8. Your calculations do NOT need to be weighted averages.
  9. All reported columns must use names that are easy to understand.
  10. HINT: The output of this query should be at the city level.
  • WITH cities AS (

  • SELECT city, avgfacsal AS avg_fac_sal, AVG(costt4_a) AS avg_cost_per_a, AVG(adm_rate) AS avg_adm_rate FROM scorecard GROUP BY STABBR)

  • SELECT * FROM cities WHERE avg_adm_rate IS NOT NULL AND avg_fac_sal IS NOT NULL;

Q6. [0.50 points] A “percentile ranking” is a statistic that shows the percentage of scores that are equal to or lower than a given score. For example, Xavier’s average SAT score reported in the college scorecard is 1190. If we compare this result against the index of values provided by the College Board (the organization that administers the SAT,) we see this score represents the 80th percentile: https://research.collegeboard.org/reports/sat-suite/understanding-scores/sat
  1. We would interpret this result as: “80% of students who take the SAT have an SAT score equal to or lower than the average Xavier University student’s SAT score.”

  2. Write a query to assess whether the college scorecard data supports applying this percentile ranking at the institution level by answering this question:

  3. “Is Xavier University’s average SAT score better than 80% of all institutions?” Briefly interpret your result and comment on whether your findings support the 80 percentile claim.

  • SELECT (SELECT COUNT(sat_avg) FROM scorecard WHERE sat_avg IS NOT NULL ORDER BY sat_avg DESC)*.2;

  • WITH sat_80 AS (

  • SELECT sat_avg FROM scorecard WHERE sat_avg IS NOT NULL ORDER BY sat_avg DESC LIMIT 264)

  • SELECT min(sat_avg) FROM sat_80;

Q7. [0.50 points] Write a query that displays the answer to the following question in a single attribute-instance: How many states have a total number of undergraduates greater than the total number of undergraduates in Ohio?
  • SELECT count(DISTINCT stabbr) FROM scorecard WHERE ugds > (SELECT SUM(ugds) FROM scorecard WHERE stabbr = “OH”);
Q8. [0.50 points] Write a query that reports the state (or states) with the most institutions. For this state (or states), report only the state abbreviation and the and the total number of undergrads attending school in the state(s).
  • WITH states as (

  • SELECT stabbr, count(instnm), sum(ugds) FROM scorecard GROUP BY stabbr)

  • SELECT * FROM states WHERE count(instnm) = ( SELECT max(count(instnm)) FROM states);

Q9. [0.50 points] The average faculty salary column records the average monthly salary of the faculty working at the institution. This value is reported in monthly terms because most faculty contracts are 8-10 months rather than 12 months.

The family income column records the average annual family income of students attending the institution.

Xavier is one of the few institutions in the country where students attending the university come from families that are more wealthy than the professors teaching them. Write a query to identify how common this phenomenon is in the college scorecard data and interpret your result.

  • WITH institutions AS (

  • SELECT instnm AS institution_name, avgfacsal AS avg_fac_sal, faminc AS fam_inc FROM scorecard)

  • SELECT * FROM institutions WHERE fam_inc >= (SELECT avg_fac_sal FROM institutions WHERE institution_name = “Xavier University”);