MBA 673 Practice 4

Author

Dr. Mitchell Church

Working with Dates and Times

The MIMIC-III Database has some very interesting time-sensitive data. For that reason, it is worth learning to use dates.

Math with dates

Julianday turns a date into the number of days since noon on Monday, January 1, 4713 BC. The explanation why is sort of complicated…but being able to do it sure makes things easier! This query calculates the 5 shortest dates stay lengths in the database. Why was their stay so short? Complete the practice to find out!

SELECT (julianday(dischtime) - julianday(admittime)) as stay_length
FROM admissions
ORDER BY stay_length
LIMIT 5
5 records
stay_length
0.0381944
0.1444444
0.1458333
0.5527778
0.6368056

Dates and strftime()

You can also parse out pieces of the date with the strftime() function. strfttime() has little codes (eg. %d) that it uses to identify parts of a “well-formed” date. One nice thing about dates in most databases is that they are usually quite “well-formed”, meaning they have a YYYY-MM-DD format with a four digit year, 2 digit month and 2 digit day. This makes them easy to work with once you declare them to be a date. This query takes admittime and tells SQL that it is a date variable, then breaks it up into its component time units.

SELECT date(admittime) as admit, 
strftime("%d",admittime) as day,
strftime("%m",admittime) as month,
strftime("%Y",admittime) as year,
strftime("%H",admittime) as hour
FROM admissions
LIMIT 5
5 records
admit day month year hour
2164-10-23 23 10 2164 21
2126-08-14 14 08 2126 22
2125-10-04 04 10 2125 23
2149-05-26 26 05 2149 17
2163-05-14 14 05 2163 20

Parsing out pieces of dates, when combined with grouping, allows you to Look at time trends. Here I’m looking at how long married vs single people stay in the hospital. Recognize though that you are likely going to need CAST here again to change the text to a number.

SELECT CAST(strftime("%m",admittime) as integer) as month, marital_status,
avg((julianday(dischtime) - julianday(admittime))) as stay_length
FROM admissions
WHERE marital_status IN ("MARRIED", "SINGLE")
GROUP BY month, marital_status
Displaying records 1 - 10
month marital_status stay_length
1 MARRIED 12.149306
1 SINGLE 2.591319
2 MARRIED 4.577778
2 SINGLE 6.758333
3 MARRIED 6.210880
4 MARRIED 5.988194
5 MARRIED 5.657361
5 SINGLE 5.319583
6 MARRIED 21.888889
6 SINGLE 5.703472

A new geom - geom_line()

Time trends look great on a geom_line. Here is a line showing average stay length by month for married vs single patients. Notice how I use a “color” aesthetic to get different lines for each marital_status.

Warning

Make sure you understand and can accurately interpret the variables used on a geom_line(). Since this data isn’t filtered to any specific year, this is tracking the stay_length across months for ALL years. So like, when they enter the hospital in october in general, how long do they stay…for example.

ggplot(data = myquery,
       aes(x = month, y = stay_length, color = marital_status))+
  geom_line()