SELECT (julianday(dischtime) - julianday(admittime)) as stay_length
FROM admissions
ORDER BY stay_length
LIMIT 5| stay_length |
|---|
| 0.0381944 |
| 0.1444444 |
| 0.1458333 |
| 0.5527778 |
| 0.6368056 |
The MIMIC-III Database has some very interesting time-sensitive data. For that reason, it is worth learning to use dates.
Julianday turns a date into the number of days since noon on Monday, January 1, 4713 BC. The explanation why is sort of complicated…but being able to do it sure makes things easier! This query calculates the 5 shortest dates stay lengths in the database. Why was their stay so short? Complete the practice to find out!
SELECT (julianday(dischtime) - julianday(admittime)) as stay_length
FROM admissions
ORDER BY stay_length
LIMIT 5| stay_length |
|---|
| 0.0381944 |
| 0.1444444 |
| 0.1458333 |
| 0.5527778 |
| 0.6368056 |
You can also parse out pieces of the date with the strftime() function. strfttime() has little codes (eg. %d) that it uses to identify parts of a “well-formed” date. One nice thing about dates in most databases is that they are usually quite “well-formed”, meaning they have a YYYY-MM-DD format with a four digit year, 2 digit month and 2 digit day. This makes them easy to work with once you declare them to be a date. This query takes admittime and tells SQL that it is a date variable, then breaks it up into its component time units.
SELECT date(admittime) as admit,
strftime("%d",admittime) as day,
strftime("%m",admittime) as month,
strftime("%Y",admittime) as year,
strftime("%H",admittime) as hour
FROM admissions
LIMIT 5| admit | day | month | year | hour |
|---|---|---|---|---|
| 2164-10-23 | 23 | 10 | 2164 | 21 |
| 2126-08-14 | 14 | 08 | 2126 | 22 |
| 2125-10-04 | 04 | 10 | 2125 | 23 |
| 2149-05-26 | 26 | 05 | 2149 | 17 |
| 2163-05-14 | 14 | 05 | 2163 | 20 |
Parsing out pieces of dates, when combined with grouping, allows you to Look at time trends. Here I’m looking at how long married vs single people stay in the hospital. Recognize though that you are likely going to need CAST here again to change the text to a number.
SELECT CAST(strftime("%m",admittime) as integer) as month, marital_status,
avg((julianday(dischtime) - julianday(admittime))) as stay_length
FROM admissions
WHERE marital_status IN ("MARRIED", "SINGLE")
GROUP BY month, marital_status| month | marital_status | stay_length |
|---|---|---|
| 1 | MARRIED | 12.149306 |
| 1 | SINGLE | 2.591319 |
| 2 | MARRIED | 4.577778 |
| 2 | SINGLE | 6.758333 |
| 3 | MARRIED | 6.210880 |
| 4 | MARRIED | 5.988194 |
| 5 | MARRIED | 5.657361 |
| 5 | SINGLE | 5.319583 |
| 6 | MARRIED | 21.888889 |
| 6 | SINGLE | 5.703472 |
Time trends look great on a geom_line. Here is a line showing average stay length by month for married vs single patients. Notice how I use a “color” aesthetic to get different lines for each marital_status.
Make sure you understand and can accurately interpret the variables used on a geom_line(). Since this data isn’t filtered to any specific year, this is tracking the stay_length across months for ALL years. So like, when they enter the hospital in october in general, how long do they stay…for example.
ggplot(data = myquery,
aes(x = month, y = stay_length, color = marital_status))+
geom_line()