Thursday 18 November 2010

a neat(?) postgresql solution using row_number() and self joining

For the scheduling aspect of openmolar, a common task is to find where "freeslots" occur. In the original openmolar, I performed this logic using python code. For openmolar2 the database itself will perform this search.
Here's how.

let's make a trivial diary table
create table diary (
 ix serial primary key, 
 startTime time, 
 endTime time, 
 activity varchar(80));
leading to this table of data.
select * from diary;
ixstarttimeendtimeactivity
109:00:0009:30:00breakfast
209:30:0010:00:00jogging
313:00:0014:00:00lunch
410:00:0011:30:00meeting with joe
514:00:0018:00:00golfing at Royal Dornoch

so the problem is this... how can I query that table to discover my free time on that day?
(there is 90 minutes before lunch)
NOTE - the activities are NOT in order!

the answer (and I found this in the excellent "sql cookbook") involves getting a view of that table in the correct order, and then performing a join on adjacent rows.

so first, get the ordered view.
=> select * from diary order by starttime;
ix starttime endtime activity
1 09:00:00 09:30:00 breakfast
2 09:30:00 10:00:00 jogging
4 13:00:00 14:00:00 lunch
3 10:00:00 11:30:00 meeting with joe
5 14:00:00 18:00:00 golfing at Royal Dornoch


But this doesn't help... because there is no way to definatively refer to the "next" activity, to do this we need to replace ix with a generated rownumber. We will use the row_number() window function of postgres 8.4.

=> select * from (select row_number() over (order by starttime) as rownumber,
starttime, endtime, activity from diary) as ordered_diary;

rownumber starttime endtime activity
1 09:00:00 09:30:00 breakfast
2 09:30:00 10:00:00 jogging
3 10:00:00 11:30:00 meeting with joe
4 13:00:00 14:00:00 lunch
5 14:00:00 18:00:00 golfing at Royal Dornoch

no we can self join that result set on itself..
(adding columns from the next row to the current row)

=> select * from
(select row_number() over (order by starttime) as row_number, * from diary) as diary1,
(select row_number() over (order by starttime) as row_number, * from diary) as diary2
where diary2.row_number = diary1.row_number+1

row_number ix starttime endtime activity row_number ix starttime endtime activity
1 1 09:00:00 09:30:00 breakfast 2 2 09:30:00 10:00:00 jogging
2 2 09:30:00 10:00:00 jogging 3 4 10:00:00 11:30:00 meeting with joe
3 4 10:00:00 11:30:00 meeting with joe 4 3 13:00:00 14:00:00 lunch
4 3 13:00:00 14:00:00 lunch 5 5 14:00:00 18:00:00 golfing at Royal Dornoch


or more succinctly...
=> select diary1.activity as first_activity, diary1.endtime as finishes,
diary2.activity as next_activity, diary2.starttime as starts from
(select row_number() over (order by starttime) as row_number, * from diary) as diary1,
(select row_number() over (order by starttime) as row_number, * from diary) as diary2
where diary2.row_number = diary1.row_number+1

first_activity finishes next_activity starts
breakfast 09:30:00 jogging 09:30:00
jogging 10:00:00 meeting with joe 10:00:00
meeting with joe 11:30:00 lunch 13:00:00
lunch 14:00:00 golfing at Royal Dornoch 14:00:00

so we simply now have to add a check to see if those times differ

so here's the final query!

=> select diary1.endtime as freetime_start, diary2.starttime as freetime_end from
(select row_number() over (order by starttime) as row_number, endtime from diary) as diary1,
(select row_number() over (order by starttime) as row_number, starttime from diary) as diary2
where diary2.row_number = diary1.row_number+1
and diary1.endtime < diary2.starttime




freetime_start freetime_end
11:30:00 13:00:00

No comments: