Tuesday, December 15, 2009

Fun with Recursive SQL (Part 3)


This blog post, Part 1, and Part 2 are derived from an article that I wrote as an article for Teradata Magazine and for Teradata Partners conference about fun uses for recursive SQL.

Finally, to the really interesting example of how to use recursive SQL productively.  In this example, we're dealing with segments of time that are defined with a start and end time.  Imagine staggered work shifts or a work schedule that specifies when a worker is scheduled to take breaks.

Imagine a series of time segments, like those shown in the picture below.  Each segment has a start and end time as well as a priority level.




The desired output from this data is a new, single, continuous time line that denotes only the highest priority segment for any particular clock time.



Here's a simple summary of the recursive algorithm that we're going to apply to solve this challenge:
  • For each segment at a given priority level
  • Join to the first segment that appears from a higher priority level that overlaps in any way
  • Calculate the results of the overlap.  This may result in:
    • A head segment
    • A tail segment
    • Or two new segments

We'll start off with a table that has the following definition and sample data:
create table tl (
  seg_num integer not null,
  p integer not null,
  st_tm integer not null,
  end_tm integer not null
);

insert into tl values (1,1,100,400);
insert into tl values (2,1,445,600);
insert into tl values (3,3,130,200);
insert into tl values (4,3,315,415);
insert into tl values (5,2,345,500);


You can find the first layer of overlaps that will cause splicing of any given segment using a self-join that matches every time segment with any time segments that:
  • have a higher priority; and
  • either:
    • have a start time between the current segment's start/end time; or
    • have an end time between the current segment's start/end times.
You can see the three conditions in the WHERE clause below:

select a.seg_num, 
       row_number() over(order by b.st_tm) rnk,
       b.seg_num, b.p, b.st_tm, b.end_tm
from   tl a,
       tl b
where  a.p < b.p
  and  ((a.st_tm  < b.st_tm  and a.end_tm > b.st_tm ) or
        (a.st_tm  > b.st_tm  and a.st_tm  < b.end_tm))

The trick is that we need to do the same kind of operation on the output of this first statement to determine if there are other, higher priority segments that need to splice these further. So, we recursively call the same kind of SQL on the output. In the first attempt at the recursive version, we also add some CASE logic to output exactly what the newly created slices are.

with recursive tl_rec (
  seq_num, seg_num, p, st_tm, end_tm, b_seg_num
) as (
  select 1, seg_num, p, st_tm, end_tm, -1
  from   tl 
  union all
  select a.seq_num + 1, a.seg_num, a.p,
         case when a.st_tm  < b.st_tm and 
                   a.end_tm < b.end_tm then a.st_tm  
              when a.st_tm  < b.st_tm and 
                   a.end_tm > b.end_tm then a.st_tm
              when a.st_tm  > b.st_tm and 
                   a.end_tm > b.end_tm then b.end_tm end,
         case when a.st_tm  < b.st_tm and 
                   a.end_tm < b.end_tm then b.st_tm  
              when a.st_tm  < b.st_tm and 
                   a.end_tm > b.end_tm then b.st_tm 
              when a.st_tm  > b.st_tm and 
                   a.end_tm > b.end_tm then a.end_tm end,
         b.seg_num
  from   tl_rec a,
         tl b
  where  a.p < b.p
    and  a.b_seg_num <> b.seg_num
    and  ((a.st_tm  < b.st_tm  and a.end_tm > b.st_tm ) or
          (a.st_tm  > b.st_tm  and a.st_tm  < b.end_tm))
)
select * from (
  select 
    row_number() over(partition by seg_num 
                      order by seq_num desc) rnk,
    seg_num, p, st_tm, end_tm
  from tl_rec ) z
where rnk = 1
;

And you get the following output records:



But wait!  There's something missing.  There should be a left-over LOW segment there between the two HIGH's.  We lost something along the way... Hmm...  The output of the join we've defined can only be one record.  So, when a higher priority segment is contained within the start/end of a lower priority segment, we're only returning the leading piece of the LOW segment.  There's no way for that join to output two records with different logic, though.  Luckily, with recursive SQL, we can have more than one recursive statement.  So, we just need to add another UNION ALL to output the trailing LOW segment.

with recursive tl_rec (
  seq_num, seg_num, seg_split_num, p, 
  st_tm, end_tm, b_seg_num
) as (

  ... repeat from previous SQL ...
  
  union all
  select a.seq_num + 1, a.seg_num, a.seg_split_num+1, 
         a.p, b.end_tm, a.end_tm, b.seg_num
  from   tl_rec a,
         tl b
  where  a.p < b.p
    and  a.b_seg_num <> b.seg_num
    and  (a.st_tm  < b.st_tm  and a.end_tm > b.end_tm)
)
select * from
(
  select 
    row_number() over(partition by seg_num, seg_split_num 
                      order by seq_num desc) rnk,
    seg_num, seg_split_num p, st_tm, end_tm
  from tl_rec
) z
where rnk = 1
;

Phew! With that combined SQL, now we finally get the expected output!




Personally, I recommend studying through that a few times so that you've really internalized the strategy. It certainly took me a while to actually get it to work correctly. Then look around at your existing projects or challenges you've faced in the past and see where else you can apply this same kind of recursive technique.

Good luck!

2 comments: