sql - Create index to speedup join on (1) ids match and (2) dates within a range -

what index should create table b optimize join on (1) ids matching , (2) date falls within range.

update: issue have hundreds of thousands of dates 25 portfolios... condition (2) work in cutting down size of resulting join...

more specifically, imagine have table a:

portfolio id    |   begin_date   |   end_date   ---------------------------------------------       1         |    20150101    |   20150130 etc...

and table b:

portfolio_id    |     date       |    daily_ret -----------------------------------------------       1              20150102            .00001       1              20150103           -.01023 etc...

the query not totally suck is:

select * tablea left join tableb b on (a.portfolio_id = b.portfolio_id) ,                       (a.begin_date < b.date) ,                       (b.date <= a.end_date)

i think want kinda index like: create index tableb_idx on tableb (portfolio_id, date)

but i'm total n00b sql indexes... imagine there's issue (a.begin_date < b.date) , (b.date <= a.end_date) part?

update more specific info:

   column   |       type       | modifiers  ------------+------------------+-----------  yyyymmdd   | integer          | not null  size_index | integer          | not null  beme_index | integer          | not null  ret        | double precision |  indexes:     "portfolio_25_size_beme_pkey" primary key, btree (yyyymmdd, size_index, beme_index)

exact query:

explain analyze select exp(sum(log(1 + f_sbm.ret))) - 1 size_beme_ret, count(*) t,         permno, period_start, period_end eb2.deleteme x join fama_french.portfolio_25_size_beme f_sbm on f_sbm.size_index = x.me_idx5 , f_sbm.beme_index = x.beme_idx5 , x.ps < f_sbm.yyyymmdd , f_sbm.yyyymmdd <= x.pe group permno, period_start, period_end

results (note limited eb2.deleteme 1000 rows because 180000 wouldn't finish...)

hashaggregate  (cost=83599.59..83602.59 rows=200 width=24) (actual time=11209.030..11209.455 rows=949 loops=1)   group key: x.permno, x.period_start, x.period_end   ->  merge join  (cost=78736.24..83566.19 rows=1909 width=24) (actual time=1183.677..11184.172 rows=19492 loops=1)         merge cond: ((f_sbm.size_index = x.me_idx5) , (f_sbm.beme_index = x.beme_idx5))         join filter: ((x.ps < f_sbm.yyyymmdd) , (f_sbm.yyyymmdd <= x.pe))         rows removed join filter: 22659494         ->  sort  (cost=78655.60..80136.72 rows=592450 width=20) (actual time=1182.834..1647.155 rows=545055 loops=1)               sort key: f_sbm.size_index, f_sbm.beme_index               sort method: external merge  disk: 17328kb               ->  seq scan on portfolio_25_size_beme f_sbm  (cost=0.00..9698.50 rows=592450 width=20) (actual time=0.014..155.770 rows=592450 loops=1)         ->  sort  (cost=80.64..83.54 rows=1160 width=32) (actual time=0.797..1618.708 rows=22678987 loops=1)               sort key: x.me_idx5, x.beme_idx5               sort method: quicksort  memory: 102kb               ->  seq scan on deleteme x  (cost=0.00..21.60 rows=1160 width=32) (actual time=0.020..0.357 rows=1000 loops=1) planning time: 0.574 ms execution time: 11228.745 ms

maybe hack, forcing join on date first pruning portfolio mismatches later runs 11x faster...

with stage1 as( select * eb2.deleteme x join fama_french.portfolio_25_size_beme f_sbm on x.ps < f_sbm.yyyymmdd , f_sbm.yyyymmdd <= x.pe ) select exp(sum(log(1+ret))) - 1 size_beme_ret, count(*) t,         permno, period_start, period_end stage1 f_sbm size_index = me_idx5 , beme_index = beme_idx5 group permno, period_start, period_end

always create indexes intended being compared through equality first, , through greater/lower (or between) last (and final one, since including further columns not assist query of nature).

adding following index speed-up query considerably:

create index on fama_french.portfolio_25_size_beme (size_index, beme_index, yyyymmdd);

Search This Blog

Business

sql - Create index to speedup join on (1) ids match and (2) dates within a range -

Comments

Post a Comment

Popular posts from this blog

scala - 'wrong top statement declaration' when using slick in IntelliJ -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

PySide and Qt Properties: Connecting signals from Python to QML -