performance - top-N query doing too much work in spite of STOPKEY optimization -


this going long, here's quick summary draw in: top-n query count stopkey , order stopkey in plan still slow no reason.

now, details. starts slow function. in real life involves string manipulations regexps. demonstration purposes, here's intentionally stupid recursive fibonacci algorithm. find pretty fast inputs 25, slow around 30, , ridiculous @ 35.

-- repeat: please no advice on how fibonacci correctly. -- slow on purpose! create or replace function tmp_fib (   n integer )   return integer begin   if n = 0 or n = 1     return 1;   end if;   return tmp_fib(n-2) + tmp_fib(n-1); end; / 

now input: list of names , numbers.

create table tmp_table (   name varchar2(20) unique not null,   num number(2,0) ); insert tmp_table (name,num)   select 'alpha',    10 dual union   select 'bravo',    11 dual union   select 'charlie',  33 dual; 

here's example of slow query: use slow fibonacci function select rows num generates fibonacci number doubled digit.

select p.name, p.num tmp_table p regexp_like(tmp_fib(p.num), '(.)\1') order p.name; 

this true 11 , 33, bravo , charlie in output. takes 5 seconds run, of slow calculation of tmp_fib(33). want faster version of slow query converting top-n query. n=1, looks this:

select * (   select p.name, p.num   tmp_table p   regexp_like(tmp_fib(p.num), '(.)\1')   order p.name ) rownum <= 1; 

and returns top result, bravo. still takes 5 seconds run! explanation it's still calculating tmp_fib(33), though result of calculation irrelevant result. should have been able decide bravo going output, there's no need test condition rest of table.

i've thought maybe optimizer needs told tmp_fib expensive. tried tell that, this:

associate statistics functions tmp_fib default cost (999999999,0,0); 

that alters of cost numbers in plan, doesn't make query run faster.

output of select * v$version in case version-dependent:

oracle database 11g enterprise edition release 11.2.0.2.0 - 64bit production pl/sql release 11.2.0.2.0 - production core    11.2.0.2.0      production tns 64-bit windows: version 11.2.0.2.0 - production nlsrtl version 11.2.0.2.0 - production 

and here's autotrace of top-1 query. appears claiming query took 1 second, that's not true. ran 5 seconds.

name                        num -------------------- ---------- bravo                        11   execution plan ---------------------------------------------------------- plan hash value: 548796432  ------------------------------------------------------------------------------------- | id  | operation               | name      | rows  | bytes | cost (%cpu)| time     | ------------------------------------------------------------------------------------- |   0 | select statement        |           |     1 |    55 |     4  (25)| 00:00:01 | |*  1 |  count stopkey          |           |       |       |            |          | |   2 |   view                  |           |     1 |    55 |     4  (25)| 00:00:01 | |*  3 |    sort order stopkey|           |     1 |    55 |     4  (25)| 00:00:01 | |*  4 |     table access full   | tmp_table |     1 |    55 |     3   (0)| 00:00:01 | -------------------------------------------------------------------------------------  predicate information (identified operation id): ---------------------------------------------------     1 - filter(rownum<=1)    3 - filter(rownum<=1)    4 - filter( regexp_like (to_char("tmp_fib"("p"."num")),'(.)\1'))  note -----    - dynamic sampling used statement (level=2)   statistics ----------------------------------------------------------          27  recursive calls           0  db block gets          25  consistent gets           0  physical reads           0  redo size         593  bytes sent via sql*net client         524  bytes received via sql*net client           2  sql*net roundtrips to/from client           1  sorts (memory)           0  sorts (disk)           1  rows processed 

update: mentioned in comments, index hint helps query lot. enough accepted correct answer, though doesn't translate real-world scenario. , in ironic twist, oracle seems have learned experience, , chooses index plan default; have tell no_index reproduce original slow behavior.

in real-world scenario i've applied more complex solution, rewriting query pl/sql function. here's how technique looks, applied fib problem:

create or replace package tmp_package   type t_namenum table of tmp_table%rowtype;   function get_interesting_names (howmany integer) return t_namenum pipelined; end; /  create or replace package body tmp_package   function get_interesting_names (howmany integer) return t_namenum pipelined     cursor c select name, num tmp_table order name;     rec c%rowtype;     outcount integer;   begin     open c;     outcount := 0;     while outcount < howmany loop       fetch c rec;       exit when c%notfound;       if regexp_like(tmp_fib(rec.num), '(.)\1')         pipe row(rec);         outcount := outcount + 1;       end if;     end loop;   end; end; /  select * table(tmp_package.get_interesting_names(1)); 

thanks responders read question , ran tests , helped me understand execution plans, , dispose of question suggest.

follow-up comment big. running under 11.2.0.3 (oel), query:

select * (   select p.name, p.num   tmp_table p   regexp_like(tmp_fib(p.num), '(.)\1')   order p.name ) rownum <= 1;  name                        num -------------------- ---------- bravo                        11   elapsed: 00:00:00.094 plan hash value: 1058933870  ---------------------------------------------------------------------------------- | id  | operation            | name      | rows  | bytes | cost (%cpu)| time     | ---------------------------------------------------------------------------------- |   0 | select statement     |           |     1 |    25 |     4  (25)| 00:00:01 | |*  1 |  count stopkey       |           |       |       |            |          | |*  2 |   view               |           |     3 |    75 |     4  (25)| 00:00:01 | |   3 |    sort order     |           |     3 |    75 |     4  (25)| 00:00:01 | |   4 |     table access full| tmp_table |     3 |    75 |     3   (0)| 00:00:01 | ----------------------------------------------------------------------------------  predicate information (identified operation id): ---------------------------------------------------     1 - filter(rownum<=1)    2 - filter( regexp_like (to_char("tmp_fib"("num")),'(.)\1'))  note -----    - dynamic sampling used statement (level=2) 

note variation in sort order by saw, , corresponding rows values. moving order-by sub-select looks more yours:

select * (   select * (     select p.name, p.num     tmp_table p     order p.name   )   regexp_like(tmp_fib(num), '(.)\1') ) rownum <= 1;  name                        num -------------------- ---------- bravo                        11   elapsed: 00:00:07.894 plan hash value: 548796432  ------------------------------------------------------------------------------------- | id  | operation               | name      | rows  | bytes | cost (%cpu)| time     | ------------------------------------------------------------------------------------- |   0 | select statement        |           |     1 |    25 |   171  (99)| 00:00:03 | |*  1 |  count stopkey          |           |       |       |            |          | |   2 |   view                  |           |     1 |    25 |   171  (99)| 00:00:03 | |*  3 |    sort order stopkey|           |     1 |    25 |   171  (99)| 00:00:03 | |*  4 |     table access full   | tmp_table |     1 |    25 |   170  (99)| 00:00:03 | -------------------------------------------------------------------------------------  predicate information (identified operation id): ---------------------------------------------------     1 - filter(rownum<=1)    3 - filter(rownum<=1)    4 - filter( regexp_like (to_char("tmp_fib"("p"."num")),'(.)\1'))  note -----    - dynamic sampling used statement (level=2) 

don't know how helpful or practical in real scenario, in case (in environment, anyway), adding index across fetched columns - full index scan instead of full table scan - seems change behaviour:

create index tmp_index on tmp_table(name, num);  index tmp_index created.  select * (   select p.name, p.num   tmp_table p   regexp_like(tmp_fib(p.num), '(.)\1')   order p.name ) rownum <= 1;  name                        num -------------------- ---------- bravo                        11   elapsed: 00:00:00.093 plan hash value: 1841475998  ------------------------------------------------------------------------------- | id  | operation         | name      | rows  | bytes | cost (%cpu)| time     | ------------------------------------------------------------------------------- |   0 | select statement  |           |     1 |    25 |     1   (0)| 00:00:01 | |*  1 |  count stopkey    |           |       |       |            |          | |*  2 |   view            |           |     3 |    75 |     1   (0)| 00:00:01 | |   3 |    index full scan| tmp_index |     3 |    75 |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------  predicate information (identified operation id): ---------------------------------------------------     1 - filter(rownum<=1)    2 - filter( regexp_like (to_char("tmp_fib"("num")),'(.)\1'))  note -----    - dynamic sampling used statement (level=2)  select * (   select * (     select p.name, p.num     tmp_table p     order p.name   )   regexp_like(tmp_fib(num), '(.)\1') ) rownum <= 1;  name                        num -------------------- ---------- bravo                        11   elapsed: 00:00:00.093 plan hash value: 1841475998  ------------------------------------------------------------------------------- | id  | operation         | name      | rows  | bytes | cost (%cpu)| time     | ------------------------------------------------------------------------------- |   0 | select statement  |           |     1 |    25 |     1   (0)| 00:00:01 | |*  1 |  count stopkey    |           |       |       |            |          | |   2 |   view            |           |     1 |    25 |     1   (0)| 00:00:01 | |*  3 |    index full scan| tmp_index |     1 |    25 |     1   (0)| 00:00:01 | -------------------------------------------------------------------------------  predicate information (identified operation id): ---------------------------------------------------     1 - filter(rownum<=1)    3 - filter( regexp_like (to_char("tmp_fib"("p"."num")),'(.)\1'))  note -----    - dynamic sampling used statement (level=2) 

incidentally, aftr i've run threw several times of rownum variants start getting ora-01000: maximum open cursors exceeded errors. i'm dropping objects @ end of each run staying connected. think suggests bug somewhere, though not related you're seeing since happens index scan.


Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -