Using OR condition + function in join appears to confuse SQL Server's query optimizer -


i struggling debug performance on particular query. query this:

select count(*)   dbo.user d inner join dbo.distinct_first_name dfn on (         [dbo].jw(dfn.first_name, 'john') > 0.8         ,         (d.first_name = dfn.first_name          or d.nick_name = dfn.first_name          or d.middle_name = dfn.first_name)         ) 

the query runs jaro winkler filter on distinct first name table (containing approx 15k rows) , inner joins against user table produce result set. defined, takes around 1 minute run approx 500k rows in user table.

here's know:

1) jaro winkler filter instant (0.1s itself)

2) if change user clause include 1 of columns (i.e. remove ors) takes 0.4s

3) if change 3 queries, , run them back, takes approx 2s

4) if change jaro winkler filter 0.99 (so there's 1 result) makes no substantive difference in query execution time

5) if replace jaro winkler filter equality operation (dfn.first_name = 'john') total query time reduced 4s

(all timings on slow virt; real life performance better.)

so, reason, combination of function , ors confusing query optimizer. execution plan not informative; says 90% of query spent on:

<relop nodeid="63" physicalop="clustered index seek" logicalop="clustered index seek" estimaterows="1.69029" estimateio="0.003125" estimatecpu="0.000158859" avgrowsize="17" estimatedtotalsubtreecost="71.4311" tablecardinality="15958" parallel="0" estimaterebinds="448881" estimaterewinds="0.504024" estimatedexecutionmode="row">                               <outputlist>                                 <columnreference database="[mydb]" schema="[dbo]" table="[distinct_first_name]" alias="[dfn]" column="first_name" />                               </outputlist>                               <runtimeinformation>                                 <runtimecountersperthread thread="0" actualrows="857936" actualendofscans="859454" actualexecutions="859454" />                               </runtimeinformation>                               <indexscan ordered="1" scandirection="forward" forcedindex="0" forceseek="0" forcescan="0" noexpandhint="0" storage="rowstore"> 

splitting query option, since in sproc, , can redesign schema little, i'm stumped what's bogging down. ideas?

first of all, both first_name_alphaonly , nick_name_alphaonly non-persisted computed columns, cardinality est off, , multiplied.

then, there 857.936 individual clustered index seeks on distinct_first_name table, , after filter including jw function applied.

creating indexes on computed columns help. filtering on distinct_first_name prior join (into #temp table) also. , it's advice on turning ors union alls.

optimizer, afaik, never rearange ors unions itself. believe it's called playing safe.


Comments

Popular posts from this blog

How to access named pipes using JavaScript in Firefox add-on? -

multithreading - OPAL (Open Phone Abstraction Library) Transport not terminated when reattaching thread? -

node.js - req param returns an empty array -