Using OR condition + function in join appears to confuse SQL Server's query optimizer -


i struggling debug performance on particular query. query this:

select count(*)   dbo.user d inner join dbo.distinct_first_name dfn on (         [dbo].jw(dfn.first_name, 'john') > 0.8         ,         (d.first_name = dfn.first_name          or d.nick_name = dfn.first_name          or d.middle_name = dfn.first_name)         ) 

the query runs jaro winkler filter on distinct first name table (containing approx 15k rows) , inner joins against user table produce result set. defined, takes around 1 minute run approx 500k rows in user table.

here's know:

1) jaro winkler filter instant (0.1s itself)

2) if change user clause include 1 of columns (i.e. remove ors) takes 0.4s

3) if change 3 queries, , run them back, takes approx 2s

4) if change jaro winkler filter 0.99 (so there's 1 result) makes no substantive difference in query execution time

5) if replace jaro winkler filter equality operation (dfn.first_name = 'john') total query time reduced 4s

(all timings on slow virt; real life performance better.)

so, reason, combination of function , ors confusing query optimizer. execution plan not informative; says 90% of query spent on:

<relop nodeid="63" physicalop="clustered index seek" logicalop="clustered index seek" estimaterows="1.69029" estimateio="0.003125" estimatecpu="0.000158859" avgrowsize="17" estimatedtotalsubtreecost="71.4311" tablecardinality="15958" parallel="0" estimaterebinds="448881" estimaterewinds="0.504024" estimatedexecutionmode="row">                               <outputlist>                                 <columnreference database="[mydb]" schema="[dbo]" table="[distinct_first_name]" alias="[dfn]" column="first_name" />                               </outputlist>                               <runtimeinformation>                                 <runtimecountersperthread thread="0" actualrows="857936" actualendofscans="859454" actualexecutions="859454" />                               </runtimeinformation>                               <indexscan ordered="1" scandirection="forward" forcedindex="0" forceseek="0" forcescan="0" noexpandhint="0" storage="rowstore"> 

splitting query option, since in sproc, , can redesign schema little, i'm stumped what's bogging down. ideas?

first of all, both first_name_alphaonly , nick_name_alphaonly non-persisted computed columns, cardinality est off, , multiplied.

then, there 857.936 individual clustered index seeks on distinct_first_name table, , after filter including jw function applied.

creating indexes on computed columns help. filtering on distinct_first_name prior join (into #temp table) also. , it's advice on turning ors union alls.

optimizer, afaik, never rearange ors unions itself. believe it's called playing safe.


Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -