SQL Server Basics: Avoiding IN and NOT IN

Posted by Adrian Fri, 26 Jun 2009 20:59:00 GMT

IN and NOT IN are often the cause of otherwise inexplicably slow procedures in SQL Server. This article briefly explains why they should be avoided and how to better implement the same functionality.

IN and NOT IN are a feature of Microsoft SQL Server that should be avoided at all cost. The use of IN can cause a massive bottleneck in your application.

Just so you know what I'm talking about, here's an example of a select statement using IN:

SELECT *
    FROM TableOne
    WHERE SomeId IN (SELECT SomeReferece FROM TableTwo)

An example of NOT IN:

SELECT *
    FROM TableOne
    WHERE SomeId NOT IN (SELECT SomeReferece FROM TableTwo)

Apart from being messy, the problem is this: SQL Server does not know ahead of time what values are going to be returned by the IN query. This means that it has to run the same query over and over again for each row in TableOne (in our example). That causes a massive overhead.

NOT IN is even worse. With IN, as soon as the sub-query returns a matching value it knows that it can stop searching and move on to the next row. NOT IN requires a check of every single row of the sub-query, because without checking every single row we can’t know if our value is not in that list.

Imaging that TableOne returns 200 rows, and that TableTwo has 500 records. For both examples, SQL Server will run 201 queries. For the first example, we will need to scan between 220 and 100,000 (approx) rows to complete the query, based on the data. For the second example we will always need to scan 100,000 rows. That’s a lot.

Indexes can cover-up the problem somewhat, but you can’t get away from the problem of number of queries – especially if this stops SQL Server from creating a ‘plan’. There are better, more elegant solutions. The two most common are shown here.

To replace IN, we use an INNER JOIN. They are effectively the same thing. This code does exactly the same as example 1:

SELECT t1.*
    FROM TableOne t1
    INNER JOIN TableTwo t2
        ON t1.SomeId = t2.SomeReference

We will only ever get rows for t1 where there is a corresponding entry in t2.

NOT IN is effectly saying "where t2 doesn’t have the corresponding row". That requires a left outer join. I shall explain LEFT OUTER JOIN more fully in another article, but what we need to know is that it will return data if it can, other wise it will return an empty result. We can check for the empty result by seeing if the joined column IS NULL. It’s much easier than it sounds.

The following code achieves exactly the same results as example 2 above:

SELECT t1.*
    FROM TableOne t1
    LEFT OUTER JOIN TableTwo t2
        ON t1.SomeId = t2.SomeReference
    WHERE t2.SomeReference IS NULL

Before I finish, I shall say that there is one place where IN might just be OK (although even that is debatable). That one case is when you are supplying a hard-coded list of numbers. E.g.:

WHERE TypeId IN (1, 2, 3, 5, 9)

That is the only time.


About

We are a small British company that produces business-oriented software and solutions. These articles are a product of our daily work - information that we think might be useful to share. We hope you find them useful.

Our Software

These are some of our products. Several are open source, some are web-based and others are proprietary:

Categories

Archives

Syndicate

ml> ._trackPageview(); } catch(err) {} ml> l> pageTracker._trackPageview(); } catch(err) {} ml> ._trackPageview(); } catch(err) {} ml> l>