MySQL5 : MySQL Optimizations - Left Join with several "or" conditions

2013-07-09
Laurent Bientz

If you have already had to handle large queries with LEFT JOIN cascade that involve multiple conditions "or" in the joints, this article should interest you.

Imagine 3 tables:

  • news with FK nullable to user + FK nullable to account
  • user with FK nullable to account
  • account

Of course, you want to retrieve in a single query all news optionally with the user and/or the accout.

At first, you try one:

SELECT
    *
FROM
    news n
    LEFT JOIN user u ON n.user_id = u.user_id
    LEFT JOIN account a ON u.account_id = a.account_id

This will work but will give you all the news + user (if there's join) + account (if there's join from the user).

Except that our news table also has a nullable FK to account, the problem is that you do not recovered the news accounts that do not have users, because the joint with account is done from user.

Basically, your query returns:

news -> user (if there) -> account (if there if has a user which have a FK to account)

But not:

news -> account (if the news has a FK directly to account)

You can change the join but the problem is reversed, you will have the news' accounts but not the accounts of users of the news.

Second, you try to add a join condition "or" on account:

SELECT
    *
FROM
    news n
    LEFT JOIN user u ON n.user_id = u.user_id
    LEFT JOIN account a ON (u.account_id = a.account_id OR n.account_id = a.account_id)

This allows you to attach the user account from the OR from the news.

Then, you say « Victory! » but how MySQL reacts when offers several possibilities for join?

The reason (at least mine) would like to say that he try the first condition (u.account_id = a.account_id) based on its index (PK and FK), then the 2nd (n.account_id = a.account_id) based again on its index, etc.

Unfortunately the reality is quite different: not knowing on which condition and index is based, MySQL will perform full table scan between news and user THEN between news and account table.

EXPLAIN will give you this kind of message that are not fun at all:

Type: ALL + Extra: Range checked for each record (index map: 0x1)

Imagine what it can give if you have 5 or 6 conditions "or" join, all in subqueries with group by, having, math functions and other wonderful treatments.

It happens very quickly to millions of scanned lines and exponential running time.

Third, we say that if MySQL is unable to find the proper condition, we will force his hand:

It's not sexy but perfectly functional. The "or" is replaced by nested "if" to do the job in his place and to provide directly the condition and therefore the index to use.

SELECT
    *
FROM
    news n
    LEFT JOIN user u ON n.user_id = u.user_id
    LEFT JOIN account a ON (IF (u.account_id IS NOT NULL, u.account_id, n.account_id) = a.account_id)

Instead of full table scans, me just made a simple matching index.

In my case now, before: over 5 million of scanned lines - after: only 25,500 of scanned lines... MySQL thank you!

More infos on MySQL website: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

neobie - 2016-05-18 19:04:53
This is amazing. Never know the join condition can be replace by IF !
It save a lot! Thank you!
craig brown - 2016-10-28 20:22:25
is this good coding practice? know how to implement similar logic in sql server?
Alex - 2017-03-09 22:30:53
Took me too long to find this article, exactly what I needed. Great work!

Comment

USER EXPERIENCE

Wandi invites you to discover a new approach to navigating a website...
Are you ready?

Launch the experience