This post will be short. It’s about a lesson learned by tuning the big SQL used for loading items from not related tables using the same where clause. It was performed on Oracle 11g, but I am pretty confident that it applies to the most SQL databases.
Using the “WHERE” clause after the whole “UNION” is performed is significantly slower than using the “WHERE” clause inside inner selects.
This (if you can ensure removal of duplicates by the inner where clauses):
(SELECT * FROM TABLE_1 WHERE COL > 1) a UNION ALL (SELECT * FROM TABLE_2 WHERE COL > 1) b;
is better than following:
(SELECT * FROM TABLE_1 WHERE COL > 1) a UNION (SELECT * FROM TABLE_2 WHERE COL > 1) b;
which is better than following:
SELECT * FROM ( (SELECT * FROM TABLE_1) a UNION (SELECT * FROM TABLE_2) b) WHERE COL > 1;
UNION ALL is faster than UNION because plain UNION is expecting that in two joined datasets are duplicates which need to be removed. If you can ensure (by inner where clauses) that there will be no duplicates, it’s far better to use UNION ALL and let database engine optimize the inner selects.
Using WHERE clause on the result of grouped results is too expensive because you are operating on more internal results than you need. Also the optimizing by database engine can’t be processed as the results have anything in common.
P.S. If you enjoyed this post, you can share this post anywhere as well as follow me on twitter to stay in touch with my further articles and other thoughts.
P.S.2 Cover image by Joel Herzog | unsplash.com.