Where (implicit inner join) vs. explicit inner join - does it affect indexing?

Question

For the query

SELECT * from table_a, b WHERE table_a.id = b.id AND table_a.status ='success'

or

SELECT * from a WHERE table_a.status ='success' JOIN b ON table_a.id = b.id

Somehow, i would tend to create one index (id,status) on table_a for the top form
whereas my natural tendency for the bottom form would be to create two separate indices, id, and status, on table_a.

the two queries are effectively the same, right? would you index both the same way?
how would you index table_a (assuming this is the only query that exists in the system to avoid other considerations)? one or two indices?

Do you really want your WHERE clause before the JOIN clause? The looks like the first part is a sub-query, whose results should be joined, but I miss the parentheses then. — Fabian
– Fabian, Commented Nov 19, 2013 at 13:46
Fabian, i think you're right. the second query is missing parenthesis. but would that be equivalent to placing the where at the end? — inor
– inor, Commented Nov 20, 2013 at 5:30
Please do not think of the id field in either table as a primary key. for the purpose of my question it's just a normal field. i should have used another name, like 'foo' — inor
– inor, Commented Nov 20, 2013 at 5:32

Fabian · Accepted Answer · 2013-11-20 10:17:20Z

The "traditional style" and the SQL 92 style inner join are semantically equivalent, and most DBMS will treat them the same (Oracle, for example, does). They will use the same execution plan for both forms (this is, nevertheless, implementation-dependent, and not guaranteed by any standard).

Hence, indexes are used the same way in both forms, too.

Independently of the syntax you use, the appropriate indexing strategy is implementation-dependent: some DBMS (such as Postgres) generally prefer single-column indexes and can combine them very efficiently, others, such as Oracle, can take more advantage from combined (or even covering) indexes (although both forms work for both DBMS of course).

Regarding the syntax of your example, the position of the second WHERE clause surprises me a little bit.

The following two queries are processed the same way in most DBMS:

SELECT * FROM table_a, b WHERE table_a.id = b.id AND table_a.status ='success'

and

SELECT * FROM a JOIN b ON table_a.id = b.id WHERE table_a.status ='success'

However, your second query shifts the WHERE clause inside the FROM clause, which is no valid SQL in my view.

A quick check for

SELECT * from a WHERE table_a.status ='success' JOIN b ON table_a.id = b.id

confirms: MySQL 5.5, Postgres 9.3, and Oracle 11g all yield a syntax error for it.

Stuart Ainsworth · Accepted Answer · 2013-11-19 13:48:00Z

The two queries should be optimized to perform the same way; however, the join syntax is ANSI compliant, and the older version should be deprecated. As far as index usage is concerned, you only want to touch a table (index) once. The RDBMS and tabular design you are using will determine the specifics as to whether or not you need to include the PRIMARY KEY (assuming that's what ID represents in your example) in a covering index. Also, SELECT * may or may not be covered; better to use specific column names.

Markus Winand · Accepted Answer · 2013-11-20 16:50:40Z

Well you ruled out other queries but there are still open questions: particularly about the data distribution. E.g. how to to number of rows WHERE table_a.status ='success' compare to the table size of table_b? Depending on the optimizers estimates the has to make two important decisions:

Which join algorithm to use (Nested Loops; Hash or Sort/Merge)
In which order to process the table?

Unfortunately these decision affect indexing (and are affected by indexing!)

Example: consider there is only one row WHERE table_a.status ='success'. Than it would be fine to have an index on table_a.status to find that row quickly. Next, we'd like to have an index on table_b.id to find the corresponding rows quickly using a nested loops join. Considering that you select * it doesn't make any sense to include additional columns into these indexes (not considering any other queries in the system).

But now imagine that you don't have an index on table_a.status but on table_a.id and that this table is huge compared to table_b. For demonstration let's assume table_b has only one row (extreme case, of course). Now it would be better to go to table_b, fetch all rows (just one) and than fetch the corresponding rows from table_a using the index. You see how indexing affects the join order? (for a nested loops join in this example)

This is just one simple example how things interact. Most database have three join algorithms to chose from (except MySQL).

If you create the three mentioned indexes and look which way the database executions the join (explain plan) you'll note that one or two of the indexes remains unused for the specific join-algo/join-order selected for your query. In theory, you could drop that indexes. However, keep in mind that the optimizer makes his decision based on the statistics available to him and that the optimizers estimations might be wrong.

You can find more about indexing joins on my web-site: http://use-the-index-luke.com/sql/join

Collectives™ on Stack Overflow

Where (implicit inner join) vs. explicit inner join - does it affect indexing?

3 Answers 3

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Linked

Related