I have a query which is taking more than 15 mins to execute in Redshift. This query is being triggered using AWS Lambda which has a timeout of 15 mins. So, I wanted to check if there is a way to optimize the query to make it give results fast.
Here is my SQL query:
insert into test.qa_locked select '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481', 'ABC-013505', 'ABC-013505-2-2020', user_id, cast(TIMEOFDAY() as timestamp) from ( select user_id from ( select contact_id from test.qa_locked ) where contact_cnt <= 1 ) ) Here is the plan:
XN Subquery Scan "*SELECT*" (cost=1000028198481.69..1000028198481.75 rows=1 width=218) -> XN Subquery Scan derived_table1 (cost=1000028198481.69..1000028198481.73 rows=1 width=210) -> XN Window (cost=1000028198481.69..1000028198481.71 rows=1 width=56) -> XN Sort (cost=1000028198481.69..1000028198481.70 rows=1 width=56) -> XN Network (cost=1645148.05..28198481.68 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_OUTER (cost=1645148.05..28198481.68 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_INNER (cost=1645147.76..28091814.71 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_INNER (cost=1645147.09..7491814.01 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_INNER (cost=1645146.68..6805146.91 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_INNER (cost=1645146.16..6438479.71 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_NONE (cost=1645145.65..6071812.51 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_NONE (cost=1645145.29..6071812.13 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_BOTH (cost=1645144.96..6071811.77 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_NONE (cost=1645144.50..5598477.96 rows=1 width=56) -> XN Hash NOT IN Join DS_DIST_BOTH (cost=1645144.47..5598477.91 rows=1 width=84) -> XN Hash NOT IN Join DS_DIST_OUTER (cost=1645142.59..5078476.00 rows=1 width=84) -> XN Hash NOT IN Join DS_BCAST_INNER (cost=1645142.57..4065142.63 rows=1 width=600) -> XN Hash Left Join DS_DIST_BOTH (cost=1201145.21..3221145.24 rows=1 width=1116) -> XN Seq Scan on contacts xa (cost=1201145.21..1201145.21 rows=1 width=640) -> XN Hash (cost=0.00..0.00 rows=1 width=556) -> XN Seq Scan on accounts ya (cost=0.00..0.00 rows=1 width=556) -> XN Hash (cost=443997.35..443997.35 rows=1 width=32) -> XN Subquery Scan "IN_subquery" (cost=23989.76..443997.35 rows=1 width=32) -> XN Unique (cost=23989.76..443997.34 rows=1 width=516) -> XN Nested Loop DS_BCAST_INNER (cost=23989.76..443997.34 rows=1 width=516) -> XN Seq Scan on accounts con (cost=0.00..0.00 rows=1 width=516) -> XN Hash NOT IN Join DS_DIST_OUTER (cost=23989.76..83997.32 rows=1 width=26) -> XN Seq Scan on campaign_exclusion_list cam (cost=0.00..7.53 rows=1 width=26) -> XN Hash (cost=23989.75..23989.75 rows=1 width=32) -> XN Subquery Scan "IN_subquery" (cost=0.00..23989.75 rows=1 width=32) -> XN Unique (cost=0.00..23989.74 rows=1 width=18) -> XN Seq Scan on campaign_inclusion_list (cost=0.00..23989.74 rows=1 width=18) -> XN Hash (cost=0.01..0.01 rows=1 width=516) -> XN Subquery Scan "IN_subquery" (cost=0.00..0.01 rows=1 width=516) -> XN Unique (cost=0.00..0.00 rows=1 width=516) -> XN Seq Scan on contacts (cost=0.00..0.00 rows=1 width=516) -> XN Hash (cost=1.88..1.88 rows=1 width=210) -> XN Seq Scan on bh_email_open_clicks (cost=0.00..1.88 rows=1 width=210) -> XN Hash (cost=0.01..0.01 rows=1 width=210) -> XN Subquery Scan "IN_subquery" (cost=0.00..0.01 rows=1 width=210) -> XN Unique (cost=0.00..0.00 rows=1 width=28) -> XN Seq Scan on contacts (cost=0.00..0.00 rows=1 width=28) -> XN Hash (cost=0.45..0.45 rows=1 width=210) -> XN Seq Scan on bh_leads (cost=0.00..0.45 rows=1 width=210) -> XN Hash (cost=0.32..0.32 rows=1 width=402) -> XN Subquery Scan "IN_subquery" (cost=0.30..0.32 rows=1 width=402) -> XN HashAggregate (cost=0.30..0.31 rows=1 width=402) -> XN Seq Scan on campaign_extraction_history (cost=0.00..0.30 rows=1 width=402) -> XN Hash (cost=0.35..0.35 rows=1 width=402) -> XN Subquery Scan "IN_subquery" (cost=0.33..0.35 rows=1 width=402) -> XN HashAggregate (cost=0.33..0.34 rows=1 width=402) -> XN Seq Scan on campaign_extraction_history (cost=0.00..0.33 rows=1 width=402) -> XN Hash (cost=0.50..0.50 rows=1 width=210) -> XN Seq Scan on bh_leads (cost=0.00..0.50 rows=1 width=210) -> XN Hash (cost=0.50..0.50 rows=1 width=210) -> XN Seq Scan on bh_leads (cost=0.00..0.50 rows=1 width=210) -> XN Hash (cost=0.40..0.40 rows=1 width=402) -> XN Seq Scan on campaign_extraction_history (cost=0.00..0.40 rows=1 width=402) -> XN Hash (cost=0.30..0.30 rows=30 width=402) -> XN Seq Scan on ce_locked_records_tb (cost=0.00..0.30 rows=30 width=402) -> XN Hash (cost=0.27..0.27 rows=1 width=210) -> XN Subquery Scan "IN_subquery" (cost=0.26..0.27 rows=1 width=210) -> XN HashAggregate (cost=0.26..0.26 rows=1 width=210) -> XN Seq Scan on bh_leads (cost=0.00..0.25 rows=1 width=210) Please suggest if there are any ways to optimize this query.
not inoperators, which are notoriously bad for efficiency. Anot inrequires the selection of a whole lot of data, then checking that the desired value isn't in each returned row. That makes things very slow in any database. They also seem to be responsible for most of thecostcalculations. There are 31 sub-queries (SELECT) in that query, too.NOT INcan often be replaced by aLEFT OUTER JOIN. Then, confirm that a joined field is NULL. There is quite a bit of discussion on the Internet about this, eg: SQL performance on LEFT OUTER JOIN vs NOT EXISTS and Consider using NOT EXISTS instead of NOT IN with a subquery - Redgate Software and NOT IN vs. NOT EXISTS vs. OUTER APPLY vs. OUTER JOIN.costfigure. You should concentrate on reducing or removing the high costs. In addition, the high costs might be caused by theDS_DIST_INNERandDS_DIST_BOTHactivities. These can typically be avoided by tables sharing the sameDISTKEYor by replicating tables on all nodes. See: Evaluating the query plan - Amazon Redshift