I have three tables: units (apartment units), leases, and line_items (a better name would have been recurring rents). A unit has many leases and lease has many line items (associated with each rent change). My goal is, given a set of a months, to list each unit with its most recent rent to it.
Table public.units Column | Type | Collation | Nullable | Default ----------------+-----------------------------+-----------+----------+----------------------------------- id | integer | | not null | nextval('units_id_seq'::regclass) number | character varying | | | bedrooms | integer | | | bathrooms | integer | | | square_footage | integer | | | community_id | integer | | | building_id | integer | | | created_at | timestamp without time zone | | not null | updated_at | timestamp without time zone | | not null | slug | character varying | | | status | integer | | not null | 0 note | text | | | half_bathrooms | integer | | | display_number | character varying | | | Indexes: units_pkey PRIMARY KEY, btree (id) index_units_on_building_id btree (building_id) index_units_on_community_id btree (community_id) index_units_on_slug btree (slug) Foreign-key constraints: fk_rails_5850136a38 FOREIGN KEY (building_id) REFERENCES buildings(id) fk_rails_b860cf198b FOREIGN KEY (community_id) REFERENCES communities(id) Referenced by: TABLE leases CONSTRAINT fk_rails_29210439a5 FOREIGN KEY (unit_id) REFERENCES units(id) TABLE market_rents CONSTRAINT fk_rails_7270410e47 FOREIGN KEY (unit_id) REFERENCES units(id) TABLE incomes_unit_budgets CONSTRAINT fk_rails_791330e8fe FOREIGN KEY (unit_id) REFERENCES units(id) Table public.leases Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---------------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+------------- id | integer | | not null | nextval('leases_id_seq'::regclass) | plain | | start | date | | | | plain | | unit_id | integer | | | | plain | | created_at | timestamp without time zone | | not null | | plain | | updated_at | timestamp without time zone | | not null | | plain | | vacate | date | | | | plain | | community_id | bigint | | not null | | plain | | Indexes: leases_pkey PRIMARY KEY, btree (id) index_leases_on_community_id btree (community_id) index_leases_on_unit_id btree (unit_id) Foreign-key constraints: fk_rails_29210439a5 FOREIGN KEY (unit_id) REFERENCES units(id) fk_rails_46ad3e9d34 FOREIGN KEY (community_id) REFERENCES communities(id) Referenced by: TABLE line_items CONSTRAINT fk_rails_02720bca5b FOREIGN KEY (lease_id) REFERENCES leases(id) TABLE terms CONSTRAINT fk_rails_49f65f38eb FOREIGN KEY (lease_id) REFERENCES leases(id) Table public.line_items Column | Type | Collation | Nullable | Default -----------------+-----------------------------+-----------+----------+---------------------------------------- id | integer | | not null | nextval('line_items_id_seq'::regclass) name | character varying | | | lease_id | integer | | | amount_cents | integer | | not null | 0 amount_currency | character varying | | not null | 'USD'::character varying created_at | timestamp without time zone | | not null | updated_at | timestamp without time zone | | not null | start | date | | | expiration | date | | | Indexes: line_items_pkey PRIMARY KEY, btree (id) index_line_items_on_lease_id btree (lease_id) index_line_items_on_start btree (start) Foreign-key constraints: fk_rails_02720bca5b FOREIGN KEY (lease_id) REFERENCES leases(id) The following query does exactly what I would hope, but is extremely slow (8 seconds) on a relatively small data set (~1500 units, ~5000 leases, ~15000 line items):
with last_rent as ( SELECT line_items.* FROM (SELECT line_items.*, leases.unit_id, row_number() over ( partition by unit_id, date_trunc('month', line_items.start) order by line_items.start desc ) FROM line_items INNER JOIN leases ON leases.id = line_items.lease_id WHERE line_items.name = 'RNT' AND leases.community_id IN (X)) as line_items WHERE line_items.row_number = 1 ), month_series as ( SELECT id as unit_id, community_id, bedrooms, bathrooms, generate_series( '2019-03-01', '2019-03-01'::date, interval '1 month' ) - interval '1 day' dt FROM units WHERE units.community_id IN (X) ORDER BY units.id ASC ), unit_rent_month as ( select unit_id, community_id, bedrooms, bathrooms, amount_cents, date_trunc('month', dt)::date as period from month_series left join lateral ( select amount_cents from last_rent where last_rent.unit_id = month_series.unit_id and last_rent.start <= month_series.dt order by unit_id, last_rent.start desc limit 1 ) last_rent on true order by dt, unit_id ) select * from unit_rent_month; My strategy was to isolate the last rent of each month (in case of early move outs or cancellations) by using a window function. I created another table (month_series) which lists all the unit_ids and the months. I tried doing a lateral join between last rent and month series.
The analyze looks like this:
CTE Scan on unit_rent_month (cost=2563655.29..2593175.29 rows=1476000 width=24) (actual time=8019.924..8020.607 rows=1476 loops=1) Output: unit_rent_month.unit_id, unit_rent_month.community_id, unit_rent_month.bedrooms, unit_rent_month.bathrooms, unit_rent_month.amount_cents, unit_rent_month.period Buffers: shared hit=990 CTE last_rent -> Subquery Scan on line_items (cost=1682.58..2163.78 rows=60 width=56) (actual time=44.584..63.921 rows=12010 loops=1) Output: line_items.id, line_items.name, line_items.lease_id, line_items.amount_cents, line_items.amount_currency, line_items.created_at, line_items.updated_at, line_items.start, line_items.expiration, line_items.unit_id, line_items.row_number Filter: (line_items.row_number = 1) Rows Removed by Filter: 63 Buffers: shared hit=269 -> WindowAgg (cost=1682.58..2013.40 rows=12030 width=68) (actual time=44.582..60.093 rows=12073 loops=1) Output: line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.start, line_items_1.expiration, leases.unit_id, row_number() OVER (?), leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone)) Buffers: shared hit=269 -> Sort (cost=1682.58..1712.65 rows=12030 width=56) (actual time=44.524..45.478 rows=12073 loops=1) Output: line_items_1.start, leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone)), line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.expiration Sort Key: leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone)), line_items_1.start DESC Sort Method: quicksort Memory: 2082kB Buffers: shared hit=269 -> Hash Join (cost=378.43..867.28 rows=12030 width=56) (actual time=7.626..33.901 rows=12073 loops=1) Output: line_items_1.start, leases.unit_id, date_trunc('month'::text, (line_items_1.start)::timestamp with time zone), line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.expiration Inner Unique: true Hash Cond: (line_items_1.lease_id = leases.id) Buffers: shared hit=269 -> Seq Scan on public.line_items line_items_1 (cost=0.00..395.25 rows=12736 width=44) (actual time=0.012..5.813 rows=12736 loops=1) Output: line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.start, line_items_1.expiration Filter: ((line_items_1.name)::text = 'RNT'::text) Rows Removed by Filter: 4884 Buffers: shared hit=175 -> Hash (cost=292.99..292.99 rows=6835 width=8) (actual time=7.537..7.537 rows=6837 loops=1) Output: leases.unit_id, leases.id Buckets: 8192 Batches: 1 Memory Usage: 332kB Buffers: shared hit=94 -> Seq Scan on public.leases (cost=0.00..292.99 rows=6835 width=8) (actual time=0.010..4.954 rows=6837 loops=1) Output: leases.unit_id, leases.id Filter: (leases.community_id = ANY ('{X}'::bigint[])) Rows Removed by Filter: 401 Buffers: shared hit=94 CTE month_series -> Result (cost=0.28..33371.24 rows=1476000 width=24) (actual time=0.065..11.553 rows=1476 loops=1) Output: units.id, units.community_id, units.bedrooms, units.bathrooms, ((generate_series('2019-03-01 00:00:00-08'::timestamp with time zone, ('2019-03-01'::date)::timestamp with time zone, '1 mon'::interval)) - '1 day'::interval) Buffers: shared hit=712 -> ProjectSet (cost=0.28..7541.24 rows=1476000 width=24) (actual time=0.062..9.869 rows=1476 loops=1) Output: generate_series('2019-03-01 00:00:00-08'::timestamp with time zone, ('2019-03-01'::date)::timestamp with time zone, '1 mon'::interval), units.id, units.community_id, units.bedrooms, units.bathrooms Buffers: shared hit=712 -> Index Scan using units_pkey on public.units (cost=0.28..146.48 rows=1476 width=16) (actual time=0.038..4.365 rows=1476 loops=1) Output: units.id, units.number, units.bedrooms, units.bathrooms, units.square_footage, units.community_id, units.building_id, units.created_at, units.updated_at, units.slug, units.status, units.note, units.half_bathrooms, units.display_number Filter: (units.community_id = ANY ('{X}'::integer[])) Rows Removed by Filter: 201 Buffers: shared hit=712 CTE unit_rent_month -> Sort (cost=2524430.27..2528120.27 rows=1476000 width=32) (actual time=8019.918..8019.998 rows=1476 loops=1) Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, last_rent.amount_cents, ((date_trunc('month'::text, month_series.dt))::date), month_series.dt Sort Key: month_series.dt, month_series.unit_id Sort Method: quicksort Memory: 164kB Buffers: shared hit=990 -> Nested Loop Left Join (cost=1.51..2302560.00 rows=1476000 width=32) (actual time=78.693..8018.611 rows=1476 loops=1) Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, last_rent.amount_cents, (date_trunc('month'::text, month_series.dt))::date, month_series.dt Buffers: shared hit=984 -> CTE Scan on month_series (cost=0.00..29520.00 rows=1476000 width=24) (actual time=0.067..13.332 rows=1476 loops=1) Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, month_series.dt Buffers: shared hit=712 -> Limit (cost=1.51..1.51 rows=1 width=12) (actual time=5.419..5.420 rows=1 loops=1476) Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start Buffers: shared hit=272 -> Sort (cost=1.51..1.51 rows=1 width=12) (actual time=5.418..5.418 rows=1 loops=1476) Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start Sort Key: last_rent.start DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=272 -> CTE Scan on last_rent (cost=0.00..1.50 rows=1 width=12) (actual time=3.395..5.411 rows=8 loops=1476) Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start Filter: ((last_rent.start <= month_series.dt) AND (last_rent.unit_id = month_series.unit_id)) Rows Removed by Filter: 12002 Buffers: shared hit=269 Planning time: 3.219 ms Execution time: 8021.388 ms Based on what the analyze is saying, I believe the issue to be the subquery and the lateral join. Would anyone help me figure how to optimize this? Happy to share more information if needed.
EDIT: Here is my final query:
SELECT id as unit_id, community_id, bedrooms, bathrooms, date_trunc('month', mon.dt)::date AS period, last_rent.amount_cents FROM "units" CROSS JOIN generate_series(timestamp '2018-01-01' , timestamp '2018-12-01' , interval '1 month') mon(dt) left join lateral ( SELECT li.amount_cents FROM leases l JOIN line_items li ON l.id = li.lease_id WHERE l."unit_id" = "units"."id" AND li.start < mon.dt + interval '1 month' AND li.name = 'RNT' order by li.start desc limit 1 ) last_rent on true WHERE "units"."community_id" IN (X) order by period, unit_id EDIT 2:
For reference, here's a test case with rows and desired output
Line Item 1: id: 1 name: "RNT" lease_id: 1 amount_cents: 100,000 start: January 1, 2018 Line Item 2: id: 2 name: "RNT" lease_id: 2 amount_cents: 110,000 start: March 5, 2018 Lease 1: id: 1 start: January 1, 2018 vacate: March 4, 2018 unit_id: 1 Lease 2: id: 2 start: March 5, 2018 vacate: null unit_id: 1 Unit 1: id: 1 community_id: 1 number: 101 bedrooms: 1 bathrooms: 1 Output:
| unit_id | community_id | bedrooms | bathrooms | period | amount_cents --------------------------------------------------------------------------- | 1 | 1 | 1 | 1 | Jan 1, 2018 | 100,0000 | 1 | 1 | 1 | 1 | Feb 1, 2018 | 100,0000 | 1 | 1 | 1 | 1 | Mar 1, 2018 | 110,0000 | 1 | 1 | 1 | 1 | Apr 1, 2018 | 110,0000 ... Repeating 110,000 for the rest of 2018.