POSTGRES INDEXES: A DEEP DIVE
INTRODUCTION Bartosz Sypytkowski ▪ @Horusiath ▪ b.sypytkowski@gmail.com ▪ bartoszsypytkowski.com
 B+Tree: how databases manage data?  How indexes work  Query plans & execution  Types of indexes  Vector search AGENDA
B+TREE
HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB B+TREE link link link link
HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 49 DATA 49 DATA 62 DATA 64 DATA 8KB SEARCH BY EQUALITY link link link link
HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 32 49 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB RANGE SCAN link link link link
POSTGRESQL TOAST TABLES book_id title content 1 ‘Haikus’ ‘In the twilight rain these brilliant-hued hibiscus - A lovely sunset.’ 2 ‘Moby Dick’ public.books chunk_id chunk_seq chunk_data 16403 0 0xFF2F000000436 16403 1 0x6167F27521F25B 16403 2 0x23FB21030E6F6 16403 3 0x7974686108676 pg_toast.pg_toast_{$OID} select relname from pg_class where oid = ( select reltoastrelid from pg_class where relname='books') Find name of a corresponding TOAST table if(compress(content).len > 2000)
HEADER HEADER HEADER HEADER HEADER 4 25 4 DATA 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA TUPLE IDs link Table Heap Index Storage 4 TID 8 TID 16 TID 20 TID 25 TID 27 TID 30 TID 32 TID 32 DATA <block id, tuple offset> select ctid from my_table
QUERY EXECUTION
select * from books where author_id = 10 SEQ SCAN
SEQ SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. (Hopefully) sequential I/O 2. Scans all table’s related pages 3. Doesn’t use index pages
create index on books(publication_date); select publication_date from books where publication_date > ‘2020/01/01’ INDEX ONLY SCAN
INDEX ONLY SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Sequential I/O over index pages 2. Doesn’t use table’s related pages
create index on books(publication_date); select title, publication_date from books where publication_date > ‘2020/01/01’ INDEX SCAN
INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table…
INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page…
INDEX SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page… 3. Sequential I/O over all table’s pages until condition is done
create index on books using gist(description_lex); select title, publication_date from books where description_lex @@ ‘epic’ BITMAP SCAN
BITMAP SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages Bitmap
BITMAP SCAN M1 T1 I1 T2 T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages 2. Random I/O over pages covered by bitmap Bitmap
INCLUDE & PARTIAL INDEXES create index ix_books_by_author on books(author_id) include (created_at) where author_id is not null; HEADER HEADER 4 25 HEADER HEADER HEADER 4 DATA 7 DATA 13 DATA 16 DATA 19 DATA 25 DATA 32 DATA 47 DATA 61 DATA 4 TID INC 16 TID INC 25 TID INC 32 TID INC duplicated columns Index Storage
BTREE INDEX create index ix_users_birthdate on users(birthdate desc);
COMPAR ING VECTOR CLOCKS B-TREE INDEX 1. Default 2. Access time: always O(logN) 3. Supports most of the index features create index ix_users_birthdate on users(birthdate desc);
HASH INDEX create index ix_orders_no on orders using hash(order_no);
HEADER HEADER 0x 01 0x 02 HEADER HEADER HEADER Meta page HASH INDEX HEADER Bucket 0 Overflow page Bitmap page Bucket 1 4 TID 71 TID 13 TID 73 TID 42 TID 67 TID 86 TID 99 TID 5 TID 38 TID 7 TID 51 TID 14 TID 34 TID 66 TID 70 TID 72 TID 90 TID 79 TID 91 TID 82 TID
COMPAR ING VECTOR CLOCKS HASH INDEX 1. Access time: O(1) – O(N) 2. Doesn’t shrink in size* create index ix_orders_no on orders using hash(order_no);
BRIN INDEX create index ih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
HEADER HEADER HEADER 2020/01/01 2020/03/14 2020/02/09 2020/11/10 2020/01/01 2020/01/21 0..127 BRIN INDEX link 2020/01/21 2020/03/14 128..255 2020/02/09 2020/07/28 256..383 2020/03/17 2020/11/10 384..511
select tablename, attname, correlation from pg_stats where tablename = 'film' tablename attname correlation film film_id 0.9979791 film title 0.9979791 film description 0.04854498 film release_year 1 film rating 0.1953281 film last_update 1 film fulltext <null> COLUMN-TUPLE CORRELATION
COMPAR ING VECTOR CLOCKS BRIN INDEX 1. Imprecise 2. Very small in size 3. Good for columns aligned with tuple insert order and immutable records create index ih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
BLOOM INDEX create index ix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2);
BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len
BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len
BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len FALSE
BLOOM FILTER INSERT(C) = h1(C)/len | h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len CONTAINS(B) = h1(B)/len | h2(B)/len .. | hn(B)/len FALSE MAYBE?
COMPAR ING VECTOR CLOCKS BLOOM INDEX 1. Small in size 2. Good for exclusion/narrowing 3. False positive ratio: hur.st/bloomfilter/ create extension bloom; create index ix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2); number of bits per record number of hashes for each column
GiST INDEX create index ix_books_content on books using gist(content_lex);
GiST INDEX GEO POINTS image: https://postgrespro.com/blog/pgsql/4175817
GiST INDEX TSVECTOR -- gist cannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gist(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
GiST INDEX TSVECTOR HEADER 011011 110111 HEADER 010011 011010 HEADER 100110 110001 HEADER aab TID aac TID aba TID HEADER adf TID azf TID bac TID HEADER brc TID caa TID cdl TID HEADER cff TID fre TID klm TID
COMPAR ING VECTOR CLOCKS GiST INDEX 1. Supports specialized operators 2. Index is not updated during deletes create index ix_books_content on books using gist(content_lex);
SP-GiST INDEX create index ix_files_path on files using spgist(path);
SP-GiST INDEX GEO POINTS image: https://postgrespro.com/blog/pgsql/4220639
SP-GiST INDEX TSVECTOR -- spgist can be created on text column but not on nvarchar create index idx_film_title on film using spgist(title); select * from film where title like ‘A Fast-Paced% in New Orleans'; Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416) Filter: (description ~~ 'A Fast-Paced%'::text) -> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0) Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text)) Query Plan
COMPAR ING VECTOR CLOCKS GiST INDEX 1. Just like GiST, but faster for some ops… 2. … but unable to perform some other 3. Indexed space is partitioned into non- overlapping regions create index ix_files_path on files using spgist(path);
GIN INDEX create index ix_books_content on books using gin(content_lex);
GIN INDEX -- gist cannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gin(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
GIN INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) gone leather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) Posting list Posting tree
COMPAR ING VECTOR CLOCKS GIN INDEX create index ix_books_content on books using gin(content_lex); 1. Reads usually faster than GiST 2. Writes are usually slower than GiST 3. Index size greater than GiST
RUM INDEX create index ix_books_content on books using rum(content_lex);
RUM INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) gone leather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) 1, 14 5 2, 5 25 1, 9 45 1, 21 9, 11 2, 10 45 1, 22 13 1 211 2, 4 2, 21 13 25 5 4, 7 (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) 2 3 1, 7 8, 13 17 1
RUM INDEX -- similarity ranking select description_lex <=> to_tsquery('epic’) as similarity from books; -- find description with 2 words located one after another select * from books where description_lex @@ to_tsquery(‘hello <-> world’);
COMPAR ING VECTOR CLOCKS RUM INDEX 1. GIN on steroids (bigger but more capable) 2. Allows to query for terms and their relative positions in text 3. Supports Index Scan and EXCLUDE create extension rum; create index ix_books_content on books using rum(content_lex);
VECTOR DATABASE
1. Uses only lexical similarity and is language-sensitive 2. Misses the context (meaning) 3. Works only on text LIMITS OF FULL-TEXT SEARCH select * from books where description_lex @@ to_tsquery(‘white <-> castle’); select * from books where description_lex @@ to_tsquery(‘white <-> fortress’);
VECTOR SIMILARITY 101 R G B 250 185 0 115 0 152 Embedding
EUCLIDEAN DISTANCE COSINE DISTANCE x y A B x y A B
IVFFLAT VECTOR INDEX create index ix_books_content on books using ivfflat(embedding content_lex) with (lists = 1000); select * from items where embedding <-> ‘[3,1,2]’ < 5;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 lists create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 centroids create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
QUERYING
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items where embedding <-> ‘[1,2]’ < 5; [1,2]
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4; [1,2]
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
HNSW VECTOR INDEX create index on items using hnsw (embedding vector_cosine_ops) with (m = 16, ef_construction = 64);
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3);
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3); Layer 1
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2 Layer 3
QUERYING
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
INVERTED FILE WITH FLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
1. Fast build time 2. Smaller size 3. Slower query performance 4. Bad for frequent index updates 1. Slow initial build time 2. Bigger index size 3. Faster performance 4. Better recall after updates IVFFLAT INDEX HNSW INDEX
SUMMARY
THANK YOU

Postgres indexes: how to make them work for your application

  • 1.
  • 2.
    INTRODUCTION Bartosz Sypytkowski ▪ @Horusiath ▪b.sypytkowski@gmail.com ▪ bartoszsypytkowski.com
  • 3.
     B+Tree: howdatabases manage data?  How indexes work  Query plans & execution  Types of indexes  Vector search AGENDA
  • 4.
  • 5.
    HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 3249 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB B+TREE link link link link
  • 6.
    HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 3249 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 49 DATA 49 DATA 62 DATA 64 DATA 8KB SEARCH BY EQUALITY link link link link
  • 7.
    HEADER HEADER HEADER HEADER HEADER HEADER HEADER 4 32 4 25 4 DATA 3249 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA 32 DATA 42 DATA 43 DATA 44 DATA 49 DATA 62 DATA 64 DATA 8KB RANGE SCAN link link link link
  • 8.
    POSTGRESQL TOAST TABLES book_id titlecontent 1 ‘Haikus’ ‘In the twilight rain these brilliant-hued hibiscus - A lovely sunset.’ 2 ‘Moby Dick’ public.books chunk_id chunk_seq chunk_data 16403 0 0xFF2F000000436 16403 1 0x6167F27521F25B 16403 2 0x23FB21030E6F6 16403 3 0x7974686108676 pg_toast.pg_toast_{$OID} select relname from pg_class where oid = ( select reltoastrelid from pg_class where relname='books') Find name of a corresponding TOAST table if(compress(content).len > 2000)
  • 9.
    HEADER HEADER HEADER HEADER HEADER 4 25 4 DATA 8 DATA 16 DATA 20 DATA 25 DATA 27 DATA 30 DATA TUPLE IDs link TableHeap Index Storage 4 TID 8 TID 16 TID 20 TID 25 TID 27 TID 30 TID 32 TID 32 DATA <block id, tuple offset> select ctid from my_table
  • 10.
  • 11.
    select * from books whereauthor_id = 10 SEQ SCAN
  • 12.
    SEQ SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. (Hopefully) sequential I/O 2. Scans all table’s related pages 3. Doesn’t use index pages
  • 13.
    create index onbooks(publication_date); select publication_date from books where publication_date > ‘2020/01/01’ INDEX ONLY SCAN
  • 14.
    INDEX ONLY SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Sequential I/O over index pages 2. Doesn’t use table’s related pages
  • 15.
    create index onbooks(publication_date); select title, publication_date from books where publication_date > ‘2020/01/01’ INDEX SCAN
  • 16.
    INDEX SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table…
  • 17.
    INDEX SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page…
  • 18.
    INDEX SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Uses index to find a first page of the related table… 2. Position read cursor on the first page… 3. Sequential I/O over all table’s pages until condition is done
  • 19.
    create index onbooks using gist(description_lex); select title, publication_date from books where description_lex @@ ‘epic’ BITMAP SCAN
  • 20.
    BITMAP SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages Bitmap
  • 21.
    BITMAP SCAN M1 T1 I1 T2T3 I2 I3 Index Storage T4 T5 T6 T7 Table Heap 1. Using index create bitmap of matching pages 2. Random I/O over pages covered by bitmap Bitmap
  • 22.
    INCLUDE & PARTIAL INDEXES createindex ix_books_by_author on books(author_id) include (created_at) where author_id is not null; HEADER HEADER 4 25 HEADER HEADER HEADER 4 DATA 7 DATA 13 DATA 16 DATA 19 DATA 25 DATA 32 DATA 47 DATA 61 DATA 4 TID INC 16 TID INC 25 TID INC 32 TID INC duplicated columns Index Storage
  • 23.
    BTREE INDEX create indexix_users_birthdate on users(birthdate desc);
  • 24.
    COMPAR ING VECTOR CLOCKS B-TREE INDEX 1. Default 2.Access time: always O(logN) 3. Supports most of the index features create index ix_users_birthdate on users(birthdate desc);
  • 25.
    HASH INDEX create indexix_orders_no on orders using hash(order_no);
  • 26.
    HEADER HEADER 0x 01 0x 02 HEADER HEADER HEADER Meta page HASH INDEX HEADER Bucket0 Overflow page Bitmap page Bucket 1 4 TID 71 TID 13 TID 73 TID 42 TID 67 TID 86 TID 99 TID 5 TID 38 TID 7 TID 51 TID 14 TID 34 TID 66 TID 70 TID 72 TID 90 TID 79 TID 91 TID 82 TID
  • 27.
    COMPAR ING VECTOR CLOCKS HASH INDEX 1. Accesstime: O(1) – O(N) 2. Doesn’t shrink in size* create index ix_orders_no on orders using hash(order_no);
  • 28.
    BRIN INDEX create indexih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
  • 29.
  • 30.
    select tablename, attname,correlation from pg_stats where tablename = 'film' tablename attname correlation film film_id 0.9979791 film title 0.9979791 film description 0.04854498 film release_year 1 film rating 0.1953281 film last_update 1 film fulltext <null> COLUMN-TUPLE CORRELATION
  • 31.
    COMPAR ING VECTOR CLOCKS BRIN INDEX 1. Imprecise 2.Very small in size 3. Good for columns aligned with tuple insert order and immutable records create index ih_events_created_at on events using brin(created_at) with (pages_per_range = 128);
  • 32.
    BLOOM INDEX create indexix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2);
  • 33.
    BLOOM FILTER INSERT(C) = h1(C)/len| h2(C)/len .. | hn(C)/len
  • 34.
    BLOOM FILTER INSERT(C) = h1(C)/len| h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len
  • 35.
    BLOOM FILTER INSERT(C) = h1(C)/len| h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len FALSE
  • 36.
    BLOOM FILTER INSERT(C) = h1(C)/len| h2(C)/len .. | hn(C)/len INSERT(D) = h1(D)/len | h2(D)/len .. | hn(D)/len CONTAINS(A) = h1(A)/len | h2(A)/len .. | hn(A)/len CONTAINS(B) = h1(B)/len | h2(B)/len .. | hn(B)/len FALSE MAYBE?
  • 37.
    COMPAR ING VECTOR CLOCKS BLOOM INDEX 1. Smallin size 2. Good for exclusion/narrowing 3. False positive ratio: hur.st/bloomfilter/ create extension bloom; create index ix_active_codes on active_codes using bloom(keycode) with (length=80, col1=2); number of bits per record number of hashes for each column
  • 38.
    GiST INDEX create indexix_books_content on books using gist(content_lex);
  • 39.
    GiST INDEX GEO POINTS image:https://postgrespro.com/blog/pgsql/4175817
  • 40.
    GiST INDEX TSVECTOR -- gistcannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gist(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
  • 41.
    GiST INDEX TSVECTOR HEADER 011011 110111 HEADER 010011011010 HEADER 100110 110001 HEADER aab TID aac TID aba TID HEADER adf TID azf TID bac TID HEADER brc TID caa TID cdl TID HEADER cff TID fre TID klm TID
  • 42.
    COMPAR ING VECTOR CLOCKS GiST INDEX 1. Supportsspecialized operators 2. Index is not updated during deletes create index ix_books_content on books using gist(content_lex);
  • 43.
    SP-GiST INDEX create indexix_files_path on files using spgist(path);
  • 44.
    SP-GiST INDEX GEO POINTS image:https://postgrespro.com/blog/pgsql/4220639
  • 45.
    SP-GiST INDEX TSVECTOR -- spgistcan be created on text column but not on nvarchar create index idx_film_title on film using spgist(title); select * from film where title like ‘A Fast-Paced% in New Orleans'; Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416) Filter: (description ~~ 'A Fast-Paced%'::text) -> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0) Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text)) Query Plan
  • 46.
    COMPAR ING VECTOR CLOCKS GiST INDEX 1. Justlike GiST, but faster for some ops… 2. … but unable to perform some other 3. Indexed space is partitioned into non- overlapping regions create index ix_files_path on files using spgist(path);
  • 47.
    GIN INDEX create indexix_books_content on books using gin(content_lex);
  • 48.
    GIN INDEX --gist cannot be applied directly on text columns alter table film add column description_lex tsvector generated always as (to_tsvector('english', description)) stored; create index idx_film_description_lex on film using gin(description_lex); select * from film where description_lex @@ 'epic'; Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416) Recheck Cond: (description_lex @@ '''epic'''::tsquery) -> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0) Index Cond: (description_lex @@ '''epic'''::tsquery) Query Plan
  • 49.
    GIN INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) goneleather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) Posting list Posting tree
  • 50.
    COMPAR ING VECTOR CLOCKS GIN INDEX create indexix_books_content on books using gin(content_lex); 1. Reads usually faster than GiST 2. Writes are usually slower than GiST 3. Index size greater than GiST
  • 51.
    RUM INDEX create indexix_books_content on books using rum(content_lex);
  • 52.
    RUM INDEX HEADER ever gone HEADER HEADER HEADER HEADER omit call (1,1) (3,1) dome (1,2) ever (1,1) (2,1) faucet (4,0) gather (2,1) (3,2) goneleather (2,1) omit (2,1) (3,2) (2,2) (3,1) HEADER HEADER (1,1) (1,6) (1,9) (1,2) (1,3) (1,7) (1,8) (2,1) (2,2) 1, 14 5 2, 5 25 1, 9 45 1, 21 9, 11 2, 10 45 1, 22 13 1 211 2, 4 2, 21 13 25 5 4, 7 (2,3) (2,6) (2,8) (2,5) (2,7) (3,1) 2 3 1, 7 8, 13 17 1
  • 53.
    RUM INDEX -- similarityranking select description_lex <=> to_tsquery('epic’) as similarity from books; -- find description with 2 words located one after another select * from books where description_lex @@ to_tsquery(‘hello <-> world’);
  • 54.
    COMPAR ING VECTOR CLOCKS RUM INDEX 1. GINon steroids (bigger but more capable) 2. Allows to query for terms and their relative positions in text 3. Supports Index Scan and EXCLUDE create extension rum; create index ix_books_content on books using rum(content_lex);
  • 55.
  • 56.
    1. Uses onlylexical similarity and is language-sensitive 2. Misses the context (meaning) 3. Works only on text LIMITS OF FULL-TEXT SEARCH select * from books where description_lex @@ to_tsquery(‘white <-> castle’); select * from books where description_lex @@ to_tsquery(‘white <-> fortress’);
  • 57.
    VECTOR SIMILARITY 101 R G B 250185 0 115 0 152 Embedding
  • 58.
  • 59.
    IVFFLAT VECTOR INDEX create indexix_books_content on books using ivfflat(embedding content_lex) with (lists = 1000); select * from items where embedding <-> ‘[3,1,2]’ < 5;
  • 60.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 61.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 lists create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 62.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 centroids create index on items using ivfflat(embedding vector_l2_ops) with (lists=3);
  • 63.
  • 64.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items where embedding <-> ‘[1,2]’ < 5; [1,2]
  • 65.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4; [1,2]
  • 66.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 67.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 68.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H L1 L2 L3 C1 C2 C3 [1,2] set ivfflat.probes = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 69.
    HNSW VECTOR INDEX createindex on items using hnsw (embedding vector_cosine_ops) with (m = 16, ef_construction = 64);
  • 70.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3);
  • 71.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=2, ef_construction=3); Layer 1
  • 72.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2
  • 73.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H create index on items using hnsw(embedding vector_l2_ops) with (m=3, ef_construction=3); Layer 1 Layer 2 Layer 3
  • 74.
  • 75.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 76.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 77.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 78.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 79.
    INVERTED FILE WITHFLAT COMPRESSION x y A C B D G E L O M N F I K J H [1,2] set hnsw.ef_search = 2; select * from items order by embedding <-> ‘[1,2]’ take 4;
  • 80.
    1. Fast buildtime 2. Smaller size 3. Slower query performance 4. Bad for frequent index updates 1. Slow initial build time 2. Bigger index size 3. Faster performance 4. Better recall after updates IVFFLAT INDEX HNSW INDEX
  • 81.
  • 82.