I have imported a copy of the ip2location_db11 lite database, which contains 3,319,097 rows, and I am looking to optimize a numeric range query, where the low and high values are in separate columns of the table (ip_from, ip_to).
Importing the database:
CREATE TABLE ip2location_db11 ( ip_from bigint NOT NULL, -- First IP address in netblock. ip_to bigint NOT NULL, -- Last IP address in netblock. country_code character(2) NOT NULL, -- Two-character country code based on ISO 3166. country_name character varying(64) NOT NULL, -- Country name based on ISO 3166. region_name character varying(128) NOT NULL, -- Region or state name. city_name character varying(128) NOT NULL, -- City name. latitude real NOT NULL, -- City latitude. Default to capital city latitude if city is unknown. longitude real NOT NULL, -- City longitude. Default to capital city longitude if city is unknown. zip_code character varying(30) NOT NULL, -- ZIP/Postal code. time_zone character varying(8) NOT NULL, -- UTC time zone (with DST supported). CONSTRAINT ip2location_db11_pkey PRIMARY KEY (ip_from, ip_to) ); \copy ip2location_db11 FROM 'IP2LOCATION-LITE-DB11.CSV' WITH CSV QUOTE AS '"'; My first naive indexing attempt was to create separate indices on each of those columns, which resulted in a sequential scan with query times of 400ms:
account=> CREATE INDEX ip_from_db11_idx ON ip2location_db11 (ip_from); account=> CREATE INDEX ip_to_db11_idx ON ip2location_db11 (ip_to); account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.ip2location_db11 (cost=0.00..48930.99 rows=43111 width=842) (actual time=286.714..401.805 rows=1 loops=1) Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to)) Rows Removed by Filter: 3319096 Planning time: 0.155 ms Execution time: 401.834 ms (6 rows) account=> \d ip2location_db11 Table "public.ip2location_db11" Column | Type | Modifiers --------------+------------------------+----------- ip_from | bigint | not null ip_to | bigint | not null country_code | character(2) | not null country_name | character varying(64) | not null region_name | character varying(128) | not null city_name | character varying(128) | not null latitude | real | not null longitude | real | not null zip_code | character varying(30) | not null time_zone | character varying(8) | not null Indexes: "ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to) "ip_from_db11_idx" btree (ip_from) "ip_to_db11_idx" btree (ip_to) My second attempt was to create a multi-column btree index, which resulted in an index scan with query times of 290ms:
account=> CREATE INDEX ip_range_db11_idx ON ip2location_db11 (ip_from,ip_to); account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using ip_to_db11_idx on public.ip2location_db11 (cost=0.43..51334.91 rows=756866 width=69) (actual time=1.109..289.143 rows=1 loops=1) Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone Index Cond: ('2538629520'::bigint <= ip2location_db11.ip_to) Filter: ('2538629520'::bigint >= ip2location_db11.ip_from) Rows Removed by Filter: 1160706 Planning time: 0.324 ms Execution time: 289.172 ms (7 rows) n4l_account=> \d ip2location_db11 Table "public.ip2location_db11" Column | Type | Modifiers --------------+------------------------+----------- ip_from | bigint | not null ip_to | bigint | not null country_code | character(2) | not null country_name | character varying(64) | not null region_name | character varying(128) | not null city_name | character varying(128) | not null latitude | real | not null longitude | real | not null zip_code | character varying(30) | not null time_zone | character varying(8) | not null Indexes: "ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to) "ip_from_db11_idx" btree (ip_from) "ip_range_db11_idx" btree (ip_from, ip_to) "ip_to_db11_idx" btree (ip_to) Update: As requested in the comments, I have re-done the above query. The timing of the first 15 queries after re-creating the table (165ms, 65ms, 86ms, 83ms, 86ms, 64ms, 85ms, 811ms, 868ms, 845ms, 810ms, 781ms, 797ms, 890ms, 806ms):
account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.ip2location_db11 (cost=28200.29..76843.12 rows=368789 width=842) (actual time=64.866..64.866 rows=1 loops=1) Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone Recheck Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to)) Heap Blocks: exact=1 Buffers: shared hit=8273 -> Bitmap Index Scan on ip_range_db11_idx (cost=0.00..28108.09 rows=368789 width=0) (actual time=64.859..64.859 rows=1 loops=1) Index Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to)) Buffers: shared hit=8272 Planning time: 0.099 ms Execution time: 64.907 ms (10 rows) account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.ip2location_db11 (cost=0.00..92906.18 rows=754776 width=69) (actual time=577.234..811.757 rows=1 loops=1) Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to)) Rows Removed by Filter: 3319096 Buffers: shared hit=33 read=43078 Planning time: 0.667 ms Execution time: 811.783 ms (7 rows) Sample rows from the imported CSV file:
"0","16777215","-","-","-","-","0.000000","0.000000","-","-" "16777216","16777471","AU","Australia","Queensland","Brisbane","-27.467940","153.028090","4000","+10:00" "16777472","16778239","CN","China","Fujian","Fuzhou","26.061390","119.306110","350004","+08:00" Is there a better way to index this table that would improve the query, or is there a more efficient query that would get me the same result?