35

I'm trying to optimize my PostgreSQL 8.3 DB tables to the best of my ability, and I'm unsure if I need to use varchar_pattern_ops for certain columns where I'm performing a LIKE against the first N characters of a string. According to this documentation, the use of xxx_pattern_ops is only necessary "...when the server does not use the standard 'C' locale".

Can someone explain what this means? How do I check what locale my database is using?

6 Answers 6

34

Currently some locale [docs] support can only be set at initdb time, but I think the one relevant to _pattern_ops can be modified via SET at runtime, LC_COLLATE. To see the set values you can use the SHOW command.

For example:

SHOW LC_COLLATE 

_pattern_ops indexes are useful in columns that use pattern matching constructs, like LIKE or regexps. You still have to make a regular index (without _pattern_ops) to do equality search on an index. So you have to take all this into consideration to see if you need such indexes on your tables.

About what locale is, it's a set of rules about character ordering, formatting and similar things that vary from language/country to another language/country. For instance, the locale fr_CA (French in Canada) might have some different sorting rules (or way of displaying numbers and so on) than en_CA (English in Canada.). The standard "C" locale is the POSIX standards-compliant default locale. Only strict ASCII characters are valid, and the rules of ordering and formatting are mostly those of en_US (US English)

In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier.

Sign up to request clarification or add additional context in comments.

3 Comments

So if I'm understanding the SHOW documentation correctly, then my server's LC_COLLATE value of "en_US.UTF-8" means that it's not using the "C" locale, in which case I need to make sure to use xxx_pattern_ops. Is that right?
You need to create such indexes only if the criteria apply (pattern matching over the columns). See my edits.
24

From the command-line: psql -l

Or from the psql interface: \l

Example output:

 List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -------------+--------+----------+-------------+-------------+------------------- packrd | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/packrd + | | | | | packrd=CTc/packrd template1 | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/packrd + | | | | | packrd=CTc/packrd (5 rows) 

Comments

8

OK, from my perusings, it appears that this initial setting

initdb --locale=xxx

 --locale=locale Specifies the locale to be used in this database. This is equivalent to specifying both --lc-collate and --lc-ctype. 

basically specifies the "default" locale for all database that you create after that (i.e. it specifies the settings for template1, which is the default template). You can create new databases with a different locale like this:

Locale is different than encoding, you can manually specify it and/or encoding:

 CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; 

If you want to manually call it out.

Basically if you don't specify it, it uses the system default, which is almost never "C".

So if your show LC_COLLATE returns anything other than "C" or "POSIX" then you are not using the standard C locale and you will need to specify the xxx_pattern_ops for your indexes. Note also the caveat that if you want to use the <, <=, >, or >= operators you need to create a second index without the xxx_pattern_ops flag (unless you are using the standard C locale on your database, which is rare...). For just == and LIKE (etc.) then you don't need a second index. If you don't need LIKE then you don't need the index with xxx_pattern_ops, possibly, as well.

Even if your indexes are defined to collate with the "default" like

CREATE INDEX my_index_name ON table_name USING btree (identifier COLLATE pg_catalog."default"); 

This is not enough, unless the default is the "C" (or POSIX, same thing) collation, it can't be used for patterns like LIKE 'ABC%'. You need something like this:

CREATE INDEX my_index_name ON table_name USING btree (identifier COLLATE pg_catalog."default" varchar_pattern_ops); 

Comments

4

There is also another way (assuming you want to check them, not modify them):

Check file /var/lib/postgres/data/postgresql.conf (Linux default) Following lines should be found:

# These settings are initialized by initdb, but they can be changed. lc_messages = 'en_US.UTF-8' # locale for system error message strings lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting 

1 Comment

My DB was created using the right collate, but always fallback to default. Changing this, solved my problem.
2

If you've got the option...

You could recreate the database cluster with the C locale.

You need to pass the locale to initdb when initializing your Postgres instance.

You can do this regardless of what the server's default or user's locale is.

That's a server administration command though, not a database schema designers task. The cluster contains all the databases on the server, not just the one you're optimising.

It creates a brand new cluster, and does not migrate any of your existing databases or data. That'd be additional work.

Furthermore, if you're in a position where you can consider creating a new cluster as an option, you really should be considering using PostgreSQL 8.4 instead, which can have per-database locales, specified in the CREATE DATABASE statement.

Comments

2

Another per database way, not sure how it compares to the others:

select datname, datcollate from pg_database; 

or as a cli one liner:

psql -c 'select datname, datcollate from pg_database' 

Sample output:

 datname | datcollate -------------------+------------- postgres | en_GB.UTF-8 ciro | en_GB.UTF-8 template1 | en_GB.UTF-8 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.