How is the information_schema table in trino updated?

Question

We use Trino (https://trino.io/) to connect to HDFS. I discovered that the data in the information_schema tables, for example:

select * from information_schema.columns clz where clz.table_catalog = ‘hive’ and clz.table_schema = ‘<schema_name>’ and clz.table_name = ‘<table_name>’

doesn’t always match up with what I get if I run

show tables from [schema] show columns in [schema].[table]

etc. It seems that the show tables/show columns commands pretty much always match up with what I see if I run the hadoop command (hadoop fs -ls ...) to show the contents of the hdfs folder.

So I’m trying to figure out:

why the information_schema doesn’t give the same results as show tables/show columns/etc.
if there is a way to refresh/update information_schema to make it current

Thank you.

Manfred Moser · Accepted Answer · 2022-09-27 15:51:46Z

The information_schema table in Trino just exposes the underlying schema data from each data source. It therefore varies depending on the used data source and connector:

For connectors for an RDBMS such as PostgreSQL it basically just exposes the information schema from PostgresSQL after applying type mapping and such.
For systems like Hive and Iceberg connectors it exposes the information from the Hive metastore service and the table format.
For other systems like Elasticsearch or so it is completely different again, but basically always gets the information from the underlying system.

For your specific case it might not match up if some external system also messes around in the object storage and the HMS. Specifically there is also a metadata cache in play with HMS which could be stale.

Stack Exchange Network

How is the information_schema table in trino updated?

1 Answer 1

Hot Network Questions

How is the information_schema table in trino updated?

1 Answer 1

Related

Hot Network Questions