Given a parquet file how can I create the table associated with it into my redshift database? Oh the format of the parquet file is snappy.
3
- Amazon Redshift Can Now COPY from Parquet and ORC File FormatsJohn Rotenstein– John Rotenstein2021-04-15 01:12:56 +00:00Commented Apr 15, 2021 at 1:12
- Is your real problem that you don't know what columns are stored in the file?Parsifal– Parsifal2021-04-15 12:30:25 +00:00Commented Apr 15, 2021 at 12:30
- 1@Parsifal yes, I don't want to guess on the column types but I can't COPY the data unless I create the table first.BugCatcherJoe– BugCatcherJoe2021-04-15 14:52:25 +00:00Commented Apr 15, 2021 at 14:52
Add a comment |
1 Answer
If you're dealing with multiple files, especially over a long term, then I think the best solution is to upload them to an S3 bucket and run a Glue crawler.
In addition to populating the Glue data catalog, you can also use this information to configure external tables for Redshift Spectrum, and create your on-cluster tables using create table as select.
If this is just a one-off task, then I've used parquet-tools in the past. The version that I've used is a Java library, but I see that there's also a version on PyPi.