actually a lot of folks asked the similar question, right now it's not possible to have an extension for the files. (but parquet files can have)
The reason behind this is, RedShift by default export it in parallel which is a good thing. Each slice will export its data. Also from the docs,
PARALLEL
By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files.
So it has to create new files after 6GB that's why they are adding numbers as a suffix.
How do we solve this?
No native options from RedShift, but we can do some workaround with lambda.
- Create a new S3 bucket and a folder inside it specifically for this process.(eg:
s3://unloadbucket/redshift-files/) - Your unload files should go to this folder.
- Lambda function should be triggered based on S3 put object event.
- Then the lambda function,
- Download the file(if it is large use EFS)
- Rename it with
.csv - Upload to the same bucket(or different bucket) into a different path (eg:
s3://unloadbucket/csvfiles/)
Or even more simple if you use shell/powershell script to do the following process
- Download the file
- Rename it with
.csv