Reading Data from AWS S3

Question

I have some data with very particular format (e.g., tdms files generated by NI systems) and I stored them in a S3 bucket. Typically, for reading this data in python if the data was stored in my local computer, I would use npTDMS package. But, how should is read this tdms files when they are stored in a S3 bucket? One solution is to download the data for instance to the EC2 instance and then use npTDMS package for reading the data into python. But it does not seem to be a perfect solution. Is there any way that I can read the data similar to reading CSV files from S3?

Thanks for the comments. The TDMS files are large, usually more than 1 GB so I cant share them here. Also, as suggested by "Guy", I can use Boto3 for reading the file. But I cant change the working directory to the S3 bucket location — Mohammad Sadoughi
– Mohammad Sadoughi, Commented Jan 2, 2020 at 20:53

Guy · Accepted Answer · 2019-12-24 15:51:30Z

Some Python packages (such as Pandas) support reading data directly from S3, as it is the most popular location for data. See this question for example on the way to do that with Pandas.

If the package (npTDMS) doesn't support reading directly from S3, you should copy the data to the local disk of the notebook instance.

The simplest way to copy is to run the AWS CLI in a cell in your notebook

!aws s3 cp s3://bucket_name/path_to_your_data/ data/

This command will copy all the files under the "folder" in S3 to the local folder data

You can use more fine-grained copy using the filtering of the files and other specific requirements using the boto3 rich capabilities. For example:

s3 = boto3.resource('s3') bucket = s3.Bucket('my-bucket') objs = bucket.objects.filter(Prefix='myprefix') for obj in objs: obj.download_file(obj.key)

Thanks for the answer and clarification. I think what you said is the only and best possible solution for the problem.

Khakhar Shyam · Accepted Answer · 2019-12-24 13:02:48Z

import boto3 s3 = boto3.resource('s3') bucketname = "your-bucket-name" filename = "the file you want to read" obj = s3.Object(bucketname, filename) body = obj.get()['Body'].read()

xxyjoel · Accepted Answer · 2021-04-29 22:12:36Z

0

boto3 is the default option, however, as an alternative awswrangler provides some nice wrappers.

edited Apr 29, 2021 at 22:12

answered Apr 29, 2021 at 21:56

xxyjoel

6118 silver badges8 bronze badges

Collectives™ on Stack Overflow

Reading Data from AWS S3

3 Answers 3

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Linked

Related