1

I am using the following code snippet for listing objects in a bucket.

objectListing = client.listObjects(bucketname); do{ for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) { System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize()); } objectListing=s3Client.listNextBatchOfObjects(objectListing); }while (objectListing.isTruncated()); 

I am not able to get the last batch of objects.I did some research regarding this and the batches are being saved in list.But I am not able to use list to save all the objects as there are million of objects and this will cause the heap memory problem sometimes.How can i get all the objects.Thanks!!!

New:

I am running this:

 BasicAWSCredentials credentials = new BasicAWSCredentials("foo", "bar"); client = AmazonS3ClientBuilder .standard() .withCredentials(new AWSStaticCredentialsProvider(credentials)) .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("http://localhost:" + port, null)) .withPathStyleAccessEnabled(true) .withChunkedEncodingDisabled(true) .build(); ObjectListing listing = client.listObjects( "bucketname"); System.out.println("Listing size "+listing.getObjectSummaries().size()); System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey()); System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey()); while (listing.isTruncated()) { System.out.println("-----------------------------------------------"); listing = client.listNextBatchOfObjects(listing); System.out.println("Listing size "+listing.getObjectSummaries().size()); System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey()); System.out.println("At 999 index "+ listing.getObjectSummaries().get(1000).getKey()); } 

I am getting following result:

Listing size 1000 At 0 index folder1/a.gz At 999 index folder1/b.gz --------------------------------------------------------------- Listing size 1001 At 0 index folder1/b.gz At 1000 index folder1/d.gz --------------------------------------------------------------- Listing size 1001 At 0 index folder1/d.gz At 1000 index folder1/e.gz 

1 Answer 1

2

Simple and Straightforward

ObjectListing listing = s3.listObjects( bucketName, prefix ); List<S3ObjectSummary> summaries = listing.getObjectSummaries(); while (listing.isTruncated()) { listing = s3.listNextBatchOfObjects (listing); summaries.addAll (listing.getObjectSummaries()); } 

Or

ObjectListing listing = s3.listObjects( bucketName, prefix ); doSomeProcessing(listing); while (listing.isTruncated()) { listing = s3.listNextBatchOfObjects (listing); doSomeProcessing(listing); } 

Update:
On the below comment of repeating elements, I ran below code

Yeah,I am getting objects but the 1000 and 1001 objects are repeating and so 2001 and 2002 objects are repeating and so on.How can i avoid this by second method @raevilman. Thank You

public static void main(String[] args) { int i=0; System.out.println("start"); ObjectListing listing = s3Client.listObjects( "emr-logs"); System.out.println("Listing size "+listing.getObjectSummaries().size()); System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey()); System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey()); while (listing.isTruncated()) { if(i>3)break; System.out.println("========================================================================"); listing = s3Client.listNextBatchOfObjects(listing); System.out.println("Listing size "+listing.getObjectSummaries().size()); System.out.println("At 0 index "+ listing.getObjectSummaries().get(0).getKey()); System.out.println("At 999 index "+ listing.getObjectSummaries().get(999).getKey()); i++; } System.out.println("end"); } 

I got the below results, without repeating elements

start Listing size 1000 At 0 index j-10HD9DMBVVTJL/containers/application_1507189355052_0001/container_1507189355052_0001_01_000001/stderr.gz At 999 index j-156WGS0LMKA2I/node/i-00085367e194fc02a/daemons/instance-state/instance-state.log-2017-11-16-05-15.gz ======================================================================== Listing size 1000 At 0 index j-156WGS0LMKA2I/node/i-00085367e194fc02a/daemons/instance-state/instance-state.log-2017-11-16-05-30.gz At 999 index j-182UIXOOU8GZ6/node/i-061ffd1d1ae11da74/provision-node/0d1707a0-71dd-4dd5-a1dc-ab226ee2d150/stdout.gz ======================================================================== Listing size 1000 At 0 index j-182UIXOOU8GZ6/node/i-061ffd1d1ae11da74/provision-node/apps-phase/stderr.gz At 999 index j-1BW9J554DDY15/containers/application_1521803257216_0002/container_1521803257216_0002_01_000002/stderr.gz ======================================================================== Listing size 1000 At 0 index j-1BW9J554DDY15/containers/application_1521803257216_0002/container_1521803257216_0002_01_000002/stdout.gz At 999 index j-1EKRPTSEXCTB5/node/i-0576a3c452d00384b/applications/hadoop/steps/s-2B5LZ2PC741FD/controller.gz ======================================================================== Listing size 1000 At 0 index j-1EKRPTSEXCTB5/node/i-0576a3c452d00384b/applications/hadoop/steps/s-2B5LZ2PC741FD/stderr.gz At 999 index j-1G6AYY5EMTR94/node/i-02363f6ac11c89135/daemons/instance-state/instance-state.log-2017-10-29-14-15.gz end Process finished with exit code 0 
Sign up to request clarification or add additional context in comments.

14 Comments

List is being used to save object summaries every time u run and all is being saved in summaries list.I cant use list to save millions of million @raevilman
Ok. In that case, you can process summaries inside while loop instead of adding it to a list....
Yeah,I am getting objects but the 1000 and 1001 objects are repeating and so 2001 and 2002 objects are repeating and so on.How can i avoid this by second method @raevilman. Thank You
I ran the code. But didn't get repeating elements. Updated the answer. Please have a look and suggest if needed any changes.
I used your sample code with my s3 credentials even though i am getting repeating elements.I am not knowing why.Is there will be any difference in AmazonS3 s3Client connection or something @raevilman Thank You.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.