Skip to content

S3 write failure with alluxio 2.9.4 release #18643

@pragnesh

Description

@pragnesh

Alluxio Version:
2.9.4

Describe the bug
After upgrading alluxio from 2.9.3 to 2.9.4, we are seeing following exception when spark job write output to dir which is mounted on s3 as UFS, multiple job running which is writing to same location.

2024-06-28 14:15:33,548 ERROR [task-execution-service-2](S3AOutputStream.java:156) - Failed to upload s3path/y=2024/mo=06/d=28/h=13/part-00008-df7dc7d1-cde0-4ba9-98b4-e6563e37ee59.c000.snappy.parquet java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@22e014ef rejected from java.util.concurrent.ThreadPoolExecutor@722238d8[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.create(UploadMonitor.java:95)	at com.amazonaws.services.s3.transfer.TransferManager.doUpload(TransferManager.java:685)	at com.amazonaws.services.s3.transfer.TransferManager.upload(TransferManager.java:534)	at alluxio.underfs.s3a.S3AOutputStream.close(S3AOutputStream.java:154)	at com.google.common.io.Closer.close(Closer.java:218)	at alluxio.job.plan.persist.PersistDefinition.runTask(PersistDefinition.java:183)	at alluxio.job.plan.persist.PersistDefinition.runTask(PersistDefinition.java:57)	at alluxio.worker.job.task.TaskExecutor.run(TaskExecutor.java:88)	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)	at java.util.concurrent.FutureTask.run(FutureTask.java:266)	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)	at java.lang.Thread.run(Thread.java:750) 2024-06-28 14:15:33,553 INFO [task-execution-service-2](TaskExecutorManager.java:204) - Task 0 for job 1719583916388 failed: 

To Reproduce
spark job running with 2.9.4 using s3 as UFS at

Expected behavior
alluxio should be able to upload file to s3, instead it failed and throw exception.

Urgency
Unable to upgrade to 2.9.4

Are you planning to fix it
Not working on it, But i am willing to help with PR if someone point me to some direction.

Additional context
Add any other context about the problem here.
Other jobs running which is also writing to same location. So s3 metadata update could be issue.
I am also seeing alluxio.proxy.s3.bucketpathcache.timeout property default vaule changed from 1 min to 0 min in 2.9.4 release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugThis issue is about a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions