Skip to main content
-3 votes
1 answer
145 views

Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...
Strange's user avatar
  • 1,514
0 votes
0 answers
76 views

I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...
shiva's user avatar
  • 2,781
0 votes
0 answers
70 views

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...
Vishwas Singh's user avatar
1 vote
0 answers
67 views

I am connecting to an EMR cluster through SageMaker Unified Studio(JupyterLab). My EMR cluster is configured with Delta Lake support, and I have the following Spark properties set on the cluster: ...
sakshi's user avatar
  • 41
0 votes
0 answers
61 views

I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...
user3858193's user avatar
  • 1,558
2 votes
0 answers
166 views

I'm trying to connect to an existing EMR cluster from SageMaker Unified Studio to run SQL queries via JupyterLab. SageMaker requires that the EMR cluster be runtime role-enabled to integrate with ...
valzor's user avatar
  • 315
0 votes
1 answer
59 views

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
Roobal Jindal's user avatar
1 vote
0 answers
56 views

I have successfully implemented the IBM S3 Shuffle Plugin v0.9.6 (https://github.com/IBM/spark-s3-shuffle) on EMR on EKS (Spark 3.5.0) and the shuffle operations are working correctly with S3 storage. ...
metersk's user avatar
  • 12.7k
0 votes
1 answer
141 views

I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. ...
user3858193's user avatar
  • 1,558
0 votes
0 answers
40 views

I want to install external Python packages on EMR with an EC2 setup, but currently, apart from bootstrap actions, nothing else seems to be working. The problem with this setup is that if I want to ...
RushHour's user avatar
  • 645
3 votes
1 answer
104 views

Having trouble getting dynamic allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. Trying this strategy out to battle cost ...
metersk's user avatar
  • 12.7k
0 votes
0 answers
41 views

I am exploring data write into glue Table (Iceberg Table format). I have been using saveAsTable method mentioned as option1 . However is there any difference between two methods. Iceberg stores ...
user3858193's user avatar
  • 1,558
0 votes
1 answer
104 views

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...
Sayed's user avatar
  • 11
0 votes
0 answers
33 views

I have an EMR cluster configured with the following SecurityConfiguration: "AuthenticationConfiguration": { "IdentityCenterConfiguration": { "EnableIdentityCenter":...
ExK's user avatar
  • 1
0 votes
0 answers
59 views

Gives the below JSON: { "environments": [ {"env": "dev", "description": "dev environment"}, {"env": "dev01", "...
dplvs's user avatar
  • 43

15 30 50 per page
1
2 3 4 5
333