Skip to main content
0 votes
0 answers
13 views

I have MongoDB collections forms and submissions where forms define dynamic UI components (textfield, checkbox, radio, selectboxes, columns, tables, datagrids) and submissions contain the user data in ...
Aniruth N's user avatar
  • 106
2 votes
0 answers
27 views

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...
hage's user avatar
  • 6,213
0 votes
1 answer
49 views

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...
Nakeuh's user avatar
  • 1,933
0 votes
0 answers
25 views

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...
Sruthi Gopalakrishnan's user avatar
-1 votes
2 answers
45 views

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...
Ziggy's user avatar
  • 43
1 vote
1 answer
85 views

I am very new in Spark (specifically, have just started with learning), and I have encountered a recursion error in a very simple code. Background: Spark Version 3.5.7 Java Version 11.0.29 (Eclipse ...
GINzzZ100's user avatar
2 votes
1 answer
104 views

I’m trying to create a Delta Lake table in MinIO using Spark 4.0.0 inside a Docker container. I’ve added the required JARs: delta-spark_2.13-4.0.0.jar delta-storage-4.0.0.jar hadoop-aws-3.3.6.jar aws-...
Tutu ツ's user avatar
  • 155
3 votes
1 answer
92 views

I’m experiencing data loss when writing a large DataFrame to Redis using the Spark-Redis connector. Details: I have a DataFrame with millions of rows. Writing to Redis works correctly for small ...
gianfranco de siena's user avatar
0 votes
1 answer
39 views

i am new to pyspark. i have installed java 17 and made sure it works C:\Windows\System32>java -version java version "17.0.12" 2024-07-16 LTS installed python 3.9 and made sure it works C:\...
Blogger Anonymous's user avatar
0 votes
0 answers
44 views

I'm using a PySpark notebook inside of Azure Synapse. This is my schema definition qcew_schema = StructType([ StructField( 'area_fips', dataType = CharType(5), ...
Vijay Tripathi's user avatar
1 vote
1 answer
66 views

I'm reading data from a PostgreSQL 8.4 database into PySpark using the JDBC connector. The database's server_encoding is SQL_ASCII. When I query the table directly in pgAdmin, names like SÉRGIO or ...
Thiago Luan's user avatar
2 votes
0 answers
56 views

I am running a data ingestion ETL pipeline orchestrated by Airflow using PySpark to read data from MongoDB (using the MongoDB Spark Connector) and load it into a Delta Lake table. The pipeline is ...
Tavakoli's user avatar
  • 1,433
0 votes
0 answers
26 views

I want to query a different subscription SQL Pool using SPark can I just use the same syntax or is additional configuration neccesary and if so how to? df = spark.read.option(Constants.SERVER, "&...
javadev's user avatar
  • 287
0 votes
0 answers
19 views

I have an application using EKS in AWS that runs a spark session that can run multiple workloads. In each workload, I need to access data from S3 in another AWS account, for which I have STS ...
md12345's user avatar
0 votes
1 answer
102 views

I have a user case like this. I have a list of many queries. I am running multi-threading with pyspark with each thread submitting some sql. There are some queries that report success but the final ...
user31827888's user avatar

15 30 50 per page
1
2 3 4 5
2735