1,001 questions
0 votes
1 answer
59 views
PySpark/Spark connection to Geomesa Cassandra DB
Im trying to make a PySpark connection to Cassandra DB indexed with Geomesa. Searching about it, I noticed that it uses the Geotools spark runtime since there is no optimized runtime for Cassandra. I'...
0 votes
1 answer
151 views
Executors for PySpark app always finish with "state KILLED exitStatus 143"
I got the problem while running spark-submit --master spark://localhost:7077 \ --packages com.datastax.spark:spark-cassandra-connector_2.12:3.5.1, \ org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 \ -...
0 votes
2 answers
103 views
unable to connect to cassandra from apache spark:com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Lost connection to remote peer
[cassandra running from docker windows] and I am running spark from wsl2 spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.12:3.5.1 [its spark-shell after the command above] and ...
0 votes
1 answer
55 views
Can't connect/write stream from spark container to table in cassandra container
I am composing these services in separate docker containers all on the same confluent network: broker: image: confluentinc/cp-server:7.4.0 hostname: broker container_name: broker ...
1 vote
1 answer
154 views
ERROR SparkContext: Failed to add file java.io.FileNotFoundException: Jar to Spark not Found
Please help me fix the above errors based on the code I used The proccesing_data.py code is used to process data using spark-streaming import logging from pyspark.sql import SparkSession from ...
0 votes
1 answer
64 views
PySpark connection to Cassandra returns "Py4JJavaError: An error occurred while calling o54.start"
I'm try to make connection from Pyspark to Cassandra in virtual environment and the services is installed by docker. I've been using the --packages method to solve the dependencies but it seems doesn'...
2 votes
0 answers
37 views
Pyspark connector slow during joins
I'm building an application that allows me to use pyspark to combine an oracle and a cassandra table. The cassandra table's count is in a scale of 100s of millions with the oracle one at a few 1000. ...
0 votes
0 answers
98 views
Spark is not inserting data into Cassandra when using ```writeStream```
I'm trying to create a pipeline in streaming that is calling an API using Airflow, then processing it with Kafka and inserting the data into Cassandra using Spark. I'm struggling when inserting data ...
1 vote
1 answer
64 views
Ignoring codec because it collides with previously generated codec
i am trying to register custom code(for map) like below val session: CqlSession = CassandraConnector.apply(spark.sparkContext).openSession() val codecRegistry: MutableCodecRegistry = session....
2 votes
0 answers
96 views
register a custom codec in cassandra connector
I am using spark-cassandra-connector_2.11 and version 2.5.2 in my scala application and want to register a custom map codec, but facing issues. Is there any way to register it. I did the same thing on ...
0 votes
1 answer
92 views
How To Use Spark Submit Operator With Cassandra Remote Server In Apache Airflow
I'm working airflow into the Docker container on WindowsPC. I have some problems with apache airflow spark submit operator. I want to write data to a remote Cassandra server. When I was using df.write....
0 votes
1 answer
317 views
PySpark app returns "NoClassDefFoundError: com/datastax/spark/connector/util/Logging"
I had this error: py4j.protocol.Py4JJavaError: An error occurred while calling o59.start. : java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging at java.base/java.lang....
0 votes
1 answer
74 views
Pyspark cassandra connector generates tombstones during writing
I understand that when inserting data, tombstones might be created because of existing null values in the columns of the dataframe. To mitigate this issue and minimize tombstones, insertion queries ...
0 votes
3 answers
953 views
Simple Python app failed to load, getting "ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra"
Context : Sorry to bother you with I am struggling to install a Cassandra-Spark connector. My goal is to install it to use Spark-SQL since Cassandra has strong limitations to do requests. I have : ...
0 votes
1 answer
315 views
Unable to connect to spark
Running the Python code does not connect to Spark and does not create a database in Cassandra either. I have confirmed the services are up on docker and accessible from the PC. I placed the .jar files ...