Cannot load local file into PySpark Dataframe

Question

I am a MacOS user and I just downloaded Apache Spark. I then put it in /usr/local/spark. Here is what inside my .bash_profile:

export SPARK_HOME="/usr/local/spark" export PYSPARK_PYTHON=python3 export PATH=$PATH:$SPARK_HOME/bin #export PYSPARK_DRIVER_PYTHON="jupyter" #export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

The problem is, when type pyspark to enter the pyspark shell, then type these two lines:

spark = SparkSession.builder.appName("preprocessing").config("spark-master", "local").getOrCreate() df = spark.read.format("csv").option("header","true").option("inferSchema", "true").option("delimiter",",").load("src/census-income.data")

An error occurs:

2018-10-02 19:55:24 ERROR PoolWatchThread:118 - Error in trying to obtain a connection. Retrying in 7000ms java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source) at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324) at com.jolbox.bonecp.ConnectionHandle.<init>(ConnectionHandle.java:262) at com.jolbox.bonecp.PoolWatchThread.fillConnections(PoolWatchThread.java:115) at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:82) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ERROR 25505: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.sql.conn.GenericAuthorizer.setReadOnlyConnection(Unknown Source) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.setReadOnly(Unknown Source) ... 8 more

Spark version: 2.3.2
Python version: 3.7.0

pvy4917 · Accepted Answer · 2018-10-02 14:55:25Z

Can you try deleting the file metastore_db/dbex.lck from the current directory (SPARK_HOME)?

Source: https://github.com/bpn1/ingestion/wiki/Troubleshooting

iurii_n · Accepted Answer · 2025-09-29 20:20:45Z

Spark is trying to load from HDFS. Apparently you don't have hadoop installed and spark is failing to connect to HDFS. If you want to load from local file system, you have to specify it explicitly:

file:///src/census-income.data

thank you, I have been like two weeks and this resolved my problem!

Collectives™ on Stack Overflow

Cannot load local file into PySpark Dataframe

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related