unable to install pyspark

Question

I am trying to install pyspark as this:

python setup.py install

I get this error:

Could not import pypandoc - required to package PySpark

pypandoc is installed already

Any ideas how can I install pyspark?

viraj ghorpade · Accepted Answer · 2019-05-23 02:01:18Z

43

I faced the same issue and solved it as below install pypandoc before installing pyspark

pip install pypandoc pip install pyspark

answered May 23, 2019 at 2:01

viraj ghorpade

5436 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Adam Over a year ago

This worked for me: Python 3.7.3, pip 20.1.1, Windows 10.0.17134 Build 17134, and running commands in Windows Powershell.

Vzzarr Over a year ago

In my case I have Python3.8 installed (Ubuntu 20.04) and running pip3 install pypandoc (after running sudo apt install python3-pip) solved the problem

RoiMinuit Over a year ago

this is still working as of today. Spent 4 hours trying everything under the moon last night, and nothing worked until this. Thanks a lot!

geisterfurz007 · Accepted Answer · 2019-11-13 09:07:39Z

5

Try installing pypandoc with python3 with pip3 install pypandoc.

edited Nov 13, 2019 at 9:07

geisterfurz007

5,9346 gold badges37 silver badges59 bronze badges

answered Nov 13, 2019 at 9:00

kubote

962 silver badges7 bronze badges

Comments

Sociopath · Accepted Answer · 2018-08-29 11:23:07Z

If you are using window follow the following steps:
1) install Jdk in the computer from link: https://www.oracle.com/technetwork/java/javase/downloads/index.html

2) set the environment variable $JAVA_HOME= /path/where/you/installed/jdk than add path in the PATH=%JAVA_HOME/bin

3)download the spark from the link:- https://spark.apache.org/downloads.html this file in the Zip format extract the file and file name is like spark-2.3.1-bin-hadoop2.7.tgz , move this folder to the C Directory. and set the environment variable

SPARK_HOME=/path/of the /spark

4)download the scala ide from the link :- http://scala-ide.org/ extract the file and copy the Eclipse folder to the C: directory

5) now open cmd and write spark-shell it will open the scala shell for you.

pvy4917 · Accepted Answer · 2018-12-05 22:15:25Z

2018 version-

Install PYSPARK on Windows 10 JUPYTER-NOTEBOOK with ANACONDA NAVIGATOR.

STEP 1

Download Packages

1) spark-2.2.0-bin-hadoop2.7.tgz Download

2) Java JDK 8 version Download

3) Anaconda v 5.2 Download

4) scala-2.12.6.msi Download

5) hadoop v2.7.1 Download

STEP 2

Create SPARK folder in C:/ drive and extract Hadoop, spark and install Scala using scala-2.12.6.msi in the same directory. The directory structure should be It will look like this

Note: During installation of SCALA, specify C:/Spark folder

STEP 3

Now set the windows environment variables:

HADOOP_HOME=C:\spark\hadoop
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151
SCALA_HOME=C:\spark\scala\bin
SPARK_HOME=C:\spark\spark\bin
PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe
PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS=notebook
NOW SELECT PATH OF SPARK :

Click on Edit and add New

Add "C:\spark\spark\bin” to variable “Path” Windows

STEP 4

Make folder where you want to store Jupyter-Notebook outputs and files
After that open Anaconda command prompt and cd Folder name
then enter Pyspark

thats it your browser will pop up with Juypter localhost

STEP 5

Check if PySpark is working or not !

Type simple code and run it

from pyspark.sql import Row a = Row(name = 'Vinay' , age=22 , height=165) print("a: ",a)

Ashish C. · Accepted Answer · 2018-08-29 06:46:04Z

Steps to install PySpark API for jupyter notebook:

Go to this site https://spark.apache.org/downloads.html to download latest spark. The file will be downloaded in .tgz format. Extract this tgz file in a directory where you want to install PySpark.
After extracting the tgz file , you will need to download hadoop because Apache spark requires Hadoop, so download hadoop from https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe, A file will be downloaded - 'winutils.exe'. Copy this exe file in the 'bin/' directory of your spark (spark-2.2.0-bin-hadoop2.7/bin)
If you have anaconda installed, there will be .condarc file in C:\Users\, open that, change ssl_verify from true to false. This will help you to install python libraries directly from prompt.(In case if you have restricted network)
Open anaconda prompt and type 'conda install findspark' to install findspark python module.If you are not able to install it, go to this link https://github.com/minrk/findspark and download ZIP,extract it and open anaconda prompt and go to this extracted path and run 'python setup.py install'.
Open ThisPC>> Properties>> Advanced System Settings(You need to have admin access for that).Click on Environment Variables and then Add new user environment variables.
After creating 4 user variables and adding spark path to 'PATH' system variable, open jupyter notebook and run this code:
```
import findspark findspark.init() import pyspark from pyspark.sql import SQLContext from pyspark import SparkContext sc = SparkContext("local", "First App") sqlContext = SQLContext(sc) 
```
If you dont get any error, the installation has been completed successfully.

pawelofficial · Accepted Answer · 2022-04-30 16:52:10Z

what worked for me (windows 10) was:

install pypandoc with pip install pypandoc
add wheel to PATH in windows - if you do pip install wheel and see yellow warning message (wheel not installed not in ... which is not in your PATH) then try doing that. Indication of this problem will be a message during pyspark installation which saying that setup.py is used because wheel is unavailable (it is unavailable because it is not in PATH)

ruhanbidart · Accepted Answer · 2022-10-19 21:25:29Z

What worked for me inside docker was actually done in three steps:

RUN python -m pip install --upgrade pip RUN pip3 install pypandoc RUN pip install -r requirements.txt --no-input

Or shorter:

RUN python -m pip install --upgrade pip && \ pip3 install pypandoc && \ pip install -r requirements.txt --no-input

Strick · Accepted Answer · 2019-10-07 10:54:01Z

-4

As of version 2.2 you can directly install pyspark using pip

pip install pyspark

answered Oct 7, 2019 at 10:54

Strick

1,65213 silver badges17 bronze badges

Collectives™ on Stack Overflow

unable to install pyspark

8 Answers 8

3 Comments

Comments

Comments

STEP 1

STEP 2

STEP 3

STEP 4

STEP 5

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

3 Comments

Comments

Comments

STEP 1

STEP 2

STEP 3

STEP 4

STEP 5

Comments

Comments

Comments

Comments

Comments

Linked

Related