Getting the error as:
Traceback (most recent call last): File “/opt/application/main.py”, line 6, in from pyspark import SparkConf, SparkContext ModuleNotFoundError: No module named ‘pyspark’
While running pyspark in docker.
And my dockerfile is as follows:
FROM centos ENV DAEMON_RUN=true ENV SPARK_VERSION=2.4.7 ENV HADOOP_VERSION=2.7 WORKDIR /opt/application RUN yum -y install python36 RUN yum -y install wget ENV PYSPARK_PYTHON python3.6 ENV PYSPARK_DRIVER_PYTHON python3.6 RUN ln -s /usr/bin/python3.6 /usr/local/bin/python RUN wget https://bootstrap.pypa.io/get-pip.py RUN python get-pip.py RUN pip3.6 install numpy RUN pip3.6 install pandas RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \ && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \ && rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz ENV SPARK_HOME=/usr/local/bin/spark RUN yum -y install java-1.8.0-openjdk ENV JAVA_HOME /usr/lib/jvm/jre COPY main.py . RUN chmod +x /opt/application/main.py CMD ["/opt/application/main.py"]