863 questions
0 votes
1 answer
79 views
Unable to stream data to azure blob using flink job
I'm running flink job and on my local machine I dont see any issue of streaming the data to Azure blob, but when I deploy on dev environment I'm seeing an error in the console like Caused by: org....
-1 votes
1 answer
45 views
How to change the file name with updated date in flink job
I have a flink job which streams data to azure using hadoop fs. Currently I'm able to push the data and create a new file but I want to roll the new file when there is a date change(like from 2025-03-...
1 vote
1 answer
37 views
hadoop streaming job hanged at reduce side merge stage
I write a hadoop streaming job, that uses python code to transform the data.But the job occurred some error.when the input file is larger(e.g. 70M bytes), it will hange on the reduce stage.When I ...
0 votes
1 answer
78 views
Python - How to run Hadoop stream passing command line arguments
I need help for a school project. For the labs I've did, I've written the mapper and reducer scripts in python (version 3) and I was able to run hadoop streaming with no problems there. Then I edited ...
2 votes
0 answers
123 views
MapReduce Troubleshoot with python script as mapper and reducer using hadoop-streaming-3.3.6.jar
core-site.xml config : <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> </...
0 votes
1 answer
55 views
Specify N in hadoop streaming when use NLineInputFormat
If I use NLineInputFormat in hadoop streaming, how to specify N? hadoop jar /home/Software/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \ -D stream.non.zero.exit.is.failure=false \ -D ...
0 votes
0 answers
69 views
Unable to process text file using mapreduce on linux
I am currently trying to use Hadoop streaming. I have a file called diamonds.txt that contains the carat of a diamond and its price beside it, all separated by commas (csv). An example of the first ...
0 votes
0 answers
132 views
Hadoop mapreduce error : PipeMapRed.waitOutputThreads(): subprocess failed with code 1
I'm trying to convert xml files through a mapreduce job and receive the error : 2023-04-04 09:41:52,515 INFO mapreduce.Job: map 0% reduce 0% 2023-04-04 09:42:12,676 INFO mapreduce.Job: Task Id : ...
0 votes
1 answer
126 views
How to execute multiple reduce jobs with one mapper using bash file in Hadoop using Python as the base?
bash file code I formatted the mapper and the reducer to be the same so I can skip the mapping steps and just continue to reduce it. IN this case I am only doing two reduce jobs. It works fine using ...
0 votes
0 answers
166 views
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
I am fairly new to using hadoop and I've been get these exceptions when I run a file on hadoop.Please help. this is the command: hadoop jar /home/eeman/hadoop-3.2.4/share/hadoop/tools/lib/hadoop-...
0 votes
1 answer
86 views
Word count application is not running on hadoop
This is my first time using hadoop for anything so I started working with basic program which is word count. On my local machine it works perfectly fine. Real issue is that I am unable to run in on ...
3 votes
2 answers
4k views
How to fix "java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" Pyspark
Below are the runtime versions in pycharm. Java Home /Library/Java/JavaVirtualMachines/jdk-11.0.16.1.jdk/Contents/Home Java Version 11.0.16.1 (Oracle Corporation) Scala Version version 2.12.15 ...
0 votes
1 answer
416 views
Hadoop Streaming Exception (No FileSystem for Scheme "C")
I'm new to Hadoop, and trying to use streaming option to develop some jobs using Python on windows 10 localy. After double checking my pathes given, and even my program, I encounter an Exception that ...
1 vote
0 answers
1k views
Caused by: java.io.IOException: error=2, No such file or directory error in Colab Hadoop
I'm Hadoop in Colab and I have two documents that I've made in Pycharm, one with the mapper and another one with the reducer part. This is the code: !apt-get install -y openjdk-11-jdk-headless -qq >...
-2 votes
1 answer
102 views
Calculate average temperature in reducer
I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc weather. 0057011060999991928010112004+67500+012067FM-12+001199999V0202001N012319999999N0500001N9+00281+...