Questions tagged [apache-hadoop]
Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing. The project itself includes a variety of other complementary additions.
118 questions
0 votes
1 answer
177 views
Hadoop, Spark and Cloud
It seems Hadoop, Spark, and different versions of Clouds offer facilities to store and analyze big data. There are some articles comparing Hadoop and Spark (for example, this article). There are also ...
0 votes
1 answer
60 views
Can I update the source of Data found in a Data Lake or Data Blob
Is it possible to update the source of data found in a Data Lake or Data Blob? What about while using HDInsight or Azure Databricks?
0 votes
1 answer
524 views
Storage of N-dimensional matrices (tensors) as part of machine learning pipelines
I'm an infra person working on a storage product. I've been googling quite a bit to find an answer to the following question but unable to do so. Hence, I am attemping to ask the question here. I am ...
0 votes
1 answer
42 views
Can Single Node Hadoop Cluster be installed on a system with 1gb RAM
I am trying to learn hadoop, would like to know if for basic single node installation 1gb RAM system would be enough or we need more RAM. It would be helpful if someone can share what other minimum ...
5 votes
1 answer
1k views
What is the main difference between Hadoop and Spark? [closed]
I recently read the following about Hadoop vs. Spark: Insist upon in-memory columnar data querying. This was the killer-feature that let Apache Spark run in seconds the queries that would take Hadoop ...
0 votes
1 answer
35 views
What are common problems around HADOOP storage?
I've been asked to lead a program to understand why our Hadoop storage is constantly near capacity. What questions should I ask? Data age, Data size? Housekeeping schedule? How do we identify the ...
1 vote
0 answers
49 views
Loading file into and out of HDFS via system call/cmd line vs using libhdfs
I am trying to implement a simple C/C++ program for the HDFS file system like word count, it takes a file from the input path puts it into HDFS (where it gets split), processed my map-reduce function ...
3 votes
1 answer
631 views
BERT in production
I've created a BERT model. What are the ways to do the deployment of this model? Is it possible to use it with Spark, Hadoop or Docker?