Edit - Stack Overflow

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Rev

4

I know this kind of approach, but I'd like to avoid it for 2 main reasons: using it in a transformation means that the result cannot be trusted in case of some failures; there is anyway a (little) overhead. I was just wondering if there is a counter accessible someway, like there is in mapreduce, since in the web UI the number of rows written is shown...

mgaido
– mgaido

2016-05-29 11:44:03 +00:00
Commented May 29, 2016 at 11:44
Well, thank you for your answer... even though I keep on wondering how they can show this info on the web UI if there is no internal counter...

mgaido
– mgaido

2016-05-29 11:59:25 +00:00
Commented May 29, 2016 at 11:59
@mark91 Ah, well, you could clone the UI code and dig through it I guess. Having read the documentation, the code I've given is fine. (Spark says it protects against restarted tasks). It seems what you want to protect against is when an RDD is transformed multiple times, but the code I've given the rdd isn't accessible outside the Pimps scope. It will only accumulates before writing, and only accumulate once.

samthebest
– samthebest

2016-05-29 12:01:49 +00:00
Commented May 29, 2016 at 12:01
count = rdd.count(); rdd.saveAsTextFile(p); Is this anyway better?

Amit Kumar
– Amit Kumar

2016-05-29 12:18:09 +00:00
Commented May 29, 2016 at 12:18
1

@amit_kumar If RDD is not cached this should be more efficient than separate count because data will be materialized only once.

zero323
– zero323

2016-05-29 12:30:27 +00:00
Commented May 29, 2016 at 12:30

| Show 1 more comment

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

create code fences with backticks ` or tildes ~
```
like so
```
add language identifier to highlight code
```python
def function(foo):
print(foo)
```
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
indent code by 4 spaces
backtick escapes `like _so_`
quote by placing > at start of line
to make links (use https whenever possible)

<https://example.com>

[example](https://example.com)

<a href="https://example.com">example</a>

formatting help »
answering help »

Collectives™ on Stack Overflow