2

I'm implementing a loop in Ruby, but it looks ugly and I wonder if there's a neater, more Ruby-like way of writing it:

def get_all_items items = []; page = 1; page_items = nil while page_items != [] # Loop runs until no more items are received items += (page_items = get_page_items(page)) page += 1 end items end 

Note that the get_page_items method runs a HTTP request to get the items for the page, and there is no way of knowing the number of pages, or the total number of items, or the number of items for any page before actually executing the requests in order until one of them returns an empty item set.

Imagine leafing through a catalog and writing down all the products, without knowing in advance how many pages it has, or how many products there are.

2
  • Does get_page_items return an empty array when page is out of bounds? I'm inferring from the structure that this is the logic. Commented Aug 12, 2011 at 19:35
  • Yes, though what is or isn't out of bounds is not known by the method. Commented Aug 12, 2011 at 19:36

8 Answers 8

3

I think that this particular problem is compounded because A) there's no API for getting the total number of items and B) the response from get_page_items is always truthy. Further, it doesn't make sense for you to iteratively call a method that is surely making individual requests to your DB with an arbitrary limit, only to concatenate them together. You should, at the risk of repeating yourself, implement this method to prompt a DB query (i.e. model.all).

Normally when you are defining an empty collection, iterating and transforming a set, and then returning a result, you should be using reduce (a.k.a inject):

array.reduce(0) { |result, item| result + item } # a quick sum 

Your need to do a form of streaming in this same process makes this difficult without tapping into Enumerable. I find this to be a good compromise that is much more readable, even if a bit distasteful in fondling this items variable too much:

items = [] begin items << page_items = get_page_items(page ||= 1) page += 1 end until page_items.empty? items.flatten 
Sign up to request clarification or add additional context in comments.

6 Comments

Looks good! Concerning your remarks: No, there are no database requests, and there is no API, since this is for a web crawler which reads items from external sources by going from page to page, parsing the HTML, etc.
That makes a lot more sense as to why we've got these constraints! :)
One cosmetic thing: items will contain an empty array at the end.
@emboss: I'm not sure I follow your logic. Items is the array we're appending all of the get_page_items responses to. It'll only be an empty array if no items are ever returned from get_page_items.
I think the problem is that << is used with items here, which doesn't merge the arrays together, but adds an array of items as a new element of the items array. So we need to return items.flatten at the end to get all the items, or use += instead of <<.
|
1

Here's how I'd have written it. You'll see it's actually more lines, but it's easier to read and more Rubyish.

def get_all_items items = [] page = 1 page_items = get_page_items page until page_items.empty? # Loop runs until no more items are received items += page_items page += 1 page_items = get_page_items page end items end 

You could also implement get_page_items as an Enumerator which would eliminate the awkward page += 1 pattern but that might be overkill.

1 Comment

Definitely easier to read, but I don't really like the fact that page_items = get_page_items page is repeated. If I change the method name I have to change both occurrences, for example.
1

I don't know that this is any better, but it does have a couple of Ruby-isms in it:

def get_all_items items = []; n = 0; page = 1 while items.push(*get_page_items(page)).length > n page += 1 n = items.length end end 

Comments

1

I would use this solution, which is a good compromise between readability and length:

def get_all_items [].tap do |items| page = 0 until (page_items = get_page_items(page)).empty? items << page_items page += 1 end end end 

Comments

1

The short version, just for fun ;-)

i=[]; p=0; loop { i+=get_page_items(p+=1).tap { |r| return i if r.empty? } } 

Comments

1

I wanted to write a functional solution which would closely resemble the task you want to achieve.

I'd say that your solution comes down to this:

For all page numbers from 1 on, you get the corresponding list of items; Take lists while they are not empty, and join them into a single array.

Sounds ok?

Now let's try to translate this, almost literally, to Ruby:

(1..Float::INFINITY). # For all page numbers from 1 on map{|page| get_page_items page}. # get the corresponding list of items take_while{|items| !items.empty?}. # Take lists while they are not empty inject(&:+) # and join them into a single array. 

Unfortunately, the above code won't work right away, as Ruby's map is not lazy, i.e. it would try to evaluate on all members of the infinite range first, before our take_while had the chance to peek at the values.

However, implementing a lazy map is not that hard at all, and it could be useful for other stuff. Here's one straightforward implementation, along with nice examples in the blog post.

module Enumerable def lazy_map Enumerator.new do |yielder| self.each do |value| yielder.yield(yield value) end end end end 

Along with a mockup of your actual HTTP call, which returns arrays of random length between 0 and 4:

# This simulates actual HTTP call, sometimes returning an empty array def get_page_items page (1..rand(5)).to_a end 

Now we have all the needed parts to solve our problem easily:

(1..Float::INFINITY). # For all page numbers from 1 on lazy_map{|page| get_page_items page}. # get the corresponding list of items take_while{|items| !items.empty?}. # Take lists while they are not empty inject(&:+) # and join them into a single array. #=> [1, 1, 2, 3, 1] 

4 Comments

I get the idea, but frankly I don't think this type of code would increase my popularity in a team :-)
I just hope it wasn't too hard to understand. I think take_while describes pretty closely what you are trying to achieve - take one result at a time while it satisfies a condition - not to be an empty array. Enumerator is just a way to get the "infinite pool" of results.
I don't think the central paradigm should be to avoid everything that looks like C, but to make the code cleaner and more intuitive. So, I understand it, but I don't think it's more intuitive, and I think the infinity part looks scary to some people :-)
I think it's all about what we are used to - every C's while has the same Infinity hidden within, implicitly. :) BTW, I will try to think a bit less intimidating version.
0

It's a small (and almost entirely cosmetic) tweak, but one option would be to replace while page_items != [] with until page_items.empty?. It's a little more "Ruby-ish," in my opinion, which is what you're asking about.

1 Comment

It's more Ruby-ish, though it will produce an error if I declare page_items = nil before, since nil.empty? is undefined.
0
def get_all_items items = []; page = 0 items << page_items while (page_items = get_page_items(page += 1)) items end 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.