14

Editor's note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code produce a different errors, but the answers still contain valuable information.

I'm trying to write a function to split a string into clumps of letters and numbers; for example, "test123test" would turn into [ "test", "123", "test" ]. Here's my attempt so far:

pub fn split(input: &str) -> Vec<String> { let mut bits: Vec<String> = vec![]; let mut iter = input.chars().peekable(); loop { match iter.peek() { None => return bits, Some(c) => if c.is_digit() { bits.push(iter.take_while(|c| c.is_digit()).collect()); } else { bits.push(iter.take_while(|c| !c.is_digit()).collect()); } } } return bits; } 

However, this doesn't work, looping forever. It seems that it is using a clone of iter each time I call take_while, starting from the same position over and over again. I would like it to use the same iter each time, advancing the same iterator over all the each_times. Is this possible?

1
  • 3
    For what it's worth, as of 2/2015 Peekable<Chars> is not a Copy, so the original code doesn't loop infinitely -- it throws a compile error indicating an ownership change on calling take_while. See, for example: is.gd/1ppVX0 Commented Mar 3, 2015 at 6:01

2 Answers 2

16

As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)

You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:

pub fn split(input: &str) -> Vec<String> { let mut bits: Vec<String> = vec![]; let mut iter = input.chars().peekable(); loop { match iter.peek().map(|c| *c) { None => return bits, Some(c) => if c.is_digit(10) { bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect()); } else { bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect()); }, } } } fn main() { println!("{:?}", split("123abc456def")) } 

Prints

["123", "bc", "56", "ef"] 

However, I imagine this is not correct.

I would actually recommend writing this as a normal for loop, using the char_indices iterator:

pub fn split(input: &str) -> Vec<String> { let mut bits: Vec<String> = vec![]; if input.is_empty() { return bits; } let mut is_digit = input.chars().next().unwrap().is_digit(10); let mut start = 0; for (i, c) in input.char_indices() { let this_is_digit = c.is_digit(10); if is_digit != this_is_digit { bits.push(input[start..i].to_string()); is_digit = this_is_digit; start = i; } } bits.push(input[start..].to_string()); bits } 

This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:

pub fn split<'a>(input: &'a str) -> Vec<&'a str> { let mut bits = vec![]; if input.is_empty() { return bits; } let mut is_digit = input.chars().next().unwrap().is_digit(10); let mut start = 0; for (i, c) in input.char_indices() { let this_is_digit = c.is_digit(10); if is_digit != this_is_digit { bits.push(&input[start..i]); is_digit = this_is_digit; start = i; } } bits.push(&input[start..]); bits } 

All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.

One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh, this by_ref saved me!
5

take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.

You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.

Here is one of many possible ways of dealing with this:

pub fn split(input: &str) -> Vec<String> { let mut bits: Vec<String> = vec![]; let mut iter = input.chars().peekable(); loop { let seeking_digits = match iter.peek() { None => return bits, Some(c) => c.is_digit(10), }; if seeking_digits { bits.push(take_while(&mut iter, |c| c.is_digit(10))); } else { bits.push(take_while(&mut iter, |c| !c.is_digit(10))); } } } fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String where I: Iterator<Item = char>, F: Fn(&char) -> bool, { let mut out = String::new(); loop { match iter.peek() { Some(c) if predicate(c) => out.push(*c), _ => return out, } let _ = iter.next(); } } fn main() { println!("{:?}", split("test123test")); } 

This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

1 Comment

I want to mention that the itertools crate provides a peeking_take_while method which does essentially what this answer's function does, but in a generic fashion.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.