4

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.

The best I could do so far has been the following:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> { let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect(); vals.into_iter().map(|v: &String| v.to_owned()).collect() } 

However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.

I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.

So let me try and trace the types of each expression:

- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>> - vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U> - vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>) - vals was declared as Vec<&String>, therefore vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String. 

I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> { let a: Iter<Vec<String>> = vecs.iter(); let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter()); let c = b.collect(); print_type_of(&c); let vals : Vec<&String> = c; vals.into_iter().map(|v: &String| v.to_owned()).collect() } 

Clearly, U=Iter<String>... Please help me clear up this mess.

EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> { vecs.into_iter().flat_map(|x| x.into_iter()).collect() } 

My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.

EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> { let a: Iter<Vec<String>> = vecs.iter(); let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter(); let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f); let vals : Vec<&String> = b.collect(); vals.into_iter().map(|v: &String| v.to_owned()).collect() } 
1
  • - vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>: this is incorrect. There is no associated type named Item on struct Iter (only traits may have associated types). vecs.iter() is of type Iter<Vec<String>>, but this type implements Iterator<Item=&Vec<String>> (note the &). When you flat_map this, you turn &Vec<String> into &String. Commented Jul 1, 2015 at 2:19

1 Answer 1

6

I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.

We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.

To minimize time spent growing and reallocating the vector, calculate the space needed up front.

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> { let size = vecs.iter().fold(0, |a, b| a + b.len()); vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| { acc.extend(v); acc }) } 

If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */


The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).

vecs.into_iter().flat_map(|x| x.into_iter()).collect()

If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)

vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

Sign up to request clarification or add additional context in comments.

9 Comments

This gives me some errors: ` build.rs:104:9: 104:12 error: cannot borrow immutable local variable acc as mutable build.rs:104 acc.extend(v); v ^~~ note: in expansion of closure expansion build.rs:103:53: 105:6 note: expansion site build.rs:104:24: 104:25 error: use of moved value: v [E0382] build.rs:104 acc.extend(v); v`
maybe because I didn't test compile it until after I submitted, but then I fixed it
How can I independently learn about vecs.concat()? It's not on the API page for std::vec::Vec. I mean how am I supposed to find it if looking at the API doesn't give any hints? Even now that I empirically know it's there I can't figure out where it is defined.
As to your solution with fold(), thank you. Still, what I'm after is learning how to achieve the needed result independently, so I would really like to use my code as the starting point and see how it can be improved to only use one collect() rather than rewritten.
Well, this answers my main question, and I would really like to mark your answer as solution, but OTOH I also would like to get an answer to the meta-questions (how to figure out types by looking at the API). Perhaps my question is too broad?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.