There's a lot of ground to cover, so buckle your seatbelt, and hop on!
Craftsmanship
No test
It's great to know that your code passes the tests in the mark scheme, but that's no substitute for the lack of unit tests of the various methods you have.
No documentation
Rust provides full support for great doc comments, and you don't use them at all, which is a shame.
A small blurb describing what the function does, its pre-conditions, post-conditions, edge-cases, etc... is always useful.
Also, in the context of Olympiads, a note on the algorithmic complexity, time-wise and space-wise, would not be amiss.
No Linting
The Rust toolchain comes with a built-in linter named Clippy: use it!
In particular, it should flag the non-idiomatic &String argument. An &str argument is just as powerful, and allows passing anything which can hold a str value, not just a String.
Non-Idiomatic item order (mild)
It's not very idiomatic in Rust code to put a private item (NUMERALS) at the top of the file, when it's nothing more than an implementation detail. In general, I would recommend organizing items from high-level to low-level, with high-level items at the top, so that readers are naturally guided towards them.
Strange use of standard input
It's a bit strange that the arguments are read from standard input, when the use is not interactive, and only 2 arguments are used.
One would expect to read them from the command line instead, using std::env::args(), which returns an Iterator<Item = String>.
Entangled
The algorithm clearly works in two steps:
- Identify the sub-sequences of common characters.
- "Say" them.
Your code, however, performs both steps in a single function -- with only a small helper to convert numbers to roman numerals -- and therefore makes it impossible to independently test them.
I would recommend extracting step 1 from look_and_say, and creating an iterator which returns (usize, char) when iterating over a string.
Nesting
Nesting is the enemy of the software developer: unlike a compiler our minds don't have an infinite stack of facts that we can hold onto.
As a result, it's better to keep nesting as short as possible, which means using a guard style instead:
for c in old.chars() { if c == current { length += 1; continue; } new.push_str(&to_roman(length)); new.push(current); length = 0; current = c; }
Algorithmic
Repeated work in command line splitting
Rather than using nth(0) and nth(1) to parse the command line, it would be more idiomatic to use next():
let mut iter = buf.split_whitespace(); let mut numeral = iter.next().unwrap().to_vec(); let n: u32 = iter.next().unwrap().parse().unwrap();
Pessimistic romanization
I wonder if iteration over NUMERALS in to_roman is not... pessimistic.
I would expect that most numbers pass to it will be small, because the very nature of look-and-say means that even if you have, once, a thousand I, then it'll be described as MI, and next time you'll have 1 M and 1 I.
Thus, I would expect it would be interesting to bias the implementation of to_roman for small numbers, and thus NOT to iterate from 1000, 900, etc... but instead to have a first pass, starting from the other end, to figure the biggest number smaller than n.
let bigger = NUMERALS.iter().rposition(|(k, _)| *k > n).unwrap_or(0); let start = NUMERALS.len() - bigger; for (key, value) in &NUMERALS[start..] { ... }
Performance
To String or not to String
The problem description naturally leads to thinking in terms of String, but String provides more than you need, here, being a UTF-8 encoded Unicode string type.
All the numerals that you read, or write, are pure ASCII. By using a UTF-8 oriented abstraction, you're giving up quite a bit of performance for manipulating strictly ASCII.
Instead, you should switch to the byte-representation underneath, where each ASCII character is a single byte, and manipulate (and generate) [u8].
One Byte by One Byte
Your look-up routine checks bytes one by one. That's about the least efficient way to go about it, though by Unicode it may be difficult to do any better.
Instead, you should implement the look up by advancing to the next different character; which is much easier when manipulating a slice of bytes:
fn look(bytes: &[u8]) -> impl Iterator<Item = (u32, u8)> + '_ { struct Looker<'a> { bytes: &'a [u8], } impl<'a> Iterator for Looker<'a> { type Item = (u32, u8); fn next(&mut self) -> Option<Self::Item> { let (&first, tail) = self.bytes.split_first()?; let index = tail.iter() .position(|b| *b != first) .unwrap_or(tail.len()); self.bytes = tail.split_at_checked(index).unwrap_or((&[], &[])).1; Some(((index + 1) as u32, first)) } } Looker { bytes } }
Note: for the future reader, the generator feature (gen) will make this less boilerplatey in the future.
Many, many, allocations
The algorithm creates... many, many, strings.
Each call to look_and_say will:
- Creates a new, empty,
String, which will be reallocated log(N) times. - Creates a new, empty,
String, which will be reallocated at least once or twice, for each call to to_roman.
That a lot of memory allocations.
It would make sense, instead, to tweak the signatures in order to enable buffer reuse.
let mut old = numeral.into_vec(); let mut new = Vec::new(); for _ in 0..n { q1::look_and_say(&old, &mut new); mem::swap(&mut old, &mut new); }
With the signatures changed to match:
pub fn look_and_say(old: &[u8], new: &mut Vec<u8>); pub fn to_roman(mut n: u32, out: &mut Vec<u8>);
Even better, both old and new could start pre-sized, if you have a good heuristic for the final size they could reach. A reasonable upper-bound -- which would guarantee the absence of reallocation -- would be great, though otherwise a conservative lower-bound triggering at most one reallocation would already be pretty good.
Repeated parsing/formatting
While the challenge is expressed as parsing a string and formatting a string at each round, an implementation doesn't, actually, need to do so.
Turning things on its head, one could instead:
- Parse the original numeral into a sequence of (count, character) tuples.
- Then, repeatedly, apply a look-and-say which produces a sequence of (count, character) from a sequence of (count, character).
- Finally produce the result from such a sequence.
This would entirely skip the parsing & formatting codes, and likely result in performance gains.
The only trick, there, would be in the "formatting" part of look-and-say, to consolidate the last element of the sequence with the element to be appended, if they have the same character.
(u32, &str)array?) \$\endgroup\$