Haskell breaking up words by first space

Question

Note this is not the same as using the words function.

I would like to convert from this:

"The quick brown fox jumped over the lazy dogs."

into this:

["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."]

Note how the breaks are on the first space after each word.

The best I could come up with is this:

parts "" = [] parts s = if null a then (c ++ e):parts f else a:parts b where (a, b) = break isSpace s (c, d) = span isSpace s (e, f) = break isSpace d

It just looks a little inelegant. Can anyone think of a better way to express this?

What you want is obviously similar to the words function, so maybe you should look at how words is implemented and see if you can do something similar. — Tyler
– Tyler, Commented Aug 16, 2011 at 5:02
... and you can see that implementation here: darcs.haskell.org/packages/base/Data/List.hs — Tyler
– Tyler, Commented Aug 16, 2011 at 5:14

gatoatigrado · Accepted Answer · 2011-08-16 08:52:19Z

edit -- Sorry I didn't read the question. Hopefully this new answer does what you want.

> List.groupBy (\x y -> y /= ' ') "The quick brown fox jumped over the lazy dogs." ["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."]

The library function groupBy takes a predicate function that tells you whether you add the next element, y to the previous list, which starts with x, or start a new list.

In this case, we don't care what the current list started with, we only want to start a new list (i.e. make the predicate evaluate to false) when the next element, y, is a space.

edit

n.m. points out that the handling of multiple spaces is not correct. In which case you can switch to Data.List.HT, which has the semantics you'd want.

> import Data.List.HT as HT > HT.groupBy (\x y -> y /= ' ' || x == ' ') "a b c d" ["a"," b"," c"," d"]

the different semantics that makes this work is that the x is the last element in the previous list (that you might add y to, or create a new list).

Looks a bit like this problem, from not so long ago! stackoverflow.com/questions/6966151/…

John L · Accepted Answer · 2011-08-16 07:45:48Z

If you're doing lots of slightly different types of splits, have a look at the split package. The package lets you define this split as split (onSublist [" "]).

Vagif Verdi · Accepted Answer · 2011-08-16 06:41:36Z

words2 xs = head w : (map (' ':) $ tail w) where w = words xs

And here's with arrows and applicative: (not recommended for practical use)

words3 = words >>> (:) <$> head <*> (map (' ':) . tail)

EDIT: My first solution is wrong, because it eats additional spaces. Here's the correct one:

words4 = foldr (\x acc -> if x == ' ' || head acc == "" || (head $ head acc) /= ' ' then (x : head acc) : tail acc else [x] : acc) [""]

n. m. could be an AI · Accepted Answer · 2011-08-16 05:58:20Z

Here's my take

break2 :: (a->a->Bool) -> [a] -> ([a],[a]) break2 f (x:(xs@(y:ys))) = if f x y then ([x],xs) else (x:u,us) where (u,us) = break2 f xs break2 f xs = (xs, []) onSpace x y = not (isSpace x) && isSpace y words2 "" = [] words2 xs = y : words2 ys where (y,ys) = break2 onSpace xs

Landei · Accepted Answer · 2011-08-16 07:33:04Z

parts xs = foldr spl [] xs where spl x [] = [[x]] spl ' ' (xs:xss) = (' ':xs):xss spl x xss@((' ':_):_) = [x]:xss spl x (xs:xss) = (x:xs):xss

Snoqual · Accepted Answer · 2011-08-16 23:01:45Z

I like the idea of the split package but split (onSublist [" "]) doesn't do what I want and I can't find a solution that splits on one-or-more spaces.

Also like the solution using Data.List.HT but I'd like to stay away from dependencies if possible.

Cleanest I can come up with:

parts s | null s = [] | null a = (c ++ e) : parts f | otherwise = a : parts b where (a, b) = break isSpace s (c, d) = span isSpace s (e, f) = break isSpace d

jvinkovic · Accepted Answer · 2013-11-10 13:00:20Z

Here it is. Enjoy! :D

 words' :: String -> [String] words' [] = [] words' te@(x:xs) | x==' ' || x=='\t' || x=='\n' = words' xs | otherwise = a : words' b where (a, b) = break isSpace te

Collectives™ on Stack Overflow

Haskell breaking up words by first space

7 Answers 7

edit

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

edit

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related