6

Note this is not the same as using the words function.

I would like to convert from this:

"The quick brown fox jumped over the lazy dogs." 

into this:

["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."] 

Note how the breaks are on the first space after each word.

The best I could come up with is this:

parts "" = [] parts s = if null a then (c ++ e):parts f else a:parts b where (a, b) = break isSpace s (c, d) = span isSpace s (e, f) = break isSpace d 

It just looks a little inelegant. Can anyone think of a better way to express this?

2
  • What you want is obviously similar to the words function, so maybe you should look at how words is implemented and see if you can do something similar. Commented Aug 16, 2011 at 5:02
  • 1
    ... and you can see that implementation here: darcs.haskell.org/packages/base/Data/List.hs Commented Aug 16, 2011 at 5:14

7 Answers 7

7

edit -- Sorry I didn't read the question. Hopefully this new answer does what you want.

> List.groupBy (\x y -> y /= ' ') "The quick brown fox jumped over the lazy dogs." ["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."] 

The library function groupBy takes a predicate function that tells you whether you add the next element, y to the previous list, which starts with x, or start a new list.

In this case, we don't care what the current list started with, we only want to start a new list (i.e. make the predicate evaluate to false) when the next element, y, is a space.

edit

n.m. points out that the handling of multiple spaces is not correct. In which case you can switch to Data.List.HT, which has the semantics you'd want.

> import Data.List.HT as HT > HT.groupBy (\x y -> y /= ' ' || x == ' ') "a b c d" ["a"," b"," c"," d"] 

the different semantics that makes this work is that the x is the last element in the previous list (that you might add y to, or create a new list).

Sign up to request clarification or add additional context in comments.

2 Comments

Looks a bit like this problem, from not so long ago! stackoverflow.com/questions/6966151/…
chains (\x y -> isSpace x || (not . isSpace) y) ?
3

If you're doing lots of slightly different types of splits, have a look at the split package. The package lets you define this split as split (onSublist [" "]).

Comments

1
words2 xs = head w : (map (' ':) $ tail w) where w = words xs 

And here's with arrows and applicative: (not recommended for practical use)

words3 = words >>> (:) <$> head <*> (map (' ':) . tail) 

EDIT: My first solution is wrong, because it eats additional spaces. Here's the correct one:

words4 = foldr (\x acc -> if x == ' ' || head acc == "" || (head $ head acc) /= ' ' then (x : head acc) : tail acc else [x] : acc) [""] 

1 Comment

This does not keep original spaces
0

Here's my take

break2 :: (a->a->Bool) -> [a] -> ([a],[a]) break2 f (x:(xs@(y:ys))) = if f x y then ([x],xs) else (x:u,us) where (u,us) = break2 f xs break2 f xs = (xs, []) onSpace x y = not (isSpace x) && isSpace y words2 "" = [] words2 xs = y : words2 ys where (y,ys) = break2 onSpace xs 

Comments

0
parts xs = foldr spl [] xs where spl x [] = [[x]] spl ' ' (xs:xss) = (' ':xs):xss spl x xss@((' ':_):_) = [x]:xss spl x (xs:xss) = (x:xs):xss 

Comments

0

I like the idea of the split package but split (onSublist [" "]) doesn't do what I want and I can't find a solution that splits on one-or-more spaces.

Also like the solution using Data.List.HT but I'd like to stay away from dependencies if possible.

Cleanest I can come up with:

parts s | null s = [] | null a = (c ++ e) : parts f | otherwise = a : parts b where (a, b) = break isSpace s (c, d) = span isSpace s (e, f) = break isSpace d 

Comments

0

Here it is. Enjoy! :D

 words' :: String -> [String] words' [] = [] words' te@(x:xs) | x==' ' || x=='\t' || x=='\n' = words' xs | otherwise = a : words' b where (a, b) = break isSpace te 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.