6

I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:

<"apple", "pear", "pineapple", "orange"> 

The code that I wrote to do that is:

collection :: Parser [String] collection = (char '<') *> (string `sepBy` char ',')) <* (char '>') string :: Parser String string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"' 

This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:

 collection :: Parser [String] collection = do char '<' value <- (string `sepBy` char ',')) char '>' return nub value 

But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.

6
  • 2
    return $ nub value - this is a simple typographical error. Right now you're trying to apply return to nub, instead of to nub value. Note that this will not make your parser only parse a unique list of items, it will merely filter out duplicates that exist in the list. Commented Aug 23, 2015 at 15:33
  • @Cubic Thank you for pointing this out, this indeed works and you are right in that this will not make my parser parse a unique list of items, but it is already pretty close to what I want to achieve. What I am wondering is whether what I want to achieve is possible using a parser or whether I should perhaps try to achieve this in another way. Commented Aug 23, 2015 at 15:46
  • 1
    Spoilers: yes, it's possible - but if you don't restrict the possible strings in that list you're entering context sensitive territory. You have to parse the list in a more manual way though, because you'll need to remember list items you've already looked at. Commented Aug 23, 2015 at 15:52
  • 2
    It should be possible but I think this requires the monadic interface. Using only Applicative I don't think this is achievable. Commented Aug 23, 2015 at 15:53
  • 1
    @Cubic, it would be possible to parse the list as in the current code, but then to =<< on a function checking that elements are unique. The downside is that failure won't be detected till the list is completely parsed, but the code will be simple. Commented Aug 23, 2015 at 16:10

1 Answer 1

9

The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:

guard :: MonadPlus m => Bool -> m () 

This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.

So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.

To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:

allUnique ls = length (nub ls) == length ls 

With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:

unique parser = do res <- parser guard (allUnique res) return res 

Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.

Here's how we could use it:

λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">" Right ["apple","pear","pineapple","orange"] λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">" Left "<interactive>" (line 1, column 46):unknown parse error 

This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.

unique parser = do res <- parser guard (allUnique res) <?> "unique elements" return res 

Ahhh, much better:

λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">" Left "<interactive>" (line 1, column 46): expecting unique elements 

All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.

Sign up to request clarification or add additional context in comments.

3 Comments

To improve efficiency redefine allUnique ls = size (fromList ls) == length ls where you import Data.Set(size, fromList). This should take O(nlogn) (which is optimal). It changes the constraint from Eq to Ord but this shouldn't be a problem.
With the Functor-Applicative-Monad proposal in effect, guard now has type Alternative f => Bool -> f (), but that's fine because Parsec parsers also have an Alternative instance.
guard gives the 'expecting ...' error message; this isn't always appropriate and then you can use when cond $ fail "message here" to get an error message that is not 'expecting'.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.