List processing in Haskell

Question

I am teaching myself Haskell and have run into a problem and need help.

Background:

type AInfo = (Char, Int) type AList = [AInfo] (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)] type BInfo = Char type BList = [BInfo] (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]

One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.

I am looking to do the following:

For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.

So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:

AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]

BList [‘c’, ‘g’, ‘w’, ‘b’]

And in the end, the lists values should be:

AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]

BList [‘c’, ‘g’]

Of course, all of this is happening in an IO monad.

Things I have tried:

Using mapM and a recursive helper function. I have looked at both:

Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist
Passing both lists to a function and using a complicated if/then/else & helper function calls (feels like I am forcing Haskell to be iterative; Messy convoluted code, Does not feel right.)
I have thought about using filter, the character value of AList element and Blist to create a third list of Bool and the count the number of True values. Update the Int value. Then use filter on BList to remove the BList elements that …… (again Does not feel right, not very Haskell-like.)

Things I think I know about the problem:

The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.

Any pointers would be greatly appreciated. (mutter away….)

luqui · Accepted Answer · 2013-01-25 21:57:17Z

A few pointers:

Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)

I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.

I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:

import qualified Data.Map as Map sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])

Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.

Oh, also this program has no business being in the IO monad. This is pure code.

Thanks for the response. I think you are on to something... Use a different Data Structure (Map instead of a List). Just might work... give me a day or so and I will tell you how I'm getting on....
Luqui, Thanks for the pointer to Data.Map. I changed my logic such that AList and BList only contain unique elements and was able to use existing Haskell methods to operate/transform/sequence the lists. That said, I have come back to looking at Data.Map as current performance degrades quickly as the length of the lists increases.

Father Deus · Accepted Answer · 2013-01-25 06:56:30Z

import Data.List type AInfo = (Char, Int) type AList = [AInfo] type BInfo = Char type BList = [BInfo] process :: AList -> BList -> AList process [] _ = [] process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where is_in f [] = False is_in f (s:ss) = if fst f == s then True else is_in f ss *Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b'] [('a',5),('b',6),('a',1),('b',22)] *Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b'] [('a',5),('b',6),('a',1),('w',22)]

Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.

Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.

Thanks for the response. You have hit the same problem I am facing; I need to "return" both modified lists.

The Internet · Accepted Answer · 2013-01-25 07:04:42Z

While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.

import Data.List type AInfo = (Char, Int) type AList = [AInfo] type BInfo = Char type BList = [BInfo] lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList) listb = ['a','a','c','g','a','w','b'] --step one, get the head, and its occurrences items list = (eleA, eleB) where eleA = length $ filter (\x -> x == (head list)) list eleB = head list getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot --add to lista addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)] addWhile [] _ _ = [] addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times else (x,y) : addWhile xs letter 0 --first answer firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb) --[('a',5),('b',5),('a',1),('w',21)]

Will Ness · Accepted Answer · 2013-01-31 06:42:36Z

The operation you describe is pure, as @luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).

import Data.List combine alist blist = (reverse a, b4) where

First we sort and count the B list:

 b = map (\g->(head g,length g)) . group . sort $ blist

We need the import for group and sort to be available. Next, we roll along the alist and do our thing:

 (a,b2) = foldl g ([],b) alist g (acc,b) e@(x,c) = case pick x b of Nothing -> (e:acc,b) Just (n,b2) -> ((x,c+n):acc,b2) b3 = map fst b2 b4 = [ c | c <- blist, elem c b3 ]

Now pick, as used, must be

 pick x [] = Nothing pick x ((y,n):t) | x==y = Just (n,t) | otherwise = case pick x t of Nothing -> Nothing Just (k,r) -> Just (k, (y,n):r)

Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.

Test run:

> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb" ([('a',5),('b',6),('a',1),('w',22)],"cg")

edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.

So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):

scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc] scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of (Just _, m2) -> (Nothing, m2) -- seen before _ -> (Just a, cmap) -- not present in counts new_b_list = [ a | (Just a,_) <- scanning ] last_counts = snd $ last scanning

You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

Will, Thanks for the response. I did explore using Data.Map only to come to the conclusion that I had a fundamental error in my logic and I needed to have BList to only contain unique elements. This allows me to use existing Haskell list methods. That said I am currently sequencing AList and BList multiple times and there may be a performance gain if I have them as Maps (hash tables, dictionaries) instead of Lists.
you mean, AList to contain only unique elements - counts for each of BList's unique elements? As you've described it, your algorithm can be made on-line (produce and forget) for BList, as in my 2nd variant. If your BList contains only unique elements, then you don't need to count them (and it won't be very long, right?).

Collectives™ on Stack Overflow

List processing in Haskell

4 Answers 4

2 Comments

1 Comment

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

2 Comments

Related