2
\$\begingroup\$

Please see here for the general description.

Exercise 3: histogram :: [Integer] -> String

takes as input a list of Integers between 0 and 9 (inclusive), and outputs a vertical histogram showing how many of each number were in the input list.

histogram :: [Integer] -> String histogram xs = let (a,m) = getCountOfElements xs unpadded = getStrings a padded = pad m unpadded transposed = transp ([], padded) in mkOutput transposed mkOutput :: [String] -> String mkOutput xss = let nums = "0123456789\n" sep = "==========\n" t = foldl (\a b -> b++"\n"++a) "" xss in t ++ sep ++ nums pad :: Integer -> [String] -> [String] pad _ [] = [] pad n (xs:xss) = (xs ++ (replicate (fromInteger n - length xs) ' ')) : (pad n xss) getCountOfElements :: [Integer] -> ([Integer],Integer) getCountOfElements ys = let arr = foldr (\e acc -> count e acc) (replicate 10 0) ys where count :: Integer -> [Integer] -> [Integer] count n xs = let idx = fromInteger n in (take idx xs) ++ [(xs!!idx + 1)] ++ (drop (idx+1) xs) maxElt = maximum arr in (arr, maxElt) getStrings :: [Integer] -> [String] getStrings = map (\x -> replicate (fromInteger x) '*') -- good, descriptive name is lacking... f :: [[a]] -> ([a], [[a]]) f xss = foldr h ([],[]) xss where h [] (acc1, acc2) = (acc1, []) h (x:xs) (acc1, acc2) = (x:acc1, xs:acc2) transp :: ([[a]],[[a]]) -> [[a]] transp (xs:xss, [] ) = reverse xss transp (xss, yss) = let (r, l) = f yss in transp ([r] ++ xss, l) 
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

This seems overwrought. Let's start with getCountElements.

It's often a good idea to consider not just the problem at hand, but a more general statement of what it is you're trying to solve. In this case, you're trying to get a frequency count of each distinct element in a list. At this point don't worry about the particulars of your problem (like the fact that you're only supposed to be counting Integers with values [0..9]), frequency counts are a well-defined problem for arbitrary data sets and as such lend themselves to a generalized solution.

So think about what a general solution would look like. Well, to start, your function will be handed a [a] ->, just some list of arbitrary elements. But we need to group up which ones are equal to each other so that implies that we must have an equality constraint, (Eq a) =>. After we find and count the equal elements you'll need to return that count, so -> [Int]. But that doesn't make sense now given our generalized problem, the position of each Int in the list says nothing about which as it represents, so instead we'll return -> [(a, Int)], a list of pairs of an element and the count of that element in the original list. Putting it all together we get—

frequencyCount :: (Eq a) => [a] -> [(a, Int)] 

Now we could make this work as-is†, but there's an important optimization we can make by tightening our constraints further. By first sorting the list we are given we can then easily group together identical elements, improving our asymptotic performance to O(n log n). Thus—

frequencyCount :: (Eq a, Ord a) => [a] -> [(a, Int)] 

Jumping back to your version before showing the implementation, let's compare what's going on between both already.

getCountOfElements :: [Integer] -> ([Integer],Integer) 

Program specifications and logic are threaded all through, from the specification of the domain to Integers, to the encoding of element identity in the position of the list in the return type, through to the inclusion of the largest count in the returned pair (a particularly odd design choice given the name of the function).

Getting back to implementing frequencyCount our implementation should be almost self-evident after having given the type. First we'll sort the elements to efficiently group the similar ones, then we'll run-length encode the list.

frequencyCount :: (Eq a, Ord a) => [a] -> [(a, Int)] frequencyCount = runLength . sort where runLength :: (Eq a) => [a] -> [(a, Int)] runLength = map (\es -> (head es, length es)) . group 

Et, voilà! Let's now contrast some of the properties of the getCountOfElements implementation.

  1. Due to the use of !! for indexing, if the [Integer] we're passed is out of bounds an exception will be thrown. Better filter the whole list of elements beforehand.
  2. count uses take, drop, and of course !!, all of which are O(n) operations on Haskell lists. That's three traversals of the index-encoded list for each element, consider the worst case performer of getCountOfElements (replicate 100000 9). This would have to be cleaned up through use of splitAt.
  3. Using ++ in count will walk the left portion of the accumulator list again, consider again that worst case performer.

In the face of all that, getCountOfElements is an O(n) algorithm (albeit with a very large constant factor) whereas frequencyCount is O(n log n). So despite the constructive beauty of frequencyCount, to do better than getCountOfElements I'll have to steal your algorithm and use it with an appropriate Haskell data structure for constant time indexing and updating.

Data.Array from the array package provides such a data structure for us, and in fact the very first visible code example in the documentation is for a histogram function.

hist :: (Ix a, Num b) => (a,a) -> [a] -> Array a b hist bnds is = accumArray (+) 0 bnds [(i, 1) | i<-is, inRange bnds i] 

To use this version we have to provide a pair of bounds, in this question's particular case (0, 9).

count09Elements :: [Integer] -> [(Integer, Int)] count09Elements = assocs . hist (0, 9) 

In conclusion, Haskell is a land of contrasts. So, we learned today that there are many different ways to skin a cat. Depending on what you know about your problem domain many different complexity classes of algorithms may be available to you. And don't use linked lists for operations requiring constant time updating.


Appendix

† An inefficient but important frequencyCount implementation. Note that performance is O(n²) (consider e.g., frequencyCount [0..100000]).

import Data.List (partition) frequencyCount :: (Eq a) => [a] -> [(a, Int)] frequencyCount [] = [] frequencyCount xs@(e:_) = (head es, length es) : frequencyCount xs' where (es, xs') = partition (== e) xs 
\$\endgroup\$
1
  • \$\begingroup\$ Thank you for your in depth analysis. The name getCountOfElements is a remnant of earlier attempts... That may clear the unfitting name :) The return type is like that because I thought it would be better to reuse parts of the list instead of a new traversal (like with an array where you can do a frequency count linearly). Now that you mention it: is getCountOfElements not quadratic because of ++? \$\endgroup\$ Commented Mar 28, 2015 at 18:38

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.