Here I assume you just want to see how your current code can be improved without changing the algoritm.
First of all, give HLint tool a chance to suggest you obvious improvements. In your case the only improvement was that do in do readHeaderLine' "" was redundant, so not much.
Second, in my opinion many small top-level definitions are better than few large ones. You can still control namespace pollution by not exporting definitions local to the module:
import System.IO type HostName = String handleRequest :: HostName -> Handle -> IO () handleRequest host handle = do requestLine <- readHeaderLine handle putStrLn $ requestLine ++ "\n-------------------" -- FIXME: This code is bad, and its author should feel bad. readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = do chr <- hGetChar handle case chr of '\r' -> do nextChr <- hGetChar handle case nextChr of '\n' -> return s _ -> readHeaderLine' $ s ++ [chr, nextChr] _ -> readHeaderLine' $ s ++ [chr]
Next, your nextChr <- hGetChar handle ; case nextChr of limes appear twice, so you can extract a function and reduce code duplication.
Thanks to purity, you can do pretty mechanically:
foo handle quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz
Now replace the two fragments with calls to foo. Let's start with inner one:
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = do chr <- hGetChar handle case chr of '\r' -> do foo handle '\n' (return s) (readHeaderLine' $ s ++ [chr, nextChr]) _ -> readHeaderLine' $ s ++ [chr]
Heh, it didn't work because nextChr is only known inside foo. No problem, pass it as parameter to baz branch and use lambda to catch it:
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = do chr <- hGetChar handle case chr of '\r' -> do foo handle '\n' (return s) (\nextChr -> readHeaderLine' $ s ++ [chr, nextChr]) _ -> readHeaderLine' $ s ++ [chr] foo handle quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
So now you can do the outer one.
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = do foo handle '\r' (foo handle '\n' (return s) (\nextChr -> readHeaderLine' $ s ++ [chr, nextChr])) (\chr -> readHeaderLine' $ s ++ [chr])
No luck again as first chr is nowhere to get from. Fortunately we know it's always '\r', so
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = do foo handle '\r' (foo handle '\n' (return s) (\nextChr -> readHeaderLine' $ s ++ ['\r', nextChr])) (\chr -> readHeaderLine' $ s ++ [chr])
As line got too long, we can split it, removing another redundant do:
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = foo handle '\r' haveCR noCR where haveCR = foo handle '\n' (return s) haveCRnoLF noCR chr = readHeaderLine' $ s ++ [chr] haveCRnoLF nextChr = readHeaderLine' $ s ++ ['\r', nextChr]
Now there are more repeated patterns to eliminate: foo handle and readHeaderLine' $ s ++. To remove foo handle we move foo back in and remove its handle parameter both from applications and definition as it's now accessible from closure:
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = foo '\r' haveCR noCR where haveCR = foo '\n' (return s) haveCRnoLF noCR chr = readHeaderLine' $ s ++ [chr] haveCRnoLF nextChr = readHeaderLine' $ s ++ ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
To eliminate readHeaderline' repeated patterns we extract them into recurse local function:
readHeaderLine handle = readHeaderLine' "" where readHeaderLine' s = foo '\r' haveCR noCR where haveCR = foo '\n' (return s) haveCRnoLF noCR chr = recurse [chr] haveCRnoLF nextChr = recurse ['\r', nextChr] recurse x = readHeaderLine' $ s ++ x foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
This is how far you can get with mechanical code deduplication. Now it's time for heavier weapons. You can still:
- separate recursive code from non-recursive code
- separate monadic code from non-monadic code
The initial redHeaderLine' call can be implemented using recurse with an extra parameter:
readHeaderLine handle = recurse [] [] where recurse s x = readHeaderLine' $ s ++ x readHeaderLine' s = foo '\r' haveCR noCR where haveCR = foo '\n' (return s) haveCRnoLF noCR chr = recurse s [chr] haveCRnoLF nextChr = recurse s ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
Now we can inline readHeaderLine' as it is only applied once:
readHeaderLine handle = recurse [] [] where recurse s1 x = foo '\r' haveCR noCR where s = s1 ++ x haveCR = foo '\n' (return s) haveCRnoLF noCR chr = recurse s [chr] haveCRnoLF nextChr = recurse s ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
And we can remove duplication of recurse s:
readHeaderLine handle = recurse [] [] where recurse s1 x = foo '\r' haveCR noCR where s = s1 ++ x rf = recurse s haveCR = foo '\n' (return s) haveCRnoLF noCR chr = rf [chr] haveCRnoLF nextChr = rf ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
Now let's put return value of recurse into a local declaration g:
readHeaderLine handle = recurse [] [] where recurse s1 x = g where s = s1 ++ x rf = recurse s g = foo '\r' haveCR noCR haveCR = foo '\n' (return s) haveCRnoLF noCR chr = rf [chr] haveCRnoLF nextChr = rf ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
Our goal is to divorse g from recurse. You can do it by adding parameters to both recurse and g and localizing identifiers used only in recurse and used only in g:
readHeaderLine handle = recurse g [] [] where recurse g s1 x = g rf s where s = s1 ++ x rf = recurse g s
g rf s = foo '\r' haveCR noCR where haveCR = foo '\n' (return s) haveCRnoLF noCR chr = rf [chr] haveCRnoLF nextChr = rf ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
Now recurse is completely self-contained:
readHeaderLine handle = recurse g [] [] where g rf s = foo '\r' haveCR noCR where haveCR = foo '\n' (return s) haveCRnoLF noCR chr = rf [chr] haveCRnoLF nextChr = rf ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr
recurse g s1 x = g rf s where s = s1 ++ x rf = recurse g s
But g is still recursive: it has a nasty rf parameter which is an indirect recursive application. We need to move rf into recurse too. So here comes a trick: convert a function call into a constructor.
g can have only 3 return values: return s, rf [chr] and rf ['\r', nextChr]. We can represent them with a data type and return it instead of calling return or rf:
data Outcomes a b c = RF1 a | RF2 b | Return c readHeaderLine handle = recurse g [] [] where g rf s = foo '\r' haveCR noCR where haveCR = foo '\n' (return $ Return s) haveCRnoLF noCR chr = return $ RF1 [chr] haveCRnoLF nextChr = return $ RF2 ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr recurse g s1 x = analyzeOutcomes $ g rf s where s = s1 ++ x rf = recurse g s analyzeOutcomes outcomeM = do outcome <- outcomeM case outcome of RF1 a -> rf a RF2 a -> rf a Return a -> return a
Now rf parameter is unused, so we can clean the definitions of g and recurse:
data Outcomes a b c = RF1 a | RF2 b | Return c
readHeaderLine handle = recurse g [] [] where g s = foo '\r' haveCR noCR where haveCR = foo '\n' (return $ Return s) haveCRnoLF noCR chr = return $ RF1 [chr] haveCRnoLF nextChr = return $ RF2 ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr recurse g s1 x = analyzeOutcomes $ g s where s = s1 ++ x rf = recurse g s analyzeOutcomes outcomeM = do outcome <- outcomeM case outcome of RF1 a -> rf a RF2 a -> rf a Return a -> return a
Now two more improvements: a) RF1 and RF2 outcomes can be joined into one outcome as they are handled uniformly and have the same types; b) the only reason we pass s is to return it in Return outcome, so we can eliminate s argument of g too.
data Outcomes a = RF a | Return readHeaderLine handle = recurse g [] [] where g = foo '\r' haveCR noCR where haveCR = foo '\n' (return Return) haveCRnoLF noCR chr = return $ RF [chr] haveCRnoLF nextChr = return $ RF ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr recurse g s1 x = analyzeOutcomes g where s = s1 ++ x rf = recurse g s analyzeOutcomes outcomeM = do outcome <- outcomeM case outcome of RF a -> rf a Return -> return s
Now rf and analyzeOutcomes are used only once and Outcomes type became the same as Maybe. So:
readHeaderLine handle = recurse g [] [] where g = foo '\r' haveCR noCR where haveCR = foo '\n' (return Nothing) haveCRnoLF noCR chr = return $ Just [chr] haveCRnoLF nextChr = return $ Just ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr recurse outcomeM s1 x = do outcome <- outcomeM let s = s1 ++ x in case outcome of Just a -> recurse outcomeM s a Nothing -> return s
Now s1 and x are only used in recurse to construct s. We can then construct s outside of recurse and pass it. Also, g now can be inlined.
readHeaderLine handle = recurse (foo '\r' haveCR noCR) [] where haveCR = foo '\n' (return Nothing) haveCRnoLF noCR chr = return $ Just [chr] haveCRnoLF nextChr = return $ Just ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else baz chr recurse outcomeM s = do outcome <- outcomeM case outcome of Just a -> recurse outcomeM (s ++ a) Nothing -> return s
Now note that outcomeM is just a constant and it is not changed across recursive calls. So we can proceed further with our splitting of recursive and non-recursive code:
recurse outcomeM s = f s where f s = do outcome <- outcomeM case outcome of Just a -> f (s ++ a) Nothing -> return s
And duplicate return $ Just can be moved inside foo:
readHeaderLine handle = recurse (foo '\r' haveCR noCR) [] where haveCR = foo '\n' (return Nothing) haveCRnoLF noCR chr = [chr] haveCRnoLF nextChr = ['\r', nextChr] foo quux bar baz = do chr <- hGetChar handle if chr == quux then bar else return (Just $ baz chr)
After renaming of nonsense identifiers in definition of foo we get:
readHeaderLine handle = recurse (match '\r' haveCR noCR) [] where haveCR = match '\n' (return Nothing) haveCRnoLF noCR chr = [chr] haveCRnoLF nextChr = ['\r', nextChr] match expectedChar onSuccess onFailure = do actualChar <- hGetChar handle if actualChar == expectedChar then onSuccess else return (Just $ onFailure actualChar) recurse outcomeM s = f s where f s = do outcome <- outcomeM case outcome of Just a -> f (s ++ a) Nothing -> return s
hGetLine? There are many alternatives to your code. For example, you can just usehGetContentand parse the resulting list. \$\endgroup\$hGetContentwon’t work, since the length of the data isn’t known before the headers are parsed.hGetLinewon’t work if the headers contain a newline. \$\endgroup\$hGetContentdoesn't require length to be known in advance. \$\endgroup\$