Thursday, April 9, 2009

The perils of laziness

I've begun cargo culting my way into Haskell, in the hopes that I'll eventually grasp such mathematical mysteries as anamorphisms. In the mean time, I'd like to write some code that actually works.

To this end, I've written a program to digest the C. elegans transcriptome in silico, then check the resulting fragments against data from a mass spectroscopy experiment (Multidimensional Protein Identification Technology --"MudPIT"--if you're curious). I use regular expressions to break up the protein sequences, store the fragments in a big Data.Map, and then test my MudPIT fragments against the Map.

I've greatly simplified my function to be a simple list-to-map conversion -- which would be stupid, since Data.Mat.fromList does this quite well -- but the real function involves plenty of off-topic complexity, so here's the svelt form:
  add_frags (i:l) m =
let m' = M.insert i () m in
add_frags l $! m'
add_frags [] m = m
The important part is the $!, in bold. Given some function f and parameter x, f $! x is equivalent to x `seq` f x; the seq function, meanwhile, forces the evaluation of the first thunk. So f $! x applies f to x AFTER x HAS BEEN EVALUATED.

Without that forced evaluation, ghc will simply add more thunks to the stack until the stack is blown ... hours after the run has begun. You can pretty easily imagine this as a series of substitutions, which is exactly what we're talking about:
  import qualified Data.Map as M
m = M.empty
add_frags ["ACDEF","GHI","KLMNO","PQRST",...] m
-> M.insert "ABCDEF" () (add_frags ["GHI","KLMNO","PQRST",...] m)
-> M.insert "ABCDEF" () (M.insert "GHI" () (add_frags ["KLMNO","PQRST",...] m))
And off the stack grows, until pop.

No comments: