Tuesday, December 18, 2012

Using a web server for deferred computation

Bioinformatics lends itself to lengthy, offline computations on large datasets like Next-generation Sequencing runs, but I imagine any number of problems would follow the submit-form, check back in 10 minutes approach.

Often, web frameworks make this hard to do without resorting to secondary processes and message queues, because their response function doesn't close the socket until it returns.  (Eg, Django's views.py functions return Response objects).  Python's BaseHTTPServer / BaseHTTPRequestHandler combo turns out to make this easy:
 
import BaseHTTPServer
from SocketServer import ForkingMixIn
import socket

class Handler(BaseHTTPServer.BaseHTTPRequestHandler):

   def do_POST(self):
                path = self.path
                msg = ...
                self.send_response(200)
                self.send_header("Content-type", "text/html")
                self.end_headers()
                self.wfile.write(msg)
                self.wfile.close()
                self.connection.shutdown(socket.SHUT_RDWR)
                self.connection.close()

                long_computation() 


class ForkingHTTPServer(ForkingMixIn, BaseHTTPServer.HTTPServer):
  """
  Handle requests in a separate process.
  """
  pass

server = ForkingHTTPServer((host,port), handler_factory(override))
server.serve_forever()

The key is this bit:
   self.connection.shutdown(socket.SHUT_RDWR)
   self.connection.close()

Once you shutdown and close the socket, the HTTP client is released and you can run as lengthy a computation as you like.

Thursday, May 10, 2012

Through the JVM looking-glass.

So you've got a binary file full of integers in bog-standard x86 format, just a straight list of unsigned short ints.  And you need to read them into a Java ArrayList.  Sounds simple enough.

Now, you're an experienced C programmer, so you know you'll have to deal with endianness: the JVM is big-endian, and x86 is little-endian.  Every time you find 0xff00 in your file, you need to flip those bytes around to read 0x00ff.  So you write a simple loop -- in pseudocode,

int accumulator = 0;
while( b = file.readByte() ) {
   accumulator = accumulator | b;
   accumulator = accumulator << 8;
}

And low and behold, 0xff00 becomes ... -1.

You might have guessed it already, but the problem is that Java doesn't support unsigned integers and that first byte, 0xff, is read as -1 per 2's complement.

Now, if you can visualize all that bit-shifting and or'ing, you might not see this as a problem -- the 0xff will just be OR'ed into your short, then shifted.  This is not what happens.

Instead, our byte b, which has been interpreted as -1, is silently coerced into a short -- 0xff becomes 0xffff, which is then OR'ed with 0x0000 to yield 0xffff, aka -1.  Our next byte, 0x00, is also coerced and OR'ed to no effect.

[Why anyone would want a signed byte is a whole 'nother question -- next time I want to store a value between -128 and 127, I guess I know what to use.]

The fix is to undo the 2's complement -- to turn -1 into 255, like so:

int accumulator = 0;
while( b = file.readByte() ) {
   short s = (short) b;
   if( s < 0 ) {
      s = s + 0x100;
   }
   accumulator = accumulator | s;
   accumulator = accumulator << 8;
}

That's how you read a single unsigned short in Java.  Just so intuitive, don't you think?  Compare with Python:

   s = struct.unpack("B", file.read(1))

Thursday, February 2, 2012

Embedding iPython, redux

I've posted about iPython as an embedded debugger before; it turns out that the syntax has changed a bit:
  import IPython
  IPython.embed()

Friday, January 20, 2012

hdiutil and OS X data recovery

Your hard drive fails, so the first thing you do is grab dd and pull a byte-by-byte copy of the drive's contents:
   dd if=/dev/bad_drive of=bad_drive.img conv=noerror,sync

Problem is, if the drive was from a Mac and you want to use a recovery tool like Data Rescue or Drive Warrior, you'll probably need to attach your drive image as a loopback device, and double-clicking the file to mount it doesn't work when the filesystem is fragged.

Enter hdiutil:
   hdiutil attach -nomount bad_drive.img