Tuesday, December 18, 2012

Using a web server for deferred computation

Bioinformatics lends itself to lengthy, offline computations on large datasets like Next-generation Sequencing runs, but I imagine any number of problems would follow the submit-form, check back in 10 minutes approach.

Often, web frameworks make this hard to do without resorting to secondary processes and message queues, because their response function doesn't close the socket until it returns.  (Eg, Django's views.py functions return Response objects).  Python's BaseHTTPServer / BaseHTTPRequestHandler combo turns out to make this easy:
 
import BaseHTTPServer
from SocketServer import ForkingMixIn
import socket

class Handler(BaseHTTPServer.BaseHTTPRequestHandler):

   def do_POST(self):
                path = self.path
                msg = ...
                self.send_response(200)
                self.send_header("Content-type", "text/html")
                self.end_headers()
                self.wfile.write(msg)
                self.wfile.close()
                self.connection.shutdown(socket.SHUT_RDWR)
                self.connection.close()

                long_computation() 


class ForkingHTTPServer(ForkingMixIn, BaseHTTPServer.HTTPServer):
  """
  Handle requests in a separate process.
  """
  pass

server = ForkingHTTPServer((host,port), handler_factory(override))
server.serve_forever()

The key is this bit:
   self.connection.shutdown(socket.SHUT_RDWR)
   self.connection.close()

Once you shutdown and close the socket, the HTTP client is released and you can run as lengthy a computation as you like.