Web Server Programming

John "Scooter" Morris

April 12, 2014

Portions Copyright © 2005-06 Python Software Foundation.

Web Programming

  • Web programming is a very broad topic
  • We're going to break it up into five parts:
    1. HTTP basics
    2. Client programming:
      • HTML Forms
      • Javascript
    3. Server as a client:
      • Downloading and analyzing content from the web
    4. Server programming:
      • CGI
    5. Putting it all together:
      • AJAX
  • Our focus is on web programming
    • See the Software Carpentry lecture for a more general discussion of Internet programming

The Server as a Client

  • The ability to fetch and parse content from the web is an essential part of modern bioinformatics
    • For example, using NCBI eutils to pull data from Entrez or PubMed.
  • In this context, your server program becomes a client to someone else's server

Fetching Pages

  • Opening sockets, constructing HTTP requests, and parsing responses is tedious
    • So most languages provide libraries to do the work for you
    • In Python, that library is called urllib
  • urllib.urlopen(URL) does what your browser would do if you gave it the URL
    • Parse it to figure out what server to connect to
    • Connect to that server
    • Send an HTTP request
    • Returns an object that looks like a file, from which to read response data

urllib Example

  • Read a page the easy way
    import urllib
    instream = urllib.urlopen("http://www.third-bit.com/greeting.html")
    lines = instream.readlines()
    for line in lines:
        print line,
  • Note: readlines wouldn't do the right thing if the thing being read was an image
    • Might try to convert “line endings”
    • Use read to grab the bytes in that case

Building A Spider

  • A web spider is a program that can explore the web on its own
    • Fetch a page, extract all the external links, visit those pages…
    • That, a search engine, and a few billion dollars, and you're Google
$ python spider.py http://www.google.ca
import sys, urllib, re

url = sys.argv[1]
instream = urllib.urlopen(url)
page = instream.read()

links = re.findall(r'href=\"[^\"]+\"', page)
temp = set()
for x in links:
    x = x[6:-1]    # strip off 'href="' and '"'
    if x.startswith('http://'):
links = list(temp)
for x in links:
    print x

Passing Parameters

  • Sometimes want to provide extra information as part of a URL
    • Example: when searching on Google, have to specify what the search terms are
  • Could do this as part of the URL
    • Amazon puts ISBNs in URLs
  • More flexible to add parameters to the URL
    • http://www.google.ca?q=Python searches for pages related to Python
    • "?" separates the parameters from the rest of the URL
    • If there are multiple parameters, they are separated from each other by "&"
      • E.g., http://www.google.ca/search?q=Python&client=firefox

Special Characters

Table 3: URL Encoding
Character Encoding
"#" %23
"$" %24
"%" %25
"&" %26
"+" %2B
"," %2C
"/" %2F
":" %3A
";" %3B
"=" %3D
"?" %3F
"@" %40
  • What if you want to include "?" or "&" in a parameter?
    • Same problem (and solution) as including a quote in a string, or <> in XML
  • URL encode special characters using "%" followed by a 2-digit hexadecimal code
    • And replace spaces with "+"

Encoding Example

  • To search Google for “grade = A+”, use
  • urllib has functions to make this easy
    • urllib.quote(str) replaces special characters in str with escape sequences
    • urllib.unquote(str) replaces escape sequences with characters
    • urllib.urlencode(params) takes a dictionary and constructs the entire query parameter string
    • import urllib
      print urllib.urlencode({'surname' : 'Von Neumann', 'forename' : 'John'})

Screen Scraping (And Why Not)

  • Suppose you want to write a script that actually does search Google
    • Construct a URL: easy
    • Send it and read the response: no problem
    • Parse the response: there's a lot of junk on the page…
  • Many first-generation web applications relied on screen scraping
    • “Parse” the HTML with regular expressions
  • Hard to get right if the page layout is complex
    • And whenever the layout changes, the application breaks
  • Now, there's a better way...

Web Services

[Web Services]

Figure 5: Web Services

  • Modern web services separate data from presentation
    • When a client sends a request, it indicates that it wants machine-readable XML, rather than human-readable HTML
      • Much easier to parse
      • Much less likely to change over time
  • Many web services use the Simple Object Access Protocol (SOAP) standard
    • Despite its name, it's anything but simple
    • Luckily, there are libraries to hide the details for most widely-used web services

Web Services (REST)

  • Today, it's more common to use REST (REpresentational State Transfer).
    • REST Principles:
      • Give every "thing" an ID
      • Link things together
      • Use standard methods
      • Resources with multiple representations
      • Communicate statelessly
  • Basically, encode what you want in a URL:
    • http://example.com/customers/1234
    • http://example.com/orders/2007/10/776654
    • http://example.com/products/4554
    • http://example.com/processes/salary-increase-234
  • Then return data as XML or JSON
  • Generally much simpler than SOAP

The Server As A Client


Server Programming

  • Users want to make the web do different things
    • How to let them write programs that handle HTTP requests?
  • Option #1: Require them to write socket-level code
    • Complicated and error-prone
    • Can only have one program listening to a socket at a time
  • Option #2: have the web server accept the HTTP request, and then run the user's code
    • Recompiling the web server every time someone wants to add functionality would be a pain
    • So define a protocol that lets web servers run other programs

The CGI Protocol

  • The Common Gateway Interface (CGI) protocol specifies:
    • How a web server passes information to a program
    • How that program passes information back to the web server
  • CGI does not specify:
    • A particular language
      • You can use Fortran, the shell, C, Java, Perl, Python…
    • How the web server figures out what program to run
      • Each web server has its own rules
      • We'll (briefly) talk about Apache's

From Server To CGI

[CGI Data Processing Cycle]

Figure 5: CGI Data Processing Cycle

    • Web server runs the CGI by creating a new process
    • Web server passes some information to the CGI process through environment variables
    • The web server may also send CONTENT_LENGTH bytes to the CGI on standard input
      • E.g., when a file is being uploaded
Table 4: Important CGI Environment Variables
Name Purpose Example
REQUEST_METHOD What kind of HTTP request is being handled GET or POST
SCRIPT_NAME The path to the script that's executing /cgi-bin/post_photo.py
QUERY_STRING The query parameters following "?" in the URL name=mydog.jpg&expires=never
CONTENT_TYPE The type of any extra data being sent with the request img/jpeg
CONTENT_LENGTH How much extra data is being sent with the request (in bytes) 17290

From CGI To Server

  • The CGI program sends data back to the web server by printing it to standard output
  • The web server then forwards this directly to the client
    • Which means that the CGI program is responsible for creating headers
  • Note: none of this works unless the web server has been configured to run the CGI
    • By default, modern servers won't do this unless they're told they can

MIME Types

  • Clients and servers need a way to specify data types to each other
    • Remember, bytes are just bytes: the browser doesn't magically know how to interpret them
  • Multipurpose Internet Mail Extensions standard specifies how to do this
    • Organizes data types into families, and provides a two-part name for each type
    • Use the "Content-Type" header to specify the MIME type of the data being sent
Table 5: Example Mime Types
Family Specific Type Describes
Text text/html Web pages
Image image/jpeg JPEG-format image
Audio audio/x-mp3 MP3 audio file
Video video/quicktime Apple Quicktime video format
Application-specific data application/pdf Adobe PDF document

Hello, CGI

  • Simplest possible CGI pays no attention to query parameters or extra data
    • Just prints HTML to standard output, to be relayed to the client
    • Along with a Content-Type header to tell the client to expect HTML…
    • …and a blank line to separate the headers from the data
#!/usr/bin/env python

# Headers and an extra blank line
print 'Content-type: text/html'

# Body
print '<html><body><p>Hello, CGI!</p></body></html>'

Invoking a CGI

[Basic CGI Output]

Figure 6: Basic CGI Output

Generating Dynamic Content

[Environment Variable Output]

Figure 7: Environment Variable Output

  • But the whole point of CGI is to generate content dynamically
    • E.g., show a list of environment variables and their values
  • You'll use this frequently when debugging…
#!/usr/bin/env python

import os, cgi

# Headers and an extra blank line
print 'Content-type: text/html'

# Body
print '<html><body>'
keys = os.environ.keys()
for k in keys:
    print '<p>%s: %s</p>' % (cgi.escape(k), cgi.escape(os.environ[k]))
print '</body></html>'

A Simple Form (reprise)

[A Simple Form]

Figure 4: A Simple Form

    <form action="/bmi219/cgi-bin/print_params.py">
      <p>Sequence: <input type="text" name="sequence"/>
      Search type:
      <select name="match">
        <option>Exact match</option>
        <option>Similarity match</option>
      <input type="checkbox" name="frog">
        FROG (version 1.1)
      <input type="checkbox" name="frog2">
        FROG (2.0 beta)
      <input type="checkbox" name="bayeshart">
        <input type="submit" value="Submit Query"/>
        <input type="reset" value="Reset"/>

Parameter Names

  • Each <input/> element has a name attribute
    • These become the names of the parameters that the client sends to the server
    • The input elements' values are the parameters' values
  • Submitting the form shown above with default values produces:
    • os.environ['REQUEST_METHOD']: "POST"
    • os.environ['SCRIPT_NAME']: "/cgi-bin/simple_form.py"
    • os.environ['CONTENT_TYPE']: "application/x-www-form-urlencoded"
    • os.environ['REQUEST_LENGTH']: "80"
    • Standard input: sequence=GATTACA&search_type=Similarity+match&program=FROG-11&program=Bayes-Hart

Handling Forms

  • Could handle form data directly
    • Read and parse environment variables
    • Read extra data from standard input
  • But the mechanics are the same each time, so use Python's cgi module instead
    • Defines a dictionary-like object called FieldStorage
      • Keys are parameter names
      • Values are either strings (if there's a single value assocatied with the parameter) or lists (if there are many)
  • When a FieldStorage object is created, it reads and stores information contained in the URL and environment
    • Which means that a CGI program should only ever create one
  • Program can read extra data from sys.stdin

Form Handling Example

  • Example: show the parameters send to a script
    #!/usr/bin/env python
    import cgi
    print 'Content-type: text/html'
    print '<html><body>'
    form = cgi.FieldStorage()
    for key in form.keys():
        value = form.getvalue(key)
        if isinstance(value, list):
            value = '[' + ', '.join(value) + ']'
        print '<p>%s: %s</p>' % (cgi.escape(key), cgi.escape(value))
    print '</body></html>'
Table 6: Example Parameter Values
URL Value of a Value of b
http://www.third-bit.com/swc/show_params.py?a=0 "0" None
http://www.third-bit.com/swc/show_params.py?a=0&b=hello "0" "hello"
http://www.third-bit.com/swc/show_params.py?a=0&b=hello&a=22 [0, 22] "hello"

Development Tips

  • During development, add
    import cgitb; cgitb.enable()
    to the top of the program
    • cgitb is the CGI traceback module
    • When enabled, it will create a web page showing a stack trace when something goes wrong in your script
  • Testing whether a FieldStorage value is a string or a list is tedious
    • In almost all cases, you'll know whether to expect one value or many
    • Use FieldStorage.getfirst(name) to get the unique value
      • Returns the first, if there are many
    • FieldStorage.getlist(name) always returns a list of values
      • Empty list if there's no data associated with name
      • If there's only one value, get a single-item list

Maintaining State

[Three Tier Architecture]

Figure 9: Three Tier Architecture

  • Often want to change the data a server is managing, as well as read it
    • Update a description of an experiment, change your preferred email address, etc.
  • The industrial-strength solution is to use a three-tier architecture
    • CGI program stuffs parameters from HTTP requests into SQL queries
    • Runs the queries
    • Translates results into HTML to send back to the client

Maintaining State in Files

  • Simple programs can often get away with using files
    • The CGI program re-reads the file each time it processes a request
    • And re-writes it if there have been any updates
  • Example: append messages to a web page
    • Old messages are saved in a file, one per line
    • Hi, is anyone reading this site?
      I was wondering the same thing.
      I wasn't sure if we were supposed to post here.
      Good point.  Is there way to delete messages?
  • Script checks the incoming parameters to decide what to do
    • If newmessage is there, append it, and display results
    • If newmessage isn't there, someone's visiting the page, rather than submitting the form
    • # Get existing messages.
      infile = open('messages.txt', 'r')
      lines = [x.rstrip() for x in infile.readlines()]
      # Add more data?
      form = cgi.FieldStorage()
      if form.has_key('newmessage'):
          outfile = open('messages.txt', 'w')
          for line in lines:
              print >> outfile, line


  • AJAX (Asynchronous Javascript And XML) provides:
    • Increased usability (on-page server interaction)
    • Client-side state
    • Enhanced user experience (possibly)
  • Basic AJAX idea:
    1. Javascript in browser sends a request to server using XMLHttpRequest()
    2. Server CGI processes request and sends back response, usually as an XML document
    3. Javascript in browser receives response, parses the XML and (using DOM) extracts the information
    4. Results are presented to the user, or used to modify the interface in some way

XMLHttpRequest Example

// Handle the XMLHttpRequest
function sendRequest(sql)
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/cgi-bin/getBmi219Table.py?sql="+sql, true);

// This method gets called whenever the object state changes.
function getData()
  // Are we complete? 
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);

XMLHttpRequest Methods

Table 7: XMLHttpRequest Methods
Method Description
abort()Cancels the current request
getAllResponseHeaders() Returns the complete set of http headers as a string
getResponseHeader("headername") Returns the value of the specified http header
open("method","URL",async,"username","password") Specifies the method, URL, and other optional attributes of a request

The method parameter can have a value of "GET", "POST", or "PUT" (use "GET" when requesting data and use "POST" when sending data (especially if the length of the data is greater than 512 bytes.

The URL parameter may be either a relative or complete URL.

The async parameter specifies whether the request should be handled asynchronously or not. true means that script processing carries on after the send() method, without waiting for a response. false means that the script waits for a response before continuing script processing

send(content) Sends the request
setRequestHeader("label", "value") Adds a label/value pair to the http header to be sent

XMLHttpRequest Properties

Table 8: XMLHttpRequest Properties
onreadystatechange An event handler for an event that fires at every state change
readyState Returns the state of the object:

0 = uninitialized
1 = loading
2 = loaded
3 = interactive
4 = complete

responseText Returns the response as a string
responseXML Returns the response as XML. This property returns an XML document object, which can be examined and parsed using W3C DOM node tree methods and properties
status Returns the HTTP status as a number (e.g. 404 for "Not Found" or 200 for "OK")
statusText Returns the HTTP status as a string (e.g. "Not Found" or "OK")

AJAX Server Side

  • The server implementation is just a CGI
  • Arguments can be handled using the python cgi module
  • Be sure to return a proper XML document if you want to use XMLHttpRequest.responseXML:
#! /usr/bin/python
import cgi
import sys

print "Content-type: text/xml"
print ""
# We want this to be interpreted as HTML by the client
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

print '<html xmlns="http://www.w3.org/1999/xhtml">'
  • Note that like any CGI, we need to send the Content-type, followed by a blank line
  • In this example, we want the browser to parse this as XHTML, could be any arbitrary XML, though

Putting it together

    • Consider the Example Application
    • It is using SVG for the graphics, HTML forms for the input, and AJAX to query the backend database and populate the tables
    • There is also a fair amount of JavaScript and CSS trickery going on
    • The application is made up of 4 files:
      • bmi219/bmi219.svg: the XHTML+SVG file that makes up the front-end
      • bmi219/css/bmi219.css: the stylesheet for both the XHTML and SVG
      • bmi219/js/bmi219.js: the JavaScript that drives the application
      • cgi-bin/getBmi219Table.py: the server-side component


<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xml:lang="en" lang="en">
<script type="text/javascript" src="js/bmi219.js"></script>
<link rel="stylesheet" type="text/css" href="css/bmi219.css"/>
<h3>BMI219 - AJAX Example</h3>
<svg:svg id="svg-root" width="100%" viewBox="0 0 800 100" version="1.1" >
  <!-- Surrounding Rectangle -->
  <svg:rect x="0" y="0" width="800" height="100" style="stroke: blue; fill: none;"/>
  <!-- Recipe Entity -->
  <svg:rect x="40" y="30" width="60" height="40" class="entity" onclick="showInput('recipe_input', this);"/>
  <svg:text x="50" y="52" class="label1">Recipe</svg:text>
  <svg:line x1="100" y1="50" x2="330" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Fragment Entity -->
  <svg:rect x="330" y="30" width="60" height="40" class="entity" onclick="showInput('fragment_input', this);"/>
  <svg:text x="334" y="52" class="label1">Fragment</svg:text>
  <svg:line x1="390" y1="50" x2="630" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Gene Entity -->
  <svg:rect x="630" y="30" width="60" height="40" class="entity" onclick="showInput('gene_input', this);"/>
  <svg:text x="647" y="52" class="label1">Gene</svg:text>
  <!-- Produces relationship -->
  <svg:rect x="200" y="30" width="40" height="40" class="relationship" transform="rotate(-45,220,50)" onclick="showInput('recipe_input_join', this);"/>
  <svg:text x="201" y="52" class="label2">Produces</svg:text>
  <!-- Contains relationship -->
  <svg:rect x="500" y="30" width="40" height="40" class="relationship" transform="rotate(-45,520,50)" onclick="showInput('gene_input_join', this);"/>
  <svg:text x="501" y="52" class="label2">Contains</svg:text>
  <!-- Links and orders -->

<!-- This is the form: Note that each <span> has an ID and a class that we will use to 
     control whether we show the containing input field or not.  Also note specifically
     the way we call getTable with the arguments we want. -->
  <span id="recipe_input" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE','RECIPE.NAME', this, 'Name,File,Owner',null);"/>
  <span id="recipe_input_join" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE,PRODUCES,FRAG','RECIPE.NAME', this, 'RECIPE.Name,RECIPE.Owner,PRODUCES.Date,FRAG.Name,FRAG.Sequence','RECIPE.RCP=PRODUCES.RCP and PRODUCES.FRAG=FRAG.FRAG');"/>
  <span id="fragment_input" class="hidden" style="position: absolute; left: 35%;">
    Fragment Name: <input type="text" onchange="getTable('FRAG','FRAG.NAME', this, 'Name,Sequence,Circular',null);"/>
  <span id="gene_input_join" class="hidden">
    Gene Name: <input type="text" onchange="getTable('FRAG,CONTAINS,GENE','GENE.NAME', this, 'FRAG.Name,FRAG.Sequence,GENE.Name,CONTAINS.Start,CONTAINS.End','FRAG.FRAG=CONTAINS.FRAG and GENE.ID=CONTAINS.GENE');"/>
  <span id="gene_input" class="hidden" style="position: absolute; left: 70%;">
    Gene Name: <input type="text" onchange="getTable('GENE','GENE.NAME', this, 'Name,Protein,StartNum',null);"/>

<!-- We'll write a header into this <h3> when we get the data -->
<h3 id="table_header" class="table_header"> </h3>
<!-- We'll write the results table into this when we get the data -->
<div id="results_div">


rect.entity { fill: purple; stroke-width: 2px;}
rect.relationship { fill: lightgreen; stroke-width: 2px;}
text.label1 {fill:white; font-size:8pt; font-family: arial; font-weight: bold;}
text.label2 {fill:blue; font-size:6pt; font-family: arial; font-weight: bold;}
span.hidden {visibility: hidden; }
span.shown {visibility: visible; }
tr.table-header {font-weight: bold; text-align: center; color: green; font-family: arial;}
h3.table_header {font-family: arial; text-align: center;}
table {font-family: arial; font-size: 80%;}


var elementShown = null;
var xmlhttp = null;
var selectedRect = null;

// ShowInput just controls the presentation of the name
// of the row we are looking for
function showInput(elementID, rect) {
  // Get a pointer to the element that called us
  var element = document.getElementById(elementID);

  // Do we already have a text input element showing?
  if (elementShown != null)
    elementShown.className = "hidden"; // Yes, hide it

  // Do we already have a rectangle highlighted?
  if (selectedRect != null)
    selectedRect.setAttributeNS(null, "stroke", "none"); // Yes, hide it

  // Show the text input
  element.className = "shown";
  elementShown = element;

  // Outline the element the user clicked on
  // Note that we need to use setAttributeNS for SVG attributes
  rect.setAttributeNS(null, "stroke", "black");
  selectedRect = rect;

// This is the method that gets called when a text field is changed
function getTable(tableName, column, textField, fields, where) {
    var text = textField.value; // This contains the value the user entered

    // Now, create the SELECT statement
    var sql = 'SELECT '+fields+' from '+tableName;
    if (text.length >= 2 || where != null) {
      sql += ' where ';
      if (text.length >= 2) {
        sql += column+' = "'+text+'"';
        if (where != null) {
          sql += ' AND '+where;
      } else {
        sql += where;
    sql += ';';
    // Uncomment the next line to see what we pulled together
    // alert(sql);
    // Issue the request.  Because our XMLHttpRequest call is
    // asynchronous, this will return immediately
    // Clear the text field
    textField.value = "";
    // Add a header
    header = document.getElementById("table_header");
    header.innerHTML = tableName;
    // Clear the old table
    var tableDiv = document.getElementById("results_div");
    while (tableDiv.firstChild) {
// Handle the XMLHttpRequest
function sendRequest(sql)
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/bmi219/cgi-bin/getBmi219Table.py?sql="+sql, true);

// This method gets called whenever the object state changes
function getData()
  // Are we complete?
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);


#! /usr/local/bin/python

import cgi
import cgitb
import sys
import sqlite3

def returnError(errorString): 
  print """<html xmlns="http://www.w3.org/1999/xhtml">
    <body> <h3 id="results_table" style="color:red;">%s</h3> </body>


print "Content-type: text/xml"
print ""
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

# Get the form data
form = cgi.FieldStorage()
if not (form.has_key("sql")):
  returnError("No SQL string?")

sqlStatement = form["sql"].value
rows = None

  conn = sqlite3.connect ("/home/socr/b/bmi219/bmi219.db")
  cursor = conn.cursor()
  rows = cursor.fetchall()

except sqlite3.Error, e:

print '<html xmlns="http://www.w3.org/1999/xhtml">'
print '<body>'
print   '<table id="results_table" border="1" width="80%" align="center">'
print     '<tr class="table-header">',
for column in cursor.description:
  print '<td>'+column[0]+'</td>',
print     '</tr>'
for row in rows:
  print     '<tr>',
  for cell in row:
    print '<td>'+str(cell)+'</td>',
  print     '</tr>'
print   '</table>'
print '</body>'
print '</html>'

AJAX - Questions?

  • Questions about CGI or AJAX?

HTML Templating

  • A lot of this program is devoted to copying values into an HTML template
    • There are lots of good systems out there, in many languages, for doing this
    • Kid in Python
    • Java Server Pages (JSPs) in Java
    • Please do not write one of your own

What About Concurrency?

  • What happens if two users try to save messages at the same time?
    • I/O is typically slower than processing
    • So most web servers try to overlap operations
  • Race condition:
    • First instance of message_form.py opens messages.txt, reads lines, closes file
    • Second instance opens messages.txt, reads the same lines, closes file
    • First instance re-opens file, writes out original data plus one new line
    • Second instance re-opens file, writes out original plus a different new line
    • First instance's message has been lost!

File Locking

  • Solution is to lock the file
    • As the name implies, gives one process exclusive rights to the file
    • After the first process acquires the lock, any other process that tries to read or write the file is suspended until the first releases it
  • Mechanics are different on different operating systems
    • But the Python Cookbook includes a generic file locking function that works on both Unix and Windows

Implementing Locking

# Get existing messages.
msgfile = open('messages.txt', 'r+')
fcntl.flock(msgfile.fileno(), fcntl.LOCK_EX)
lines = [x.rstrip() for x in msgfile.readlines()]

# Add more data?
form = cgi.FieldStorage()
if form.has_key('newmessage'):
    for line in lines:
        print >> msgfile, line

# Unlock and close.
fcntl.flock(msgfile.fileno(), fcntl.LOCK_UN)

Who Are You?

  • How to maintain state on the client?
    • Need to know which shopping cart to display for a particular user
  • HTTP is a stateless protocol
    • If a client makes a second (or third, or fourth…) request, server has no reliable way of connecting it to the first one
  • Can guess based on client address, elapsed time, etc.
    • But it's just a guess



Figure 10: Cookies

  • Solution is for the server to create a cookie
    • A string that is sent to the client in an HTTP response header
  • Client saves it (either in memory or on disk)
  • The next time the client sends a request to the site, it sends the cookie back to the server
    • Like giving someone a claim check for their luggage

Creating Cookies

  • Represent cookies in Python using Cookie.SimpleCookie
    • Do not use SmartCookie: it is potentially insecure
  • When creating, add values to a cookie as if it were a dictionary
    • Convert it to a string (e.g., by printing it) to create the required HTTP header
  • When the cookie comes back:
    • Get the value associated of the environment variable "HTTP_COOKIE"
    • Create a SimpleCookie
    • Pass the "HTTP_COOKIE" value to the cookie's load method

Cookie Example

  • Example: count the number of times a user has visited a web site
    • If there's no cookie, create one with a count of 1
    • Otherwise, increment the count
    • Create a new cookie to send back to the user
    • Display the count
# Get old count.
count = 0
if os.environ.has_key('HTTP_COOKIE'):
    cookie = Cookie.SimpleCookie()
    if cookie.has_key('count'):
        count = int(cookie['count'].value)

# Create new count.
count += 1
cookie = Cookie.SimpleCookie()
cookie['count'] = count

# Display.
print 'Content-Type: text/html'
print cookie
print '<html><body>'
print '<p>Visits: %d</p>' % count
print '</body></html>'

Cookie Tips

  • Can control how long a cookie is valid by setting an expiry value
    • Either the number of milliseconds
    • Or the time it should expire (in UTC )
      • Use time.asctime(time.gmtime()) to create the value
  • Do not put sensitive information in cookies
    • Browsers store them in files on disk
    • Villains can watch network traffic, and steal data
  • Cookies should instead be random values that act as keys into server-side information


  • Before class tomorrow get started hooking your front-end to your back-end:
    1. Write stubs on the back-end to handle any CGI or AJAX calls
    2. Implement input forms or actions in your front-end that call your server stubs