Web Applications

John "Scooter" Morris

April 13, 2017

Web Programming

Web programming is a very broad topic
We're going to break it up into five parts:
1. HTTP basics
2. Client programming:
  - HTML Forms
  - Javascript
Our focus is on web programming

See the Software Carpentry lecture for a more general discussion of Internet programming

The Server as a Client

The ability to fetch and parse content from the web is an essential part of modern bioinformatics
- For example, using NCBI eutils to pull data from Entrez or PubMed.
In this context, your server program becomes a client to someone else's server

Fetching Pages

Opening sockets, constructing HTTP requests, and parsing responses is tedious
- So most languages provide libraries to do the work for you
- In Python, that library is called urllib
urllib.urlopen(URL) does what your browser would do if you gave it the URL
- Parse it to figure out what server to connect to
- Connect to that server
- Send an HTTP request
- Returns an object that looks like a file, from which to read response data

urllib Example

Read a page the easy way

import urllib

instream = urllib.urlopen("http://www.third-bit.com/greeting.html")
lines = instream.readlines()
instream.close()
for line in lines:
    print line,

Note: readlines wouldn't do the right thing if the thing being read was an image
- Might try to convert “line endings”
- Use read to grab the bytes in that case

Building A Spider

A web spider is a program that can explore the web on its own
- Fetch a page, extract all the external links, visit those pages…
- That, a search engine, and a few billion dollars, and you're Google

$ python spider.py http://www.google.ca
http://groups.google.ca/grphp?hl=en&tab=wg&ie=UTF-8
http://news.google.ca/nwshp?hl=en&tab=wn&ie=UTF-8
http://scholar.google.com/schhp?hl=en&tab=ws&ie=UTF-8
http://www.google.ca/fr

import sys, urllib, re

url = sys.argv[1]
instream = urllib.urlopen(url)
page = instream.read()
instream.close()

links = re.findall(r'href=\"[^\"]+\"', page)
temp = set()
for x in links:
    x = x[6:-1]    # strip off 'href="' and '"'
    if x.startswith('http://'):
        temp.add(x)
links = list(temp)
links.sort()
for x in links:
    print x

Passing Parameters

Sometimes want to provide extra information as part of a URL
- Example: when searching on Google, have to specify what the search terms are
Could do this as part of the URL
- Amazon puts ISBNs in URLs
More flexible to add parameters to the URL
- http://www.google.ca?q=Python searches for pages related to Python
- "?" separates the parameters from the rest of the URL
- If there are multiple parameters, they are separated from each other by "&"
  - E.g., http://www.google.ca/search?q=Python&client=firefox

Special Characters

Table 3: URL Encoding
Character	Encoding
`"#"`	`%23`
`"$"`	`%24`
`"%"`	`%25`
`"&"`	`%26`
`"+"`	`%2B`
`","`	`%2C`
`"/"`	`%2F`
`":"`	`%3A`
`";"`	`%3B`
`"="`	`%3D`
`"?"`	`%3F`
`"@"`	`%40`

What if you want to include "?" or "&" in a parameter?
- Same problem (and solution) as including a quote in a string, or <> in XML
URL encode special characters using "%" followed by a 2-digit hexadecimal code
- And replace spaces with "+"

Encoding Example

To search Google for “grade = A+”, use
http://www.google.ca/search?q=grade+%3D+A%2B
urllib has functions to make this easy
- urllib.quote(str) replaces special characters in str with escape sequences
- urllib.unquote(str) replaces escape sequences with characters
- urllib.urlencode(params) takes a dictionary and constructs the entire query parameter string

Screen Scraping (And Why Not)

Suppose you want to write a script that actually does search Google
- Construct a URL: easy
- Send it and read the response: no problem
- Parse the response: there's a lot of junk on the page…
Many first-generation web applications relied on screen scraping
- “Parse” the HTML with regular expressions
Hard to get right if the page layout is complex
- And whenever the layout changes, the application breaks
Now, there's a better way...

Web Services

Figure 5: Web Services

Modern web services separate data from presentation
- When a client sends a request, it indicates that it wants machine-readable XML, rather than human-readable HTML
  - Much easier to parse
  - Much less likely to change over time
Many web services use the Simple Object Access Protocol (SOAP) standard
- Despite its name, it's anything but simple
- Luckily, there are libraries to hide the details for most widely-used web services

Web Services (REST)

Today, it's more common to use REST (REpresentational State Transfer).
- REST Principles:
  - Give every "thing" an ID
  - Link things together
  - Use standard methods
  - Resources with multiple representations
  - Communicate statelessly
Basically, encode what you want in a URL:
- http://example.com/customers/1234
- http://example.com/orders/2007/10/776654
- http://example.com/products/4554
- http://example.com/processes/salary-increase-234
Then return data as XML or JSON
Generally much simpler than SOAP

The Server As A Client

Questions?

Server Programming

Users want to make the web do different things
- How to let them write programs that handle HTTP requests?
Option #1: Require them to write socket-level code
- Complicated and error-prone
- Can only have one program listening to a socket at a time
Option #2: have the web server accept the HTTP request, and then run the user's code
- Recompiling the web server every time someone wants to add functionality would be a pain
- So define a protocol that lets web servers run other programs

The CGI Protocol

The Common Gateway Interface (CGI) protocol specifies:
- How a web server passes information to a program
- How that program passes information back to the web server
CGI does not specify:
- A particular language
  - You can use Fortran, the shell, C, Java, Perl, Python…
- How the web server figures out what program to run
  - Each web server has its own rules
  - We'll (briefly) talk about Apache's

From Server To CGI

Figure 5: CGI Data Processing Cycle

Web server runs the CGI by creating a new process
Web server passes some information to the CGI process through environment variables
The web server may also send CONTENT_LENGTH bytes to the CGI on standard input
- E.g., when a file is being uploaded

Table 4: Important CGI Environment Variables
Name	Purpose	Example
`REQUEST_METHOD`	What kind of HTTP request is being handled	`GET` or `POST`
`SCRIPT_NAME`	The path to the script that's executing	`/cgi-bin/post_photo.py`
`QUERY_STRING`	The query parameters following `"?"` in the URL	`name=mydog.jpg&expires=never`
`CONTENT_TYPE`	The type of any extra data being sent with the request	`img/jpeg`
`CONTENT_LENGTH`	How much extra data is being sent with the request (in bytes)	`17290`

From CGI To Server

The CGI program sends data back to the web server by printing it to standard output
The web server then forwards this directly to the client
- Which means that the CGI program is responsible for creating headers
Note: none of this works unless the web server has been configured to run the CGI
- By default, modern servers won't do this unless they're told they can

MIME Types

Clients and servers need a way to specify data types to each other
- Remember, bytes are just bytes: the browser doesn't magically know how to interpret them
Multipurpose Internet Mail Extensions standard specifies how to do this
- Organizes data types into families, and provides a two-part name for each type
- Use the "Content-Type" header to specify the MIME type of the data being sent

Table 5: Example Mime Types
Family	Specific Type	Describes
Text	`text/html`	Web pages
Image	`image/jpeg`	JPEG-format image
Audio	`audio/x-mp3`	MP3 audio file
Video	`video/quicktime`	Apple Quicktime video format
Application-specific data	`application/pdf`	Adobe PDF document

Hello, CGI

Simplest possible CGI pays no attention to query parameters or extra data
- Just prints HTML to standard output, to be relayed to the client
- Along with a Content-Type header to tell the client to expect HTML…
- …and a blank line to separate the headers from the data

#!/usr/bin/env python

# Headers and an extra blank line
print 'Content-type: text/html'
print

# Body
print '<html><body><p>Hello, CGI!</p></body></html>'

Invoking a CGI

Invoke it by going to
http://bmi219.rbvi.ucsf.edu/examples/html/cgi-bin/hello_cgi.py
- By convention, CGI programs are put in a cgi-bin directory
Browser displays the simple HTML page generated by the program

Figure 6: Basic CGI Output

Generating Dynamic Content

Figure 7: Environment Variable Output

But the whole point of CGI is to generate content dynamically
- E.g., show a list of environment variables and their values
You'll use this frequently when debugging…


#!/usr/bin/env python

import os, cgi

# Headers and an extra blank line
print 'Content-type: text/html'
print

# Body
print '<html><body>'
keys = os.environ.keys()
keys.sort()
for k in keys:
    print '<p>%s: %s</p>' % (cgi.escape(k), cgi.escape(os.environ[k]))
print '</body></html>'

A Simple Form (reprise)

Figure 4: A Simple Form


<html>
  <body>
    <form action="/bmi219/cgi-bin/print_params.py">
      <p>Sequence: <input type="text" name="sequence"/>
      Search type:
      <select name="match">
        <option>Exact match</option>
        <option>Similarity match</option>
        <option>Sub-match</option>
      </select>
      </p>
      <p>Programs: 
      <input type="checkbox" name="frog">
        FROG (version 1.1)
      </input>
      <input type="checkbox" name="frog2">
        FROG (2.0 beta)
      </input>
      <input type="checkbox" name="bayeshart">
        Bayes-Hart
      </input>
      </p>
      <p>
        <input type="submit" value="Submit Query"/>
        <input type="reset" value="Reset"/>
      </p>
    </form>
  </body>
</html>

Parameter Names

Each <input/> element has a name attribute
- These become the names of the parameters that the client sends to the server
- The input elements' values are the parameters' values
Submitting the form shown above with default values produces:
- os.environ['REQUEST_METHOD']: "POST"
- os.environ['SCRIPT_NAME']: "/cgi-bin/simple_form.py"
- os.environ['CONTENT_TYPE']: "application/x-www-form-urlencoded"
- os.environ['REQUEST_LENGTH']: "80"
- Standard input: sequence=GATTACA&search_type=Similarity+match&program=FROG-11&program=Bayes-Hart

Handling Forms

Could handle form data directly
- Read and parse environment variables
- Read extra data from standard input
But the mechanics are the same each time, so use Python's cgi module instead
- Defines a dictionary-like object called FieldStorage
  - Keys are parameter names
  - Values are either strings (if there's a single value assocatied with the parameter) or lists (if there are many)
When a FieldStorage object is created, it reads and stores information contained in the URL and environment
- Which means that a CGI program should only ever create one
Program can read extra data from sys.stdin

Form Handling Example

Example: show the parameters send to a script

#!/usr/bin/env python
import cgi

print 'Content-type: text/html'
print
print '<html><body>'
form = cgi.FieldStorage()
for key in form.keys():
    value = form.getvalue(key)
    if isinstance(value, list):
        value = '[' + ', '.join(value) + ']'
    print '<p>%s: %s</p>' % (cgi.escape(key), cgi.escape(value))
print '</body></html>'

Table 6: Example Parameter Values
URL	Value of `a`	Value of `b`
`http://www.third-bit.com/swc/show_params.py?a=0`	`"0"`	None
`http://www.third-bit.com/swc/show_params.py?a=0&b=hello`	`"0"`	`"hello"`
`http://www.third-bit.com/swc/show_params.py?a=0&b=hello&a=22`	`[0, 22]`	`"hello"`

Development Tips

During development, add
```
import cgitb; cgitb.enable()
```
to the top of the program
- cgitb is the CGI traceback module
- When enabled, it will create a web page showing a stack trace when something goes wrong in your script
Testing whether a FieldStorage value is a string or a list is tedious
- In almost all cases, you'll know whether to expect one value or many
- Use FieldStorage.getfirst(name) to get the unique value
  - Returns the first, if there are many
- FieldStorage.getlist(name) always returns a list of values
  - Empty list if there's no data associated with name
  - If there's only one value, get a single-item list

Maintaining State

Figure 9: Three Tier Architecture

Often want to change the data a server is managing, as well as read it
- Update a description of an experiment, change your preferred email address, etc.
The industrial-strength solution is to use a three-tier architecture
- CGI program stuffs parameters from HTTP requests into SQL queries
- Runs the queries
- Translates results into HTML to send back to the client

Maintaining State in Files

Simple programs can often get away with using files
- The CGI program re-reads the file each time it processes a request
- And re-writes it if there have been any updates

Example: append messages to a web page

Old messages are saved in a file, one per line

Hi, is anyone reading this site?
I was wondering the same thing.
I wasn't sure if we were supposed to post here.
Good point.  Is there way to delete messages?

Script checks the incoming parameters to decide what to do

If newmessage is there, append it, and display results
If newmessage isn't there, someone's visiting the page, rather than submitting the form

# Get existing messages.
infile = open('messages.txt', 'r')
lines = [x.rstrip() for x in infile.readlines()]
infile.close()

# Add more data?
form = cgi.FieldStorage()
if form.has_key('newmessage'):
    lines.append(form.getfirst('newmessage'))
    outfile = open('messages.txt', 'w')
    for line in lines:
        print >> outfile, line
    outfile.close()

Questions on CGI?

AJAX

AJAX (Asynchronous Javascript And XML) provides:
- Increased usability (on-page server interaction)
- Client-side state
- Enhanced user experience (possibly)
Basic AJAX idea:
1. Javascript in browser sends a request to server using XMLHttpRequest()
2. Server CGI processes request and sends back response, usually as an XML document
3. Javascript in browser receives response, parses the XML and (using DOM) extracts the information
4. Results are presented to the user, or used to modify the interface in some way

XMLHttpRequest Example


// Handle the XMLHttpRequest
function sendRequest(sql)
{ 
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/cgi-bin/getBmi219Table.py?sql="+sql, true);
    xmlhttp.send(null);
  }
}

// This method gets called whenever the object state changes.
function getData()
{ 
  // Are we complete? 
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
      tableDiv.appendChild(newChild);
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);
    }
  }
}

XMLHttpRequest Methods

Table 7: XMLHttpRequest Methods
Method	Description
`abort()`	Cancels the current request
`getAllResponseHeaders()`	Returns the complete set of http headers as a string
`getResponseHeader("headername")`	Returns the value of the specified http header
`open("method","URL",async,"username","password")`	Specifies the method, URL, and other optional attributes of a request The method parameter can have a value of "GET", "POST", or "PUT" (use "GET" when requesting data and use "POST" when sending data (especially if the length of the data is greater than 512 bytes. The URL parameter may be either a relative or complete URL. The async parameter specifies whether the request should be handled asynchronously or not. true means that script processing carries on after the send() method, without waiting for a response. false means that the script waits for a response before continuing script processing
`send(content)`	Sends the request
`setRequestHeader("label", "value")`	Adds a label/value pair to the http header to be sent

XMLHttpRequest Properties

Table 8: XMLHttpRequest Properties
Property	Description
`onreadystatechange`	An event handler for an event that fires at every state change
`readyState`	Returns the state of the object: 0 = uninitialized 1 = loading 2 = loaded 3 = interactive 4 = complete
`responseText`	Returns the response as a string
`responseXML`	Returns the response as XML. This property returns an XML document object, which can be examined and parsed using W3C DOM node tree methods and properties
`status`	Returns the HTTP status as a number (e.g. 404 for "Not Found" or 200 for "OK")
`statusText`	Returns the HTTP status as a string (e.g. "Not Found" or "OK")

AJAX Server Side

The server implementation is just a CGI
Arguments can be handled using the python cgi module
Be sure to return a proper XML document if you want to use XMLHttpRequest.responseXML:

#! /usr/bin/python
import cgi
import sys

print "Content-type: text/xml"
print ""
# We want this to be interpreted as HTML by the client
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

print '<html xmlns="http://www.w3.org/1999/xhtml">'

Note that like any CGI, we need to send the Content-type, followed by a blank line
In this example, we want the browser to parse this as XHTML, could be any arbitrary XML, though

Putting it together

Consider the Example Application
It is using SVG for the graphics, HTML forms for the input, and AJAX to query the backend database and populate the tables
There is also a fair amount of JavaScript and CSS trickery going on
The application is made up of 4 files:
- examples/html/bmi219.svg: the XHTML+SVG file that makes up the front-end
- examples/html/css/bmi219.css: the stylesheet for both the XHTML and SVG
- examples/html/js/bmi219.js: the JavaScript that drives the application
- examples/html/cgi-bin/getBmi219Table.py: the server-side component

bmi219.svg


<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xml:lang="en" lang="en">
<head>
<script type="text/javascript" src="js/bmi219.js"></script>
</head>
<link rel="stylesheet" type="text/css" href="css/bmi219.css"/>
<body>
<h3>BMI219 - AJAX Example</h3>
<svg:svg id="svg-root" width="100%" viewBox="0 0 800 100" version="1.1" >
  <!-- Surrounding Rectangle -->
  <svg:rect x="0" y="0" width="800" height="100" style="stroke: blue; fill: none;"/>
  <!-- Recipe Entity -->
  <svg:rect x="40" y="30" width="60" height="40" class="entity" onclick="showInput('recipe_input', this);"/>
  <svg:text x="50" y="52" class="label1">Recipe</svg:text>
  <svg:line x1="100" y1="50" x2="330" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Fragment Entity -->
  <svg:rect x="330" y="30" width="60" height="40" class="entity" onclick="showInput('fragment_input', this);"/>
  <svg:text x="334" y="52" class="label1">Fragment</svg:text>
  <svg:line x1="390" y1="50" x2="630" y2="50" stroke="yellow" stroke-width="2"/>
  <!-- Gene Entity -->
  <svg:rect x="630" y="30" width="60" height="40" class="entity" onclick="showInput('gene_input', this);"/>
  <svg:text x="647" y="52" class="label1">Gene</svg:text>
  <!-- Produces relationship -->
  <svg:rect x="200" y="30" width="40" height="40" class="relationship" transform="rotate(-45,220,50)" onclick="showInput('recipe_input_join', this);"/>
  <svg:text x="201" y="52" class="label2">Produces</svg:text>
  <!-- Contains relationship -->
  <svg:rect x="500" y="30" width="40" height="40" class="relationship" transform="rotate(-45,520,50)" onclick="showInput('gene_input_join', this);"/>
  <svg:text x="501" y="52" class="label2">Contains</svg:text>
  <!-- Links and orders -->
</svg:svg>

<!-- This is the form: Note that each <span> has an ID and a class that we will use to 
     control whether we show the containing input field or not.  Also note specifically
     the way we call getTable with the arguments we want. -->
<form>
  <span id="recipe_input" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE','RECIPE.NAME', this, 'Name,File,Owner',null);"/>
  </span>
  <span id="recipe_input_join" class="hidden">
    Recipe Name: <input type="text" onchange="getTable('RECIPE,PRODUCES,FRAG','RECIPE.NAME', this, 'RECIPE.Name,RECIPE.Owner,PRODUCES.Date,FRAG.Name,FRAG.Sequence','RECIPE.RCP=PRODUCES.RCP and PRODUCES.FRAG=FRAG.FRAG');"/>
  </span>
  <span id="fragment_input" class="hidden" style="position: absolute; left: 35%;">
    Fragment Name: <input type="text" onchange="getTable('FRAG','FRAG.NAME', this, 'Name,Sequence,Circular',null);"/>
  </span>
  <span id="gene_input_join" class="hidden">
    Gene Name: <input type="text" onchange="getTable('FRAG,CONTAINS,GENE','GENE.NAME', this, 'FRAG.Name,FRAG.Sequence,GENE.Name,CONTAINS.Start,CONTAINS.End','FRAG.FRAG=CONTAINS.FRAG and GENE.ID=CONTAINS.GENE');"/>
  </span>
  <span id="gene_input" class="hidden" style="position: absolute; left: 70%;">
    Gene Name: <input type="text" onchange="getTable('GENE','GENE.NAME', this, 'Name,Protein,StartNum',null);"/>
  </span>
</form>

<!-- We'll write a header into this <h3> when we get the data -->
<h3 id="table_header" class="table_header"> </h3>
<!-- We'll write the results table into this when we get the data -->
<div id="results_div">
</div>
</body>
</html>

bmi219.css


rect.entity { fill: purple; stroke-width: 2px;}
rect.relationship { fill: lightgreen; stroke-width: 2px;}
text.label1 {fill:white; font-size:8pt; font-family: arial; font-weight: bold;}
text.label2 {fill:blue; font-size:6pt; font-family: arial; font-weight: bold;}
span.hidden {visibility: hidden; }
span.shown {visibility: visible; }
tr.table-header {font-weight: bold; text-align: center; color: green; font-family: arial;}
h3.table_header {font-family: arial; text-align: center;}
table {font-family: arial; font-size: 80%;}

bmi219.js


var elementShown = null;
var xmlhttp = null;
var selectedRect = null;

// ShowInput just controls the presentation of the name
// of the row we are looking for
function showInput(elementID, rect) {
  // Get a pointer to the element that called us
  var element = document.getElementById(elementID);

  // Do we already have a text input element showing?
  if (elementShown != null)
    elementShown.className = "hidden"; // Yes, hide it

  // Do we already have a rectangle highlighted?
  if (selectedRect != null)
    selectedRect.setAttributeNS(null, "stroke", "none"); // Yes, hide it

  // Show the text input
  element.className = "shown";
  elementShown = element;

  // Outline the element the user clicked on
  // Note that we need to use setAttributeNS for SVG attributes
  rect.setAttributeNS(null, "stroke", "black");
  selectedRect = rect;
}


// This is the method that gets called when a text field is changed
function getTable(tableName, column, textField, fields, where) {
    var text = textField.value; // This contains the value the user entered

    // Now, create the SELECT statement
    var sql = 'SELECT '+fields+' from '+tableName;
    if (text.length >= 2 || where != null) {
      sql += ' where ';
      if (text.length >= 2) {
        sql += column+' = "'+text+'"';
        if (where != null) {
          sql += ' AND '+where;
        }
      } else {
        sql += where;
      }
    } 
    sql += ';';
  
    // Uncomment the next line to see what we pulled together
    // alert(sql);
  
    // Issue the request.  Because our XMLHttpRequest call is
    // asynchronous, this will return immediately
    sendRequest(sql);
  
    // Clear the text field
    textField.value = "";
  
    // Add a header
    header = document.getElementById("table_header");
    header.innerHTML = tableName;
  
    // Clear the old table
    var tableDiv = document.getElementById("results_div");
    while (tableDiv.firstChild) {
      tableDiv.removeChild(tableDiv.firstChild);
    }
} 
// Handle the XMLHttpRequest
function sendRequest(sql)
{
  xmlhttp = new XMLHttpRequest();
  if (xmlhttp != null) {
    xmlhttp.onreadystatechange = getData; // getData is our callback method
    xmlhttp.open("GET", "/bmi219/cgi-bin/getBmi219Table.py?sql="+sql, true);
    xmlhttp.send(null);
  }
}

// This method gets called whenever the object state changes
function getData()
{
  // Are we complete?
  if (xmlhttp.readyState == 4) {
    // Yes, do we have a good http status?
    if (xmlhttp.status == 200) {
      // yes, responseXML will hold the XML document, which we can address using the DOM
      // if we only wanted the raw text, we could get xmlhttp.responseText
      var response = xmlhttp.responseXML;

      // Use the DOM to get the results table from the server
      var newChild = response.getElementById("results_table");

      // Get a handle on the results div
      var tableDiv = document.getElementById("results_div");

      // Add in our results table
      tableDiv.appendChild(newChild);
    } else {
      alert("Unable to contact AJAX server: "+xmlhttp.status);
    }
  }
}

getBmi219Table.py


#! /usr/local/bin/python

import cgi
import cgitb
import sys
import sqlite3

def returnError(errorString): 
  print """<html xmlns="http://www.w3.org/1999/xhtml">
    <body> <h3 id="results_table" style="color:red;">%s</h3> </body>
  </html>"""%errorString

cgitb.enable()

print "Content-type: text/xml"
print ""
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'

# Get the form data
form = cgi.FieldStorage()
if not (form.has_key("sql")):
  returnError("No SQL string?")
  sys.exit(0)

sqlStatement = form["sql"].value
rows = None

try:
  conn = sqlite3.connect ("/home/socr/b/bmi219/bmi219.db")
  cursor = conn.cursor()
  cursor.execute(sqlStatement)
  rows = cursor.fetchall()
  cursor.close()
  conn.commit()
  conn.close()

except sqlite3.Error, e:
  returnError(e.args[1])
  sys.exit(0)

print '<html xmlns="http://www.w3.org/1999/xhtml">'
print '<body>'
print   '<table id="results_table" border="1" width="80%" align="center">'
print     '<tr class="table-header">',
for column in cursor.description:
  print '<td>'+column[0]+'</td>',
print     '</tr>'
  
for row in rows:
  print     '<tr>',
  for cell in row:
    print '<td>'+str(cell)+'</td>',
  print     '</tr>'
print   '</table>'
print '</body>'
print '</html>'

AJAX - Questions?

Questions about CGI or AJAX?

HTML Templating

A lot of this program is devoted to copying values into an HTML template
- There are lots of good systems out there, in many languages, for doing this
- Django in Python
- Java Server Pages (JSPs) in Java
- New HTML5 <template> tag
- Please do not write one of your own

Web Frameworks

A web framework (or web application framework) is a set of APIs and supporting software that makes developing rich web applications easier (at least in principle). Two of the most popular frameworks are:

Django: a rich, python-based framework that includes a templating mechanism, database interface, object-relational mapper, and many other features. Can be complicated to tune for large applications.
jQuery: a client-side Javascript library that simplifies developing rich web applications. Features include easy traversal of the DOM, AJAX interaction, form manipulation, and event handling. Many frameworks utilize JQuery on the client side.

What About Concurrency?

What happens if two users try to save messages at the same time?
- I/O is typically slower than processing
- So most web servers try to overlap operations
Race condition:
- First instance of message_form.py opens messages.txt, reads lines, closes file
- Second instance opens messages.txt, reads the same lines, closes file
- First instance re-opens file, writes out original data plus one new line
- Second instance re-opens file, writes out original plus a different new line
- First instance's message has been lost!

File Locking

Solution is to lock the file
- As the name implies, gives one process exclusive rights to the file
- After the first process acquires the lock, any other process that tries to read or write the file is suspended until the first releases it
Mechanics are different on different operating systems
- But the Python Cookbook includes a generic file locking function that works on both Unix and Windows

Implementing Locking

# Get existing messages.
msgfile = open('messages.txt', 'r+')
fcntl.flock(msgfile.fileno(), fcntl.LOCK_EX)
lines = [x.rstrip() for x in msgfile.readlines()]

# Add more data?
form = cgi.FieldStorage()
if form.has_key('newmessage'):
    lines.append(form.getfirst('newmessage'))
    msgfile.seek(0)
    for line in lines:
        print >> msgfile, line

# Unlock and close.
fcntl.flock(msgfile.fileno(), fcntl.LOCK_UN)
msgfile.close()

Who Are You?

How to maintain state on the client?
- Need to know which shopping cart to display for a particular user
HTTP is a stateless protocol
- If a client makes a second (or third, or fourth…) request, server has no reliable way of connecting it to the first one
Can guess based on client address, elapsed time, etc.
- But it's just a guess

Cookies

Figure 10: Cookies

Solution is for the server to create a cookie
- A string that is sent to the client in an HTTP response header
Client saves it (either in memory or on disk)
The next time the client sends a request to the site, it sends the cookie back to the server
- Like giving someone a claim check for their luggage

Creating Cookies

Represent cookies in Python using Cookie.SimpleCookie
- Do not use SmartCookie: it is potentially insecure
When creating, add values to a cookie as if it were a dictionary
- Convert it to a string (e.g., by printing it) to create the required HTTP header
When the cookie comes back:
- Get the value associated of the environment variable "HTTP_COOKIE"
- Create a SimpleCookie
- Pass the "HTTP_COOKIE" value to the cookie's load method

Cookie Example

Example: count the number of times a user has visited a web site
- If there's no cookie, create one with a count of 1
- Otherwise, increment the count
- Create a new cookie to send back to the user
- Display the count

# Get old count.
count = 0
if os.environ.has_key('HTTP_COOKIE'):
    cookie = Cookie.SimpleCookie()
    cookie.load(os.environ['HTTP_COOKIE'])
    if cookie.has_key('count'):
        count = int(cookie['count'].value)

# Create new count.
count += 1
cookie = Cookie.SimpleCookie()
cookie['count'] = count

# Display.
print 'Content-Type: text/html'
print cookie
print
print '<html><body>'
print '<p>Visits: %d</p>' % count
print '</body></html>'

Cookie Tips

Can control how long a cookie is valid by setting an expiry value
- Either the number of milliseconds
- Or the time it should expire (in UTC )
  - Use time.asctime(time.gmtime()) to create the value
Do not put sensitive information in cookies
- Browsers store them in files on disk
- Villains can watch network traffic, and steal data
Cookies should instead be random values that act as keys into server-side information
- Talk about this more in the next lecture

Assignment

Before class tomorrow get started hooking your front-end to your back-end:
1. Write stubs on the back-end to handle any CGI or AJAX calls
2. Implement input forms or actions in your front-end that call your server stubs