My CGI Script Is Not Working!

Either your script is not being executed, or
Apache does not like the output from your script.

My CGI Script Didn't Run

Your CGI directory has to be located in a path that looks something like /home/socr/b/bmi219/current/z/conrad/cgi-bin
- The directory must be exactly three levels below current (a.k.a., 2017).

Your script must be executable.

$ cd /home/socr/b/bmi219/current/z/conrad/cgi-bin
$ ls -l
total 4
-rw-rw-r-- 1 conrad bmi219 179 Apr 17 09:58 example.cgi
$ chmod +x example.cgi
$ ./example.cgi
???

Make sure the first line of your script looks like:
```
#!/usr/bin/python2
```

My AJAX Sees Nothing

Check to see if your script is being invoked:

$ cd /usr/local/www/logs/plato-test-httpd/bmi219
$ grep example.cgi bmi219-ssl-access_log
169.230.21.23 - - [17/Apr/2017:09:56:19 -0700] "GET /z/conrad/cgi-bin/example.cgi HTTP/1.1" 401 362 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0" 0/29321
169.230.21.23 - conrad [17/Apr/2017:09:56:22 -0700] "GET /z/conrad/cgi-bin/example.cgi HTTP/1.1" 500 613 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0" 0/117099
169.230.21.23 - conrad [17/Apr/2017:09:56:46 -0700] "GET /z/conrad/cgi-bin/example.cgi HTTP/1.1" 200 50 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0" 0/155331

401 is "Unauthorized", and is normal when web site requires login
200 is "Success"
500 is "Internal Server Error"

Try going to the CGI URL directly in your browser

Internal Server Error

"View Source" in the browser to see how far your script got.

Check the logs to see what Apache is reporting

$ cd /usr/local/www/logs/plato-test-httpd/bmi219
$ grep example.cgi bmi219-ssl-error_log
[Mon Apr 17 09:56:22.793106 2017] [cgi:error] [pid 34932] [client 169.230.21.23:39853] End of script output before headers: example.cgi
[Mon Apr 17 09:58:34.034631 2017] [cgi:error] [pid 34932] [client 169.230.21.23:40281] AH01215:   File "/home/socr/b/bmi219/current/z/conrad/cgi-bin/example.cg ", line 7, in 
[Mon Apr 17 10:25:06.924289 2017] [cgi:error] [pid 55993] [client 169.230.21.23:40643] malformed header from script 'example.cgi': Bad header: 
[Mon Apr 17 10:25:06.925424 2017] [cgi:error] [pid 55993] [client 169.230.21.23:40643] AH01215:   File "/home/socr/b/bmi219/current/z/conrad/cgi-bin/example.cg ", line 7, in

Make sure your script output starts with HTTP headers.

Debugging Hack

#!/usr/bin/python2

import cgitb; cgitb.enable()
import sys, StringIO
orig_stdout = sys.stdout
sys.stdout = StringIO.StringIO()

print "Content-Type: text/xml"
print
print ""
print ""
raise ValueError("bad value")
print "This is content"
print ""
print ""

orig_stdout.write(sys.stdout.getvalue())

Testing, Debugging and Optimization

Conrad Huang

April 17, 2017

Introduction

The more you invest in quality, the less time it takes to develop working software [Glass 2002]
Quality is not just testing
- “Trying to improve the quality of software by doing more testing is like trying to lose weight by weighing yourself more often.” (Steve McConnell)
Quality is:
- Designed in
- Monitored and maintained through the whole software lifecycle
This lecture looks at basic things every developer can do to maintain quality
- See [Whittaker 2003] for more information

Why Do We Test?

To make sure our programs are "working right"
To make sure they satisfy requirements
To make sure they match specifications

Test-Driven Design

Tests are actually specifications
- “Given these inputs, this code should behave the following way”
So write the tests first, then the application code
- Test-driven development (TDD)

Sounds backward, but:
- A great way to clarify specifications
  - I write the tests
  - “All” you have to do is write code that passes those tests
- Gives programmers a definite goal
  - Coding is finished when all tests run
  - Particularly useful when trying to fix bugs in old code, as it forces you to figure out how to re-create the bug
  - Helps prevent the “one more feature” syndrome
- Ensures that tests actually get written
  - People are often too tired, or too rushed, to test after coding
- Helps clarify the Application Programming Interface (API) before it is set in stone
  - If something is awkward to test, it can be redesigned before it's written

TDD Example

I want you to write a function that calculates a running sum of the values in a list
- Doesn't specify whether to create a new list, or overwrite the input
- Doesn't specify how to handle errors

You'd probably prefer something like this:

Tests = [
    [[],        [],          'empty list'],
    [[1],       [1],         'single value'],
    [[1, 3],    [1, 4],      'two values'],
    [[1, 3, 7], [1, 4, 11],  'three values'],
    [[-1, 1],   [-1, 0],     'negative values'],
    [[1, 3.0],  [1, 4.0],    'mixed types'],
    ["string",  ValueError,  'non-list input'],
    [['a'],     ValueError,  'non-numeric value']
    ]

If the expected result is an exception, pass only if that exception is raised
If the test doesn't pass, print the comment so that the programmer knows what to look at

Limits to Testing

Suppose you have a function that compares two 7-digit phone numbers, and returns True if the first is greater than the second
- (10⁷)² possible inputs
- At ten million tests per second, that's 155 days
If they're 7-character alphabetic strings, it's 254 years
- Then you move on to the second function…
And how do you know that your tests are correct?
All a test can do is show that there may be a bug

Terminology

A unit test exercises one component in isolation
- Developer-oriented: tests the program's internals
An integration test exercises the whole system
- User-oriented: tests the software's overall behavior
Regression testing is the practice of rerunning tests to check that the code still works
- i.e., make sure that today's changes haven't broken things that were working yesterday
- Programs that don't have regression tests are difficult (sometimes impossible) to maintain [Feathers 2005]

Test Results and Specifications

Any test can have one of three outcomes:
- Pass: the actual outcome matches the expected outcome
- Fail: the actual outcome is different from what was expected
- Error: something went wrong inside the test (i.e., the test contains a bug)
  - Don't know anything about the system being tested
A specification is something that tells you how to classify a test's result
- You can't test without some sort of specification

Structuring Tests

How to write tests so that:
- It's easy to add or change tests
- It's easy to see what's been tested, and what hasn't
A test consists of a fixture, an action, and an expected result
- A fixture is something that a test is run on
- Can be as simple as a single value, or as complex as a networked database

Every test should be independent
- I.e., the outcome of one test shouldn't depend on what happened in another test
- Otherwise, faults in early tests can distort the results of later ones
So each test:
- Creates a fresh instance of the fixture
- Performs the operation
- Checks and records the result

A Simple Example

Test string.startswith
- Specification: returns True if the string starts with the given prefix, and False otherwise
- Hm… What if the prefix is the empty string?

Store the tests in a table

Easy to read and add to

Tests = [
# String  Prefix  Expected
['a',     'a',    True],
['a',     'b',    False],
['abc',   'a',    True],
['abc',   'ab',   True],
['abc',   'abc',  True],
['abc',   'abcd', False],
['abc',   '',     True]
]

String and prefix are fixture

Now run them

passes = 0
failures = 0
for (s, p, expected) in Tests:
actual = s.startswith(p)
if actual == expected:
    passes += 1
else:
    failures += 1
print 'passed', passes, 'out of', passes+failures, 'tests'

Hm… Where's the code to handle and report errors in the tests themselves?

Catching Errors

Python uses exceptions for error handling
- Separates normal operation from error handling
- Makes both easier to read
Structured like if/else
- Code for healthy case goes in a try block
- Error handling code goes in a matching except block
When something goes wrong in the try block, Python raises an exception
- This is caught by the matching except
- Python then executes the code in the exception handler
Can add an optional else block
- Executed when things don't go wrong inside the try block

Simple Exception Example

Try dividing by zero and some non-zero values:

for num in [-1, 0, 1]:
  try:
      inverse = 1/num
  except:
      print 'inverting', num, 'caused error'
  else:
      print 'inverse of', num, 'is', inverse

inverse of -1 is -1
  inverting 0 caused error
  inverse of 1 is 1

Flow of Control in Try/Except/Else

Exception Objects

When Python raises an exception, it creates an object to hold information about what went wrong
- Typically contains an error message

Can choose which errors to handle by specifying an exception type in the except statement

E.g., handle division by zero, but not out-of-bounds list index

# Note: mix of numeric and non-numeric values.
  values = [0, 1, 'momentum']
  
  # Note: top index will be out of bounds.
  for i in range(4):
  try:
      print 'dividing by value', i
      x = 1.0 / values[i]
      print 'result is', x
  except ZeroDivisionError, e:
      print 'divide by zero:', e
  except IndexError, e:
      print 'index error:', e
  except:
      print 'some other error:', e

dividing by value 0
  divide by zero: float division
  dividing by value 1
  result is 1.0
  dividing by value 2
  some other error: float division
  dividing by value 3
  index error: list index out of range

The except blocks are tested in order—whichever matches first, wins
- If a “naked” except appears, it must come last (since it catches everything)
- Generally better to use except Exception as e so that you have the exception object

Exception Hierarchy

Exceptions are organized in a hierarchy
- e.g., ZeroDivisionError, OverflowError, and FloatingPointError are all types of ArithmeticError
- A handler for the general type catches all its specific sub-types

Name			Purpose
`Exception`			Root of exception hierarchy
	`ArithmeticError`		Illegal arithmetic operation
		`FloatingPointError`	Generic error in floating point calculation
		`OverflowError`	Result too large to represent
		`ZeroDivisionError`	Attempt to divide by zero
	`IndexError`		Bad index to sequence (out of bounds or illegal type)
	`TypeError`		Illegal type (e.g., trying to add integer and string)
	`ValueError`		Illegal value (e.g., `math.sqrt(-1)`)
	`EnvironmentError`		Error interacting with the outside world
		`IOError`	Unable to create or open file, read data, etc.
		`OSError`	No permissions, no such device, etc.
Table 11.1: Common Exception Types in Python

Functions and Exceptions

Stacking Exception Handlers

Each time Python enters a try/except block, it pushes the except handlers on a stack
- Just like the function call stack

When an exception is raised, Python searches this stack for the top-most matching handler

Often means jumping out of the middle of a function

def invert(vals, index):
try:
    vals[index] = 10.0/vals[index]
except ArithmeticError, e:
    print 'inner exception handler:', e

def each(vals, indices):
try:
    for i in indices:
	invert(vals, i)
except IndexError, e:
    print 'outer exception handler:', e

# Once again, the top index will be out of bounds.
values = [-1, 0, 1]
print 'values before:', values
each(values, range(4))
print 'values after:', values

values before: [-1, 0, 1]
inner exception handler: float division
outer exception handler: list index out of range
values after: [-10.0, 0, 10.0]

Raising Exceptions

Use raise to trigger exception processing

Specify the type of exception you're raising using raise Exception('this is an error message')
Please make your error messages more informative…

for i in range(4):
try:
    if (i % 2) == 1:
	raise ValueError('index is odd')
    else:
	print 'not raising exception for %d' % i
except ValueError, e:
    print 'caught exception for %d' % i, e

not raising exception for 0
caught exception for 1 index is odd
not raising exception for 2
caught exception for 3 index is odd

Exceptional Style

Always use exceptions to report errors instead of returning None, -1, False, or some other value
- Allows callers to separate normal code from error handling
- And sooner or later, your function will probably actually want to return that “special” value
- Note: Python's own list.find breaks this rule
  - Returns -1 if something can't be found
Throw low, catch high
- i.e., throw lots of very specific exceptions…
- …but only catch them where you can actually take corrective action
- Because every application handles errors differently
  - If someone is using your library in a GUI, you don't want to be printing to stderr

Handling Errors in Tests

Now we know how to check for errors in tests

Wrap the test in try/except

Tests = [
    ['a',     'a',    False],    # wrong expected value
    ['a',     1,      False],    # wrong type
    ['abc',   'a',    True]      # everything legal
]

passes = failures = errors = 0
for (s, p, expected) in Tests:
    try:
        actual = s.startswith(p)
        if actual == expected:
            passes += 1
        else:
            failures += 1
    except:
        errors += 1

print 'tests:', passes + failures + errors
print 'passes:', passes
print 'failures:', failures
print 'errors:', errors

tests: 3
passes: 1
failures: 1
errors: 1

Note the deliberate errors in the test cases to exercise the testing code

Python Test Frameworks

There are many pre-build Python test frameworks: unittest, py.test, doctest, nose, etc.
unittest comes standard and is easy to use

Test Web Output

#!/usr/bin/python2

import unittest

class TestExample(unittest.TestCase):

    def setUp(self):
        import getpass
        passwd = getpass.getpass()
        import base64
        token = base64.encodestring("%s:%s" % (getpass.getuser(), passwd))
        global headers
        self.headers = {"Authorization": "Basic %s" % token.replace('\n', '')}

    def test_example(self):
        import urllib2
        url = "http://bmi219.rbvi.ucsf.edu/z/conrad/cgi-bin/example.cgi"
        data = None
        req = urllib2.Request(url, data, self.headers)
        f = urllib2.urlopen(req)
        data = f.read()
        f.close()
        assert(f.getcode() == 200)
        assert("Traceback" not in data)

if __name__ == "__main__":
    unittest.main()

Design by Contract

Functions ought to carry their specifications around with them
- Keeping specification and implementation together makes both easier to understand
- And improves the odds that programmers will keep them in sync
A function is defined by:
- Its pre-conditions: what must be true in order for the function to work correctly
- Its post-conditions: what the function guarantees will be true if its pre-conditions are met
- May also have invariants: things that are true throughout the execution of the function

Leads to a style of programming called design by contract
Pre- and post-conditions constrain how the function can evolve
- Can only ever relax pre-conditions (i.e., take a wider range of input)…
- …or tighten post-conditions (i.e., produce a narrower range of output)
- Tightening pre-conditions, or relaxing post-conditions, would violate the function's contract with its callers

Assertions

Normally specify pre- and post-conditions using assertions
- A statement that something is true at a particular point in a program
- If the assertion's condition is not met, Python raises an AssertionError exception

For example:

Pre-condition: input argument is a non-empty list
Post-condition: two values from the list such that the first is less than the second

def find_range(values):
'''Find the non-empty range of values in the input sequence.'''
assert (type(values) is list) and (len(values) > 0)
left = min(values)
right = max(values)
assert (left in values) and (right in values) and (left <= right)
return left, right

Note that the post-condition isn't as exacting as it should be
- Doesn't check that left is less than or equal to all other values, or that right is greater than or equal to
- The code to check the condition exactly is as likely to contain errors as the function itself
- Which is one of the reasons design by contract isn't as popular as it might be

Defensive Programming

You can (and should) test for errors liberally
- Even if you don't practice design by contract
- Use assert or custom code that provides more information to help identify detected problems
Defensive programming is like defensive driving
- Program as if the rest of the world is out to get you
- “Fail early, fail often”
  - The less distance there is between the error and you detecting it, the easier it will be to find and fix

Good practice: every time you fix a bug, put in an error test and a comment

Because if you made the error, the right code can't be obvious
And you should protect yourself against someone “simplifying” the bug back in

def can_transmute(element):
  '''Can this element be turned into gold?'''

  # Bug #172: make sure the input is actually an element.
  assert is_valid_element(element)

  # Gold is trivial.
  if element is Gold:
      return True

  # Trans-uranic metals and halogens are impossible.
  if (element.atomic_number > Uranium.atomic_number) or \
     (element in Halogens):
      return False

  # Look for a sequence of steps that leads to gold.
  steps = search_transmutations(element, Gold)
  if steps == []:
      return False
  else:
      # Bug #201: must be at least two elements in sequence.
      assert len(steps) >= 2
      return True

Summary

The real goal of “quality assurance” isn't to find bugs: it's to figure out where they're coming from, so that they can be prevented
But without testing, no one (including you) has any right to rely on the program's output
Just because a program passes all the test does not mean it is of high quality
Only way to ensure quality is to design it in

Questions?

Debugging

You're going to spend half your professional life debugging
- So you should learn how to do it systematically
Talk about some simple rules
Then two common debugging tools

Agans' Rules

Many people make debugging harder than it needs to be by:
- Not going about it systematically
- Becoming impatient
- Using inadequate tools
Agans' Rules [Agans 2002] describe how to apply the scientific method to debugging
- Observe a failure
- Invent a hypothesis explaining the cause
- Test the hypothesis by running an experiment (i.e., a test)
- Repeat until the bug has been found

Rule 0: Get It Right the First Time

The simplest bugs to fix are the ones that don't exist
Design, reflect, discuss, then code
- “A week of hard work can sometimes save you an hour of thought.”
Design and build your code with testing and debugging in mind
- Minimize the amount of “spooky action at a distance”
- Minimize the number of things programmers have to keep track of at any one time
- Train yourself to do things right, so that you'll code well even when you're tired, stressed, and facing a deadline

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” (Brian Kernighan)

Rule 1: What Is It Supposed to Do?

First step is knowing what the problem is
- “It doesn't work” isn't good enough
- What exactly is going wrong?
- How do you know?
- You will learn a lot by following execution in a debugger and trying to anticipate what the program is going to do next
Requires you to know how the software is supposed to behave
- Is this case covered by the specification?
- If not:
  - Do you have enough knowledge to extrapolate?
  - Do you have the right to do so?
Try not to let what you want to see influence what you actually observe
- It's harder than you'd think [ Hock 2004]

Rule 2: Is It Plugged In?

Are you actually exercising the problem that you think you are?
- Are you giving it the right test data?
- Is it configured the way you think it is?
- Is it the version you think it is?
- Has the feature actually been implemented yet?
- Why are you sure?
  - Maybe the reason you can't isolate the problem is that it's not there (I wouldn't use this one too often)
Another argument in favor of automatic regression tests
- Guaranteed to rerun the test the same way each time

Rule 3: Make It Fail

You can only debug things when they go wrong
So find a test case that makes the code fail every time
- Then try to find a simpler one
- Or start with a trivially simple test case that passes, then add complexity until it fails
Each experiment becomes a test case
- So that you can re-run all of them with a single command
- How else are you going to know that the bug has actually been fixed?
Use the scientific method
- Formulate a hypothesis, make a prediction, conduct an experiment, repeat
- Remember, it's computer science, not computer flip-a-coin

Alternatives

What if you can't make it fail reliably?
- Problem involves timing, network load, etc.
- Or you just don't know enough about the cause yet
Use post-mortem inspection
- But then you have to reason backwards to figure out why the program crashed
Or logging
- But this can distort the program's behavior
- And you'll have to wade through a lot of irrelevant information

Rule 4: Divide and Conquer

The smaller the gap between cause and effect, the easier the relationship is to see
So once you have a test that makes the system fail, use it isolate the faulty subsystem
- Examine the input of the code that's failing
- If that's wrong, look at the preceding code's input, and so on
Use assert to check things that ought to be right
- “Fail early, fail often”
- A good way to stop yourself from introducing new bugs as you fix old ones

When you do fix the bug, see whether you can add assertions to prevent it reappearing
- If you made the mistake once, odds are that you, or someone, will make it again
Another argument against duplicated code
- Few things are as frustrating as fixing a bug, only to have it crop up again elsewhere

Rule 5: One Change at a Time, For a Reason

Replacing random chunks of code unlikely to do much good
- If you got it wrong the first time, what makes you think you'll get it right the second? Or the ninth?
- So always have a hypothesis before making a change
Every time you make a change, re-run all of your tests immediately
- The more things you change at once, the harder it is to know what's responsible for what
- And the harder it is to keep track of what you've done, and what effect it had
- Changes can also often uncover (or introduce) new bugs

Rule 6: Write It Down

Science works because scientists keep records
- “Did left followed by right with an odd number of lines cause the crash? Or was it right followed by left? Or was I using an even number of lines?”
Records particularly useful when getting help
- People are more likely to listen when you can explain clearly what you did

Rule 7: Be Humble

If you can't find it in 15 minutes, ask for help
- Just explaining the problem aloud is often enough
- “Never debug standing up.” (Gerald Weinberg)
Don't keep telling yourself why it should work: if it doesn't, it doesn't
- Never debug while grinding your teeth, either…
Keep track of your mistakes
- Just as runners keep track of their time for the 100 meter sprint
- “You cannot manage what you cannot measure.” (Bill Hewlett)
And read [ Zeller 2006] to learn more

Debugging Tools

Print statement
- Easy to use, but…
Symbolic debugger
- Very powerful, but…

What's Wrong with Print Statements

Many people still debug by adding print statements to their programs
It's error-prone
- Adding print statements is a good way to add typos
- Particularly when you have to modify the block structure of your program
And time-consuming
- All that typing…
- And (if you're using Java, C++, or Fortran) all that recompiling…
And can be misleading
- Moves things around in memory, changes execution timing, etc.
- Common for bugs to hide when print statements are added, and reappear when they're removed

But print statements can be extremely effective
- May be added using the same tools as programming
- Is less likely to hide bugs in interpreted languages
- Can collect lots of data in a single run

Symbolic Debuggers

A debugger is a program that runs another program on your behalf
- Sometimes called a symbolic debugger because it shows you the source code you wrote, rather than raw machine code
While the target program (or debuggee) is running, the debugger can:
- Pause, resume, or restart the target
- Display or change values
- Watch for calls to particular functions, changes to particular variables, etc.

Do not need to modify the source of the target program!
- Depending on your language, you may need to compile it with different flags
And yes, the debugger modifies the target's layout in memory, and execution speed…
- …but a lot less than print statements… (maybe)
- …with a lot less effort from you
But you need to invest the time to learn to use it well

Debugging Summary

Debugging is not a black art
Like medical diagnosis, it's a skill that can be studied and improved
You're going to spend a lot of time doing it: you might as well learn how to do it well

Questions?

Optimization

Does my program work properly?
- Think about optimization during design
- Get your program to work before optimizing
How do I make my program run faster?
- Where is my program spending all its time?
- What can I do about it?
- Is it worth your time?
First rule of optmization: “Measure, measure, measure.”
- Don't guess!
- Performance bottlenecks are often in unexpected parts of the code
- It's not just how slow a particular function is, but also how many times that function is called
- If you improve code that takes 10% of run time by a factor of ten, you get a 9% increase in performance; if you improve code that takes 50% of run time by a factor of two, you get a 25% increase in performance
- Moral: optimize the right section of code

Execution Profile

The execution profile of a program is a description of its run-time behavior
- Different inputs generate different profiles
- Sections of code that consume more computation time than others are known as “hot spots”
Use a profiler to identify hot spots for optimization
- A profiler collects statistics on the execution profiles, e.g.,
  - counts the number of times a function is called
  - tracks how long the calls take
- Data collection makes profiled program run slower than normal, sometimes a lot slower
Python has built-in profilers, e.g., profile, cProfile and line_profile

Speeding Things Up

Replace hot spots with faster code

If a collection of data will be searched repeatedly, instead of a linear list, use a sorted list or a dictionary
There is usually a tradeoff between bookkeeping overhead and search speed when using “faster” data structures, e.g.,
- Keeping a list sorted
- Managing a dictionary
Which data structure is “best” depends on the data

Restructure the entire program
- Take a different approach to solving your problem
- Throw hardware at “embarrassingly parallel” problems
  - If you have access to a computing cluster, and the problem can be partitioned into multiple jobs easily, use one CPU per job to improve performance
  - Embarrassingly parallel tasks include:
    - Processing multiple independent data sets
    - Repeating simulations with different initial conditions
  - Write a shell script to:
    - Partition big job into a bunch of little jobs
    - Run the jobs, either directly or submitting it into a batch job queue
    - Wait for all jobs to complete
    - Collate results

Optimization Summary

Measure, Measure, Measure
Use efficient data structures when possible
Take advantage of multiple processors