Friday, July 28, 2006

The 0.01% Solution

In an earlier post The Tale of the Bathtub Drain I talked about the difficulty of extracting the edge or fringe conditions.  The input consists of case alpha 99.99% of the time and case beta 0.01% of the time.  The problem is that each of these cases is probably to take the same amount of effort to document, design, implement, test, and so on.  It doesn’t seem quite fair.  In fact the effort to handle case beta is probably going to be even more difficult because it is obscure.  There probably are not very many people who fully understand case beta and it will take you longer to get the full and complete picture.  In fact the “powers that be” might not even be aware that there is this alternative processing path and might well regard the effort to document and handle it as a distraction from the real work at hand.

In my mind, that is a very strong motivation for getting a partial solution in place as quickly as possible.  Having the system reject valid data provides the concrete motivation to understand and document how to process that data.  Developing this motivation early in the project lifecycle as opposed to at the very end is critical to developing a quality application.

I’ve been giving a lot of thought to how applications ought to be designed to handle these kinds of situations.  It seems to me that applications oftentimes work by applying a set of rules to the input to control what kind of processing is applied to each of the inputs.  Oftentimes an application takes a binary approach to its input: either the input is “good” or it is “bad”.  The good input is processed and sent on its way; the bad input is rejected. 

I once worked on application that accepted data from a wide variety of sources.  This data would be dumped into a staging area which was just a set of tables on the database that had very relaxed requirements for format and content.  Periodically a set of automated processes would examine this data in the staging area and pull the data that passed the validation criteria into the mainstream of the processing.  The data that didn’t pass the validation requirements would stay in the staging area.  From time to time human beings would bring up an application that would allow the “adult leadership” to review the data in the staging area and take appropriate action.  Most the time they updated the input to correct the problems; the next time the automated processing ran, the data would be picked up and sent along its way.  Another significant part of the “appropriate action” was to formulate new rules for processing or to alter existing processing rules.

A very long time ago I took Latin in high school.  One of the works that I read in Latin was by Julius Caesar.  It began by saying “All Gaul is divided into three parts.”  I think applications should divide their input into three parts: data that is clearly broken because values are missing or mangled; data that is complete and the application knows how to process; and data that appears to be complete but the application does not know how to process explicitly.

Let’s do an example. Suppose that alpha and beta are products that we want to compute prices for.  When an order comes in for alpha, the application can perform the pricing automatically.  When an order come in for beta, we mark the order for the attention of the “adult leadership”.  Every so often the “adult leadership” accesses the suspended data and performs the pricing computation by hand.  In an agile project with multiple releases of the application, each successive release would be able to handle more of the input automatically.  The goal would be to reduce the “pile” of exceptional input to as close to zero as was economically feasible.  It may well be the case that the exceptional pile never becomes completely empty.  Some special cases might be so special and so infrequent that they do not warrant automated processing.

I think there is another interesting aspect of this approach.  The data processed by the system represents a rich source of requirements.  That is, we know what kind of inputs exist and we also know how the “adult leadership” handled those special imports.  Rather than asking them how they would handle each kind of input, we can simply look in the database to see how they actually did it.



Post a Comment

Links to this post:

Create a Link

<< Home