Friday, August 18, 2006

I Walk the Line

In my previous entry on Busy Little Data Worker Bees I wrote about the requirements for the class (or classes) that will hold individual data items (we won't know if it's a single class or multiple classes until later).  The intent is that these data objects are for the most part uncritical.  They hold the data and they describe certain characteristics about  the data but they do not apply any validation to the contents.  That is the subject of this entry: the validation of the contents of the data object.

For the purposes of what I am doing here, a rule is something that when we apply it, comes back and tells us potentially three different things:

  • Do the contents of the data object comply with the rule; in other words are the contents valid with respect to this rule.  This is a Boolean result.
  • What error message, if any, should be displayed to the outside world with respect to the contents of this data object.  I would expect that if the contents are valid, the error message would come back as an empty string.
  • What are the details of the rule that is being applied.  This might be part of the error message but I also think it's useful to have an explicit statement that you can display to the user telling them ahead of time before any error condition is signaled, this is what we expect that you will do.

Ideally, each rule is independent of all other rules.  Specifically, I do not want a rule to depend upon certain processing to have been done prior to the application of that particular rule.  This allows me to change the order in which the rules are applied without having to change the definition of each rule in the process.  In actual practice, it's not always possible to achieve total independence.  For example, most rules make no sense to even attempt to apply if the contents of the data object are null.  While it is possible for each rule to test the contents of the data object for a null value and report the fact that it is null, it seems to me to be much easier to always ensure, if only by convention, that a "must not be null"  rule is applied before the other rules.

That leads us to the notion that rules are applied in a specific order.  That also leads us to the notion that we would apply the rules in order up to the point where a specific rule fails.  There is an interesting issue about just how much information you share with the user about the contents of the data object.  It certainly is possible to apply all of the rules and to convey all of the error messages generated by those rules back to the user.  While there might be some use to that, the typical situation that I've encountered is that the user made a simple mistake and we need to provide them with a simple error message and not a whole list of error messages that cascade from that simple mistake. 

Of course, as always, there are trade-offs to consider.  If the cost of making the round trip to perform validation is more expensive than the cost of transporting all of those their messages, by all means apply all of the rules and convey all of the resulting error messages.  It may also be the case that the rules are truly independent of each other and it makes sense to provide the output of each rule independent of the state of the other rules.  This might even lead you to do design and develop and implement a much more sophisticated "rules manager" that can maintain and execute a fairly complex dependency tree of the rules.  While this sounds like an interesting programming exercise, I have never run across a situation in which I could justify the development of such sophistication. 

There can be many different kinds of rules, including:

  • Existence.  The rule only returns a valid state if the contents of the data object exist.  You also might have an additional rule to validate that the contents are not empty, such as a zero-length string.
  • Minimum and Maximum Lengths.  The rule only returns a valid state if the contents of the data viewed as a string fall within a minimum and maximum length requirement.
  • Minimum and Maximum Values.  The rule only returns a valid state if the strongly-typed contents of the data fall within a minimum and maximum value.  You can have variations on this to reflect the fact that the comparison to the minimum might be inclusive or exclusive and the fact that the comparison to the maximum might be inclusive or exclusive.  There might be several different variations on this rule to reflect the different data types that the data object might hold.
  • Legal Values.  The rule only returns a valid state if the contents of the data matched to a list of valid or legal values.  The list might be a hard coded list or it might be a list drawn from a database table.
  • Regular Expression.  The rule only returns a valid state if the string contents of the data object match to a regular expression.
  • Custom Rules.  A given application might have some very specific rules that would be custom coded.

In my experience the vast majority of rules apply to the individual data objects.  It is also possible that there are rules that apply to the container as a whole.  For example, in our Address example, we might well have a rule that matches the state to the zip code to validate that they are consistent with each other. The pattern of these rules is very similar to that of the data object rules: is it valid, if it is not valid why is it not valid, and what must be done to make it valid.

In general the application will instantiate a number of rules and add these rules to a "rule manager" that is focused upon a specific data object or upon the container as a whole.  The logic that instantiates the rule will supply parameters that tailor the rule to the specific needs of the application.  For example, we could write a single rule to test for minimum and maximum values based upon parameters passed in to the constructor or to public properties of the rule.  Once written, this rule could be applied with different parameters to a wide variety of situations.

I want to spend a paragraph on what comes back from applying one of these validation rules.  The initial or "na├»ve" approach would be to focus on the "is valid" aspect of the rule and only retrieve the error message or validation rule description if the "is valid" aspect returned a failure.  I want to propose to you that that's the wrong way to do it.  What I have done in the past is to create a specific class to hold the "validation result".  This "validation result" class contains the validity indicator, along with the error message and the rule message, as well as anything else that might be appropriate.  This approach "cleans out" the rule each time it is used and does not require the rule to retain knowledge of what it did the last time that it was applied.  I'll have more to say on this in a separate entry in this blog.

We need the ability to activate or deactivate a rule dynamically.  There are a couple of ways to achieve this.  One is to maintain a collection of active rules and either add or delete specific members in that collection depending upon the needs of the moment.  I think a somewhat better way is to create all of the rules at one time and provide a mechanism to enable or disable a specific rule within the collection.  This implies that we have a way to explicitly identify an individual rule.

In any case, we need a "rules manager" to maintain the collection, to handle the enabling and disabling of individual rules within that collection and to apply the rules within that collection to the contents of a given data object.  The logic in this "rules manager" can be fairly generic depending upon how we define the data objects (that is, interfaces, abstract classes, or generics).  One issue is where to place this logic.  If we select the abstract class as our implementation canvas, we could place the logic of the "rules manager" within that abstract class, thus making it available to each of the derived child classes.  The same might be true if we select generics as our implementation canvas.  If we select the interface approach as our implementation canvas, we will probably end up creating a separate "rules manager" class that each implementation of the interface can use to manage the rules.

There is one other area that deserves some additional consideration.  One of course could write a set of rules that are specific to a particular set of data object classes.  A more reasonable approach would be to write the rules against some generic "interface" that each of the data objects would support.  That is certainly the approach that I've had in mind as I have been writing this material.  This consideration also applies to the rules that are used to validate the parent container.  Ideally, the parent container would also implement some form of "interface" that would support polymorphic access to the data.  We'll come back to this in more detail when we talk about the trade-offs between interfaces, abstract classes, and generic classes.


Post a Comment

Links to this post:

Create a Link

<< Home