Tuesday, August 15, 2006

A Walk on the Data Side

The theme of this blog is "designing out loud".  What we're going to do in this entry is to talk about these data objects we are going to use to implement our laissez-faire data entry process.  We begin with the assumption that each attribute of the entity, in our case the Address class, is implemented as a separate object that has the responsibility for maintaining the value of the attribute.  Each of these objects has a responsibility to accept and retrieve the value of the attribute as a string.  Each of these objects has a responsibility to accept and retrieve the value of the attribute as a particular type, such as 32-bit integer or Boolean.  Each of these objects has the responsibility to maintain a dirty status and to provide mechanisms to clear that status or to force that status.  Each of these objects has a responsibility to maintain the null status (that is, whether the value has been specified or not).  Each of these objects may or may not have the responsibility to determine or at least manage the process of determining whether or not it is valid or not.


We have a bunch of design considerations that we should think about as we look at the possible alternatives.  These include:


  • The relationship of these objects to the container object (in our case this is the Address class).
    • The assumption is that the container class (for example, Address) will provide some kind of façade mechanism to connect the consumers within the client software with one or more of these data objects.  The question is how much of a façade is appropriate?  There are a couple of different ways that this could be done:
      • The container could provide a delegated linkage to the typed value.  That is the Address would look like a typical domain object in which strongly typed attributes were available.
      • The container could provide multiple methods for each underlying attribute to permit the client software to have limited access to the functionality of these data objects.
      • The container could provide a read only attribute for each of these data objects to allow consumers in the client programs to access the underlying functionality of these data objects.
      • The container could provide a generic lookup mechanism using a key (most likely the name of the attribute) to retrieve a reference to a particular data object.
      • The container could provide a reference to a read-only collection that the client software could iterate over at will.
    • At some point in the processing the container object needs to understand whether it is dirty or not dirty to support the persistence of the contents to a data store.  I can think of three distinct ways by which the individual data objects can communicate their "dirtiness" to their parent container:
      • Each data object declares one or more events to signal various change events that have occurred.  The most obvious of these is the "My dirty status has changed to the value of a supplied parameter".  One can also envision similar events for becoming valid or invalid, becoming null or not null, and so on.
      • Assuming that the individual data object is aware of its parent container (that is, it holds a reference to the parent container), then the individual data object could invoke methods on the parent class to signal each of these various kinds of changes.  This of course binds the child data object class to its parent container.  Some of the sting of this binding can be taken out by using an interface, but the binding is still there.
      • The data object might take a more passive stance and provide a set of methods which the container or anyone else could invoke to determine the status of the data object.  Think about functions such as is valid, is dirty, is null, and so on.
  • While we certainly want the ability to provide a very specific class to support our application, there may well be some parts of the application that could benefit from the use of a collection of polymorphic data objects.  One of the examples that we might use is a dynamic construction of a web page or of a Windows form for data entry.  It would also think about supporting the dynamic construction of SQL to retrieve and update a database, the creation and consumption of XML files, and so on.  If we choose to support this kind of dynamic behavior, we may well want to include some form of explicit Meta data within each of these data object's.  Taking this to its logical extreme, we would also want to be able to explicitly present the rules that are used to determine the validity of the contents of the data object.
  • We want to be able to dynamically control the rules that are applied to each data object.  Specifically we want to be able to "turn on" and "turn off" the application of each of these rules.  As the data in the Address proceeds through the various stages, we would turn on more and more rules to ensure that the quality of the data increased monotonically over time.  This implies that we must be able to explicitly identify each rule, most likely through a specific rule identifier.  ("Now turning on rule 17 A.")
  • These data objects also serve as an "attractive nuisance".  By that I mean that they seem like a logical place to tack on some additional functionality that may or may not be appropriate.  For example I have built variations on what we're talking about in which each of the data objects contained a method that expressed the contents of the data object as an SQL value; numbers came out as numbers, strings came out with enclosing double quotes, dates came out with an enclosing pound signs, Boolean values came out as zero and one, and so on.  One could also envision similar methods to produce XML nodes and to produce HTML for display.  While I don't think this is an entirely stupid thing to do, I suspect that it is a better approach to provide attributes for each of these data objects to identify whether they are numeric, or string, are Boolean, are date and so on and allow some external class that deals exclusively with SQL, or XML or, HTML to query the characteristics of each of these data objects to determine how it should be expressed.
  • Again, in the area of attractive nuisances, it is also attempting to provide space within these individual data objects for "hints" as to how the data should be display.  For example I might want to provide a place in each of these data objects to include a reference to an object which controls formatting, or selects the kind of control that might be used on a Windows form, and so on.  Again, I don't think this is an entirely stupid thing to do but it's awfully easy to junk up these data objects with extraneous attributes simply because they are present in the design.

There are three fundamental design approaches that we can take to constructing these data objects.  These are the canvases upon which we're going to implement the set of features that we ultimately decide on.  Our choices are:


  • Interface: we could define an interface that each of these data objects would have to implement.  While we are going to look at this in a lot of detail, we can say a few things up front about the use of interfaces: First, the pure use of an interface means that we're going to re-implement the logic for each of the methods defined by the interface; when there are a small number of methods, this might not be too painful, especially if you use "cut and paste" inheritance.  Second, the use of an interface allows us to create multiple data objects that are inherited from different base classes.  For example, we're going to see that the numeric data objects might be constructed from a single base class or generic template.  The implementation of the interface allows us to support a fairly high degree of polymorphism.
  • Inheritance from an abstract class: We could define an abstract class that provided the implementation of the common logic and then create derived child classes for each of the different types of data object. One of the questions that we will look at is just how much common logic there is between these different data object types.  I think it is fair to say that the more functionality (drawn from our above) that we add to the conceptual data object, the more likely it is that this approach will make sense.
  • Generic class definitions: some programming languages provide a mechanism by which one can define a "generic" class definition that can be tailored to support a "specific" type at compile time.  There are half a dozen or more "numeric" data types for which this approach looks very promising.

We have a number of competing design forces that we must consider.  First, I have written in earlier entries about the design requirements of the client or consumer programming logic upon our Address class.  These requirements have not gone away.  Second, one of the motivations for this approach is that we will create a set of reusable data objects that can be applied again and again.  There is the question of how much glue code we want to add to the parent container, in our case the Address class, to make these data objects usable from the client programming logic.  If we require too much "blue" code to make these data object usable, we have erected a substantial barrier to reuse.  Third, we need to think about how the rules are implemented.  There are some cases in which the rules would benefit from having additional access to the contents of the data object.  The question is how do we provide that access without opening up the data objects two external "meddling".


To make things even more complicated, we are not restricted to a single canvas, we could use each of the approaches described just above in any combination.


Obviously we have quite a bit of material that we can cover.  I expect that it is going take a number of entries in this blog while I think out loud about each of these particular topics.


Post a Comment

Links to this post:

Create a Link

<< Home