Thursday, August 17, 2006

Busy Little Data Worker Bees

 In A Walk on the Data Side I outlined the overall elements that we are going to put into our laissez-faire data solution.  In this entry I am going to focus in on the worker bees of our solution, the objects that hold the individual data attributes of the entity, in our case the Address class.

I know from experience that it is all too easy to "junk up" these classes.  I also know that that is why re-factoring was invented.  At some point in time you recognize that there's a pattern to the junk and you try to move it to a more appropriate place.  But it is even better not to put the junk there in the first place.

First, we need some form of identity for the individual data element.  However we implement the data class (for example, using interfaces, using abstract classes, or using generics), we will need to be able to refer to this object by name.  The problem here is what name shall we use.  Each of these data objects will be linked to or associated with an underlying database column in a table (or XML file or comma separated variable file or...).  We could use that  external name but the rules for forming these external names are typically somewhat different from the rules for forming names in a programming language such as C# or  If we tidy the external name up, we need to maintain a link back to the underlying external name.  I think that the simplest way of doing this is to merely include an additional property, the original name, which is stored as a string. 

Second, we need the actual value of the data object.  Ultimately we need this value in a strongly typed variable that has a distinct datatype associated with it.  But for a significant part of the lifetime of an individual Address class, the data has the potential to be in an amorphosis format that is inconsistent with the underlying strong datatype.  That would suggest that we need to have the value stored as a string (which is capable of being as amorphosis as you like) and as the underlying strong data type.  [I am not sure that the word "underlying" is appropriate here.  It may be better to think of the string value as the underlying value and to think of the strongly typed data item as a view against the underlying string value.

Third, we need a number of indicators that describe the state of the contents of the data class.  Here are a few that I have thought of:

  • Null.  We need to know whether or not the value has been specified or not.  We could simply return a null (VB.NET Nothing) value as a result of the as string function above, but I prefer to have an explicit function that returns a Boolean to indicate if the value is null or not.  I also prefer to have a convenience function which indicates that the value is not null.  These are cheap functions to implement and they have the advantage of explicitly conveying the intention of the logic.
  • Converted.  Although not as important as the null indicators above, it is also useful to have an indicator that specifies if a supplied string value was successfully converted to the strongly typed view of the data.  This indicator is fairly essential for the internal operation of the class and is useful enough outside of the class that we should expose it.  This brings up the question of what to do when the client logic requests the strongly typed value when the string could not be converted successfully.  My initial cut at this would be to throw an exception.  The indicator provides a guard function that the client can use to prevent this.
  • Dirtiness.  In most applications (and one that we are considering is no different) we would need to keep track of whether the data has changed.  If it has not changed we may be able to avoid all sorts of time-consuming trips to the database and back.  Depending upon how we implement the dirty flag, we may need to add the original value to our list of things to keep track of.  We do not necessarily have to expose this original value to the outside world; we could just assume that the first value that we had was the original one.  That does lead to a complication about knowing that the original value was null as opposed to it being not specified it yet.  We could make that distinction by including the original value within the constructor but it is very likely that we will create the individual data objects very early in the game, well before we've actually gotten around to reading any data.
  • Data Type.  Because we're going to use these data objects for various generic-oriented processing, we need to know what the data type is.  As with the name above we have a couple of complications.  We could have the datatype as expressed in the programming language that we are using, or we could have the datatype that was used in the underlying persistent data store.  Both have their uses.  I want to point out that the data type is more than just the name of the data type, is also such things as the minimum and maximum lengths/values.  For example, we might want to know if this data column is linked to an underlying database column that is an auto-incrementing value; we might want to avoid setting this value and certainly want to avoid trying to write any values that we have set to the database.  We may have to expose any and all of these as a separate property.
  • Characterizations.  There are times in the processing of these data objects where I just want to know if the object is a number or a string or a Boolean or a date.  Numbers are particularly difficult because there are so many different data types that can be legitimately thought of as "numbers".  I would want to have a simple function provided by the data object that indicates if it is numeric.  Strings are perhaps a little is complicated but it is possible that you might implement something that would have fixed length strings and variable length strings.  Having an "is string" function simplifies this as well.  In many ways, these are simply extensions of the concept of datatype.  Again, these are relatively cheap functions to implement and they go a long way toward exposing the intention of the logic.  Obviously, you can do without these and because of that some people might regard this as "junk" methods, but I think they serve a legitimate and valuable purpose and I'm including them.

There are a few things that we could add to the mix but are not going to.  It may well turn out that you look at this and say "Ah ha, I'd like to have one of those".  In any case I think it's important to least think about them and make a conscious decision to include or not to include.  Remember, all design is choice between considered alternatives.  Here are at least some of the possibilities:

  • Descriptive Material. This would include such things as descriptive titles, column headers for reports and grid displays, and desired sort orders. 
  • Default Values.  We could include within the definition of the data object, the default value that should be used if no value is explicitly provided.  We could also include a method to define what the empty value for the data object is.
  • Rules.  Under this heading I include all of the various rules which describe what is acceptable behavior for the contents of the data object.  An obvious one is whether or not the value is permitted to be null.  I'm going to defer all of these kind of rule-based restrictions to the entry on rules which will show up in this blog in the next couple of days.  This pretty much includes anything to do with validity and expressing any rules that have been broken or that need to be followed.
  • Linkages.  Under this heading I include all those things which referred to how this particular entity relates to other entities.  We might like to know whether the column in question is a part of a key, or as part of the primary key.  We might like to know that this column is related to in the sense of eight "foreign key" to a column in some other entity.

I think all of these things are relatively straightforward (but that may be because I thought about them for a fairly long period of time).  I have not included a lot of detail because the choice between using interfaces, abstract classes and generics will influence the exact specification.  Choosing one approach over another will cause us to make some adjustments in how we specify the individual data object.  There doesn't seem to be a lot of value in writing explicit code that we half to come back and change later.


Post a Comment

Links to this post:

Create a Link

<< Home