Knowledge Integrity Column Archive/Data Standards and Data Models  
Custom Development
Company Profile
Work With Us
Column Archive
Ask The Expert


Data Standards and Data Models - Published in DM Review, January 2004

A standard defines a frame of reference that encourages confidence between interacting parties. For example, when you fill your car's tank at a gas station, the standard definition of a "gallon" of gas will assure you that you are acquiring the amount of gas that you think you are. In turn, the standard definition of a "dollar" assures the gas station owner that you are paying him the appropriate value of the gas that you are purchasing. In essence, a standard is an agreement between interacting parties to the context of the interaction.

Presuming that any two (or more) parties wish to share information, there must be a way to describe what that information "looks like" so that when a data set arrives at its target location, the receiving party can actually do something with it. A data standard provides the guidelines through which interacting parties can confidently exchange information.

The goal of a data standard is to enable the sharing or exchange of information between multiple parties in a way that guarantees that the interacting parties share the same understanding of what is represented within that information. When exchanged information is comprised of structured data, a data standard provides the description of that structure. A data standard, at the very least, defines entity names, data element names, descriptions, definitions and formatting rules. In addition, a data standard may include procedures, implementation guidelines and usage directives. As more information is being exchanged in different operating environments, the need for defined data standards is becoming more acute. Particularly in environments where many separate organizations (each with its own data definition peculiarities) have agreed to exchange data, there is a need to coordinate that information exchange in a way that provides the most benefit to all participants.

Data models and data standards are related, yet they differ subtly. A data model is a formal structured representation of real-world entities focused on the definition of an object and its associated attributes. For example, a data model representing people might capture all attributes relevant to the description of a person: last name, first name, weight, height, birth date, hair color, eye color, etc. In addition, a data model captures how individual entities are related, such as documenting all line items associated with a customer's order.

The data model, however, is mostly concerned with the structure of the representation and not necessarily all the details associated with the content in that structure. We can say that an instance of a person object is attributed with that person's birth date and that the birth date attribute is represented using a character string, but the model does not specify whether that birth date is expressed using month names followed by the day of the month followed by a year (e.g., February 28, 1977), or whether it is expressed using the MM/DD/YY format (e.g., 02/28/77) or in any other date format.

Regardless of the format used for that date, as long as the representation is valid within the operating context (i.e., meets the needs of those working with that data), the value will conform to the model's directive. This may be fine as long as the people using the data in that model understand this to be true. However, as soon as anyone wants to share the data stored using that model with someone in a different organization, the variety of date formats may negatively affect the ease with which the information may be migrated from its source to its next destination.

For example, in contrast to the laxness associated with the source data model with respect to date representation, the next user of that data set may have strict requirements about date formats. This apparent formatting dichotomy evolves from the fact that any participant sharing the information may have his/her own data model, and embedded within each data model is information about the data types that populate each field. Therefore, while one data model (built using one vendor's database system) may allow dates to be stored as character strings, another data model (built using a different vendor's database system) might use an embedded system type for representing dates. When the target system attempts to load a record whose values do not conform to the specified type, an exception occurs that may prevent the participant from using that violating record (or the entire set of records).

The solution to this problem is the use of a data standard for information exchange. The standard may correspond to the source data model or the target data model, or may provide for a format that is foreign to both models. The actual format selected is irrelevant; what is important is the participants agree to use the selected format in any situation where they exchange data. This is not to say that a data standard should not be distinct from the data models associated with the applications that use the exchanged data. On the contrary, it is sometimes very important to develop the data standard in concert with the data model. However, it is important to be aware that there is a difference between a data model and a data standard.

© 2005 Knowledge Integrity, Inc

1-866-BIZRULE (1-866-249-7853)

images courtesy http://www.freeimages.co.uk