Background, Tutorials

Background, Tutorials

What are data models?

Wikipedia offers the following definition of a Data Model:

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world. For instance, a data model may specify that a data element representing a car comprises of a number of other elements which in turn represent the color, size, and owner of the car.

In addition, the Wikipedia entry goes on to say the following:

A data model explicitly determines the structure of data.

Data models describe the structure, manipulation and integrity aspects of the data stored in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video.

Taking the analogy of computer programming languages, data models can be compared with data structures.

Why are data models needed?

Data models as described above are needed to facilitate the development of information systems by providing developers (not necessarily programmers) with clear definitions and format of the data. In addition to offering clarity, data models are steps towards standardization of data interchange and access policies. For instance, well-defined data models can make information systems interface with one another with much ease and efficiency, thereby decreasing costs and confusion.

ANSI Classification of data models

The ANSI study group on Data Base Management Systems came up with 3 kinds of instances of data models:

  1. Conceptual data model: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model.

  2. Logical data model: describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
  3. Physical data model: describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

How does it all fit in with the CSH project?

We need a data model in order to construct a repository of digital objects from the CSH. These digital objects are surrogates for actual physical documents, photographs, microfilms, and perhaps other formats of physical storage. One major part of our data model, other than the actual data, is the metadata that provides information about the object, and information about an object's relationships with other objects. Having a well-defined, standards-based data model will help us create a repository of these objects which can facilitate easy access by scholars, and easy interfacing with other repositories.