Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What are data models?

Wikipedia offers the following definition of a Data Model:

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world. For instance, a data model may specify that a data element representing a car comprises of a number of other elements which in turn represent the color, size, and owner of the car.

In addition, the Wikipedia entry goes on to say the following:

A data model explicitly determines the structure of data.

Data models describe the structure, manipulation and integrity aspects of the data stored in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video.

Taking the analogy of computer programming languages, data models can be compared with data structures.

Why are data models needed?

Data models as described above are needed to facilitate the development of information systems by providing developers (not necessarily programmers) with clear definitions and format of the data. In addition to offering clarity, data models are steps towards standardization of data interchange and access policies. For instance, well-defined data models can make information systems interface with one another with much ease and efficiency, thereby decreasing costs and confusion.

ANSI Classification of data models

The ANSI study group on Data Base Management Systems came up with 3 kinds of instances of data models:

  1. Conceptual data model: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model.

  2. Logical data model: describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
  3. Physical data model: describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

How does it all fit in with the CSH project?

We need a data model in order to construct a repository of digital objects from the CSH. These digital objects are surrogates for actual physical documents, photographs, microfilms, and perhaps other formats of physical storage. One major part of our data model, other than the actual data, is the metadata that provides information about the object, and information about an object's relationships with other objects. Having a well-defined, standards-based data model will help us create a repository of these objects which can facilitate easy access by scholars, and easy interfacing with other repositories.This section includes metadata descriptions for the dark archive and the public-facing digital library. Both data models are developed based on broadly adopted, well-documented, community supported standards. The standards have been extended where necessary. We expect that no single data model is sufficient for 

Dark archive

The dark archive will be available only to project personnel and to the staff authorized by participating organizations, such as Central State Hospital and the State Library of Virginia.

The unit of archiving is a single scan. Metadata for the archive is designed for this level of granularity. Part of the rationale for this decision is simplicity and part operational facility.

Simplicity: The relationships between various scans is difficult to ascertain without studying the scans. Scans are numbered sequentially and in some cases, some scans are more closely related to each other than others within the sequential order.

Operational facility: Scan file sizes are between 20MB and 250MB. Bundling scans into a larger grouping for archiving will result in a requirement of downloading several gigabytes at a time prior to accessing any information from the archive. Supporting granular access will enable future patrons or software to retrieve data quickly and then bundle it as needed for particular goals.

The metadata will use a hybrid, standards-based data model, with classes that cluster the metadata developed for different purposes: descriptive, administrative, technical, rights, and preservation.

Digital library

The data model for the CSH digital library will be based on standards such as the Portland Common Data Model, or PCDM. Documentation for PCDM can be found at the following URIs:

DescriptionURI
Duraspace PCDM wikihttps://github.com/duraspace/pcdm/wiki

Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects

http://www.slideshare.net/kestlund/portland-common-data-model-pcdm-creating-and-sharing-complex-digital-objects