Integrating Data and Information Management

Reagan W. Moore

Published 2004, SCEC Contribution #903

A common approach is emerging for the management, analysis, and preservation of digital data. The approach relies upon the separation of content management (the digital entities), from context management (provenance, authenticity, and administrative metadata), from knowledge management (concept spaces for asserting relationships between metadata attributes). For each layer, virtualization mechanisms are being developed to manage distributed data. Storage repository virtualizations are used to allow data storage across a wide variety of storage systems. Once data is distributed across multiple systems, data virtualization is used to impose a common namespace for context management. Information repository virtualization is used to allow the context to be stored in arbitrary database technology. Once the context is distributed between multiple databases, an information virtualization is required to define relationships between different choices for attribute names. The relationships are stored in a knowledge repository as a concept space or ontology.

This approach is being used to implement data grids for the sharing of data, digital libraries for the publication of data, and persistent archives for the preservation of data and the management of technology evolution. Data grids provide the fundamental virtualization mechanisms that are needed for managing distributed data. Both digital libraries and persistent archives can be built on top of data grids. Multiple examples will be given of projects managing distributed data. Research areas will be identified including federation of data grids, and integration of knowledge management with data management. The goal is to build a data management environment in which all aspects of data discovery, manipulation, and preservation can be automated, and driven from an application.

Citation
Moore, R. W. (2004). Integrating Data and Information Management. Poster Presentation at International Supercomputing Conference.