An Architecture of a hypermedia DBMS supporting physical data independence

Authors
T. Prückler, M. Schrefl
Paper
Prue96c (1996)
Citation
(Preliminary version of "Achieving physical data independence in hypermedia databases")
Resources
Copy  (In order to obtain the copy please send an email with subject  Prue96c  to dke.win@jku.at)
BibTeX

Abstract

As a large amount of hypermedia data is collected the question of storing this data in a hypermedia database arises. Hypermedia databases have to deal with a host of new problems and problems solved for conventional databases resurface in a different form or demand a different solution. The goal of the hypermedia database we are creating is the support of typical hypermedia applications with highly interlinked pieces of information in different media. One of the problems encountered is how to separate applications from the data so that the same data can be used by many applications and changes to data organization have - besides run-time - no effect on applications. In conventional databases this problem is solved by the concepts of physical and logical data independence.

In this article we argue that the concept of physical data independence can be applied beneficially to hypermedia databases. So far, while the demand for physical data independence has been voiced in a number of papers, only partial solutions have been proposed. We will describe the part of a hypermedia database management system which deals with physical data independence and show how an existing database management system can be extended in that direction. In particular, we will present a DataBlade for Illustra, extending Illustra with physical data independence for images. Illustra is a commercial version of Postgres [Stonebraker and Kemnitz, 1991] and provides extensibility by DataBlades, which group types, tables and functions.

Applying physical data independence helps to achieve the following goals:

*Easier reuse of multimedia objects
Often different applications use the same multimedia object (mm object), but each application requires a slightly different variation of the object. Variations needed include, e.g., different segments of the object, different encoding, and transformations (for images, e.g., gamma correction and effect filter). These variations are stored separately, thus creating uncontrolled redundancy. A mechanism is needed to provide these variations in a consistent and efficient manner and to control the redundancy.
The term "variation" was chosen to contrast to "derivation" as used, e.g., in [Gibbs et al., 1994], which is used to describe both: a view mechanism and a mechanism to achieve physical data independence.

*Higher performance
Operations needed to transform the mm objects often last to long for interactive presentations. So even if several applications share a mm object to achieve reuse as described above, the need to store different variations to avoid transformations on demand remains. A mechanism is needed to instantiate and control the use of customized variations of a mm object.

The concepts described in the article achieve the goals presented above through the following means:

For each mm object there is exactly one conceptual representation. Applications use sequences of operators (called queries) on the conceptual object to derive a concrete representation of the mm object, e.g., an image of a certain size and resolution.

As a default, such representations are created on the fly. To accelerate the execution of the query, instantiations of the result (called "secondaries") can be created. These secondaries are either concrete variations which are expected to be used frequently or variations of the mm object to which certain (time-expensive) operators of the query language have already been applied.

Whenever a query is sent to the conceptual object, a query optimizer first reformulates the query into a canonical format and then selects the primary or secondary representation which can be converted to the desired output with the least effort. To bring a query into the canonical format, the operators the query consists of are divided into classes where operators belonging to the same class are commutative or only one operator of the class can be meaningfully applied. For images in Illustra, such classes are for example preparers (e.g., crop, trunc), enhancers (e.g., edge, enhance), transformers (e.g., scale, gamma, rotate), effect-filters (e.g., oil, bentley), and converters (e.g., to gif or postscript).

Thus not only the application for which the secondary was created benefits from the secondary, but every application which can use the secondary as an intermediate to creating the desired variation. The performance gain is achieved simply with the instantiation of the secondary without the need to change any application or the database schema. Deletions of precomputed variations also do not result in changes to applications or the schema, although the execution time of certain queries increases.

The following approach is used to extend Illustra with physical data independence: For each mm object exists exactly one conceptual medium. Each conceptual medium has one or more internal media, which in turn contain one or more media chunks. Internal media conveniently model media consisting of several media chunks, which store the mm object.

Whenever a piece of multimedia data is inserted into the database a conceptual medium is created. The media data is stored in media chunks belonging to an internal medium. The internal medium containing the original data is called the primary internal medium of the conceptual medium. To achieve higher performance variations can be created which are derived from the primary internal medium and which are also stored as internal media with corresponding chunks. These derived internal media are called secondary internal media. The internal media belonging to a conceptual medium form a directed, acyclical graph starting from the primary where the edges are annotated with the operators which led to the creation of the secondary. The query optimizer uses this graph to detect either a precomputed variation which matches the query or searches for the secondary which can be transformed into the result of the query with the least effort.

There is no need to instantiate secondaries for all requested queries: If several queries have a common prefix, followed by different additional operations, a secondary representing the intermediate step up to this prefix can be used. The secondary can be shared by a number of queries to save space.

We are implementing a proof of concept prototype supporting physical data independence using Illustra. As Illustra is an extensible database, we need not change the architecture, but we will add appropriate DataBlades for mm objects, starting with images. Further work includes the development of a design environment also supporting logical data independence.

Selected important references of the full paper are listed below:

[Campbell and Chung, 1995] Scott T. Campbell and Soon M. Chung. The role of database systems in the management of multimedia information. In Proceedings of the International Workshop on Multi-Media Database Management Systems'95, pages 4-11, Blue Mountain Lake, New York, August 1995. IEEE Computer Society Press.

[Campbell and Goodman, 1988] Brad Campbell and Joseph M. Goodman. HAM: A general-purpose hypertext abstract machine. Communications of the ACM, 31(7):856-861, July 1988.

[Gibbs et al., 1993] Simon Gibbs, Christian Breiteneder, and Dennis Tsichritzis. Audio/video databases: An object-oriented approach. In 9th Intl. Conf. on Data Engineering, pages 381-390, 1993.

[Gibbs et al., 1994] Simon Gibbs, Christian Breiteneder, and Dennis Tsichritzis. Data modeling of time-based media. In Proceedings ACM SIGMOD Conference on the Management of Data'94, pages 91-102. ACM, May 1994.

[Gu and Neuhold, 1993] Junzhong Gu and Erich J. Neuhold. A data model for multimedia information retrieval. In Proceedings of the First International Conference on Multi-Media Modeling, pages 113-127. World Scientific, Singapure, November 1993.

[Halasz and Schwartz, 1994] Frank Halasz and Mayer Schwartz. The dexter hypertext reference model. Communications of the ACM, 37(2):30-39, 1994.

[Kacmar and Leggett, 1991] Charles J. Kacmar and John J. Leggett. PROXHY: A process-oriented extensible hypertext architecture. ACM Transactions on Information Systems, 9(4):399-419, 1991.

[Prückler and Schrefl, 1995] Thomas Prückler and Michael Schrefl. Modeling corresponding information content between multimedia data. In LIRMM Research Report Nr 95028 (preliminary proceedings of the International Workshop on Hypermedia Design 1995), pages 63-74, June 1995.

[Schnase et al., 1993] John L. Schnase, John J. Leggett, David L. Hicks, and Ron L. Szabo. Semantic data modeling of hypermedia associations. ACM Transactions on Information Systems, 11(1):27-50, 1993.

[Sheck and Scholl, 1984] H. J. Sheck and M. Scholl. An algebra for the relational model with relation valued attributes. Technical Report TR DVSI-1984-TI, Technical University of Darmstadt, Darmstadt, 1984.

[Stonebraker and Kemnitz, 1991] Michael Stonebraker and Greg Kemnitz. The POSTGRES next-generation database management system. Communications of the ACM, 34(10):78-93, October 1991.

[Tsichritzis and Klug, 1978] D. C. Tsichritzis and A. Klug. The ANSI/X3/SPARC DBMS framework report of the study group on database management systems. Information Systems, 3(3):173-191, 1978.

[Wiil, 1993] Uffe Kock Wiil. Experiences with hyperbase: A hypertext database supporting collaborative work. ACM SIGMOD RECORD, 22(4):19-25, December 1993.

[Woelk et al., 1986] Darrell Woelk, Won Kim, and Willis Luther. An objectoriented approach to multimedia databases. In Proceedings ACM SIGMOD Conference on the Management of Data, pages 311-325, May 1986.