Towards a Standardization of Database Metadata in XML
Wednesday, 30 August 2006
Having knowledge of how you database is
structured is not easy. It is often available through complex tools accessible
to DBA (Database Administrator), not even developers. Management may not have
access to those tools. This structure is data about the data, it is called metadata.
This article introduces the needs for a XML dialect to
store the database's metadata.
Why is it important?
In our times, people want to measure
impacts more closely than ever. What happens when my supplier delivers updates
of the software? What impacts has this change on your database? What impacts
has it on your data?
ITIL (IT Infrastructure Library) recommends
that changes should be tracked in the CMDB (Configuration Management Database).
Other recommendations, including Sarbanes-Oxley, impose such archives. And
simply, it is a best practice with a lot of common sense.
The problem is that there is no standard
way to access, store & exploit this information.
Why is it important to have a standard?
It is important to have a standard way of
storing those elements because most companies, those days, have multiple
databases and multiple vendors supplying the data servers itself. It is not
rare to find a shop with an ERP on Oracle, a retail application on Informix and
a few web sites handling data from MySQL. Nobody is shocked.
Tools to access all those databases exist
from Quest, Embarcadero, etc. But a lot of tools remain proprietary (like Command
Center of DB2) or dedicated such as AGS ServerStudio for Informix…
This is the purpose of an XML dialect for
storing database schema (aka database structure or database metadata).
Why not DDL (Data Definition Language)?
DDL uses SQL (Structured Query Language) and
is proprietary to every vendor, and may evolve over time. DDL is not easy to
parse, unless you are fluent in EBNF (Extended Backus–Naur Form).
Seeing differences between two versions of
DDL can be done by using the Unix diff command, which results is not very
user-intuitive. Just try to bring the result of a diff over 2 DDL files, to
your CIO (Chief Information Officer), for fun.
Requirements
Be extensible: all databases are different.
Storing the different capabilities is not the goal, but it is critical to see
where things differ and to be sure that you have a format that follows the
evolution and your data server assets.
Proposal
Based on all those constraints & needs,
we propose DBMXML, for DataBase
Metadata eXtensible Mark-up Language.
What benefits can we expect of it?
Offer
a vision of a database. Applying a simple XSL
(eXtensible Stylesheet Language) filter will give you HTML, PDF or even Word
reports of what your database looks like.
Comparing
two versions of the database. Going from
development to production? What impacts has the new development on my
production database (yes, all this should be documented, but real life is often
catching on us).
Keeping
track of the evolution. What is the difference
between my production, testing and development database?
Improve
consistency. If your customers’ last name is 32
characters long in your ERP and only 10 characters on the web, you have a
consistency problem which may result in data loss or alteration at some time.
Measure
the impact of changes. If the result of the
comparison of 2 version of your schema reflects that a field gets from 40 to 50
characters long has different consequences that adding a column that does not
allow NULL values.
More benefits
For
the CIO: have a permanent, consistent and historic
vision of its databases.
For
the project manager: have a consistent way to
discuss and share information about database structure to his developers.
For
the developer: Developers will be more cautious and aware about
changes at the database level and will adapt the source code accordingly in a
safer way.
For
the DBA: increased visibility of the database,
ability to build and use more consistent scripts, tools, etc.
Extensions
A variety of tools can be build on top of
DBMXML: code generator, documentators, graphers, productivity scripts,
administration tools…
Practical applications of such a language
XML2DDL or DdlUtils uses an XML dialect to
create the DDL for creating or altering a database. Awoma proposed a framework,
called ThinStructure, where it dynamically used an XML repository for building
records, SQL statements, checking integrity…
Why not reuse someone else’s work?
We are not affected by the NIH (Not
Invented Here) syndrome, but after very extensive search, over time, we did not
find anything that would fulfill the needs. DBMXML is being designed with
transformation in mind, meaning that others can use it.
Similar work
XML2DDL is a project to transform an XML file to DDL, currently for PostgreSQL,
MySQL, Oracle and Firebird. It does not define a standard XML dialect. Problem:
XML is not documented. Project lead: Scott Kirkwood, scott at forusers dot com.
More details at http://xml2ddl.berlios.de.
DdlUtils is an Apache project focusing on delivering an easy-to-use
component for working with Database Definition (DDL) files. These are XML files
that contain the definition of a database schema, e.g. tables and columns. Problem:
XML is documented by a DTD but limited (e.g. no information on stored
procedure, no “extension” possible…). Project lead: Thomas Dudziak, tomdzk at
gmail dot com. More details at http://db.apache.org/ddlutils.
What’s next?
The next steps will be to design the XML
dialect that would achieve DBMXML.