From: Arthur S Bridges <Arthur_S_Bridges@progressive.com>
To: "Peter Hunsberger" <peter.hunsberger@gmail.com>
Date: Thu, 26 Apr 2007 10:46:02 -0400
I didn't want to get too descriptive
of the project as it involves security assets. Basically, this Security
Classification Dictionary is designed to track security assets (Servers,
Network IDs & resources, datasets, printer profiles, Access lists[ACL],and
other such resources), what they are (description and if they are part
of an application), any security model information, and who owns it (can
grant access to it).
It's also designed to interface/synchronize
with existing lists, in DB2 and other database types, and eventually supersede
most of them.
Arthur Scott Bridges
IT Security —
SCD
"Peter Hunsberger"
<peter.hunsberger@gmail.com>
04/26/2007 10:30 AM
To
"Arthur S Bridges" <Arthur_S_Bridges@progressive.com>
cc
xml-dev@lists.xml.org, arthursb73@gmail.com
Subject
Re: [xml-dev] XML and databases
On 4/26/07, Arthur S Bridges <Arthur_S_Bridges@progressive.com>
wrote:
>
> I am working on a large data classification dictionary in XML and
I am
> wondering at what point do I need to use a database as a back end.
>
> Project Profile:
> From a 'table' point of view, I have about
a dozen fields and 25,000
> records which we project to grow to 250,000+
> Each record contains resource profiles
and descriptions as well as
> ownership information which needs to be kept up to date with personnel
> changes.
> We plan to use XQuery/XSLT as well as
C# for access/update
> programming
>
> Has anyone out there had to deal with a project of this size?
>
> We are planning to run a web service fro Query and update functions.
>
From your description it's sort of hard to tell what
you're doing. We
have a large metadata driven system for collecting medical research
data which may match up well with your "data classification
dictionary"? We use XSLT to customize the data for about 70
different
medical protocols each of which needs different presentation and
different fields, but using web serviced query and update would have a
CRUD profile similar in nature. The back end is Java with EJB.
Currently we track what would be about 16,000 colums
in a conventional
schema and have the equivalent of what would be about 5,000,000
active rows of data in a conventional schema. We expect to row another
order of magnitude in size over the next two years and may be starting
to manage gene and phenom data in which case we expect perhaps another
2 orders of magnitude growth. We actually use about 30 tables in a
highly normalized proprietary EAV type pattern for the actual
relational store behind the scenes. We dynamically assemble the
presentation based on authorizations and the metadata at run time.
For most of the conventional access response times are subsecond
though some complex hierarchical searches can run into the 20 second
range. Our actual system is highly resource intensive but is seems
necessary given that the presentation and schema requirements change
daily (new protocols, protocol revisions, etc.) . Bottom, line; I
can't see why you would have any problems managing the volumes you're
talking about if you design the system properly.