OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Expert's advice needed about XML Schema and definingsome k

[ Lists Home | Date Index | Thread Index ]

On Thu, 2003-12-04 at 14:00, Peter Glantschnig wrote:
<snip>
> I will try to explain the main problem. Let's say you have two XML
> files. One stores publications and the other one stores some names of
> persons. Now each person is responsible for a couple of publications.
> Now I want to make sure that this relation is always true by using XML
> Schema. So when you enter a new publication, you should not be able to
> assign a person to that publication, which can not be found in the
> persons XML file. So at least when you validate the publications XML
> file you should get an error.

If I had this problem, I would probably have solved it using XLink.
Using XLink, or some other linking approach, it is not necessary to
change the publications schema every time a new person is added to, or
removed from, the persons document.

You may have one author associated with several publications, but I
assume the reverse is also true. While it is possible to create the
links using Simple XLink, Extended XLink would seem to offer a more
natural solution in this case, because Extended XLink supports both
multiended and out of line links.

The links could either be inline, i.e. the links are inside the two
files, or out of line, which means the links would be defined in a
separate document. (Topic maps use the latter approach.)

Personally, I prefer going with inline links when it is possible to edit
the source document at will. It is usually (but not always) a bit easier
to implement applications that way.

Instead of validating against a schema, you would have to check the
links. This is fairly easy though: walk through the link elements, yank
the URIs and see if there is anything at the other end. Given Perl and
LibXML you could do something like this:

use LWP::Simple;
use URI;
use XML::LibXML;
...
sub find_broken_locators {
  my $doc_element = shift;
  my $uri;
  my $xlink_ns = 'http://www.w3.org/1999/xlink';
  my @broken_locators;
  foreach my $locator ($doc_element->findnodes('//*[@xlink:href]')) {
    $uri = $locator->getAttributeNS($xlink_ns, 'href');
    find_link_end($uri) or push @broken_locators, $locator;
  }
  return @broken_locators;
}

my %document_cache;
sub find_link_end {
  my $uri = URI->new(shift);
  my $base_uri = $uri->scheme().'//:'.$uri->authority().$uri->path();
  unless (defined($document_cache{$base_uri})) {
    my $document_string = get($base_uri);
    $document_cache{$base_uri} =
      eval{XML::LibXML->new()->parse_string($document_string)} ||
      undef;
  }
  return undef unless $document_cache{$base_uri};
  my $document = $document_cache{$base_uri};
  my $id = $uri->fragment();
  my ($target_element) = $document->findnodes("//*[id($id)]")};
  return $target_element;
}
...

Quite some time since I did something like this, so I'm sure you can
find a bug or three. Also, it would be necessary to add a bit more error
handling in real life. I hope the principle is clear though.

find_broken_locators() iterates over a list of XLink elements and calls
find_link_end() for each one. If a link target is _not_ found, the link
element is added to a list of broken links. After checking all links,
the function returns a list of broken links.

find_link_end() extracts the base URI (well, URL, really,) and downloads
the target document. Since we will want to check the same target
document many times, it is cached. This saves a lot of wear and tear on
get() and the parser. If we didn't find a target document, or could not
parse it, the function returns undef. If we did parse the target
document successfully, we get the fragment identifier, assumed to be the
value of an ID attribute, locate the element node with that ID value,
and return the element node.

In real life I would probably go for a more object oriented solution
(still Perl though, if I have a choice). I would not try to implement
link checking in XSLT, even though it is possible to do it.

/Henrik





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS