OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   A utility to make msxsl more useful

[ Lists Home | Date Index | Thread Index ]
  • From: Andrew Bunner <bunner@massquantities.com>
  • To: xml-dev@ic.ac.uk
  • Date: Fri, 04 Sep 1998 14:39:12 -0700


  I wrote a small Perl script that can be used to preprocess XML files
before sending them to msxsl. Why might you want to do this? So you can
expand ENTITY references and do something like <INCLUDE
HREF="included_file.xml"/>

  It's very basic and very small so I just attached it to this message for
anyone who's interested.

  Here's the syntax for using it from the DOS command prompt...

C:\<your path to Perl>\Perl.exe expand.pl myfile.xml > temp.xml
msxsl -i myfile.xml -s myfile.xsl -o output.html

  myfile.xml can define entities in its internal and external DTD by saying
<!ENTITY entityname 'VALUE'> or <!ENTITY entityname SYSTEM 'filepath'> You
can use single or double quotes.

  I also made it so you can include a file by saying <INCLUDE
HREF="filetoinclude"/>

  Basically, I'm trying to find ways to make msxsl usable now. I was sort
of hoping some Java programmers would leap to the rescue and turn msxml (or
some equivalent parser) into type of preprocessor for msxsl but, failing
that, I worked up a quick and dirty way to do what I want. Hopefully some
one else will find it useful.


main();

sub main {
	$xml = (&readFile($ARGV[0]));
    %externalEntities = &parseExternalDTD($xml);
    %internalEntities = &parseInternalDTD($xml);
    my($moreToGo) = (1);
    while ($moreToGo) {
    	$moreToGo = &expandEntities(%externalEntities, %internalEntities) | &expandLinks(%externalEntities, %internalEntities);
	}
    print $xml;
}

# $_[0] = file name or path
# returns full text of file
sub readFile {
	my($contents);
	my(@fileInfo) = stat($_[0]);
	open(F, $_[0]) or die "Couldn't open $_[0]\n";
	read F, $contents, $fileInfo[7];
	close(F);
    return $contents;
}

# $_[0] full text of an XML document
# returns hash of external entities and what they reference
sub parseExternalDTD {
	# Looking for...  <!DOCTYPE foo SYSTEM 'bar.dtd'>
	unless ($_[0] =~ /<!DOCTYPE\s+\w+\s+SYSTEM\s+['"]([^"']+)/) {
    	return {};
    }
    my($dtdPath) = ($1);
    my($dtd) = &readFile($dtdPath);
    my(%entities) = (&extractEntities($dtd));
    return %entities;
}

# $_[0] full text of XML document
# returns hash of internally defined entities and what they reference
sub parseInternalDTD {
	my(%entities) = (&extractEntities($_[0]));
    return %entities;
}

# $_[0] text, possibly containing <!ENTITY> declarations
# returns entity has of names and values
sub extractEntities {
	my($text) = $_[0];
	my(%entities);
    my($entityName, $entityPath);
    # Looking for <!ENTITY foo 'bar'> or <!ENTITY foo SYSTEM 'bar'>
    while ($text =~ /<!ENTITY/) {
    	if ($text =~ s/<!ENTITY\s+(\w+)\s+['"]([^'"]*)['"]>//s) {
        	$entities{$1} = $2;
		} elsif ($text =~ s/<!ENTITY\s+(\w+)\s+SYSTEM\s+['"]([^'"]+)['"]>//s) {
        	($entityName, $entityPath) = ($1, $2);
            $entities{$entityName} = &readFile($entityPath);
		}
	}
    return %entities;
}

# @_ is a hash of entities and what they expand to
# works on global variable $xml searching for &foo; references
# returns true if it was able to make any replacements
sub expandEntities {
	my(%entities) = @_;
    my($gotOne) = (0);
    while ($xml =~ s/\&(\w+);/$entities{$1}/) {
    	$gotOne = 1;
    }
    return $gotOne;
}

sub expandLinks {
	my($gotOne) = (0);
	# We're looking for... <INCLUDE HREF="foo"/>
    # This is not a complete implementation! A real XML processor would
    # look for any type of link that's defined to have SHOW="EMBED" and ACTUATE="AUTO"
    # ...but that's too much work for what I'm after
    while ($xml =~ s/<INCLUDE\s+HREF=["']([^"']+)["']\/>/&readFile($1)/se) {
    	$gotOne = 1;
	}
    return $gotOne;
}

-- Andrew

   Andrew Bunner
   President, Founder Mass Quantities, Inc.
   Professional Supplements for the Perfect Physique
   http://www.massquantities.com 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS