XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What tool can convert the HTML from an Outlook emailmessage into XHTML such that the XHTML is valid in Outlook?

Hi Roger,

The SgmlReader by Chris Lovett -- downloaded from: https://github.com/lovettchris/SgmlReader and built with Visual Studio, gives us a handy command-line tool for tasks like this.

I tried it on the html file provided by you:

sgmlreader mail.html mail.xhtml

The output file: mail.xhtml does contain the xmlns:v="urn:schemas-microsoft-com:vml" namespace declaration. here is the complete output:

<?xml version="1.0" encoding="us-ascii"?><html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
            <head>
                        <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii" />
                                    <meta name="Generator" content="Microsoft Word 15 (filtered medium)" />
                                                <style><!--
/* Font Definitions */
@font-face
            {font-family:"Cambria Math";
            panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
            {font-family:Calibri;
            panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
            {margin:0in;
            margin-bottom:.0001pt;
            font-size:11.0pt;
            font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
            {mso-style-priority:99;
            color:#0563C1;
            text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
            {mso-style-priority:99;
            color:#954F72;
            text-decoration:underline;}
span.EmailStyle17
            {mso-style-type:personal-compose;
            font-family:"Calibri",sans-serif;
            color:windowtext;}
.MsoChpDefault
            {mso-style-type:export-only;
            font-family:"Calibri",sans-serif;}
@page WordSection1
            {size:8.5in 11.0in;
            margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
            {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="" />
</o:shapelayout></xml><![endif]--></head><body lang="EN-US" link="#0563C1" vlink="#954F72"><div class="WordSection1"><p class="MsoNormal">Hello, world<o:p></o:p></p></div></body></html>



Cheers,
Dimitre


On Mon, Sep 30, 2019 at 10:40 AM Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

 

At the bottom of this message I show HTML that was produced by an Outlook email message (a “Hello, world” email message). The HTML has some interesting features. For example, it has a comment containing namespace-qualified elements and attributes:

 

<!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="" />
</o:shapelayout></xml><![endif]-->

 

The v namespace prefix is used only in that comment and nowhere else. I have tried a couple tools that convert HTML to XHTML and apparently they don’t look inside the comment because they remove the namespace declaration:

 

<html xmlns:v="urn:schemas-microsoft-com:vml"
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:w="urn:schemas-microsoft-com:office:word"
            xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
            xmlns="http://www.w3.org/TR/REC-html40">

 

I want to import the XHTML back into Outlook, but unfortunately after removing namespace declarations the XHTML is not valid as far as Outlook is concerned.

 

Is there a tool that can convert the HTML generated by Outlook to XHTML, such that the XHTML can be reimported into Outlook?

 

If no such tool exists,  I will create my own tool. Would XSLT be suitable for such a task?  /Roger

 

<html xmlns:v="urn:schemas-microsoft-com:vml"
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:w="urn:schemas-microsoft-com:office:word"
            xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
            xmlns="http://www.w3.org/TR/REC-html40">
           
<head>
                       
<META HTTP-EQUIV="Content-Type"
                                    CONTENT="text/html; charset=us-ascii">
                                   
<meta name=Generator content="Microsoft Word 15 (filtered medium)">
                                               
<style><!--
/* Font Definitions */
@font-face
           
{font-family:"Cambria Math";
           
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
           
{font-family:Calibri;
           
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
           
{margin:0in;
           
margin-bottom:.0001pt;
           
font-size:11.0pt;
           
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
           
{mso-style-priority:99;
           
color:#0563C1;
           
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
           
{mso-style-priority:99;
           
color:#954F72;
           
text-decoration:underline;}
span.EmailStyle17
           
{mso-style-type:personal-compose;
           
font-family:"Calibri",sans-serif;
           
color:windowtext;}
.MsoChpDefault
           
{mso-style-type:export-only;
           
font-family:"Calibri",sans-serif;}
@page WordSection1
           
{size:8.5in 11.0in;
           
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
           
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal>Hello, world<o:p></o:p></p></div></body></html>

 

 

 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS