[
Lists Home |
Date Index |
Thread Index
]
Thomas B. Passin wrote:
> It turns out that the page is hand-authored by someone who is not very
> expert about html. Every update the internal structure changes. It
> always looks the same in the browser, but certain key internal parts are
> actually invalid html, and the nature of the invalidity changes each
> time. Unfortunately we have to use those parts to extract indexes that
> point to the actual data we want to collect from other parts of the page.
>
> We cannot outguess all the changes, and so from time to time we get
> parse failures. We cannot influence the page design. Finally, we give
> up and use the text-only version that the agency also hosts. This has
> no markup, but the visual structure blocks out the information we need
> in a consistent way, and the visual structure matches the actual text
> format. I write a parser that emits sax-like events to feed into the
> downstream process. Everything works nicely and robustly after this
> change.
So you had a contract with the client. The client did not stick to the
contract and you had to do a great deal of work to make your end work.
If it was me, I would say to the client that you have to simply conform
to the schema that we agreed to, otherwise we have to do a great deal
more work and we will have to charge for all this work. If the
requirements changed then adjust the schema accordingly. That is much
simpler than doing the work you did. It is so simple to produce a valid
document. Hopefully you charged for it -- wait, I take that back being a
California taxpayer. It almost sounds like negligence...
-Rob
>
>
> As Rusty says, that is the world of the internet.
>
> Cheers,
>
> Tom P
>
- References:
- RE: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: "Howard Katz" <howardk@fatdog.com>
- RE: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
- From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
- From: Henrik Martensson <henrik.martensson@bostream.nu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: Robert Koberg <rob@koberg.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
- From: "Thomas B. Passin" <tpassin@comcast.net>
|