|
|
Up |
  |
Author: Mitchel HaasMitchel Haas Date: May 9, 2008 07:43
Hello,
For anyone with a need to generate or parse (x)html, I'd like to
announce a relatively new lightweight library for generating xhtml and
parsing xhtml and html.
Xport, XHTML Parsing & Objective Reporting Toolkit, is a new open
source lightweight library for the purpose of generating and parsing
(x)html documents. Although Xport was created for reporting
purposes...
|
| Show full article (1.18Kb) |
|
| | 4 Comments |
|
  |
Author: marlow.andrewmarlow.andrew Date: May 10, 2008 04:20
On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
> Hello,
>
> For anyone with a need to generate or parse (x)html, I'd like to
> announce a relatively new lightweight library for generating xhtml and
> parsing xhtml and html.
> If you have any need of xhtml/
> html generation or parsing, I hope you can find the library useful.
Thanks for making this available. I'm sure many will find it useful.
But I am not so sure about my case. My need is to parse HTML for use
by a screen scraper. The trouble is, most web pages, including the
ones I am scraping, have ill-formed HTML. How does your library cope
with that? I eventually gave up trying to do this in C++ and used
python instead. It has a package called BeautifulSoup which is
designed specifically to cope with ill-formed HTML.
>
> Thanks,
>
> Mitchel Haas
|
| Show full article (1.00Kb) |
|
| | no comments |
|
  |
Author: Ian CollinsIan Collins Date: May 10, 2008 18:26
{ Accepted as follow-up. Further discussion of general tools for HTML tidying
would be off-topic (as I see you're aware :-) ) unless there is some C++
content. -mod }
marlow.andrew@ googlemail.com wrote:
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
>
htmltidy is your friend in this case. Your system may have it
installed, otherwise it is very easy to build and use.
|
| |
| no comments |
|
  |
Author: AnonMail2005AnonMail2005 Date: May 10, 2008 18:25
{ Accepted as follow-up. Further discussion of general tools for HTML tidying
would be off-topic unless there is some C++ content. -mod }
On May 10, 8:14 am, marlow.and...@ googlemail.com wrote:
> On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
>
>> Hello,
>
>> For anyone with a need to generate or parse (x)html, I'd...
|
| Show full article (1.32Kb) |
| no comments |
|
  |
Author: Mitchel HaasMitchel Haas Date: May 11, 2008 07:52
> On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
>
>> Hello,
>
>> For anyone with a need to generate or parse (x)html, I'd like to
>> announce a relatively new lightweight library for generating xhtml and
>> parsing xhtml and html.
>> If you have any need of xhtml/
>> html generation or parsing, I hope you can find the library useful.
>
> Thanks for making this available. I'm sure many will find it useful.
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
>
> ...
|
| Show full article (1.90Kb) |
| no comments |
|
|