Open source library for generating and parsing (x)html
  Home FAQ Contact Sign in
Your Ad Here
comp.lang.c++.moderated only
 
Advanced search
POPULAR GROUPS

more...

comp.lang.c++.moderated Profile…

 Up
Open source library for generating and parsing (x)html         


Author: Mitchel Haas
Date: May 9, 2008 07:43

Hello,

For anyone with a need to generate or parse (x)html, I'd like to
announce a relatively new lightweight library for generating xhtml and
parsing xhtml and html.

Xport, XHTML Parsing & Objective Reporting Toolkit, is a new open
source lightweight library for the purpose of generating and parsing
(x)html documents. Although Xport was created for reporting
purposes...
Show full article (1.18Kb)
4 Comments
Re: Open source library for generating and parsing (x)html         


Author: marlow.andrew
Date: May 10, 2008 04:20

On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
> Hello,
>
> For anyone with a need to generate or parse (x)html, I'd like to
> announce a relatively new lightweight library for generating xhtml and
> parsing xhtml and html.
> If you have any need of xhtml/
> html generation or parsing, I hope you can find the library useful.

Thanks for making this available. I'm sure many will find it useful.
But I am not so sure about my case. My need is to parse HTML for use
by a screen scraper. The trouble is, most web pages, including the
ones I am scraping, have ill-formed HTML. How does your library cope
with that? I eventually gave up trying to do this in C++ and used
python instead. It has a package called BeautifulSoup which is
designed specifically to cope with ill-formed HTML.
>
> Thanks,
>
> Mitchel Haas
Show full article (1.00Kb)
no comments
Re: Open source library for generating and parsing (x)html         


Author: Ian Collins
Date: May 10, 2008 18:26

{ Accepted as follow-up. Further discussion of general tools for HTML tidying
would be off-topic (as I see you're aware :-) ) unless there is some C++
content. -mod }

marlow.andrew@googlemail.com wrote:
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
>

htmltidy is your friend in this case. Your system may have it
installed, otherwise it is very easy to build and use.


--
Ian Collins.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
no comments
Re: Open source library for generating and parsing (x)html         


Author: AnonMail2005
Date: May 10, 2008 18:25

{ Accepted as follow-up. Further discussion of general tools for HTML tidying
would be off-topic unless there is some C++ content. -mod }

On May 10, 8:14 am, marlow.and...@googlemail.com wrote:
> On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
>
>> Hello,
>
>> For anyone with a need to generate or parse (x)html, I'd...
Show full article (1.32Kb)
no comments
Re: Open source library for generating and parsing (x)html         


Author: Mitchel Haas
Date: May 11, 2008 07:52

On May 10, 7:14 am, marlow.and...@googlemail.com wrote:
> On 9 May, 08:25, Mitchel Haas datasoftsolutions.net> wrote:
>
>> Hello,
>
>> For anyone with a need to generate or parse (x)html, I'd like to
>> announce a relatively new lightweight library for generating xhtml and
>> parsing xhtml and html.
>> If you have any need of xhtml/
>> html generation or parsing, I hope you can find the library useful.
>
> Thanks for making this available. I'm sure many will find it useful.
> But I am not so sure about my case. My need is to parse HTML for use
> by a screen scraper. The trouble is, most web pages, including the
> ones I am scraping, have ill-formed HTML. How does your library cope
> with that? I eventually gave up trying to do this in C++ and used
> python instead. It has a package called BeautifulSoup which is
> designed specifically to cope with ill-formed HTML.
>
> ...
Show full article (1.90Kb)
no comments