parsing web pages with tDOM
short note how to parse web pages with tDOM
most web pages, especially now with html5, are rather "flexible" in
their adherence to standards. one can't just parse them as something
XML-like anymore. reading the
manual suggested
that the -html5
and -ignorexmlns
switches might help:
set f [ open "foobar.html" r ]
set dom [ dom parse -html5 -ignorexmlns [ read $f ] ]
they did, no parsing errors anymore :)