parsing web pages with tDOM

short note how to parse web pages with tDOM

most web pages, especially now with html5, are rather "flexible" in their adherence to standards. one can't just parse them as something XML-like anymore. reading the manual suggested that the -html5 and -ignorexmlns switches might help:

    set f [ open "foobar.html" r ]
    set dom [ dom parse -html5 -ignorexmlns [ read $f ] ]

they did, no parsing errors anymore :)