... That's easily patched. Get the source and edit src/definitions/html.tssl. Then add "<memberOf group='M_HEAD'/>" after the line "<element name='noscript'...
Is there a way to keep the body of <script> intact? I have HTML that looks like this: ... <script ...> //<![CDATA[ ... if (myvalue && yourvalue){ //]]> ...
... It suppresses close-tags on empty elements -- so <hr>, not <hr></hr> -- and it uses minimized attributes in certain cases, so <input checked>, not <input...
hi john thanks for your reply, it helped me a lot! in addition i had to also add <contains group='M_HEAD'/> to <element name='noscript' type='mixed'>. without...
Martin Zdila
m.zdila@...
May 23, 2008 9:26 am
1117
hello java -jar tagsoup-1.2.jar http://ppe.sk/news.htm you will see many nested <strong> tags which are not on the original page. is it possible to fix that? ...
Martin Zdila
m.zdila@...
May 23, 2008 11:15 am
1119
... Thanks. Quite right. I've added this to the next release. -- In my last lifetime, John Cowan I believed in reincarnation;...
Hello After parsing (X)HTML document I am allways getting null from Document.getDoctype(). Is that actually implemented? If not, could you please do that? It...
Martin Zdila
m.zdila@...
May 28, 2008 12:20 pm
1121
sorry, but my DOMBuilder didn't handle that. bad martin, bad martin :-) ... -- Martin Zdila CTO M-Way Solutions Slovakia s.r.o. Letna 27, 040 01 Kosice ...
Martin Zdila
m.zdila@...
May 28, 2008 1:12 pm
1125
... That's a known problem that has to do with tags opened in each of various cells of a table and never closed again. I will fix it in the next release. -- ...
Hello Group, I've been using TagSoup with some data for which I do not know the encoding ahead of time and playing around with auto detection of character...
Nitay Joffe
nitay@...
Jun 3, 2008 11:52 pm
1129
Hello I hope that intent of tagsoup is to parse ugly HTML to DOM (XML) so that result displayed of both in the modern webbrowser looks the same. It means that...
Martin Zdila
m.zdila@...
Jun 9, 2008 7:54 am
1131
Hello I found one page with following structure: <html><head>...</head><noscript><body>...</body></noscript><frameset>...</frameset></html> body was thrown out...
Martin Zdila
m.zdila@...
Jun 9, 2008 8:52 am
1132
... Yes and no. TagSoup does attempt to produce output similar to that of Web browsers, but only within the limits of its design model. It does not contain...
... Thanks. I'll add this to the next release. ... When I get time and energy to work on it enough to release it. ... Not at present. ... It's just me, except...
hello ... What I need is simple thing ;-) - let the SAX generates events: open table, open tr, open td, text "cell1", close td, open span, text "err1", close...
Martin Zdila
m.zdila@...
Jun 9, 2008 3:12 pm
1135
... You want to modify html.tssl, not html.stml (which is about the lexer). The simplest change *for this specific problem* is probably to add <contains...
... Like John said, TagSoup operates at a lower-level, "below" a dom. So what you can do is to use a tree model such as XOM, and do additional fixing _you_...
Hello Tatu thanks 4 the reaction ... I am actually using xerces to build DOM from TagSoup and xalan for XPath processing, transformation and serialization....
Martin Zdila
m.zdila@...
Jun 9, 2008 8:14 pm
1138
Hello John ... Thanks 4 the hint, I'll try it. ... OK, thanks. Please see more in mail for Tatu. BTW there is one evil in the pages that even TagSoup can't...
Martin Zdila
m.zdila@...
Jun 9, 2008 8:19 pm
1139
... True. ... Unfortunately you can't, in the general case, have one without the other. In order to know when to close tags, you need to know what elements are...
... Yes, in cases when some tag is not closed properly. But in my table example are all tags closed properly = it is valid XML :-). Best regards -- Martin...
Martin Zdila
m.zdila@...
Jun 10, 2008 6:05 am
1143
... In that case you are probably better off with an XML parser. Well-formedness is a global property of a document that TagSoup can't know about in advance. ...
... unfortunately i have mix of such constructs (which i would like to keep) and invalid tag nesting (which i'd like to fix). so i can't use xml parser :-(. ...
Martin Zdila
m.zdila@...
Jun 10, 2008 1:49 pm
1145
... In that case, I think that the private patch I mentioned, and as many other patches as you need, are your only option at present. -- There is / One art...
Hello Currently tagsoup moves block elements out of SPAN, B, I, SUP, ... and other inline elements. This causes rendering problems in following scenarios: ...
Martin Zdila
m.zdila@...
Jun 24, 2008 7:39 am
1150
If I am not mistaken, block-level elements, by definition, cannot be included from within inline elements. Therefore, it seems that tagsoup is correctly...
Hello Roger ... This is true. But for my usecase I need TagSoup to create such DOM which after serialization to XHTML would be rendered in major browsers in...
Martin Zdila
m.zdila@...
Jun 24, 2008 8:32 am
1152
hello i found one page containing two "type" attributes with different values: <input tabindex="2" type="Password" value="" name="passw" maxlength='25' ...
Martin Zdila
m.zdila@...
Jun 24, 2008 12:09 pm
1153
Dear Friends of TagSoup! If you want to use TagSoup in XMLSpy as an external Tool to convert your .html files into XHTML this blog entry describes how you can...