| Ticket UUID: | 953854 | |||
| Title: | Errors when parsing HTML to a tree | |||
| Type: | Patch | Version: | None | |
| Submitter: | nobody | Created on: | 2004-05-14 10:11:51 | |
| Subsystem: | htmlparse | Assigned To: | andreas_kupries | |
| Priority: | 1 Zero | Severity: | ||
| Status: | Closed | Last Modified: | 2006-01-18 13:15:37 | |
| Resolution: | Accepted | Closed By: | andreas_kupries | |
| Closed on: | 2006-01-18 06:15:37 | |||
| Description: |
Hello,
In func
proc ::htmlparse::mapEscapes
line: return [subst $new]
should change to:
return [subst -nobackslashes -novariables $new]
If not, if new has a backslash \, the subs breaks the
string (specially noted in paths on Windows)
-------
In func
::htmlparse::Reorder
Lines:
if {
$sibling == {} ||
(![string compare $tp [$tree get $sibling type]])
} {
break
}
Should change to:
if { $sibling == "" } { break }
if { [lsearch "h1 h2 h3 h4 h5 h6 p li" [$tree get
$sibling type]] != -1 } {
break
}
Second option is less agressive when reordering tags.
Regards,
Ramon Ribó
ramsan@cimne.upc.es
| |||
| User Comments: |
andreas_kupries added on 2006-01-18 13:15:36:
Logged In: YES user_id=75003 Mostly accepted. The changes to mapEscapes are outdated, this was fixed in a different way, by an additional quoting step protecting Tcl's special characters. Reordering advice taken. Examples are in the testsuite, actually. The relevant testcases have been updated. andreas_kupries added on 2004-09-30 04:46:10: Logged In: YES user_id=75003 Do you have small examples which demonstrate the bad behaviour ? They would also become test cases. nobody added on 2004-05-14 17:11:52: File Added - 87140: htmlparse.tcl | |||
Attachments:
- htmlparse.tcl [download] added by nobody on 2004-05-14 17:11:52. [details]
