| Ticket UUID: | 861277 | |||
| Title: | htmlparse.tcl: backslashes in content | |||
| Type: | Bug | Version: | None | |
| Submitter: | jenglish | Created on: | 2003-12-16 21:15:56 | |
| Subsystem: | htmlparse | Assigned To: | jenglish | |
| Priority: | 5 Medium | Severity: | ||
| Status: | Closed | Last Modified: | 2004-10-05 01:58:23 | |
| Resolution: | Fixed | Closed By: | andreas_kupries | |
| Closed on: | 2004-10-04 18:58:23 | |||
| Description: |
htmlparse::parse fails if backslashes appear in content: htmlparse::parse "<p>\\</p>" ==> error "Missing close-brace" | |||
| User Comments: |
andreas_kupries added on 2004-10-05 01:58:23:
Logged In: YES user_id=75003 Ok. This has been fixed and committed to head. andreas_kupries added on 2004-10-05 01:34:59: Logged In: YES user_id=75003 I will add test cases as well and when I am done both bugs will be closed. andreas_kupries added on 2004-10-05 01:34:29: Logged In: YES user_id=75003 Yes, that is what Joe proposed to me a few minutes ago as well, on the tcler's chat. I am currently implementing that. davygrvy added on 2004-10-05 01:31:36: Logged In: YES
user_id=7549
Should be using the numeric entities. &ob;, &cb;, and &bsl;
aren't documented as entities in HTML 4.01
(Desktop) 7 % htmlparse::mapEscapes &#[scan \{ %c]\;
{
(Desktop) 8 % htmlparse::mapEscapes &#[scan } %c]\;
}
(Desktop) 9 % htmlparse::mapEscapes &#[scan \\ %c]\;
\
jenglish added on 2004-10-04 23:24:36: Logged In: YES
user_id=68433
Reopening -- this is back again.
htmlparse::PrepareHtml replaces "{", "}", and "\" (left
brace, right brace, and backslash) with the nonstandard
entity references &ob;, &cb;, and &bsl;.
htmlparse::mapEscapes used to change these back into braces
and backslashes, resp. It looks like this was changed in
r1.16 (bug #1018574); now backslashes in content come out as
"&bsl;" (that's "ampersand, b, s, l, semicolon" in case the
bug tracker mangles it).
jenglish added on 2003-12-17 04:24:42: Logged In: YES user_id=68433 Patch comitted. jenglish added on 2003-12-17 04:20:51: File Added - 70792: htmlparse-backslash.patch jenglish added on 2003-12-17 04:20:30: Logged In: YES
user_id=68433
Looks like this was introduced in r1.9:
regsub -all -- \\\\ $html {\&bsl;} html
changed to
return [string map [list [...] "\\\\" "&bsl;"] $html]
(i.e., retained one too many levels of \-escaping.)
Attached patch fixes the problem, and adds a test case.
| |||
Attachments:
- htmlparse-backslash.patch [download] added by jenglish on 2003-12-17 04:20:50. [details]
