| Ticket UUID: | 565051 | |||
| Title: | csv::split parsing bug | |||
| Type: | Bug | Version: | None | |
| Submitter: | todolson | Created on: | 2002-06-05 21:41:47 | |
| Subsystem: | None | Assigned To: | andreas_kupries | |
| Priority: | 5 Medium | Severity: | ||
| Status: | Closed | Last Modified: | 2002-06-25 06:17:38 | |
| Resolution: | Fixed | Closed By: | andreas_kupries | |
| Closed on: | 2002-06-24 23:17:38 | |||
| Description: |
csv::split handles a quoted, null field incorrectly if
the field is neither the first nor the last field in a
line/record:
% info tclversion
8.3
% info patch
8.3.3
% package require csv
0.2
% ::csv::split {1 2 "" ""} { }
1 2 {"} {}
% ::csv::split {"" ""} { }
{} {}
% ::csv::split {"" "" ""} { }
{} {"} {}
A change to the character map seems to fix the problem:
% proc ::csv::split {line {sepChar ,}} {
regsub -all -- {(^\"|\"$)} $line \0 line
set line [string map [list \
$sepChar\"\"\" $sepChar\0\" \
\"\"\"$sepChar \"\0$sepChar \
\"\"$sepChar $sepChar \
\"\" \" \
\" \0 \
] $line]
set end 0
while {[regexp -indices -start $end --
{(\0)[^\0]*(\0)} $line \
-> start end]} {
set start [lindex $start 0]
set end [lindex $end 0]
set range [string range $line $start $end]
if {[string first $sepChar $range] >= 0} {
set line [string replace $line $start $end \
[string map [list $sepChar \1] $range]]
}
incr end
}
set line [string map [list $sepChar \0 \1 $sepChar
\0 {} ] $line]
return [::split $line \0]
}
% ::csv::split {"" "" "" "" "" ""} { }
{} {} {} {} {} {}
| |||
| User Comments: |
andreas_kupries added on 2002-06-25 06:17:38:
Logged In: YES user_id=75003 Patch applied to head and committed. andreas_kupries added on 2002-06-25 06:16:20: File Added - 25748: 565051.diff Logged In: YES user_id=75003 Here is a patch for the true problem. Contains an extended testsuite. andreas_kupries added on 2002-06-25 05:38:22: Logged In: YES
user_id=75003
There is a bug in split parsing, but it is not in the handling of
inner fields. It is the outer fields, i.e. first and last which are
off, i.e. wrongly handled.
The string {"" "" ""} parsed with separator character <space>
does not contain three empty fields. It contains 3 fields each
of which contains a single ". See the definition of the CSV
format. This might be even more clear when parsing {"","",""}
using the comma as separator character. An empty field is
specified by two adjacent separator characters without any
intervening other characters. In the case of <space> this
means a space following a space, or a space at beginning
and/or end of the string.
todolson added on 2002-06-06 04:47:32: File Added - 24471: csv.tcl Logged In: YES user_id=450877 A revised version of cvs.tcl is attached. Still csv package version number 0.2. | |||
Home Documentation Tickets | Timeline Branches Tags Wiki Login