View Issue Details

IDProjectCategoryView StatusLast Update
0003377JEDI VCL00 JVCL Componentspublic2006-01-08 08:29
ReporterChrister FahlgrenAssigned Toobones 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version3.00 
Target VersionFixed in Version3.20 
Summary0003377: JVSimpleXML doesn't preserve preceding whitespace for text nodes
DescriptionAs discussed in the newsgroups:
When reading an xml-file with JVSimpleXML, the preceding whitespace for text nodes is removed. So a node like <test>___Hello!_</test> becomes the equivalent of <test>Hello!_</test> (exchange _ for a space).

I believe the preceding whitespace should be preserved.

I have attached a file with my proposed fix.

Best regards,
Christer
TagsNo tags attached.

Activities

2005-12-22 12:03

 

jvsimplexml.zip (16,410 bytes)

hasse42g

2006-01-03 05:39

reporter   ~0008267

Maybe the XML specification on white space handling can be of help:

http://www.w3.org/TR/2004/REC-xml-20040204/#sec-white-space

obones

2006-01-05 08:08

administrator   ~0008290

But then, what if you have this:

<root>
__<test>
____Hello_
__</test>
</root>

That's the main problem I see.

elahn

2006-01-05 08:25

developer   ~0008298

I never considered a text element would be structured like that... It doesn't make sense to me to use "formatting" for Text elements. Do you use or know of anything that uses a format like this?

Since only the preceeding whitespace is being ignored - the trailing whitespace is being included - it seems neither "use case" is being correctly supported.

obones

2006-01-05 08:47

administrator   ~0008302

Well, for instance, the TJvFormStorage may store a TEdit value. Now, the user entered ' hello ' into it. The XML will contain this:

<EditText>_hello_</EditText>

TJvSimpleXML does not format the text, just like many other XML "pretty printers" do. With your changes "on", then the first space is lost. I haven't looked yet, but is it implemented as an option?

elahn

2006-01-05 09:08

developer   ~0008303

They're not my changes...personally I don't have a vested interest, as I don't use Text elements at all - I use properties and encode/decode CRLF values.

Currently <EditText>_hello_</EditText> would result in the string 'hello '. I believe the idea of the changes is to have it result in ' hello '.

With the changes "on", then the first space is preserved. Currently, the first space is lost.

No, I don't think it's implemented as an option. I've only glanced at the code, but I think it could be optimized (just a little bit) before inclusion. I'm waiting for agreement on the change (and a little spare time) before actually doing it though. :)

Christer Fahlgren

2006-01-05 14:06

reporter   ~0008304

As the filer of this report (and having an application that depend on this change) I just thought I'd reiterate my reasons to "fix" this issue:

* Consistency
Currently whitespace is handled inconsistent for Text elements, i.e. only preceding whitespace is removed. If there would be an option to whitespace handling for text elements, I believe the option should be either to remove all whitespace (preceding and trailing as well as *extra* whitespace between words) as HTML handles whitespace. The other option should be to preserve all whitespace. The current behaviour is IMHO not logical nor useful.

* Conformance with W3C recommendations
The specification says that whitespace should be passed up to an XML processor (which JVSimpleXML is).


The argument against this change that I can see would be that there might be some applications that depend on the current behaviour.

I think we can handle this in a couple of different ways:

1. Leave code as is - which in my opinion would make JVSimpleXML less useful than it can be
2. Change to preserve preceding whitespace in text nodes (the attached change)
3. Restructure the change to provide an option - preservewhitespacefortextnodes - which enabled would do option 2 and disabled would implement the HTML interpretation for whitespace (which would be a change to the current behaviour as well)

What option would you prefer? (Or can you think about another option)

I'd be willing to help create option 3 if desired.

Best regards,
Christer

elahn

2006-01-05 22:49

developer   ~0008305

I don't see any benefit of implementing a HTML version of whitespace handling - I think this behaviour should be implemented by an application, not by TJvSimpleXML.

I believe that whitespace should be preserved in Text elements.

IMO, there is only one question: Should an option be added to keep the current inconsistent (and wrong?) behaviour. My vote is no.

obones

2006-01-06 05:34

administrator   ~0008310

Well, my vote is yes. But turn it off by default. So the situation would be this:

Add sxoTrimPrecedingTextWhitespace option. If turned on, it disables the new code that does not trim.
Do not put this new option in the default value.
Add a line in changelog.txt stating clearly that should users want the old behaviour they should use the new option.

With all this, I think we can satisfy everyone.

elahn

2006-01-06 07:02

developer   ~0008312

Fair enough, that will satisfy everyone. :)

2006-01-08 04:35

 

JvSimpleXml.pas and patch.zip (17,495 bytes)

elahn

2006-01-08 04:45

developer   ~0008326

I've tweaked Christer's implementation and added that option. It compiles fine inside Delphi 7 IDE, but the JVCL installer crashes with an AV upon starting - i.e. it finishes compiling but instead of the installer appearing, the AV appears.

obones

2006-01-08 06:50

administrator   ~0008329

Well, of course it crashes as there are MANY additional Text elements created by the new code. Those unrequired text elements don't even have a name. Basically, with this change we get a new text elemen for every indentation that is matched. The value even includes the carriage return, which to me is not acceptable.
Note that this is the reason why the installer crashes as it does not expect to find text element in the Items collection for the "models" node.

obones

2006-01-08 07:03

administrator   ~0008330

Well, more precisely, it creates a text element for every white space found in between two closing elements. This should not happen. I'm investigating a solution that should come around in a few hours.

obones

2006-01-08 08:29

administrator   ~0008331

This is now in CVS.

Issue History

Date Modified Username Field Change
2005-12-22 12:03 Christer Fahlgren New Issue
2005-12-22 12:03 Christer Fahlgren File Added: jvsimplexml.zip
2006-01-03 05:39 hasse42g Note Added: 0008267
2006-01-05 08:08 obones Note Added: 0008290
2006-01-05 08:08 obones Status new => feedback
2006-01-05 08:25 elahn Note Added: 0008298
2006-01-05 08:47 obones Note Added: 0008302
2006-01-05 09:08 elahn Note Added: 0008303
2006-01-05 14:06 Christer Fahlgren Note Added: 0008304
2006-01-05 22:49 elahn Note Added: 0008305
2006-01-06 05:34 obones Note Added: 0008310
2006-01-06 07:02 elahn Note Added: 0008312
2006-01-08 04:35 elahn File Added: JvSimpleXml.pas and patch.zip
2006-01-08 04:45 elahn Note Added: 0008326
2006-01-08 06:50 obones Note Added: 0008329
2006-01-08 07:03 obones Note Added: 0008330
2006-01-08 08:29 obones Status feedback => resolved
2006-01-08 08:29 obones Resolution open => fixed
2006-01-08 08:29 obones Assigned To => obones
2006-01-08 08:29 obones Note Added: 0008331