View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005912||JEDI Code Library||JclUnicode||public||2012-06-14 16:21||2012-09-03 13:57|
|Priority||normal||Severity||minor||Reproducibility||have not tried|
|Product Version||Version 2.4|
|Target Version||Fixed in Version|
|Summary||0005912: TURESearch to be dumped or turned into wrapper around PCRE ?|
|Tags||No tags attached.|
|Fixed in GIT commit|
|Fixed in SVN revision|
|IDE version||Delphi/C++Builder XE2|
I've just added a UCS2/UTF16 reg exp class in revision 3850. It is based on PCRE16, an extension of the 8-bit PCRE based on the same source code.
We now have an highway to the replacement of the bad-old TURESearch.
Cool. Hopefully PCRE16 is same reliable and passes same test suit as - let's call it that way for certainty - PCRE8
But then it probably means more restructuring is in the way :-)
I think that there should be separation between pre-Unicode and post-Unicode Delphi:
1) There should be class like TJclDefaultRegEx, that be alias either to TJclAnsiRegEx or to TJclWideRegEx, depending on what "string" type is alias for.
2) There should be both defines, PCRE_8 and PCRE_16
2.1) if neither of those present, then JclPRE unit should either fail compilation or be turned into totally empty stub. Just safety check.
3) Installer should have optional flag like PCRE_AddAuxImplementation
4) For non-Unicode versions the PCRE8 should always be compiled, for post-Unicode - JCRE-16.
5) when PCRE_AddAuxImplementation is active, for non-Unicode versions PCRE16 shpould be added and for post-Unicode ones - PCRE8. Practically i think that optionwould be used next to never.
6) Probably for post-Unicode Delphi it would be better to use UnicodeString rather than WideString. If only PCRE16 is not hardly dependent upon MS COM BSTR string type. But it seems from 1st look that PCRE is based on C Strings so it should not be the case. Then some alias like JvPCRE16String is to be made, pointing to either UnicodeString or WideString;
What do you think ?
Maybe the installer just should always compile PCRE8 or PCRE16 but never both ?
Frankly, i think that embedding both PCRE8 and PCRE16 has next to zero sense.
1) That makes sense. We already have several compiler defines for that (for now only used in containers).
2) Possible, but PCRE_8 should be defined by default for backward compatibility
2.1) Empty stub
3) not sure what you're talking about
4) except backward compatibility (#2)
5) Not sure what you're talking about
6) Using UnicodeString when available sounds good.
1) then let's decide upon the name. "Default" does sound too ambiguos to me.
Maybe just "TJclRegEx" ?
2) which compatibility ?
We basically interact with the class using just default Delphi string.
If PCRE16 implementation is correct, then it should work exactly the same as PCRE8.
If PCRE16 is not correct - it just not to be used at all.
If PCRE16 is correct - then i see no point in including PCRE8 when Delphi 2009+ is used.
I basically see almost no point (maybe some niche memory-tight or speed-tight programs?) in including BOTH the PCRE8 and PCRE16 engines. I believe by default we should have only one of those, depending what is default string type.
Practically, why should my program, written in XE2, have PCRE8 included at all, if we assume correctness of PCRE16 ?
I think that installer should choose either PCRE16 or PCRE8 engine, dependign upon Delphi version.
Then, optionally, if user insists, it should add PCRE16 to non-Unicode versions, and PCRE8 to post-Unicode versions.
Looking again i see that PCRE8 may already Unicode-aware. It just is used via converting to UTF-8.
Can we have TJclBaseRegEx have overloaded methods and properties for both AnsiString and Wide/UnicodeString ?
So that interface be "cloned" like AnsiStr*** functions have AnsiString and UnicodeString versions. And compiler would automatically choose which method to call, depending upon string type.
Ok for an option to enable/disable PCRE8 (enabled by default).
About Unicode PCRE8, yes you're right, UTF-8 was there, but it costs one string conversion when the data are pushed to the PCRE8 functions and one more when they are back. PCRE16 does not require any conversion on Unicode-Delphi.
For the AnsiString, WideString, UnicodeString overloads, that's definitively in my plans, but do not have the time right now... You may do this :)
Well, you stated in the newsgroup that PCRE16 did not matured yet.
That is exactly what i was wary about.
Practically it means to me that:
1) there is no real sense to include PCRE16 into JCL yet.
I'd made it some branch but not release.
2) PCRE8 is Unicode-aware. Switch between PCRE8 and PCRE16 is (ideally) only affecting speed: do we convert UTF16->UTF8 or not.
3) So PCRE8 can be used for TURESearch re-implementation (but maybe TURESearch can be just dropped ? is it needed really?)
4) Ideal structure would be to have the same class interface and only switch engines. Thus maybe we just should really have one and the same class for public API ? And use pcre lib via internal delegate ? Should there be those JclAnsiRegEx and JclWideRegEx ?
Frankly, judging by the name alone, i thought JclAnsiRegEx is simply uncapable of Unicode. Then having two public classes had sense.
But if both PCRE8 and PCRE16 are Unicode-capable, then at least thosie names seem misleading. And maybe on public API side, there is not sense to expose two classes at all. Let there be constructor with parameters, is it worse ?
I did not checked API so i cannot say if sub-classing is better than delegating or vice versa.
Since release is very close and we're both short of time, i thin better than rush with other desicion would be just postpone it and slowly discuss.
Personally, i'd not put PCRE16 into release at all, due to reasons above.
And i'd renamed JclAnsiRegEx to some "common" name. Be it JclRegEx or JclDefaultRegEx whatever.
Later with any decision regarding those libraries made, that class name would be re-used as alias or as wrapper, and no program depending on the name would be harmed.
Having both JclAnsiRegEx and JclWideRegEx as default API seem to me bad choice.
1) names are misleading
2) implementation details might be exposed by auxillary API, but are not to be made mainstream and required. By default a program should just use one main class and not give a dime about back-end engine. It should only dive into those details if necessary, not by default.
well, due to most datatypes are different between PCRE16 and PCRe8 - then subclassing is better probably.
The rest applies.
I think for release best option would be removing PCRE16 and renamiong PCRE8 to generic name without any reference to underlying format and engine.
There is probably just now haste with PCRE16 at all.
Well, PCRE8 and PCRE16 can perfectly live side-by-side for this release. PCRE8 was not altered by the new PCRE16.
Moreover, there was a big issue in PCRE8 revision 3690 (made in January 2012): UTF-8 was not working correctly at all. This was fixed in revision 3853.
Only few users should be interested in updating their code for PCRE16, which really seems to be working correctly, these users could give valuable feedback.
One good tradeoff between the complete removal of PCRE16 and its usage in TJclWideRegEx could be the disabling of PCRE16 by default. What do you think about that?
i see nothing good in such a tradeoff
except for exotic extremes, there is no point to carry both libs at once
either there is possibility and reliability to use PCRE16 for prime in 2009+
or there is not
||Please checkout revision 3855. 16-bit PCRE is now disabled by default and there are new options in the installer to enable/disable existing 8-bit PCRE and setup TJclRegEx accordingly to the available PCRE versions. This revision will go in the future release.|
as i already reported in the newsgroup - the demo can not be compiled.
There is no "System.Actions" unit in my XE2 Enterprise
And prior Delphi versions do not have namespaces for unit names at all.
I don't know why and what purpose it serves you on your box.
Suppressed those in revision: 3858
I am not native-speaking, yet how would you like
RsHintDefPCRE8 = 'Traditional PCRE library build, capable of ANSI and UTF-8 encoding'; ?
Current text looks to me as having bad connotations towards PCRE8 (historical ~ obsolete) and frankly not clear (ANSI and UTF-* are... are wut???)
Well, i guessed right what was that System.Actions.
Yet is it needed really even in XE3?
|2012-06-14 16:21||Arioch||New Issue|
|2012-06-14 16:21||Arioch||IDE version||=> Delphi/C++Builder XE2|
|2012-06-14 16:21||Arioch||Relationship added||child of 0005909|
|2012-08-28 16:45||outchy||Note Added: 0020129|
|2012-08-31 11:00||Arioch||Note Added: 0020136|
|2012-08-31 11:02||Arioch||Relationship added||related to 0005908|
|2012-08-31 11:05||Arioch||Note Edited: 0020136|
|2012-08-31 11:24||outchy||Note Added: 0020137|
|2012-08-31 14:01||Arioch||Note Added: 0020138|
|2012-08-31 15:05||outchy||Note Added: 0020139|
|2012-08-31 16:04||Arioch||Note Added: 0020142|
|2012-08-31 16:08||Arioch||Note Added: 0020143|
|2012-08-31 21:09||outchy||Note Added: 0020147|
|2012-09-01 13:53||Arioch||Note Added: 0020148|
|2012-09-03 00:30||outchy||Note Added: 0020150|
|2012-09-03 12:02||Arioch||Note Added: 0020156|
|2012-09-03 12:21||Arioch||Note Added: 0020157|
|2012-09-03 12:26||Arioch||Note Edited: 0020156|
|2012-09-03 13:57||Arioch||Note Added: 0020158|