Quantcast
Channel: Delphi – The Wiert Corner – irregular stream of stuff
Viewing all articles
Browse latest Browse all 1445

Looking for more examples of Unicode/Ansi oddities in Delphi 2009+

$
0
0

At the end of April 2014, Roman Yankovsky started a nice discussion on Google+ trying to get upvotes for QualityCentral Report #:  124402: Compiler bug when comparing chars.

His report basically comes down to that when using Ansi character literals like #255, the compiler treats them as single-byte encoded characters in the current code page of your Windows context, translates them to Unicode, then processes them.

The QC report has been dismissed as “Test Case Error” (within 15 minutes of stating “need more info”) by one of the compiler engineers, directing to the UsingCharacterLiterals section of Delphi in a Unicode World Part III: Unicodifying Your Code where – heaven forbid – they suggest to replace #128 with the Euro-Sign literal.

I disagree, as the issue happens without any hint or warning whatsoever, and causes code that compiles fine in Delphi <= 2007 to fail in subtle ways on Delphi >= 2009.

The compiler should issue a hint or warning when you potentially can screw up. It doesn’t. Not here.

Quite a few knowledgeable Delphi people got involved in the discussion:

Thee consensus here is that this is at least confusing (especially as there are differences between the HIGHCHARUNICODE OFF and ON modes of the compiler) even though Embarcadero keeps insisting this is “as designed”.

To proof either way, I’ve started to write some unit tests to see what succeeds and what fails, but need help, so:

Help needed: test cases

In my book, if existing Ansi based code fails in a Unicode Delphi compiler, and there was no hint or warning indicating potential failure, it is a compiler bug.

So I made some preliminary test cases, but need more.

There are two areas you can help with:

  1. Formulate simple tests (even a small console app or unit proving your point will do).
  2. Run the existing tests on various code pages (preferably outside USA and Western Europe) and/or Delphi versions so I can summarise results by codepage and Delphi version.

I’m specifically looking for code that works in Delphi <= 2007, and fails in Delphi >= 2009, where the compiler does not issue a hint or warning about failure.

But I also will consider code that generates a hint/warning and still succeeds, or code that gives a hint/warning then fails in a different way.

I know that Delphi <= 2007 had their specific share of codepage problems as well, so I’m not looking for those.

Results of the help

I have written a unit test generator, that – based on the above help – generates unit tests for all characters #0..#255 and #$00..#$FF, and maybe even #$0000..#$FFFF.

There is already my CodeGeneratorUnit.pas and demo CodeGenerator project that I demoed quite a while ago at the CodeRage session A Pragmatic & Powerful Code Generator with Generics and Anonymous Methods.

First result:

  • Currently there is one failure on CodePage 437: Character #128 fails:
    Expected #128 to equal TTestChar($0080), but equals TTestChar($20AC)., expected: <€> but was: <€>

    This is odd as #128 in CodePage 437 is an uppercase C cedilla (Ç). The Euro Sign is #128 in most Windows 125x encodings.
    This probably means that the way used by CHCP to obtain the CodePage is not the one that Delphi use. I will look into that soon; most likely, GetACP will work.

–jeroen

Related:

via [WayBackRoman Yankovsky – Google+ – I think I found a compiler issue, could you please vote for….:

For “if AChar <= #255 then” compiler (XE6) gently generates the following code:

005D734A 66817DFA4F04     cmp word ptr [ebp-$06],$044f

$044f does not equal 255!

But It works fine when Ord function is used:

if Ord(AChar) <= 255  then

005D7352 668B45FA         mov ax,[ebp-$06]
005D7356 663DFF00         cmp ax,$00ff

PS: Somehow this post missed schedule in 2014 (WordPress.com has a habit of that every now and then), but [WayBack] varc: Char;begincase c ofChar(#$C0)..Char(#$D6) : begin end;end;Why I get error:[dcc32 Error] E2011 Low bound exceeds high bound on Delphi Tokyo? – Jacek Laskowski – Google+ made me find it back.


Filed under: Ansi, ASCII, CP437/OEM 437/PC-8, Delphi, Delphi 2006, Delphi 2007, Delphi 2009, Delphi 2010, Delphi 7, Delphi XE, Delphi XE2, Delphi XE3, Delphi XE4, Delphi XE5, Delphi XE6, Development, Encoding, ISO-8859, QC, Software Development, Unicode, UTF-8, Windows-1252

Viewing all articles
Browse latest Browse all 1445

Trending Articles