[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: SrPersist, sql-c-wchar, and unicode/wide-characters

To: "'Masoud Pirnazar'" <amp@apptek.com>, <plt-scheme@fast.cs.utah.edu>
Subject: RE: SrPersist, sql-c-wchar, and unicode/wide-characters
From: "Paul Steckler" <steck@ccs.neu.edu>
Date: Thu, 20 Sep 2001 18:17:35 -0400
Importance: Normal
In-Reply-To: <001a01c1421c$9253af80$850a0a0a@amp2k>
Reply-To: <steck@ccs.neu.edu>
Sender: owner-plt-scheme@fast.cs.utah.edu

> when i use sql-c-char, i get the english text ok (roman/ascii
> alphabet), but no korean, arabic, etc.  (they come back as question
> marks)

That makes sense, because sql-c-char represents 8-bit C characters.

> when i tried sql-c-wchar, the mzscheme interpreter said "illegal
> instruction" and exited.

I've never tried SrPersist with a Unicode database, so the wide-character code is untested.

Which primitive caused this problem?  Was it make-buffer, read-buffer, or write-buffer?

Even if this code worked perfectly, you might still have problems.  The MzScheme language does not support Unicode.  In SrPersist, if you read from a buffer that contains Unicode characters into a Scheme string, only the least significant 8-bit of each character are stuffed into the resulting Scheme string.  That strategy works (I think) if the Unicode represents ordinary Latin-1 text.  With Korean, Arabic, etc., it probably fails miserably.

You might consider modifying the wide-character code in srpbuffer.cxx to use a different strategy, say, placing the two bytes in the Unicode character in distinct characters in a Scheme string.  Of course, it will look like garbage in the Scheme REPL.

-- Paul

References:
- SrPersist, sql-c-wchar, and unicode/wide-characters
  - From: "Masoud Pirnazar" <amp@apptek.com>

Prev by Date: SrPersist, sql-c-wchar, and unicode/wide-characters
Next by Date: Re: Optional arguments, assertions, type annotations
Prev by thread: SrPersist, sql-c-wchar, and unicode/wide-characters
Next by thread: RE: SrPersist, sql-c-wchar, and unicode/wide-characters
Index(es):
- Date
- Thread