Upper case converter

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Candamir

Upper case converter

Post by Candamir »

I'm now dealing with some string utilities, and I want to make a upper/lower case converter. How can I now which characters actually *have* an upper/lower case and how do I get it?
blip

Re:Upper case converter

Post by blip »

Bit 5 of the character determines case in ASCII, it is clear if the letter is uppercase and set if it is lowercase.
Candamir

Re:Upper case converter

Post by Candamir »

And what happens to characters that don't have any upper/lower case pendant?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Upper case converter

Post by Brendan »

Hi,

For ASCII:

Code: Select all

toupper:
   cmp al,'a'
   jb .l1
   cmp al,'z'
   ja .l1
   sub al,'a'-'A'
.l1:
   ret

tolower:
   cmp al,'Z'
   ja .l1
   cmp al,'A'
   jb .l1
   add al,'a'-'A'
.l1:
   ret
Unicode is left as an exercise for the reader... :o


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Candamir

Re:Upper case converter

Post by Candamir »

ASCII conversion in C (I'm not a great fan of ASM ;)):

Code: Select all

char lower_case(char c)
{
   if (c >= 'A' && c <= 'Z')
   {
      c = c + 'a' - 'A';
   }
   return c;
}

char upper_case(char c)
{
   if (c >= 'a' && c <= 'z')
   {
      c = c - 'a' + 'A';
   }
   return c;
}
For now, I haven't thought about Unicode, but I have no Unicode table close, so I'll think about it...
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Upper case converter

Post by Solar »

Candamir wrote: How can I know which characters actually *have* an upper/lower case and how do I get it?
The "char value arithmetics" shown by blip and Brendan here is indeed unable to handle anything else but 7-bit ASCII, which makes it pretty useless for anyone not being British or American (i.e., about 95% of the world population).

If you look at the C library header <ctype.h>, you will realize there is a whole family of related functions - toupper(), tolower(), isspace(), ispunct() etc. etc. All these are "locale dependent", i.e. when the program switches locales, these functions will return different values.

The idea is to create a translation table for each locale. As long as we are talking about 8-bit characters, consider a [tt]char __toupper[256][/tt] containing uppercase translation codes:

Code: Select all

int toupper( int c )
{
    return __toupper[ c ];
}
Another char-array __tolower and a third array containing status flags for each character value (Is it a whitespace? Is it printable? etc.), and you're set. When a program changes locale, you have to reload the three translation tables with the values for the new locale, which are usually read from disk.

Two issues here:
  • When you're doing Unicode, using a "flat" translation table would be wasting lots of memory. You might want to implement a "folded" translation table, kind of like the page tables and directories used in virtual memory management.
  • Some alphabets contain characters that expand when converted to upper-/lowercase. A common example is the German '?', which expands to 'SS' in uppercase. Standard C has no way of handling this, so you better come up with some OS API function that does. ;-)
I hope this helps.
Every good solution is obvious once you've found it.
Post Reply