Upper case converter
Upper case converter
I'm now dealing with some string utilities, and I want to make a upper/lower case converter. How can I now which characters actually *have* an upper/lower case and how do I get it?
Re:Upper case converter
Bit 5 of the character determines case in ASCII, it is clear if the letter is uppercase and set if it is lowercase.
Re:Upper case converter
And what happens to characters that don't have any upper/lower case pendant?
Re:Upper case converter
Hi,
For ASCII:
Unicode is left as an exercise for the reader...
Cheers,
Brendan
For ASCII:
Code: Select all
toupper:
cmp al,'a'
jb .l1
cmp al,'z'
ja .l1
sub al,'a'-'A'
.l1:
ret
tolower:
cmp al,'Z'
ja .l1
cmp al,'A'
jb .l1
add al,'a'-'A'
.l1:
ret
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Upper case converter
ASCII conversion in C (I'm not a great fan of ASM ):
For now, I haven't thought about Unicode, but I have no Unicode table close, so I'll think about it...
Code: Select all
char lower_case(char c)
{
if (c >= 'A' && c <= 'Z')
{
c = c + 'a' - 'A';
}
return c;
}
char upper_case(char c)
{
if (c >= 'a' && c <= 'z')
{
c = c - 'a' + 'A';
}
return c;
}
Re:Upper case converter
The "char value arithmetics" shown by blip and Brendan here is indeed unable to handle anything else but 7-bit ASCII, which makes it pretty useless for anyone not being British or American (i.e., about 95% of the world population).Candamir wrote: How can I know which characters actually *have* an upper/lower case and how do I get it?
If you look at the C library header <ctype.h>, you will realize there is a whole family of related functions - toupper(), tolower(), isspace(), ispunct() etc. etc. All these are "locale dependent", i.e. when the program switches locales, these functions will return different values.
The idea is to create a translation table for each locale. As long as we are talking about 8-bit characters, consider a [tt]char __toupper[256][/tt] containing uppercase translation codes:
Code: Select all
int toupper( int c )
{
return __toupper[ c ];
}
Two issues here:
- When you're doing Unicode, using a "flat" translation table would be wasting lots of memory. You might want to implement a "folded" translation table, kind of like the page tables and directories used in virtual memory management.
- Some alphabets contain characters that expand when converted to upper-/lowercase. A common example is the German '?', which expands to 'SS' in uppercase. Standard C has no way of handling this, so you better come up with some OS API function that does.
Every good solution is obvious once you've found it.