Page 1 of 2
Java UTF-8 Encoding
Posted: Tue Feb 15, 2005 6:17 am
by Neo
I need to get UTF-8 encoding values of integer values. For e.g. take the int 5. This gets encoded into the UTF-8 value of 53 and so when i try to verify the value after this conversion it fails (as I am looking for 5 when I get 53).
Does anyone know a good method to do this conversion (both back and forth) in Java?
BTW the values are int's.
Re:Java UTF-8 Encoding
Posted: Tue Feb 15, 2005 7:03 am
by Solar
UTF-8 is just another character table, like ASCII or EBCDIC.
There is no UTF-8 value of the
integer 5. There is only an UTF-8 value for the
digit 5, which - for all numbers from 0 through 9 - is the same value as the ASCII value for the same digit (as up to 0x7f, UTF-8 is the same as ASCII-7 IIRC).
Luckily, conversion is inbred in Java:
Code: Select all
Integer value = new Integer( "5" );
System.out.println value.intValue();
...prints 5 (integer), and
Code: Select all
Integer value = new Integer( 5 );
System.out.println value.toString();
...prints 5 (string).
That should answer your question, right?
Re:Java UTF-8 Encoding
Posted: Tue Feb 15, 2005 7:35 am
by mystran
Actually, if one needs UTF-8 in particular, then one needs to do something fancier, because Java uses UCS-2 (a subset of UTF-16) internally. In most cases you shouldn't care though, and I definitely don't understand the orginal question!
Also remember that 7-bit ASCII is also valid UTF-8.
So one needs to do "int <-> String" conversion first, and then export/import an UTF-8 representation from/to a string. The conversion can be done with java.io.DataOutputStream and java.io.DataInputStream (using writeUTF/readUTF), or directly with java.nio.charset.CharsetEncoder.
The streams are probably what should be used if UTF8 is needed, since I can't really think of any good reason to use UTF8 internally.
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 5:45 am
by Neo
What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Is there any simple way to do this comparison in Java?
<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 6:24 am
by Solar
Neo wrote:
What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Ah, server - client communication... sorry but, when you intend to communicate an integer value and get a 53 instead of a 5, your protocol is borked, which the following...
<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
...seems to prove. You send 14 and get 69? If you'd get first 49 and then 52, that would at least be consistent. But 69? I fear you have a problem much deeper than UTF-8 / integer conversions...
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 6:50 am
by Candy
Solar wrote:
Neo wrote:
What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Ah, server - client communication... sorry but, when you intend to communicate an integer value and get a 53 instead of a 5, your protocol is borked, which the following...
<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
...seems to prove. You send 14 and get 69? If you'd get first 49 and then 52, that would at least be consistent. But 69? I fear you have a problem much deeper than UTF-8 / integer conversions...
Three logical conversions:
\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'
First is dumb software converting a single number, second is smart software converting a single number, third is software converting a number to HEX.
What sort of thing are you trying to achieve?
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 7:16 am
by Solar
Candy wrote:
Three logical conversions:
\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'
Erm...
'0' + 14 is 62.
'1', '4' is 49, 52. (Or 101, depending on SW dumbness.)
And the third I don't get.
I'm
really confused, here. (And don't see
any way to get a 69 out of 14...)
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 7:35 am
by Neo
That cheers me up
What can I do?
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 7:48 am
by Solar
Neo wrote:
What can I do?
The usual suspects: printing lots of trace messages, re-read description of functions you just skimmed over, to check if they
really do what you thought they would. Try slightly different approaches and see what changes.
Or dumping your code into this forum and see if someone else can figure it out, but that's cheating, of course.
Re:Java UTF-8 Encoding
Posted: Wed Feb 16, 2005 10:48 am
by zloba
What can I do?
at recipient, you can decode the received data (53) to obtain 5 (as you are likely to need the decoded value again later).
I need to get UTF-8 encoding values of integer values.
just convert the integer to string. since decimal digits (and even hex ones) are plain ascii, UTF-8 encoding won't be any different from ascii.
Does anyone know a good method to do this conversion (both back and forth) in Java?
integer to string: create an Integer object and use its toString() method
string to integer: there should be a constructor for Integer(String). then you can use intValue() to get back the int.
(as
Solar said)
or, you can do these manually.
boring theory below:
you should review your conversions to ensure that you compare apples to apples, ie. integers to integers, or encoded integers to encoded integers, but not integers to encoded integers.
(of course, you should have your encoding well-defined to start with)
so to compare value A to value B (integers), you can either compare:
- A to B (duh)
(at recipient)
- decode(encode(A)) to B (recommended)
- encode(A) to encode(B)
.. assuming your encoding and the network transmission preserve data correctly, that is, the recipient gets encode(A) and decode(encode(A))==A.
apparently you're trying to compare encode(A) [=53] to B [=5], judging by
as I am looking for 5 when I get 53).
correct me if i'm confused.
Re:Java UTF-8 Encoding
Posted: Thu Feb 17, 2005 7:34 am
by Candy
Solar wrote:
Candy wrote:
Three logical conversions:
\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'
'0' + 14 is 62.
'1', '4' is 49, 52. (Or 101, depending on SW dumbness.)
And the third I don't get.
I'm
really confused, here. (And don't see
any way to get a 69 out of 14...)
Converting a number to hex:
0...9: 48...57
10...15: 'A'-'F', 65...70
Hence, if you convert 14 decimal to a hex display, you get the code for 0x0E. Which should be E.
btw, making 69 out of a 14? If somebody from the police was here, they'd slam you against the wall and arrest you
Re:Java UTF-8 Encoding
Posted: Thu Feb 17, 2005 8:11 am
by Solar
Candy wrote:
0...9: 48...57
10...15: 'A'-'F', 65...70
Aaaaahhhh... *slaps forehead*
OK, Neo, here's your problem: On the sending side, you are "encoding" your integers into complete BS. I don't really know how you're doing it. Note that, depending on what it really is you're sending, you could just as well send integers or strings without any "encoding". But your 14 is mutilated into something horribly different.
Just as a reminder, integer to string is:
Code: Select all
Integer i = new Integer( 14 );
String s = i.toString();
And back:
Code: Select all
Integer j = new Integer( s );
int k = j.intValue();
Re:Java UTF-8 Encoding
Posted: Thu Feb 17, 2005 8:38 am
by distantvoices
making 69 out of a 14 ... candy candy, you definitely *did* have a look of being innocent on you *rofl* How one can be lured .... *chuckle* Poor solar, slammed against the wall ...
as for the comparing applen to pears thing: Oy - sometimes it happens that one dares to compare integer to unsigned integer which are presque the same shoen - but only presque --> so lots of weird and nearby unresolveable bugs show up - Oh, and listening to the compiler can help a lot in such a case. I think it spits out a Warning: comparison between signed and unsigned.
maaaan, i'd like to go to bed and have a handful of sleep...
Re:Java UTF-8 Encoding
Posted: Fri Feb 18, 2005 5:49 am
by Neo
Solar wrote:
Candy wrote:
0...9: 48...57
10...15: 'A'-'F', 65...70
Aaaaahhhh... *slaps forehead*
OK, Neo, here's your problem: On the sending side, you are "encoding" your integers into complete BS. I don't really know how you're doing it. Note that, depending on what it really is you're sending, you could just as well send integers or strings without any "encoding". But your 14 is mutilated into something horribly different.
I'm not the one encoding them. I'm the poor guy :'( at the other end trying to decode the damn things.
Re:Java UTF-8 Encoding
Posted: Fri Feb 18, 2005 6:05 am
by Solar
Go and kick the sender in the backside, and ask for a decent documentation of the protocol.