Java UTF-8 Encoding

Programming, for all ages and all languages.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Java UTF-8 Encoding

Post by Neo »

I need to get UTF-8 encoding values of integer values. For e.g. take the int 5. This gets encoded into the UTF-8 value of 53 and so when i try to verify the value after this conversion it fails (as I am looking for 5 when I get 53).
Does anyone know a good method to do this conversion (both back and forth) in Java?
BTW the values are int's.
Only Human
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

UTF-8 is just another character table, like ASCII or EBCDIC.
There is no UTF-8 value of the integer 5. There is only an UTF-8 value for the digit 5, which - for all numbers from 0 through 9 - is the same value as the ASCII value for the same digit (as up to 0x7f, UTF-8 is the same as ASCII-7 IIRC).

Luckily, conversion is inbred in Java:

Code: Select all

Integer value = new Integer( "5" );
System.out.println value.intValue();
...prints 5 (integer), and

Code: Select all

Integer value = new Integer( 5 );
System.out.println value.toString();
...prints 5 (string).

That should answer your question, right?
Every good solution is obvious once you've found it.
mystran

Re:Java UTF-8 Encoding

Post by mystran »

Actually, if one needs UTF-8 in particular, then one needs to do something fancier, because Java uses UCS-2 (a subset of UTF-16) internally. In most cases you shouldn't care though, and I definitely don't understand the orginal question!

Also remember that 7-bit ASCII is also valid UTF-8.

So one needs to do "int <-> String" conversion first, and then export/import an UTF-8 representation from/to a string. The conversion can be done with java.io.DataOutputStream and java.io.DataInputStream (using writeUTF/readUTF), or directly with java.nio.charset.CharsetEncoder.

The streams are probably what should be used if UTF8 is needed, since I can't really think of any good reason to use UTF8 internally.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:Java UTF-8 Encoding

Post by Neo »

What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Is there any simple way to do this comparison in Java?

<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
Only Human
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

Neo wrote: What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Ah, server - client communication... sorry but, when you intend to communicate an integer value and get a 53 instead of a 5, your protocol is borked, which the following...
<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
...seems to prove. You send 14 and get 69? If you'd get first 49 and then 52, that would at least be consistent. But 69? I fear you have a problem much deeper than UTF-8 / integer conversions...
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Java UTF-8 Encoding

Post by Candy »

Solar wrote:
Neo wrote: What i meant is that on the system where the value orginates the value is an integer '5' which before transmission is converted to 53 (the value of the character 5).
I receive this '53' and need to get it as 5 to be compared with the original value.
Ah, server - client communication... sorry but, when you intend to communicate an integer value and get a 53 instead of a 5, your protocol is borked, which the following...
<edit>
when I send 14 I get the equivalent of value 'E' (69)
</edit>
...seems to prove. You send 14 and get 69? If you'd get first 49 and then 52, that would at least be consistent. But 69? I fear you have a problem much deeper than UTF-8 / integer conversions...
Three logical conversions:

\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'

First is dumb software converting a single number, second is smart software converting a single number, third is software converting a number to HEX.

What sort of thing are you trying to achieve?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

Candy wrote: Three logical conversions:

\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'
Erm...

'0' + 14 is 62.

'1', '4' is 49, 52. (Or 101, depending on SW dumbness.)

And the third I don't get.

I'm really confused, here. (And don't see any way to get a 69 out of 14...)
Every good solution is obvious once you've found it.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:Java UTF-8 Encoding

Post by Neo »

That cheers me up :)
What can I do?
Only Human
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

Neo wrote: What can I do?
The usual suspects: printing lots of trace messages, re-read description of functions you just skimmed over, to check if they really do what you thought they would. Try slightly different approaches and see what changes.

Or dumping your code into this forum and see if someone else can figure it out, but that's cheating, of course. :-D
Every good solution is obvious once you've found it.
zloba

Re:Java UTF-8 Encoding

Post by zloba »

What can I do?
at recipient, you can decode the received data (53) to obtain 5 (as you are likely to need the decoded value again later).
I need to get UTF-8 encoding values of integer values.
just convert the integer to string. since decimal digits (and even hex ones) are plain ascii, UTF-8 encoding won't be any different from ascii.
Does anyone know a good method to do this conversion (both back and forth) in Java?
integer to string: create an Integer object and use its toString() method
string to integer: there should be a constructor for Integer(String). then you can use intValue() to get back the int.
(as Solar said)
or, you can do these manually.

boring theory below:

you should review your conversions to ensure that you compare apples to apples, ie. integers to integers, or encoded integers to encoded integers, but not integers to encoded integers.
(of course, you should have your encoding well-defined to start with)

so to compare value A to value B (integers), you can either compare:
- A to B (duh)
(at recipient)
- decode(encode(A)) to B (recommended)
- encode(A) to encode(B)

.. assuming your encoding and the network transmission preserve data correctly, that is, the recipient gets encode(A) and decode(encode(A))==A.

apparently you're trying to compare encode(A) [=53] to B [=5], judging by
as I am looking for 5 when I get 53).
correct me if i'm confused.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Java UTF-8 Encoding

Post by Candy »

Solar wrote:
Candy wrote: Three logical conversions:

\x14 -> '0' + 14
\x14 -> '1' '4'
\x14 -> 'E'
'0' + 14 is 62.

'1', '4' is 49, 52. (Or 101, depending on SW dumbness.)

And the third I don't get.

I'm really confused, here. (And don't see any way to get a 69 out of 14...)
Converting a number to hex:

0...9: 48...57
10...15: 'A'-'F', 65...70

Hence, if you convert 14 decimal to a hex display, you get the code for 0x0E. Which should be E.

btw, making 69 out of a 14? If somebody from the police was here, they'd slam you against the wall and arrest you ;)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

Candy wrote: 0...9: 48...57
10...15: 'A'-'F', 65...70
Aaaaahhhh... *slaps forehead*

OK, Neo, here's your problem: On the sending side, you are "encoding" your integers into complete BS. I don't really know how you're doing it. Note that, depending on what it really is you're sending, you could just as well send integers or strings without any "encoding". But your 14 is mutilated into something horribly different.

Just as a reminder, integer to string is:

Code: Select all

Integer i = new Integer( 14 );
String s = i.toString();
And back:

Code: Select all

Integer j = new Integer( s );
int k = j.intValue();
Every good solution is obvious once you've found it.
distantvoices
Member
Member
Posts: 1600
Joined: Wed Oct 18, 2006 11:59 am
Location: Vienna/Austria
Contact:

Re:Java UTF-8 Encoding

Post by distantvoices »

making 69 out of a 14 ... candy candy, you definitely *did* have a look of being innocent on you *rofl* How one can be lured .... *chuckle* Poor solar, slammed against the wall ...

as for the comparing applen to pears thing: Oy - sometimes it happens that one dares to compare integer to unsigned integer which are presque the same shoen - but only presque --> so lots of weird and nearby unresolveable bugs show up - Oh, and listening to the compiler can help a lot in such a case. I think it spits out a Warning: comparison between signed and unsigned.

maaaan, i'd like to go to bed and have a handful of sleep...
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:Java UTF-8 Encoding

Post by Neo »

Solar wrote:
Candy wrote: 0...9: 48...57
10...15: 'A'-'F', 65...70
Aaaaahhhh... *slaps forehead*

OK, Neo, here's your problem: On the sending side, you are "encoding" your integers into complete BS. I don't really know how you're doing it. Note that, depending on what it really is you're sending, you could just as well send integers or strings without any "encoding". But your 14 is mutilated into something horribly different.
I'm not the one encoding them. I'm the poor guy :'( at the other end trying to decode the damn things.
Only Human
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Java UTF-8 Encoding

Post by Solar »

Go and kick the sender in the backside, and ask for a decent documentation of the protocol. 8)
Every good solution is obvious once you've found it.
Post Reply