The built-in id() procedure in Python ?

distantvoices · Post by **distantvoices** » Thu Dec 16, 2004 3:21 am

it is a shot in the dark, but I think (not knowing anything about Python) that assigning an id to a value (a constant) means that the value is stored in memory - somewhere in a constant/string/value pool - and the location - maybe relative to the start of the heap/pool location - in memory is returned as the id.

At each following assignment, any other variable doesn't get assigned the value itself but a pointer to the memory location where the actual value is stored. this is especially genial, if you have loooots of strings and often assign them to local variables: reserve space for the string once and only pass a reference to it around in the program.

I hope I have made myself clear enough, but I admit that someone with more knowledge about python can give you a better answer.

stay safe.

Candy · Post by **Candy** » Thu Dec 16, 2004 3:24 am

My first guess is that the example is in Linux and that the id function returns the address of a variable. The idea behind these things is that you learn that each "object" has its own address (or id), which allows you to distinguish it from all others, and which should get you started on the line with languages like C and Assembly, which assume you know what this is.

[edit]134882108 hinted me at linux, since linux addresses start from about 0x8004000, and 0x8004000 in decimal equates about 134217728 +384 = 134234112, and your number was not significantly above this, so I guessed linux. The interpreter would be around half a megabyte?[/edit]

distantvoices · Post by **distantvoices** » Thu Dec 16, 2004 3:31 am

That's of course possible,but me remembers vaguely that python is an interpreted language, so the values of constants/values/strings occuring in a script might be put somewhere in heap/string poolmemory of the python interpreter process which is working his way throu the script.

df · Post by df » Thu Dec 16, 2004 3:55 am

python and ruby have the same deal. everything in the language is more or less an object, so everything has an 'id'..

its probably used internally for a lot of things and its not really something you would need to worry about.

Perica · Post by **Perica** » Fri Dec 17, 2004 8:34 pm

Schol-R-LEA · Post by **Schol-R-LEA** » Sat Dec 18, 2004 1:31 am

From the Python Library Reference:

id(object)
Return the ``identity'' of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value. (Implementation note: this is the address of the object.)

As for why a id(var) would be the same as id(7) when var = 7, that's simple: in both cases, the object in question in the literal 7. Variables in Python (and in most OOP languages, for that matter) are not objects; they are references to the objects they name. This is one of the reasons why a variable can hold objects of different classes - the variables themselves are (in most implementations) just typeless pointers, and properties such as class belong to the object themselves.

df · Post by df » Sat Dec 18, 2004 3:33 am

in ruby literal numbers (eg: 7) is an object complete with a set of attributes and methods... i thought it was the same in python but I guess not.

Andrew_Baker · Post by **Andrew_Baker** » Thu Jan 06, 2005 1:15 am

Y'know, I'm actually an intermediate Python programmer, and I have *never* used an id() function in any of my code. My guess is that it's pretty low level. Y'know, useful, but not really necessary.

Schol-R-LEA · Post by **Schol-R-LEA** » Thu Jan 06, 2005 2:24 pm

df wrote: in ruby literal numbers (eg: 7) is an object complete with a set of attributes and methods... i thought it was the same in python but I guess not.

AFAIK, they are. When I said that 'variables' aren't objects, I meant the actual identifiers; a literal constant, however, refers to the object directly, if I'm not mistaken. That is why [tt]id(var)[/tt] and [tt]id(7)[/tt] have the same value: because [tt]var[/tt] is just a reference to [tt]7[/tt].

mystran · Post by **mystran** » Sun Jan 09, 2005 1:22 pm

Can I cause some confusion? Great..

So we really have objects, types, values, variables. These are combined by different languages in different ways. Say, in Haskell you really never see objects, because all you have is equations describing the result of a future computation. On the other hand, in "statically typed languages" like C++, Java or Haskell, your variables have types.

Variables are really just names for something. In some languages like C++ they actually can hold the whole object. In other languages like Python they are simply references that hold the address of an object. In yet other languages like Haskell, they are simply names for typed values (in some cases, like the IO type in Haskell, there might not be any real object at all, yet you can still manipulate values of type IO).

But wait.. what is the difference between a value and an object? Well, the value of "1+2" is equal to the value of "3". On the other hand, "1+2" is definitely a different object than "3", although the former might evaluate to the latter, or at least appear to (as in referentially transparent languages like Haskell you can't really tell the difference).

How does this all relate to Python and id()? Well, your variables in Python are named slots that hold addresses of objects. You could say that variables are names for addresses of objects, althought technically they are objects themselves, because they can be reassigned and as such there has to be a slot somewhere, at least in the general case. Your objects have types, but your variables don't. id() returns the address of the object it is given. If you call id() with a variable, what id get's is not the variable, but it's contents, that is, the address of the variable. Believe or not, all objects in Python are passed around as addresses.

Finally, what the hell is an object then? That depends on the language we are talking about, but generally one of two things is ment: either object is anything that actually exists (like I'm using it above), or it is a special language construct like an Object in Java. As far as computer science is concerned, objects of primitive types like "int" in Java are object just as much as those derived from the class "Object".

Fortunately, most dynamic languages (like Python) make it simple, because all objects are similar, instead of having primitives and "user constructed" objects behave in different ways. In fact, I care to say that I consider having several different kinds of object bad language design.

Confuced already? No? Great, you have potential for computer science.

Perica · Post by **Perica** » Sun Jan 23, 2005 9:17 pm

mystran · Post by **mystran** » Mon Jan 24, 2005 5:47 pm

If you call id() with a variable, what id get's is not the variable, but it's contents, that is, the address of the variable

Oops, made an error there: what I was trying to say is that you get "it's contents, that is, the address of the object held in the variable". Oh well.

What python does when it sees "7" in a file, is creating an integer object, and setting it's value to 7. It might do this as soon as it reads the file, or it might do it in the code generated for the expression. I don't know, as I haven't looked as Python implementation. In case of small integers, it could even encode the number 7 into the address directly so the address is not really an address (it's a common optimization) but IIRC Python doesn't actually do this.

Remember that there aren't any "objects" in files. There are only descriptions of objects in files, and more often than not this description is NOT the same as object's description in memory when the program is running.

Anyway, I wrote the post as kind of a joke.

creichen · Post by **creichen** » Wed Feb 23, 2005 5:11 pm

Hi,

The reason for why even numbers and other things that you could normally directly store in a machine register need to be able to have an "id" or "address" is actually a little involved; it involves the difference between two kinds of "semantics": So-called "value semantics", and so-called "reference semantics".
These descibe different kinds of ways to attribute meaning to a piece of code.

Let's have a look at a brief example to illustrate the point: Let "x" and "y" be variables of type "array of integer". Now, observe the following piece of code:

y[ 0 ] := 0;
x := y;
x[ 0 ] := 1;
print y[ 0 ];

Here, we first zero out the first entry in the array associated to 'y'. Then, we assign this array to 'x', change the first entry in the array associated with 'x' to '1', and print out the first entry in the array associated with 'y'.

Note the use of "associated with" in the above paragraph: I deliberately left out how precisely this "association" works. For a sane programming language, there are two possible outcomes: "1", and "0".

Depending on where you learned coding, either may need an explanation.

First, let's see why "0" seems like a reasonable answer: We obvously set the first element of 'y' to zero. Then we copy 'y' to 'x', do some stuff with x, and print out the first element of 'y' again. Since we never touched 'y', it should print out "0".

Now, let's see why "1" seems like a reasonable answer: While we do set the first element of y to something, we then assign y to x, meaning that "x" points to the same memory address pointed to by y. Thus, if we then set the first element of "x" to "1", it's obvious that we must print "1" in the next line.

Well, both of those seem fairly plausible, don't they? Indeed, both represent the aforementioned different schools of thoughts about how programs should behave. The first one represents /value semantics/: If you say "set a to be the same as b", then we're copying the /value/ of "b" to "a". The second is /reference semantics/: When saying "set a to be the same as b", we copy the /reference/ to what b points to over to a.

Note that this only makes a difference with mutable datatypes; in purely functional languages like Haskell, the distinction does not arise, since functional purity requires value semantics (by referential transparency). But don't worry about that if you're not into functional programming.

Now, most current languages have both kinds of semantics: Pointers and references in C and C++ use reference semantics, whereas plain values use value semantics (unless typedef'd to be pointers or containing pointers internally, of course); in Java, everyting but a handful of builtin types (int, char, byte etc.) uses reference semantics, and in Eiffel, types can be used in either way depending on the user's whim.

In my personal opinion, this is a very, very confusing situation that is neither neccessary not generally explained well in classes, but that's besides the point, particularly since I'm sure that all of you here have mastered this distinction long ago, either explicitly or on an intuitive level.

Now that we know what "value semantics" and "reference semantics" are, let me sync my statements above to previous answers before returning to the original question: This "id" thing (a horrible name, as any functional programmer will attest here) in Python now is nothing other than a function which returns the reference of a value in a world of reference semantics.

(You probably knew this already, but I felt I should make it explicit anyway.)

Right. So what's the point in having things with /value semantics/ have an address now?

[to be continued...]

-- Christoph

creichen · Post by **creichen** » Wed Feb 23, 2005 5:12 pm

Hi again,

(apologies for splitting this up; for some reason, the message size here seems to be VERY limited)

The answer to that comes from the world of code re-use, actually.
Assume that you have some sort of container class (always good for a nice example), such as a vector. Recall that a vector is a parametric type, i.e. it takes a "type parameter" before it becomes a "manifest type"; e.g. it might take type "Object" to become "vector of Object", or it might take type "int" to become "vector of int".
Now, most programming languages try hard to avoid code duplication due to stupid language design decisions, so they try to ensure that if you specify something once, it's compiled into exactly one thing (modulo optimisations such as inlining). Thus, if you ask it to compile the "vector" class, you want it to compile into exactly one piece of code. Now, this piece of compiled code needs to be somewhat uniform, of course; after all, you want to be able to store "Object"s and "Window"s and "FileDescriptor"s and all other kinds of funny little things in them. This isn't too hard, of course: The compiler just uses a general-purpose pointer in the generated code, which can point to any memory address and thus to any of these guys.
Well, unless they don't have reference semantics, that is.
For things with value semantics, we first need to put them into a chunk of memory, which then has reference semantics; this chunk of memory-- a wrapper class, or "box"-- can then be handled by the general-purpose vector code like a charm. And once we get our value out of there again, we simply remove the box (Incidentially, this process is called "unboxing", just as the wrapping-into-a-memory-chunk is called "boxing") and proceed as planned.

Thus, many languages come with implicit means for turning "value objects" into "reference objects" and the other way 'round (where appropriate). As a procedure like the Python "id" function will always require a "reference object", the run-time system simply boxes the value before passing it in, automatically, just for the sake of satisfying the conditions behind the "id" function's parameter, and thus you get your address.

Note that the process of boxing and unboxing is fairly easy to see in Java, which, until recently, did not do automatic boxing/unboxing but, instead, gave programmers classes like "java.lang.Integer" with which they could do this manually.

Also note that statically typed languages can sometimes get away with boxing and unboxing; for example, there was some work on this in the mid-90s for Standard ML (and I wouldn't be surprised if the current in-development MLton system also used this, and I'd be VERY surprised if GHC wouldn't do this for Haskell at least when inlining). While it's impossible to avoid the boxing overhead for dynamically typed languages (because you have too little compile-time information to do anything better), this actually works quite nicely in statically typed languages like ML; the biggest problem usually is to have the garbage collector correctly figure out whether an "inlined value" is a real value or a reference. The most popular approach here appears to be tagging, which reserves a number of bits in each word to identify its type (used e.g. in SML/NJ and O'Caml, this explains why integers there use fewer than 32 bits), though I believe that there are smarter approaches to this that I don't know.

HTH,
-- Christoph

OSDev.org

The built-in id() procedure in Python ?

The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?

Re:The built-in id() procedure in Python ?