OSDev.org

Posted: **Fri Nov 15, 2013 3:01 pm**

I am writing a compiler for CIL to native code. The underlying byte-code has the notion of an object, from which you can read or write fields with special instructions. In addition, arrays are also encapsulated as objects, and there are different instructions for loading and storing array elements.

My implementation uses object references (basically pointers) to identify objects. These can either have the value null (0) or be a valid pointer into the heap pointing to the start of an object of appropriate length - this is provable as the only instructions that can assign to object references are those which load null or those which create an object. References cannot be altered (pointer arithmetic is not allowed outside of protected kernel code).

My issue is trying to efficiently detect whether the pointer is valid (i.e. not null) before a field from it is loaded. I mark the first 4 kiB of the address space 'not present' and so can detect most with a page fault. The problem is if, for example, a user declares a huge class e.g. with the last member several megabytes in. They could in theory then assign null to the reference, and use the last member to access memory beyond the first page but not within the heap, or even within another processes heap space (I use a single-address space design).

Given that most field references will actually be valid, I need a way to detect this problem with the least overhead possible. My current plan is to special-case access to fields (or array elements) that are beyond 4 kiB from the start of the object. For these, I will try a dummy read from the actual object address first and try and catch a page fault there if invalid, otherwise continue, however this introduces a unnecessary memory read. The other option is to compare the object reference with null with a CMP/TEST instruction and then jump to a null reference handler if so. This avoids the memory access but means I need two different code paths into the null reference handler (the first being from the page fault for small <4 kiB classes). Obviously I could ensure that all field references check the validity of the object first, but this really does introduce unnecessary delays.

Does anyone else have any suggestions how this could be done? Note I control the compiler so can output any opcodes required.

Regards,
John.

Posted: **Fri Nov 15, 2013 3:26 pm**

If you're using x86-64, you could use a non-canonical pointer for null instead of 0. For current iterations of x86-64 processors, which have a 48-bit virtual address space, 0x8000000000000000 would probably be the ideal; it will definitely raise an exception, and an object would have to be larger than the available virtual address space to cause problems.

However, it is possible that the virtual address space will be increased in the future - if the virtual address space was increased to 63 or 64 bits this would break. That said, artifically restricting the virtual address space by not mapping the appropriate memory should still work.

Posted: **Fri Nov 15, 2013 4:11 pm**

if the virtual address space was increased to 63 or 64 bits

The architectural limit is 56 bits IIRC, so that shouldn't be a problem.

Does anyone else have any suggestions how this could be done?

My first thought is that throwing a hardware exception makes it potentially harder to respond to a null pointer with a catchable NPE - at least it would require support from outside the compiler-generated code.

If you end up going with software isolation in the end, you can use dataflow analysis to determine if a check for the object in question is necessary or not: if all the code paths reaching a certain accessor have dereferenced their parent before, then you can omit the check. Considering you're working with CIL, you should be able to extend such analysis across function boundaries as well.

Posted: **Fri Nov 15, 2013 4:45 pm**

Combuster wrote:The architectural limit is 56 bits IIRC, so that shouldn't be a problem.

That was my recollection as well, but I couldn't find a reference for it. The physical address space is limited to 52 bits, but I can't find any similar limit for the virtual address space. The AMD64 manual says:

Long mode defines 64 bits of virtual address, but implementations of the AMD64 architecture may support fewer bits of virtual address. Although implementations might not use all 64 bits of the virtual address...

(emphasis mine), so I guess it could be increased to 64 bits in the future. Not that that really matters; you can still just not map eg. 0x8000... - 0xc000... and artifically restrict userspace object sizes to 2^62 bytes in the compiler.

Posted: **Fri Nov 15, 2013 4:49 pm**

If I understood correctly, there are two issues, null check, and out of bound (for both null and non-null objects).

null check can be handled with #PF, while OOB you may need to handle them in language level.
then for null object/array you should have the bound to zero.

Posted: **Fri Nov 15, 2013 5:11 pm**

I believe you will find this blog post useful and interesting:

http://blogs.msdn.com/b/oldnewthing/arc ... 40495.aspx

Posted: **Sat Nov 16, 2013 4:00 pm**

madanra wrote:If you're using x86-64, you could use a non-canonical pointer for null instead of 0. For current iterations of x86-64 processors, which have a 48-bit virtual address space, 0x8000000000000000 would probably be the ideal; it will definitely raise an exception, and an object would have to be larger than the available virtual address space to cause problems.

Thanks, thats a good idea to prevent me having to make the first x MiB be marked not present in order to catch problems with large objects. However, it still has the problem if someone creates a really large object that could reach out of the non-canonical area. There is no upper limit of object size in the CLR specification, however I believe Microsoft enforces an implementation-defined limit of 2 GiB in its version, but this is more due to heap issues. Also, it has the drawback of not necessarily being portable (I also support IA-32 and 32-bit ARM and potentially 64-bit ARM).

Combuster wrote:My first thought is that throwing a hardware exception makes it potentially harder to respond to a null pointer with a catchable NPE - at least it would require support from outside the compiler-generated code.

Yes, but I also require OS support to catch other exceptions too, including divide by zero, overflow and floating-point exception, so its not too much of an issue. It also seems to be the most efficient way, as its as close as possible to being a free operation in the default cause (i.e. no exception).

Combuster wrote:If you end up going with software isolation in the end, you can use dataflow analysis to determine if a check for the object in question is necessary or not: if all the code paths reaching a certain accessor have dereferenced their parent before, then you can omit the check.

Thanks, that is certainly something I can do.

Combuster wrote:Considering you're working with CIL, you should be able to extend such analysis across function boundaries as well.

However I am probably less likely to support this, as it rapidly becomes problematic with multiple paths into the same function, and also some functions being externally accessible (e.g. library functions) where the calling procedure is simply not known at compile time. I agree its doable, I just don't see the small gains being worth the not inconsiderable effort to implement and test it.

bluemoon wrote:If I understood correctly, there are two issues, null check, and out of bound (for both null and non-null objects).

null check can be handled with #PF, while OOB you may need to handle them in language level.
then for null object/array you should have the bound to zero.

The OOB check only applies to arrays, and is implemented as either a runtime or compile time check, depending on whether the state of the object reference is known at compile time. You cannot have OOB issues with objects - the field offset is guaranteed at compile time to be within the known size of the object. The issue is purely if the object reference is null, when the problem becomes how to prove that dereferencing [0 + field_offset] will always fail.

sortie wrote:I believe you will find this blog post useful and interesting:

http://blogs.msdn.com/b/oldnewthing/arc ... 40495.aspx

Thanks, that's exactly what I was looking for. It seems that Microsoft use a similar system to what I was proposing. There is also an interesting exchange in the comments as to whether cmp [object_addr], any_value; or cmp object_addr, 0; je null_ref_exception; is the most efficient (in terms of code speed and size and cache/tlb misses). I will probably go with the cmp [object_addr], any_value; method as its cleaner, always ensures that null reference exceptions go via the #PF route and avoids the possible need for trampoline code within a 32-bit jump of the JE instruction (in a 64-bit address space I cannot guarantee this won't be necessary).

My plan, therefore, is to have a distinction between 'small' and 'large' objects with the threshold defaulting to 4 kiB for x64 but being adjustable by the user (with compiler switches) to support usage of the compiler for other systems. Small objects do not need an explicit check - the attempt to load/store from [0 + field_offset] will always fail. Large objects will, at some point in the code path, need a cmp [object_addr], any_value instruction to check that the reference is valid first, with this check being only performed once per logical variable in each function (I can do this at the SSA stage in the compiler).

Many thanks again to everyone for the help.

Regards,
John.

OSDev.org

null reference detection

null reference detection

Re: null reference detection

Re: null reference detection

Re: null reference detection

Re: null reference detection

Re: null reference detection

Re: null reference detection