JVM Anatomy Quarks: Compressed References

But does that mean the size of Java reference is the same as the machine pointer width? Not necessarily. Java objects are usually quite reference-heavy, and there is pressure for runtimes to employ the optimizations that make the references smaller. The most ubiquitous trick is tocompress the references: make their representation smaller than the machine pointer width. In fact, the example above was executed with that optimization explicitly disabled.

Since Java runtime environment is in full control of internal representation, this can be done without changing any user programs. It is possible to do in other environments, but you would need to handle the leakage through ABIs, etc, see for example x86_32 ABI.

In Hotspot, due to a historical accident, the internal names had leaked to the VM options list that control this optimization. In Hotspot, the references to Java objects are called“ordinary object pointers”, or“oops”, which is why Hotspot VM options have these weird names:-XX:+UseCompressedOops,-XX:+PrintCompressedOopsMode,-Xlog:gc+heap+coops. In this post we would try to use the proper nomenclature, where possible.

“32-bit” Mode

On most heap sizes, the higher bits of 64-bit machine pointer are usually zero. On the heap that can be mapped over the first 4 GB of virtual memory, higher 32 bits are definitely zero. In that case, we can just use the lower 32-bit to store the reference in 32-bit machine pointer. In Hotspot, this is called “32-bit” mode, as can be seen with logging:

$ java -Xmx2g -Xlog:gc+heap+coops ...
[0.016s][info][gc,heap,coops] Heap address: 0x0000000080000000, size: 2048 MB, Compressed Oops mode: 32-bit

This whole shebang is obviously possible when heap size is less than 4 GB (or, 232bytes). Technically, the heap start address might be far away from zero address, and so the actual limit is lower than 4 GB. See the “Heap Address” in logging above. It says that heap starts at 0x0000000080000000 mark, closer to 2 GB.

Graphically, it can be sketched like this:

compressed refs 32 bit

Now, the reference field only takes 4 bytesandthe instance size is down to 16 bytes:

$ java -Xmx1g -jar ~/utils/jol-cli.jar internals -cp target/bench.jar org.openjdk.CompressedRefs
# Running 64-bit HotSpot VM.
# Using compressed oop with 0-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via default constructor.
org.openjdk.CompressedRefs object internals:
      0     4           (object header)    01 00 00 00
      4     4           (object header)    00 00 00 00
      8     4           (object header)    85 fd 01 f8
     12     4   MyClass CompressedRefs.o   (object)
Instance size: 16 bytes

In generated code, the access looks like this:

....[Hottest Region 2]...................................................
c2, level 4, org.openjdk.CompressedRefs::access, version 714 (35 bytes)
         [Verified Entry Point]
  0.87%    ...c0: mov    %eax,-0x14000(%rsp)  ; prolog
  6.90%    ...c7: push   %rbp
  0.35%    ...c8: sub    $0x10,%rsp
  1.74%    ...cc: mov    0xc(%rsi),%r11d      ; get field "o" to %r11
  5.86%    ...d0: mov    0xc(%r11),%eax       ; get field "o.x" to %eax
  7.43%    ...d4: add    $0x10,%rsp           ; epilog
  0.08%    ...d8: pop    %rbp
  0.54%    ...d9: mov    0x108(%r15),%r10     ; thread-local handshake
  0.98%    ...e0: test   %eax,(%r10)
  6.79%    ...e3: retq                        ; return %eax

See, the access is still in the same form, that is because the hardware itself just accepts the 32-bit pointer and extends it to 64 bits when doing the access. We have got this optimization for almost free.

“Zero-Based” Mode

But what if we cannot fit the untreated reference into 32 bits? There is a way out as well, and it exploits the fact that objects arealigned: objects always start at some multiple of alignment. So, the lowest bits of untreated reference representation are always zero. This opens up the way to use those bits for storingsignificantbits that did not fit into 32 bits. The easiest way to do that is tobit-shift-rightthe reference bits, and this gives us 2(32+shift)bytes of heap encodeable into 32 bits.

Graphically, it can be sketched like this:

compressed refs zero

With default object alignment of 8 bytes, shift is 3 (23=8), therefore we can represent the references to 235=32 GB heap. Again, the same problem with base heap address surfaces here and makes the actual limit a bit lower.

In Hotspot, this mode is called “zero based compressed oops”, see for example:

$ java -Xmx20g -Xlog:gc+heap+coops ...
[0.010s][info][gc,heap,coops] Heap address: 0x0000000300000000, size: 20480 MB, Compressed Oops mode: Zero based, Oop shift amount: 3

The access via the reference is now a bit more complicated:

....[Hottest Region 3].....................................................
c2, level 4, org.openjdk.CompressedRefs::access, version 715 (36 bytes)
         [Verified Entry Point]
  0.94%    ...40: mov    %eax,-0x14000(%rsp)    ; prolog
  7.43%    ...47: push   %rbp
  0.52%    ...48: sub    $0x10,%rsp
  1.26%    ...4c: mov    0xc(%rsi),%r11d        ; get field "o"
  6.08%    ...50: mov    0xc(%r12,%r11,8),%eax  ; get field "o.x"
  6.94%    ...55: add    $0x10,%rsp             ; epilog
  0.54%    ...59: pop    %rbp
  0.27%    ...5a: mov    0x108(%r15),%r10       ; thread-local handshake
  0.57%    ...61: test   %eax,(%r10)
  6.50%    ...64: retq

Getting the fieldo.xinvolves executingmov 0xc(%r12,%r11,8),%eax: “Taketh the ref’rence from %r11, multiplyeth the ref’rence by 8, addeth the heapeth base from %r12, and that wouldst be the objecteth that you can now readeth at offset0xc; putteth that value into%eax, please”. In other words, this instruction combines the decoding of the compressed reference with the access through it, and it is done in one sway. In zero-based mode,%r12is zero, but it is easier on code generator to emit the access involving%r12nevertheless. The fact that%r12is zero in this mode can be used by code generator in other places too.

To simplify the internal implementation, Hotspot usually carries only uncompressed references in registers, and that is why the access to fieldois just the plain access fromthis(that is in%rsi) at offset0xc.

“Non-Zero Based” Mode

But zero-based compressed references still rely on assumption that heap is mapped at lower addresses. If it is not, we can just make heap base address non-zero for decoding. This would basically do the same thing as zero-based mode, but now heap base would mean more and participate in actual encoding/decoding.

In Hotspot, this mode is called “Non-zero base” mode, and you can see it in logs like this:

$ java -Xmx20g -XX:HeapBaseMinAddress=100G -Xlog:gc+heap+coops
[0.015s][info][gc,heap,coops] Heap address: 0x0000001900400000, size: 20480 MB, Compressed Oops mode: Non-zero based: 0x0000001900000000, Oop shift amount: 3

Graphically, it can be sketched like this:

compressed refs non zero

As we suspected earlier, the access would look the same as in zero-based mode:

....[Hottest Region 1].....................................................
c2, level 4, org.openjdk.CompressedRefs::access, version 706 (36 bytes)
         [Verified Entry Point]
  0.08%    ...50: mov    %eax,-0x14000(%rsp)    ; prolog
  5.99%    ...57: push   %rbp
  0.02%    ...58: sub    $0x10,%rsp
  0.82%    ...5c: mov    0xc(%rsi),%r11d        ; get field "o"
  5.14%    ...60: mov    0xc(%r12,%r11,8),%eax  ; get field "o.x"
 28.05%    ...65: add    $0x10,%rsp             ; epilog
           ...69: pop    %rbp
  0.02%    ...6a: mov    0x108(%r15),%r10       ; thread-local handshake
  0.63%    ...71: test   %eax,(%r10)
  5.91%    ...74: retq                          ; return %eax

See, the same thing. Why wouldn’t it be. The only hidden difference here is that%r12is now carrying the non-zero heap base value.