3443 Rate this article:
No rating

Circular References - What are they and how can they be resolved?

Benjamin Foreback

In last week's edition of IDL Data Point, Dain discussed IDL's garbage collection mechanism, which is performed when the reference count of an object or pointer (collectively referred to as heap variables) reaches zero. There are cases, however, where it seems like this doesn't happen. If this is the case, the code may contain a circular reference.

What is a circular reference?

A circular reference occurs when one heap variable contains a reference to a second heap variable, and the second one contains a reference back to the first. For instance, if A is an object, and somewhere in A, there is a reference to B, and within B is a reference back to A, there is a circular reference. 

Here is a simple example:

p1 = Ptr_New(/ALLOCATE_HEAP)
p2 = Ptr_New(p1)
*p1 = p2
help, /HEAP


Heap Variables:
    # Pointer: 2
    # Object : 0

<PtrHeapVar1>  refcount=2
                POINTER   = <PtrHeapVar2>
<PtrHeapVar2>  refcount=2
                POINTER   = <PtrHeapVar1>

In this example, there are two references to each pointer. One reference is contained in the variable that I created (p1 and p2). The second reference to the first pointer is within the second pointer, and vice-versa. If I get rid of the references I am holding onto (by setting the variables to !NULL), IDL will reduce the refcount for each of these pointers by one. From my perspective, these pointers are gone. However, they still reference each other, and therefore IDL's refcount never reached zero, meaning that the pointers won't be garbage collected.

p1 = !null
p2 = !null
help, /HEAP


Heap Variables:
    # Pointer: 2
    # Object : 0

<PtrHeapVar1>  refcount=1
                POINTER   = <PtrHeapVar2>
<PtrHeapVar2>  refcount=1
                POINTER   = <PtrHeapVar1>

A common case when this occurs is with parent/child relationships. The parent keeps track of its children, and sometimes the child needs to know who its parent is.

Circular references can be triangular as well, or the loop can extend through many objects and pointers. Issues related to these more complex circular references can be difficult to debug.

Side note:

Although I no longer have a variable that references these pointers, I haven't lost them forever. As long as they are valid pointers and I know their heap identifiers, I can retrieve them using the PTR_VALID function with the /CAST keyword.

p1 = Ptr_Valid(1, /CAST)
help, p1


P1              POINTER   = <PtrHeapVar1>

Why are circular references a problem?

Circular references can be a problem for a number of reasons. The main reason is unnecessary memory usage. If the variables fell out of scope but the underlying pointers or objects aren't cleaned up, the memory is "leaked." Too much leakage, especially for large objects, slows down processing and can eventually cause IDL to hang. 

Additionally, if I call HELP, /HEAP as a form of debugging, I now have to sift through these "dead" heap variables before finding what I'm looking for.

How can circular references be resolved?

Manual Cleanup

If you're confident that you will never need a heap variable again, you can manage the memory by manually destroying it with OBJ_DESTROY or PTR_FREE. This is easier said than done, however. Destroying heap variables should be done with caution. Code that attempts to use a pointer or object that has been previously destroyed will halt with an error. Furthermore, in the pointer example above, freeing "p1" will also free the second pointer if I do not hold on to a reference to it. This is because the refcount for the second pointer reached zero and it was garbage collected. Implicit garbage collection often leads to unexpected results.

Side note: In the case of lists and hashes, implicit garbage collection is desired. If I have nested hashes, for instance from calling JSON_PARSE, and I manually destroy the root level hash, the hashes inside it will fall out of scope and be garbage collected. This saves me from needing to recursively cleanup every nested hash by hand.

Use Weak References

When I call p2 = Ptr_New(p1), my variable p2 is a strong reference to the pointer. Additionally, the pointer contains a strong reference to p1. IDL will increment the refcount for a heap variable for every strong reference there is to it. If I do not wish to directly reference the first pointer with the second, but the second one needs to be aware of the first, I can use a weak reference

A weak reference means that the heap identifier is used in place of the object or pointer reference. The heap identifier can be retrieved using the /GET_HEAP_IDENTIFIER keyword on OBJ_VALID or PTR_VALID, and, as mentioned above, the object/pointer can be retrieved from the identifier using the /CAST keyword.

Follow Strict Ownership

Sometimes following strict ownership rules can help prevent confusing reference circles. For example, whoever created an object can be held responsible for that object's lifecycle. A parent/child relationship is a good use-case of when ownership should be observed. The parent should contain a strong reference to all children, and it is a good idea for the parent to know if and when the children should be destroyed (i.e. if a child becomes irrelevant to the program after the parent is destroyed, then the parent should manually destroy the child within its ::Cleanup method). 

The parent should own the child and not the other way around (although if you ask my two year old daughter, she might disagree with that statement!). Therefore, if the child needs information about the parent for any reason, it should use a weak reference and not a strong reference.

Disable Refcounting (use with caution!)

There are a few instances when you may want to turn off IDL's automatic garbage collection. You can do so by calling the HEAP_REFCOUNT function (this function will return the current refcount for a heap variable, which can be useful for debugging) and setting the /DISABLE keyword. 

If you do not provide an argument to this function, garbage collection will be turned off globally.

I advise you to use this with caution because if garbage collection is turned off, then you as the programmer are fully responsible for the lifecycle of every object created within your program, including ones you may not immediately realize, such as with nested lists or hashes. The garbage can will get full very quickly if it isn't regularly emptied.

4 comments on article "Circular References - What are they and how can they be resolved?"

Michael Galloy

Can you show resolving your p1 and p2 example using weak references?


Benjamin Foreback

With this example:

p1 = Ptr_New(/ALLOCATE_HEAP)

p2 = Ptr_New(p1)

p2 now contains a strong reference to p1. If p1 needs to keep track of p2, but you wish to use a weak reference instead, you would do so like this:

*p1 = Ptr_Valid(p2, /GET_HEAP_IDENTIFIER)

Now if you print the value contained within p1, you get the heap ID for p2, rather than a direct reference:

Print, *p1

IDL prints...

2

Now if we ever want to get p2 back out of p1, we use /CAST on Ptr_Valid(), like this

p2 = Ptr_Valid(*p1, /CAST)

I hope that helps.

Ben


Benjamin Foreback

Now if we set p2 to !NULL and destroy the heap variable p1, p2 is gone because the use of a weak reference resulted in there not being a circular reference:

p2 = Ptr_Valid(*p1, /CAST)

p2 = !null

Ptr_Free, p1

help, /HEAP

Heap Variables:

# Pointer: 0

# Object : 0


Michael Galloy

Thanks! I thought you were saying *p1 would still be p2. A new WEAK keyword to PTR_NEW for weak references would be cool. This would make a regular pointer, but not increment the reference count.

Please login or register to post comments.