Thursday, 6 August 2015

Garbage Collection Basics

Automatic memory management process, which attempts to reclaim garbage or memory occupied by objects that are no longer referenced by the program. It provides solution to major problems of dangling references, external fragmentation and also resolves memory leaks for you. Garbage collection can consume a significant amount of processing time of your program and thus have significant influence on performance. It can actually soak up a frame while rendering views or animations are running and may leads to a jank which most of the users might have experienced in Android devices.

The basic garbage collection strategy consists of determining which objects are root, live or garbage during program execution. Once the garbage is filtered out, it will become available for garbage collection during next phase of GC execution.

Garbage Collector World:
  1. Object: Unit of storage in heap memory segment.
  2. Object or reference graphs: Additional directed graph data structure required by GC to map objects and it's references which will be kept as nodes and edges shows the reference. 
  3. Roots: The objects that a program can access directly are those objects which are referenced by local variables on the processor stack as well as by any static variables that refer to objects. JNI calls are also be kept as the root.
  4. Garbage: There is no edge referencing from these nodes/objects.
  5. Frame: A frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions. Once a method is invoked a frame is created and kept at the top of JVM stack of the created thread.
  6. Heap: Unlike primitive variables and references in the local variable array(in each frame) objects are always stored on heap segment so cannot be disposed off when a method ends, instead they will only be removed by garbage collector.


Basic GC Algorithms:
  1. Reference Counting,
  2. Tracing Algorithms - Naive mark and sweep
  3. Tracing Algorithms - Tri-color marking
Reference Counting: 
It's often called as former garbage collection strategy and it's not a very common algorithm. In this approach every object has a count of the number of objects referencing to it, the one with 0 references is identified as garbage. An object's reference count is incremented whenever a reference to is is created, and decremented when a reference is disposed or cleaned up.

Disadvantages and Alleviating techniques:
  1. Cycles : If two or more objects refer to each other it cannot be garbage collected as the reference count will never become zero. To solve this problem first cycle needs to be detected and then we can make sure the cycle end points are kept as weak references and will be easily available of garbage collection.
  2. Space Overhead(Reference count): We need extra space for storing reference count with each object, which can be kept adjacent to object's memory or in side table. Unsigned pointer is commonly used for this task, meaning that 32 or 64 bits of reference count storage is allocated for each object.
  3. Speed Overhead: Whenever an object is garbage collected more than one reference count is updated.
  4. Atomicity: In multi-threaded environment when objects are shared across threads, increment and decrement operation needs to be atomic. 

Tracing Garbage Collection:
Basic strategy for it consists of two phases :
1. Tracing which objects are reachable by a chain of references from certain root objects, considering rest of them as garbage.
2. After identifying garbage it sweeps them from the heap space.

These cycles are started for claiming memory and happens very often when the system runs low on memory. It waits for garbage to be accumulate and when system runs on low memory it starts the garbage collection and halts the execution of a program. For incremental garbage collector the collection happens in discrete phases and in between program can resume it's execution. It can be categorized under below two categories.

1. Naive Mark and Sweep Algorithm:
In this approach each object in memory has a flag(typically a reserved single bit) which helps to distinguish the object is garbage or live. The flag is always cleared except the collection cycle, during the execution it does a traversal of entire tree from root objects(accessible directly from variables on vm stack) . Finally, all memory is scanned to identify free and used blocks, those whose reserved bit are still clear are not reachable by any program object, and hence their memory are freed.

It has several disadvantages as the whole memory segment is examined twice and periodically halting the program execution.

2. Tri-color Marking:
Basic differentiation in tri-color marking is, it keeps three sets of objects namely white black and gray. 
Initially all objects are marked as white set of objects.
White Set: Objects that are candidates for garbage collection.
Gray Set: Contains all objects reachable from root and yet to be searched for references to white set. Since, they are directly reachable from roots they eventually end up being black set. These objects is set to be initialized if directly referenced from the root level.
Black Set: Directly/Indirectly reachable from root and those who don't hold any references to white set of objects.
It has major advantage - it performs iterations for gc "on-the-fly", without halting the process for significant periods of time. This is done by marking objects as they are allocated and during mutation, maintaining the various sets. 


Moving vs Non Moving Garbage Collection:
Once the garbage collector runs the marking phase it sorts out the unreachable or garbage objects, and can flush the marked garbage while moving the non reachable/live objects to heap different memory segments to avoid fragmentation. 
1. Additional efforts to reclaim the objects freed by dead objects; the entire region from where live objects were moved can be considered as empty memory region and thus saving the extra efforts to search memory segment in heap memory segment.
2.  Fragmentation can be avoided and easily memory can be allocated to newly created objects, moreover the fragmentation can lead to more gc pauses as it attempts to clear memory from garbage objects.
3. If used with proper techniques the objects that refer to each other can be allocated tightly in virtual address space and thus saving the time for moving the objects in heap memory.

Understanding Android Context:
In Android Runtime or Dalvik, every application has its own process which runs in its own VM. More Significantly, GC implementations are within a VM and thus every process has its own GC even though it is sharing data across application.

Android Runtime v/s Dalvik garbage collections:
The garbage collection strategies vary greatly in Dalvik and android runtime. One major difference is that Dalvik is not a moving collector. When a long pause in the application won't impact user experience (for eg when app is in background and is not playing audio) it compacts the heap memory in Android Runtime execution.

There is separate heap space for bitmaps, making it more easier to find space for large bitmap objects, making it faster to find memory for these large objects without wading the potentially fragmented regular heap, pauses in art are regularly in the realm of 2-3 ms.

Garbage collection is improved in post lollipop devices, it still should be avoided whenever possible, particularly during noticeable situations like animations, where a missed frame can result to jank.

1 comment:

  1. Awesome article, it was exceptionally helpful! I simply began in this and I'm becoming more acquainted with it better! Cheers, keep doing awesome! pittsburg california junk pickup

    ReplyDelete