Jan 15, 2013

Changes to the Garbage collection

Changes made in .Net 4.0:

(http://geekswithblogs.net/sdorman/archive/2008/11/07/clr-4.0-garbage-collection-changes.aspx) 

The .NET garbage collector is one of the areas of the .NET Framework that is extremely important and probably one of the least understood. There are a lot of articles written about it and there have been very few changes since .NET 1.0 was first released. (There have been changes with almost each release, but they have been relatively minor.)
With .NET 4.0, however, there are some fairly substantial changes to the GC that will have some interesting performance implications (in a good way).
For a quick review, the GC in .NET is a generational garbage collector with 3 generations. Generation 0 and 1 collections are very fast since the segment (called the ephemeral segment) is small while Generation 2 collections can be relatively slow.
The GC in .NET also has two modes of operation: Server and Workstation. In Server GC, the algorithm maximizes the overall throughput but all managed code must be paused while it runs. In CLR 4, you can now subscribe to an event to be notified before a Generation 2 or Large Object Heap collection.

public static void Main(string[] args)
{
  try
  {
    // Register for a set of notifications.
    // Parameters require tuning. First is 
    // for Gen2, second, Large Object Heap 
    GC.RegisterForFullGCNotification(10, 10);

    // Start a thread using WaitForFullGCProc
    Thread thWaitForFullGC = new Thread(new ThreadStart(WaitForFullGCProc));
    thWaitForFullGC.Start();
  }

  catch (InvalidOperationException invalidOp)
  {
    Console.WriteLine("GC Notifications are not supported while concurrent GC is enabled.\n” + invalidOp.Message);
  }
}

public static void WaitForFullGCProc()
{
    while (true)
    {
    // Wait for a notification
    GCNotificationStatus s = GC.WaitForFullGCApproach();

    if (s == GCNotificationStatus.Succeeded)
    {
      // This call will direct new traffic
      // away from machine; wait for old
      // traffic to finish; then call
      // GC.Collect()
      OnFullGCApproachNotify();
    }
    
    // Wait for a notification of completion
    s = GC.WaitForFullGCComplete();
    if (s == GCNotificationStatus.Succeeded)
    {
      OnFullGCCompleteEndNotify();
    }
}



The changes in the server GC will probably only affect a small number of applications. However, the changes to the workstation GC (which is the default mode) will affect almost all .NET applications.
In all .NET Framework versions from 3.5SP1 and earlier, workstation GC used a concurrent collection method. This means that the GC can do most, but not all, of a Generation 2 collection without pausing managed code. It can’t, however, do a Generation 0 and Generation 1 collection at the same time as a Generation 2 collection.
CLR 4.0 changes that to support background collection, which can do a Generation 0 and Generation 1 collection at the same time as a Generation 2 collection. This means that now only unusual circumstances should lead to long latency times.

http://geekswithblogs.net/images/geekswithblogs_net/sdorman/WindowsLiveWriter/NET4.0GarbageCollectionChanges_C830/image_2.png

 These charts are from some performance testing done by Microsoft and presented during PDC which shows how the new background collection algorithm should greatly reduce the latency times.



Background Garbage Collection in CLR 4.0

Concurrent GC is being replaced by Background GC in CLR 4.0
Concurrent GC is the mode of the GC that you use in desktop applications for example.  The goal of the concurrent GC is to minimize pause time, and it does so by allowing you to still allocate while a GC is in progress (hence the concurrent part).
Concurrent GC is only available in workstation mode. 
In server mode (which is what you use in ASP.NET for example when you have multiple processors/cores), simplified all managed calls are paused while in a GC, which means that you can’t allocate anything.  This means that the process pauses slightly while in a GC but on the other hand what you loose in pause time, you gain in throughput as GCs are made by x number of GC threads concurrently, where x is #procs*#cores.
In concurrent GC you were allowed to allocate while in a GC, but you are not allowed to start another GC while in a GC. This in turn means that the maximum you are allowed to allocate while in a GC is whatever space you have left on one segment (currently 16 MB in workstation mode) minus anything that is already allocated there).
The difference in Background mode is that you are allowed to start a new GC (gen 0+1) while in a full background GC, and this allows you to even create a new segment to allocate in if necessary.  In short, the blocking that could occur before when you allocated all you could in one segment won’t happen anymore.

Background GC’s will be available in the Silverlight CLR as well
The CoreCLR uses the same GC as the regular CLR, so this means that Silverlight apps benefit from this as well…

As Server mode does not use concurrent GC this will not be available in Server GC
Having this in server mode would be incredibly cool as GCs can get pretty hefty, especially in 64 bit apps with very large heaps, but as Maoni mentions in the video and in the post, this work for the concurrent GC lays the foundation for the same work being done in the Server GC. Because of the complexities involved in doing this in Server GC though this is not included in v4.0.
If you do have a lot of latency due to heavy pause times during full garbage collections, there is a feature that was introduced in 3.5 SP1 that allows you to be notified when a full GC is about to occur.  You can then redirect to another server in a cluster for example while the GC occurs.     

I just want to mention that this not being in the Server GC does not mean that you should switch your server apps (asp.net etc.) to use workstation with concurrent,  Server GC is optimized for these scenarios and should still be used there.


Changes made in .Net 4.5:

(http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx)


Garbage collection is one of premiere features of the .NET managed coding platform. As the platform has become more capable, we’re seeing developers allocate more and more large objects. Since large objects are managed differently than small objects, we’ve heard a lot of feedback requesting improvement. Today’s post is by Surupa Biswas and Maoni Stephens from the garbage collection feature team. -- Brandon
The CLR manages two different heaps for allocation, the small object heap (SOH) and the large object heap (LOH). Any allocation greater than or equal to 85,000 bytes goes on the LOH. Copying large objects has a performance penalty, so the LOH is not compacted unlike the SOH. Another defining characteristic is that the LOH is only collected during a generation 2 collection. Together, these have the built-in assumption that large object allocations are infrequent.
Because the LOH is not compacted, memory management is more like a traditional allocator. The CLR keeps a free list of available blocks of memory. When allocating a large object, the runtime first looks at the free list to see if it will satisfy the allocation request. When the GC discovers adjacent objects that died, it combines the space they used into one free block which can be used for allocation. Because a lot of interaction with the free list takes place at the time of allocation, there are tradeoffs between speed and optimal placement of memory blocks.
A condition known as fragmentation can occur when nothing on the free list can be used. This can result in an out-of-memory exception despite the fact that collectively there is enough free memory. For developers who work with a lot of large objects, this error condition may be familiar. We’ve received a lot of feedback requesting for a solution to LOH fragmentation.

A Better LOH Allocator

In .NET 4.5, we made two improvements to the large object heap. First, we significantly improved the way the runtime manages the free list, thereby making more effective use of fragments. Now the memory allocator will revisit the memory fragments that earlier allocation couldn’t use. Second, when in server GC mode, the runtime balances LOH allocations between each heap. Prior to .NET 4.5, we only balanced the SOH. We’ve observed substantial improvements in some of our LOH allocation benchmarks as a result of both changes.
We’re also starting to collect telemetry about how the LOH is used. We’re tracking how often out-of-memory conditions in managed applications are due to LOH fragmentation. We’ll use this data to measure and improve memory management of real-world applications.
We still recommend some traditional techniques are for getting the best performance from the LOH. Many large objects are quite similar in nature, which creates the opportunity for object pooling. Frequently, types allocated on the LOH are byte-buffers that are filled by third-party libraries or devices. Rather than allocating and freeing the buffer, an object pool would let you reuse a previously-allocated buffer. Since fewer allocations and collections take place on the LOH, fragmentation is less likely to occur and the program’s performance is likely to improve.