Jan 3, 2013

.Net Profilers - A brief review


1.       Focus on portions of your .NET code that really require attention
Profiling allows you to focus on critical sections of your .NET code. When you have a very large application it is very difficult to identify areas in your .NET code that require improvement. Profiling your application will pinpoint .NET code sections that really require improvement or tuning.
2.       Identify code blocks with performance issues
Performance is a very hard thing to measure and trace. The task of identifying code blocks or methods that have performance issues is tedious. Profiling your .NET code helps you to identify those code lines, blocks or methods that require performance improvement.
Compare alternative approaches
During your development, you might come across alternate ways of achieving a task. For any given task you might have two different implementations that you wish to compare, in order to find out which implementation is better in terms of performance, scalability, and resource usage. By comparing your different implementations with a profiler, you can select the most efficient code block.
3.       Get accurate code execution response times
Often, when you examine the performance of your application, you will want to find out how long a line of code, or a block of code, or a method, takes to execute. By profiling your .NET code you can get accurate execution times.
4.       Avoid guessing performance issues
Have you ever looked at your application and felt that it is executing slower than usual? You cannot just go by your instinct and start making changes to your application. By profiling your application, you can confirm whether performance issues exist in your application, and where those issues are.
5.       Visualize performance and memory usage
A visual depiction of execution times and memory usage for your application helps you to make informative decisions very quickly. Once you have the means to see a graph of the execution times or memory usage, it is much easier and quicker to understand issues and fix them.
6.       Track the lifecycle of your .NET objects
You might be using a resource intensive object in your .NET code. Tracking the lifecycle of your .NET object will allow you to make optimizations in your code. For example you might be creating the resource intensive object too early in your application. Profiling will uncover these issues.
7.       Avoid unnecessary loading or initialization of your program
During development of your application you might have had some tests that are loaded and initialized. Prior to deployment of your application you will want to ensure that any unnecessary loading or initialization is removed. It would be very difficult to track such portions of your application without profiling it.
8.       Optimize your looping constructs in .NET
Looping constructs are a common source of performance issues. Profiling your code allows you to understand and eliminate unnecessary loops within your looping constructs. Improving your looping constructs in turn will improve the overall performance of your .NET application.
9.       Identify memory leaks in your application
Memory leaks in your application can be very difficult to identify. Profiling your .NET code allows you to identify any unnecessary memory usage and therefore optimize the memory usage in your .NET application.




Key players in the market:

Sl
Name
Manufacturer
URL
Type
Licensing
1
ANTS Profiler
Redgate
Memory + Performance
Paid
2
SciTech Memory Profiler
SciTech Software
Memory
Paid
3
dotTrace
JetBrains


Paid
4
Equatec Profiler
Equatec


Partly Free / Paid
5
Coderush
DevExpress


Paid
6
windbg.exe
Microsoft
Ships with Windows
Debugger + dump analyzer
free
7
PerfMon.exe
Microsoft
free, ships with .Net Framework
Performance
free
8
ClrProfiler
Microsoft
free download
Memory
free
9
Visual Studio > Analyze> Performance Profiling
Microsoft
Visual Studio Ultimate


Paid
10
AQtime
Codework Solutions
Time
Paid
11
XPO Profiler
DevExpress


Paid
12
JustTrace
Telerik


Paid
13
Slimtune
Open Source


Free
14
SharpDevelop
Open Source
http://www.icsharpcode.net/OpenSource/SD/
IDE + Profiler


15
Yourkit profiler
Your kit
http://www.yourkit.com/dotnet/download/index.jsp
Memory + performance
Paid
16
Intel® VTune™ Performance Analyzer
Intel
http://software.intel.com/en-us/intel-vtune-amplifier-xe
CPU Cost
Paid


Some features of popular profilers (source: StackOverflow):



The specialty of this profiler is its excellent ability to profile asp.net web applications.
This is one area where the other profilers, both commercial(dotTrace) and free (Perf mon/CLR Profiler) really fall flat.

It is especially good at Performance Profiling and is also decent in Memory Profiling (SciTech Profiler is the market leader here).
One positive point for JetBrains’ dotTrace: Since it is integrated with Resharper and hence the VS IDE, it can do profiling for Unit Test cases very easily.

http://www.codeproject.com/Articles/5109/ANTS-Profiler-by-Red-Gate-Software

Common Features of ANTS and Scitech .NET Memory Profiler
·         Real-time analysis feature
·         Excellent how-to videos on their web sites
·         Easy to use
·         Reasonably performant (obviously slower than without the profiler attached, but not so much you become frustrated)
·         Show instances of leaking objects
·         Basically they both do the job pretty well
ANTS
·         One-click filters to find common leaks including: objects kept alive only by event handlers, objects that are disposed but still live and objects that are only being kept alive by a reference from a disposed object. This is probably the killer feature of ANTS - finding leaks is incredibly fast because of this. In my experience, the majority of leaks are caused by event handlers not being unhooked and ANTS just takes you straight to these objects. Awesome.
·         Object retention graph. While the same info is available in Scitech, it's much easier to interpret in ANTS.
·         Shows size with children in addition to size of the object itself (but only when an instance is selected unfortunately, not in the overall class list).
·         Better integration to Visual Studio (right-click on graph to jump to file)
Scitech .NET Memory Profiler
·         Shows stack trace when object was allocated. This is really useful for objects that are allocated in lots of different places. With ANTS it is difficult to determine exactly where the leaked object was created.
·         Shows count of disposable objects that were not disposed. While not indicative of a leak, it does identify opportunities to fix this problem and improve your application performance as a result of faster garbage collection.
·         More detailed filtering options (several columns can be filtered independently).
·         Presents info on total objects created (including those garbage collected). ANTS only shows 'live' object stats. This makes it easier to analyze and tune overall application performance (eg. identify where lots of objects being created unnecessarily that aren't necessarily leaking).
AQTime Profiler
·         Managed & Unmanaged Code Support
·         Profiling 32-bit and 64-bit Code
·         Profiling Scripts
·         Integration with Microsoft Visual Studio, Borland Developer Studio and CodeGear RAD Studio
·         Advanced Integration into Microsoft Visual Studio Team System
·         Comprehensive Analysis of Application Performance
·         Complete Top to Bottom Analysis
·         Setup and Test Management Interfaces (or, Control What to Profile and When)
·         Comparison and Merging of Results
·         Integrated Source Code Editor
·         Assistant
·         Easy to Use Interface for Optimum Efficiency
·         Automation Testing and Profiling Cycles
·         Open, Extensible Architecture
·         Benefits:
  1. Profile Scripts
  2. Trace References to Interface Objects
  3. Reveal Prematurely Freed Memory
  4. Explore .NET Code More Thoroughly
  5. Retrieve Results Even If the Application Crashed
  6. Perform Coverage Testing Faster
  7. Measure the Execution Time of SQL Queries and Stored Procedures
  8. Create Test Items in Microsoft Visual Studio 2008
  9. Filter Out Standard Units
  10. Export Results to a Database
  11. Inspect the Usage of GDI+ Resources
XPO Profiler: XPO Profiler is a profiling tool designer specific to XPO-based business applications. The main goal of the profiler is to help you find performance bottlenecks and code issues, such as attempts to access a session from different threads or to execute requests via inappropriate data layers. Unlike server side SQL query profilers, XPO Profiler tracks internal XPO events. Thus, besides SQL query executions, internal events, such as session method calls, are tracked. As a result, you are presented with a log of method calls along with passed parameters and corresponding SQL queries side-by-side. This combined data can be much more helpful than just a list of executed SQL queries. The profiler is also purely XPO-based and so it works with all XPO-supported databases.

Other profiler reviews:
Tools and Techniques for .NET Code Profiling :
http://msdn.microsoft.com/en-us/magazine/hh288073.aspx



.NET Memory Profiling … (excerpted from Rick Leeks blog)


What happens to small objects?

Small .NET objects are allocated onto the Small Object Heaps (SOH). There are three of these: Generation 0, Generation 1, and Generation 2. Objects move up these Generations based on their age.
New objects are placed on Gen 0. When Gen 0 becomes full, the .NET Garbage Collector (GC) runs, disposing of objects which are no longer needed and moving everything else up to Gen 1. If Gen 1 becomes full the GC runs again, but also moves objects in Gen 1 up to Gen 2.
A full GC run happens when Gen 2 becomes full. This clears unneeded Gen 2 objects, moves Gen 1 objects to Gen 2, then moves Gen 0 objects to Gen 1, and finally clears anything which isn’t referenced. After each GC run, the affected heaps are compacted, to keep memory which is still in use together.
This generational approach keeps things running efficiently – the time-consuming compacting process only occurs when absolutely necessary.
Remember: if you see a high proportion of memory in Gen 2, it’s an indicator memory is being held on to for a long time, and may be a sign you have a memory problem. This is where a memory profiling tools come in handy.

What happens to larger objects?

Objects larger than 85 KB are allocated onto the Large Object Heap (LOH). They aren't compacted, because of the overhead of copying large chunks of memory. When a full GC takes place, the address ranges of LOH objects not in use are recorded in a free space allocation table instead.
When a new object is allocated, this free space table is checked for an address range large enough to hold the object. If one exists, the object is allocated there, if not, it’s allocated at the next free space.
Because objects are unlikely to be the exact size of an empty address range, small chunks of memory will almost always be left between objects, resulting in fragmentation. If these chunks are less than 85 KB, there’s no possibility of reuse at all. Consequently, as allocation demand increases, new segments are reserved even though fragmented space is still available.
Furthermore, when a large object needs to be allocated, .NET tends to append the object to the end anyway, rather than run an expensive Gen 2 GC. This is good for performance but a significant cause of memory fragmentation.

The Garbage Collector can be run in different modes to optimize performance

.NET solves the trade-off between performance and heap efficiency by providing multiple modes for the GC.
Workstation mode gives maximum responsiveness to the user and cuts down pauses due to GC. It can run as ‘concurrent’ or ‘non-concurrent’, referring to the thread the GC runs on. The default is concurrent, which uses a separate thread for the GC so the application can continue execution while GC runs.
Server mode gives maximum throughput, scalability, and performance for server environments. Segment sizes and generation thresholds are typically much larger in Server mode than Workstation mode, reflecting the higher demands placed on servers.
Server mode runs garbage collection in parallel on multiple threads, allocating a separate SOH and LOH to each logical processor to prevent the threads from interfering with each other.
The .NET framework provides a cross-referencing mechanism so objects can still reference each other across the heaps. However, as application responsiveness isn't a direct goal of Server mode, all application threads are suspended for the duration of the GC.

Weak references offer a compromise between performance and memory efficiency

Weak object references an alternative source of GC roots, letting you to keep hold of objects while allowing them to be collected if the GC needs to. They’re a compromise between code performance and memory efficiency; creating an object takes CPU time, but keeping it loaded takes memory.
Weak references are particularly suitable for large data structures. For example, imagine you have an application that allows users to browse through large data structures, some of which they might return to. You could convert any strong references to the structures they have browsed into weak references. If users return to these structures, they’re available, but if not the GC can reclaim the memory if it needs to.

Object pinning can create references for passing between managed and unmanaged code

Screenshot - Figure1.gif
.NET uses a structure called GCHandle to keep track of heap objects. GCHandle can be used to pass object references between managed and unmanaged domains, and .NET maintains a table of GCHandles to achieve this. There are four types of GCHandle, including Pinned, which is used to fix an object at a specific address in memory.
The main problem with object pinning is that it can cause SOH fragmentation. If an object is pinned during a GC then, by definition, it can't be relocated. Depending on how you use pinning, it can reduce the efficiency of compaction, leaving gaps in the heap. The best policy to avoid this is to pin for a very short time and then release

Profiling the Performance of a .NET Application

Introduction

Not much has changed in terms of users' requirements for a high-performing application over the past two years since I last reviewed ANTS Profiler. Certain aspects of development have become simpler, while others have introduced new degrees of complexity. One thing is for sure, though: developers still face considerable challenges when trying to determine and improve the performance characteristics of their applications.
Luckily, ANTS Profiler has evolved with the technology to provide a quick and easy way of identifying performance bottlenecks within your .NET applications. Version 3 is the latest version of ANTS Profiler and provides more features and full support of the current .NET runtime.

What to Profile

The CLRProfiler concentrates only on the profiling of memory within your application, whereas ANTS Profiler can profile both memory and time-related performance in your application. The standard version of ANTS Profiler that I will be reviewing here provides metrics around duration and time-based performance. It's worth noting that ANTS Profiler Professional edition also deals with memory profiling.
In my review of a previous version of ANTS Profiler, I compared the tool with the CLRProfiler. Rather than perform another comparison here, I can confirm that the results of that comparison are still valid. That is, ANTS Profiler provides a quicker and easier method to identify performance bottlenecks within your application. This is not to say that the CLRProfiler should not be used, but rather used in conjunction with ANTS Profiler. The slowest areas in your application can be identified very quickly with ANTS Profiler. The CLRProfiler can then be used to 'deep dive' into the convoluted aspects of memory management within that particular area of your application. Memory usage and performance are very tightly related in .NET application, and one should not be investigated without considering the other.

The Basics

ANTS Profiler provides a concise summary of which methods and lines of code took the longest to execute during the profiling process. A summary of these methods is the first thing you see when you profile an application, and this is the real advantage and power of ANTS Profiler. An example best demonstrates this.
Let's examine the screenshot below of the summary screen within ANTS Profiler. Without even looking at the code, we immediately know which pieces of code took the longest to execute and where to start our examination to increase the performance of the application.
Figure 1: Initial Screen after profiling an application
This particular application is not very large, nor very complex. We can see this within the Slowest Lines of code section of the immediate areas of interest.
Screenshot - Figure2.gif
Figure 2: Slowest lines of code
The slowest line is initially the 'Main' entry point of the application, which stands to reason, as it will execute for as long as the program is running. The second slowest line of code is the line that executes the EncryptData method. This is verified by the Source code view at the bottom of the screen, clearly showing the line being executed and its duration of execution.
Screenshot - Figure3.gif
Figure 3: Source code view
The Source code view shows the line of code that executes the EncryptData method. The view also indicates that this is the slowest line of code with a full red bar. In addition, we can see that it took 0.259 seconds (259 milliseconds) to execute.
By clicking on the EncryptData entry in the Slowest lines of code window, the Source code view is updated to reflect the execution time within that method.
Screenshot - Figure4.gif
Figure 4: Updated Source code view
Within the EncryptData method, we can see that the GetEncryptor method was the slowest to execute (as indicated by the full red bar), and that it took .0250 seconds (250 milliseconds) to execute. Again, this is verified within the Slowest methods window display.
Screenshot - Figure5.gif
Figure 5: Slowest methods
Finally, clicking on the GetEncryptor method within the Slowest methods window updates the source code display to show us the slowest executing line within that method.
Screenshot - Figure6.gif
Figure 6: Update Source code view - GetEncryptor
Here we can see that the line:
TripleDES crypto = TripleDESCryptoServiceProvider.Create();
took 0.244 seconds (244 milliseconds) to execute. This represents a majority of the time and is effectively the slowest part of the application.
So with a few mouse clicks, we have already narrowed down the slowest portion of an application we have no idea about. While this is not a realistic scenario, as typically we would have some context around the application we are profiling, it highlights the ease and speed about which ANTS Profiler can identify the performance areas of concern in our application.

And That's Not All…

The previous section showed a brief introduction into ANTS Profiler's functionality and touched on the performance summary screen, where the most value in terms of performance information is typically gleaned.
Now let's take a broader look at the features provided by ANTS Profiler.
ANTS Profiler can profile a range of application types. The wizard dialog best illustrates this where the type of application is specified.
Screenshot - Figure7.gif
Figure 7: Type of Application to Profile wizard dialog
All the different types of .NET applications can be specified, including ASP.NET web applications. New in version 3 is the support for profiling applications using the ASP.NET development web server that comes with Visual Studio® .NET (commonly referred to by its code name 'Cassini').
ANTS Profiler allows easy profiling via a simple wizard that initiates the profiling operation. All the settings specified can be saved as a profiling project, then loaded and initiated. This is good for recording an original performance profile of your application, called a baseline, and saving those results. Typically, we would then modify the application to address any performance issues that were identified, reload the profile project, re-run the profiling operation, and save the new results. The results generated can be saved and reloaded as required; they can then be used to compare previous runs with existing runs to see if the modifications were effective. The results themselves are listed in ANTS Profiler's Results window. This window holds links to each set of results and is shown in the following diagram:
Screenshot - Figure8.gif
Figure 8: Results window
To demonstrate this, we shall use our previous example at the beginning of this article. In that example, we identified that the following line:
TripleDES crypto = TripleDESCryptoServiceProvider.Create();
was taking the longest to execute and was the major factor of the performance profile of our sample application. Clearly, the code is instantiating a cryptographic provider based on the TripleDES algorithm. To make this a little faster (although a little less secure), we will instead use a simple DES algorithm which is not as comprehensive as TripleDES, but neither is it as computationally intensive. The modified line then becomes:
DES crypto = DESCryptoServiceProvider.Create();
Now, when we re-run the profiling project, we can see from the following diagram:
Screenshot - Figure9.gif
Figure 9: Modified performance run - Source Code Window
that the instantiation of a cryptographic provider based on the DES algorithm takes 0.204 seconds (204 milliseconds), compared to 0.244 seconds (244 milliseconds) previously. While not much of a benefit singularly (40 milliseconds is indistinguishable), in a high throughput application, perhaps dealing with hundreds of thousands of users, any improvement is beneficial. Additionally, this kind of performance benefit is easily identified. It would certainly take much more time to identify it without a profiling tool like ANTS Profiler.

The Bigger Picture

To round out the information provided by ANTS Profiler, we can select the All methods tab to show all the methods executed as part of the profile run. This view provides a summary of all the methods executed within the profile run:
  • Time: how long the method took singularly (not including child methods).
  • Time with children: how long the method took to execute, including child methods.
  • Hit count: number of times the method was executed.
  • Source file: the source code file that the method is in.
Screenshot - Figure10.gif
Each column can be sorted by simply clicking on the associated header column to allow for very simple grouping. This is a really good way to obtain a complete overview of your application's performance and also a complete view of every single method executed within our applications profile run. Combined with the initial summary view, we can create a comprehensive performance profile of our application for very little effort.

New Features

Almost all the features discussed so far have been available in previous versions of ANTS Profiler. So what about the new features in version 3?

Enhanced Graphical User Interface

Organization of windows and interface elements have been made more flexible. The GUI also includes a new docking ability to make it more like Visual Studio in terms of behavior and ease of management.

Integration with Visual Studio

When ANTS Profiler is installed, a new menu is available within Visual Studio that invokes ANTS Profiler and immediately launches the profiling wizard. Conveniently, the wizard will automatically bring up the application being developed within Visual Studio.
Screenshot - Figure11.gif
Figure 101: ANTS Menu within Visual Studio

Support for the ASP.NET Development Server

ANTS Profiler supports profiling of applications using the ASP.NET Development Server, code-named 'Cassini'. In a similar way to the integration feature, this is another facet of support that ANTS Profiler provides with the latest version of Visual Studio.

Fast Mode Performance Profiling

This method of profiling allows faster profiling of applications than standard. Typically, an application's performance can worsen when being profiled, due to the intrusive nature of profiling. Fast mode alleviates some of this, but this feature is only available within the ANTS Profiler Professional edition.

Profiling API

ANTS Profiler now exposes an API (Application Programming Interface) that allows profiling actions of ANTS Profiler to be accessed and controlled by your own applications.

Support for 64-bit Applications

64-bit applications are more prevalent and will be more so in the future. ANTS Profiler fully supports profiling these types of applications.

Support for All the Latest .NET 3.0 and Related Features

ANTS Profiler has been updated to fully support the Microsoft® Windows® Vista® operating system and the Internet Information Server 7 that comes with Vista. In addition, there is support for the latest series of .NET features, such as .NET 3.0 Framework, Windows Communication Foundation, Windows Workflow Foundation, Windows Presentation Foundation, and XBAP technologies.

Viewing an Export of Profiling Results After the Trial Period has Expired

One of the more interesting features of the latest version of ANTS Profiler is the ability to view and export profiling results after the expiry of the trial version period. Clearly, this is a show of support from Red Gate, effectively acknowledging that some people will utilize the trial version of the software to profile their applications and could then go on to purchase a copy. And because Red Gate knows that these people will find the software so useful for their requirements, Red Gate doesn't want to hamper profiling efforts just because you used the trial version. At least that's my interpretation anyway....

Books on Performance profiling:
Practical Performance Profiling: Improving the Efficiency of .NET Code : http://www.simple-talk.com/books/.net-books/practical-performance-profiling-improving-the-efficiency-of-.net-code/




Creating a Custom .NET Profiler

http://www.codeproject.com/Articles/15410/Creating-a-Custom-NET-Profiler