Intel® - Trace Analyzer and Collector

	Intel® Software Development Products for Intel® Platforms and Technologies

Intel® Trace Analyzer and Collector 7.1 for Linux* and Windows Compute Cluster Server 2003*

Overview		System Requirements		Print
Features		Support

Overview

Analyze, optimize, and deploy high-performance applications on Intel® processor-based clusters. Intel® Trace Analyzer and Collector provide information critical to understanding and optimizing cluster performance by quickly finding performance bottlenecks in MPI communication. Version 7.1 now includes trace file comparison, counter data displays, and an MPI correctness checking library.

Product Brief [PDF 1.64 MB]

Intel® Trace Analyzer and Collector Flash Demo

Features

Analyze MPI performance, Speed up parallel application runs, locate hotspots and bottlenecks, Compare trace files with graphics providing extensively detailed analysis and aligned timelines.

Why Intel TAC 7.1?

	Supported on Linux* and Microsoft* Windows* Compute Cluster Server 2003
	Intuitive full color customizable GUI with many drill down view options
	Highly Scalable with low overhead and efficient memory usage
	Easy run-time loading - or instrument an MPI application executable
	MPI Correctness Checking Library detects many types of errors in communication
	Integrated online help
	Easy installation and full documentation
	Full tracing and/or light-weight statistics gathering

Many features, many options, major performance improvements

	PIN-based binary instrumentation
	Runtime behavior displayed by function, process, thread, timelines or cluster or node
	Multiple types of filtering (functions, processes, messages) and aggregation
	Performance counter data recording can be displayed as timeline
	Trace data is cached to reduce runtime overhead and memory consumption
	Traces multi-threaded MPI applications for event-based tracing to non-MPI applications
	Fail safe tracing
	Support for MPI-1, SHMEM, MPI-IO, ROMIO
	Distributed memory checking with the MPI Correctness checking library (Linux)

Trace Collector

Automated MPI tracing and MPI Correctness Checking

Generic distributed (non-MPI) and single process tracing

Thread level tracing with traces created even if application crashes

HPM data collection (PAPI, rusage, OS-counters)

Configurable tracefile parameters

Feature disabling/enabling

Tuning parameters

Distributed Memory checking with Valgrind

Binary runtime instrumentation

Compiler instrumentation

	1. Icc/ifort/icpc -tcollect
	2. Gcc/g++ -finstrument-functions

API: source code instrumentation (counter, function, message and collective operation logging)

Trace Analyzer

	Event, Quantitative, Qualitative, and Counter Timelines
	Flexible message and collective operation Profiles
	Function Profile (call graph, call tree, flat and load balance)
	Detailed comparison (of 2 traces)
	Multi-level source code visualization with a full text browser
	Flexible and powerful event tagging and filtering
	Hierarchical grouping and aggregation across function or processes data
	Large set of configuration parameters per chart
	Export profiling data as text; export charts to graphics or printer
	Command line interface

MPI Checking

Included in Intel Trace Analyze and Collector is a unique MPI correctness checker to detect deadlocks, data corruption, or errors with MPI parameters, data types, buffers, communicators, point-to-point messages and collective operations. By providing checks at run-time, and reporting the errors as they are detected, the debugging process is greatly expedited. The correctness checker also allows debugger breakpoints to help in-place analysis but has a small enough footprint to allow use during production runs.

The true benefit of the Trace Analyzer and Collector Correctness Checker is the potential to scale to extremely large systems and the ability to detect errors even among a large number of processes. The checker can be set to work based on profiles (specific implementations) and view deadlocks regardless of fabric type.

By tracking data types and wrapping MPI calls, the requests and communicators can be reused from the trace collector. (The checking library is compiled from the source code of the performance data collection library.) The Analyzer is able to extremely rapidly unwind the call stack and use debug information to map instruction addresses to source code with and without frame pointer.

With both command line and GUI interfaces the user can additionally set up batch runs or do interactive debugging. The timeline view shows actual function calls and process interactions which highlights excessive delays or errors that stem from improper execution ordering.

Instrumentation and Tracing

Intel Trace Analyze and Collector specialize in low intrusion binary instrumentation (for IA-32 and Intel® 64 architecture). It can create and add this instrumentation to existing statically and dynamically linked binary executables to allow automatic monitoring of MPI as well as function entry and exit points. This includes the capability of tracing and recording performance data from parallel threads in C, C++ and Fortran.

The Intel Trace Analyzer and Collector support both MPI applications and distributed non-MPI applications in C, C++, and Fortran. For applications running with Intel® MPI library this includes tracing of internal MPI states.

Interface and Displays

Timeline Views and Parallelism Display

	Displays concurrent behavior of parallel applications
	Displays application activities, event source code locations, and message-passing along time axis (see Figure 1)

Figure 1. Timeline Views and Parallel Displays

Advanced GUI

	Manages displays and statistics with fast interface
	Uses object point-and-zoom for enhanced detail, context-sensitive sub-menus
	Couples displays to allow focused analysis
	Supplies automatic updates of recomputed statistics
	Provides timeline displays, call-graph, performance profile for function groups and communication in a specific phase of parallel execution

Display Scalability

Navigates through trace data levels of abstraction: cluster, node, process, thread, and function hierachies (classes)

Detailed and Aggregate Views

	Examines aspects of application runtime behavior, grouped by functions or processes
	Easily identifies the amount of time spent e.g. in MPI communication
	Easily see the performance differences between two program runs (see Figure 2)

Figure 2. New comparison displays for comparing two trace files

Ease of Use

	Offers user-friendly application programming interface (API) to control or record user events
	Adds versatile recording and analysis of counter data (see Figure 3)

Figure 3. New Counter timeline display

Metrics Tracking

Communication Statistics

	Displays communication patterns of parallel applications
	Displays metrics for an arbitrary time interval
	Through comparison it keeps track of the performance increase of an algorithm change
	Figure 4 shows the same algorithm with synchronous communication (left) vs. asynchronous communication (right) with communication overhead (red)

Figure 4. Synchronous and Asynchronous Communication

Execution Statistics

	Provides subroutine execution metrics down to the level of call-tree characteristics
	Though binary, compiler-driven and source code instrumentation it offers very detailed analysis beyond MPI

Profiling Library

Records distributed, event-based trace data

Statistics Readability

Examines aspects of application runtime behavior, grouped by functions or processes Logs information for function calls, sent messages, and collective operations

System Requirements

Hardware

Minimum Requirements

IA-32, Intel® 64 or IA-64 (formerly Itanium) architecture-based system. Examples of such Intel processors are:
	Intel® Pentium® 4 processor
	Intel® Xeon® processor
	Intel® Itanium® processor
	Intel® Core™2 Duo processor (example of Intel® 64 architecture)
Note that it is assumed that the processors listed above are configured into homogeneous clusters 500MB of RAM 200MB of hard disk space

Operating System Support
Intel® 64 architecture only	Microsoft Windows* Compute Cluster Server 2003
All three architectures	Red Hat* Enterprise Linux* 3.0, 4.0, 5.0 SUSE* Linux Enterprise Server* (SLES) 9, 10 SUSE Linux 10 Asianux* 2.0 Miracle Linux* 4.0 Red Flag* Linux 5.0 Haansoft Linux Server*
IA-32 and Intel 64 architectures	Fedora Core 5 and 6 TurboLinux* 10
IA-64 and Intel 64 architectures	SGI ProPack* 5
IA-64 architecture only	SGI ProPack* 4

Other Supported Software

Intel MPI Library or MPICH

Intel� MPI Benchmarks

Intel� Math Kernel Library

Intel� C++ Compiler

Intel� Fortran Compiler

GNU C, C++, and FORTRAN77 Compilers

SGI Message Passing Toolkit (IA-64 architecture only)

Support

With the purchase of Intel® Software Development Products, you will receive one year of technical support and product updates from Intel® Premier Support, our interactive issue management and communication Web site. This premium support service allows you to submit questions, download product updates, and access technical and application notes, and other documentation. For more information, visit the Intel Registration Center.

Intel provides both the tools and support to enhance the performance, functionality and efficiency of software applications.

Compatible with leading Windows* and Linux* development environments, Intel® Software Development Products are the fastest and easiest way to take advantage of the latest features of Intel processors. Intel Software Development Products are designed for use in the full development cycle, and include Intel® Performance Libraries, Intel® Compilers (C++, Fortran for Windows, Linux, and Mac OS* X), Intel® VTune™ Analyzer, Intel® Threading Tools and Intel® Cluster Tools.

The Intel® Premier Customer Support Web site provides expert technical support for all Intel software products, product updates and related downloads. For additional product information visit: www.intel.com/software/products.

Intel, the Intel logo, Itanium, Pentium, Intel Centrino, Intel Xeon, Intel XScale, VTune, Celeron, Intel NetBurst, and MMX are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other brands and names may be claimed as the property of others. Visit our Legal Information Web site for more information.