Intel® Intel® Software Development Products for Intel® Platforms and Technologies
Intel® VTune™ Performance Analyzer 9.1 For Linux*
Product Information
 Overview

 Performance  Intel® Premier Support
 Features

 Compatibility Print Print
 New In This Release  System Requirements  

Overview
Intel® VTune™ Performance Analyzer 8.0 For Linux*

Is your application too fast? I didn't think so.

Intel® VTune™ Performance Analyzer for Linux* is a fully Linux-based solution indispensable for making your software run its fastest on single and multi-core systems. It analyzes applications without recompilation or linking on handheld through supercomputer systems. It is robust with large applications (over 1 GB of source code1) and multi-core, multiprocessor, and NuMA systems using the latest Intel® processors.

Easy to Use
VTune analyzer makes application performance tuning easier with a graphical user interface (GUI) based on the Eclipse* development environment.‡


Many developers want to maximize application performance. VTune analyzer gives the developer a view of what's happening as the application is running. It identifies areas that take an inordinate amount of processor time. It also helps identify critical paths in an application where adjustments have maximum benefit. Without VTune analyzer, performance tuning is a guessing game.

Finding Your Bottleneck Is Easier Than Ever Before
Complete one simple dialog box to get a list of the top five time-consuming functions.


It's fast and easy to find your performance bottlenecks with a list of the most active functions. Click on a function name to display the source and show what is taking all the time.
See the Answers to Your Source
Source and disassembly views show you exactly which lines of code are taking the most time.

Quickly find the data you need.
Click an icon to:

1

 

View source (shown)

2

 

View mixed source and assembly

3

 

View assembly

4

 

Go to next function

5

 

Go to hottest line for the selected event

6

 

Go to the next hottest line for the selected event

7

 

View compiler tuning advice

back to top 
 

Find the Critical Path Using Call Graph Profiling
Call Graph determines calling sequences and graphically displays the critical path. It also shows you the context of the bottleneck. To be effective, you often need to know not only where the application is spending its time, but how it got there.

View the critical path in red
Quickly locate the critical path and navigate the profiling results easily using both a table and graph view. Click a table entry to highlight the function in the graph, or click the graph to find the table entry.

Unlike other offerings, VTune analyzer provides both sampling and call graph analysis. Even if you plan to do mostly call graph analysis, running sampling first lets you identify the modules that need it so you only pay Call Graph's larger overhead for the modules that need to be analyzed. This can be vital on large projects. Sampling is great for analysis of "loopy" code. Call Graph is usually better for "branchy" code. You need both to get the job done right.

Low Overhead Sampling Profiling

Event-based sampling finds your bottleneck with very low overhead (typically less than 5 percent). Identify problems such as cache misses, branch mis-predictions, and bus bandwidth. Because it is system-wide, event-based sampling can be used to tune libraries and drivers as well as application programs.

Filter the data to find your answers
The table and bar chart views of sampling results filter your data many different ways to find what you need. View by thread (shown) for load balancing.

 
back to top 
 

Features and Benefits

All Architectures:

Low Overhead
Accurately identify where the program spends time. Sampling is system wide with negligible overhead (typically less than 5 percent).

Find the Critical Path
Determine function calling sequences and find the critical path using Call Graph.

No Recompile Required
Unlike traditional instrumented profilers that make you recompile or modify your build script, just use your production executables.

Compatibility
VTune™ Performance Analyzer supports the latest Intel® processors (Intel® 64 processors, Intel® Itanium® processors, multi-core processors...) and a wide variety of Linux* distributions.

Programming Language and Compiler Independent
VTune analyzer supports all compilers that follow industry standards (ELF, STABS, DWARF).

Mixed Java* and Native Code
Unlike Java*-only analyzers, VTune analyzer tunes mixed Java and native code2.

Minimal Memory Footprint
Remote profiling lets you monitor a production server under workload. The remote agent minimizes the performance impact on the target system while the local user interface makes it easy to review the results.

Command Line Capability
Automate batch operations.

Large Applications Welcome
VTune analyzer is a robust solution even with large executables3. If you have a large application with hundreds of thousands of functions, bring it to VTune analyzer.

Listen to the Compiler's Advice
An optimizing compiler can do a lot better with just a few tips from you. We've integrated the Intel®Compilers with VTune analyzer to make this easy and very effective.

back to top 
 

Large Enterprise and HPC Systems:

Minimize Bus Traffic in Non-uniform Memory Architecture (NuMA) Systems by storing sampling data in local CPU memory. This is critical to avoid saturating the interconnect bus and slowing the system under test.

Designed for High Performance Computing
Large High Performance Computing (HPC) systems have unique requirements supported by VTune analyzer.

  2 Multiple users can share a large system for concurrent Call Graph performance analysis.
  1 Sampling is supported on systems with up to 4096 processors4 with local buffering per CPU for maximum accuracy and minimum inter-node contention. For maximum accuracy and to minimize the amount of data collected we recommend selecting a maximum of 128 CPUs for simultaneous data collection.


Intel® Itanium® Architecture:

Eclipse* Based Graphical User Interface
The easy-to-use Eclipse* based graphical user interface in VTune analyzer is now native on Itanium® architecture.

Instruction Filtered Events Pinpoint Bottleneck Locations
Isolate problems like poor pre-fetch and poor memory alignment. Sometimes just choosing an event is not selective enough, because the event can occur both at critical and non-critical times. On Intel Itanium architecture, instruction filtering allows you to collect events only when they occur with a specified op-code.

Minimize Data Collection with CPU Selection
Collect only the data you need. CPU selection lets you control exactly where data is collected, from all the processors, only those in your allocation or only the processors you specify. This greatly reduces the amount of data you need to collect.

 
back to top 
 

What's New in This Release

Profile JavaScript* and Flash* Code

New profiling support in emerging internet browsers and other script-oriented products allow developers working with new JavaScript* or Flash* JIT technologies to analyze their code.  Use the VTune analyzer to optimize for scalable performance of these codes on Windows* and Linux* to ensure the best end user experience with your application. VTune analyzer supports profiling JIT'd code when browser vendors add the required support. This enables deep performance analysis of these additional languages:

•  JavaScript / AJAX

•  Flash (Action Script)

Check with your browser supplier for details on when their browser will enable support.

 

Profile Dynamically Generated Code

Many applications today emit their own runtime-generated or just-in-time (JIT) code. New profiling APIs in the VTune analyzer enable performance analysis of dynamic code and allow you to view annotated source code directly from the analysis results.

 

Access to VTune Analyzer's Open Data Model

VTune analyzer can now support many different software platforms with performance sampling analysis.  Use the new open data model APIs to combine the VTune analyzer's powerful GUI on Windows* or Linux* with data from your custom collector to analyze any application on a wide range of platforms.

•  Collect data on operating systems not directly supported by the VTune analyzer.

•  Supported Windows* Operating Systems

•  Supported Linux* Distributions

•  Collect data on embedded Intel hardware based platforms.

 

Access to the latest Experimental Technologies

VTune analyzer users have access to the latest experimental performance tuning technologies Intel has to offer. Visit whatif.intel.com and look for Intel® Performance Tuning Utility and Intel® Platform Modeling with Machine Learning . These tools include a number of exciting capabilities including:

•  Statistical Call Tree - profiles with low overhead to detect where time is spent in your application

•  Basic Block Analysis - displays hotspots with basic block granularity and generates a control flow graph for advanced analysis of application, even without the source code

•  Data Access Profiling - identifies memory hotspots and relates them to code hotspots Dependency Plots - visualize the relationships between metrics

•  Event Rank - view the list of best predictors of performance using machine learning

VTune Analyzer Displays What the Compiler Knows
An optimizing compiler can do a lot better with just a few tips from you. We've integrated the Intel® compilers with Intel® VTune™ Performance Analyzer to make this easy.

The Intel compiler optimization reports contain a wealth of information to make your application faster. VTune analyzer locates your critical, time consuming "hot spot" and filters the compiler optimization report to show only the lines that apply to the code selected. Now you can see what the compiler optimized and choose pragmas to further improve performance.

For example, a single click tells you that the compiler didn't optimize your critical loop because of an assumed vector dependency. You know there is no dependency and insert a pragma telling the compiler to ignore it which makes it faster.

Currently, optimization report filtering works exclusively with Intel® C++ and Fortran Compilers 9.1 and higher, but it utilizes a standard format open to other compilers.

 

 
Click to enlarge
1 After you find your hotspot using Intel® VTune™ Analyzer, select the hot lines of code in the source view and click an icon to see the compiler's tuning advice.

New More Effective Tuning Methodology Supported
Pipeline stall accounting radically improves tuning by focusing the user on the instances of possible issues (like cache misses) which actually end up mattering. Core™2 Duo and Core™2 Quad processors have greatly enhanced performance analysis capabilities. These processors support more events, higher precision in event location correlation, and a new and wonderful pipeline stall accounting.

New Events for Tuning Multi-Core Intel® Processors
New events measure parallelism, core sharing of the bus and cache, and modified data sharing by threads. These identify opportunities to improve threading, tune multi-core sharing of the bus and cache, and optimize cache-line usage.

New Linux Distributions!
Check out the details on the latest supported distributions .

Faster Call Graph - Selective Instrumentation for Java* and Native Code
Now you can selectively instrument Java* or native code to improve runtime performance. By gathering data only on the modules being tuned, overhead is reduced and runtime is improved.

Supports the Latest Intel® Processors
Supports the latest Intel® quad-core processors ( details ).

back to top 
 

Powerful User Interface Improvements

Tune Inline Functions
Tune your inlined code with instance-specific event counts on the source and assembly views. Performance can vary by context, i.e., by where a function is called. VTune analyzer provides event data for each occurrence of an inlined function.

Supports Intel and GNU compilers:
  ICC 8.1 or higher
  GCC 3.2 or higher **

One-click Hotspot Navigation
With event counts next to each source line, you can easily see how hot each line is. But in a large source file, how do you find the hottest spot? Or jump to the next hottest line which may be a thousand lines away? Easy, just select the event you want to navigate by clicking in its column, and then click the Min, Max, Next and Previous icons to quickly browse through your hot spots.

Branch and Call Navigation Made Easy
Instantly follow a branch in disassembly by clicking a menu. No more hunting for the destination, just choose "Go to target" to scroll the display.

Create Meaningful Event Labels
Name your custom events using event aliasing. When you create a custom event, it is often difficult to remember exactly what you did. Event aliasing creates a custom label that is meaningful to you. VTune analyzer then uses this label in all event displays.
  2
Click to enlarge
vnew_arrow1vnew_arrow2vnew_arrow3vnew_arrow4 Click the Max icon to scroll to the hottest line in the current source view. Next, Previous and Min buttons quickly take you through the list of hotspots. To navigate a different event, just click the desired column.

 

Large Enterprise and HPC Systems

Minimize Bus Traffic in Non-uniform Memory Architecture (NuMA) Systems
by storing sampling data in local CPU memory. This is critical to avoid saturating the interconnect bus and slowing the system under test.

 

New for Itanium® Architecture!

Eclipse* Based Graphical User Interface
VTune analyzer's easy to use Eclipse* based graphical user interface is now native on Itanium® architecture.

Instruction Filtered Events Pinpoint Bottleneck Location
Itanium® architecture exclusive!
Isolate problems like poor pre-fetch and poor memory alignment. Sometimes, just choosing an event is not selective enough, because the event can occur both at critical and non-critical times. On Intel® Itanium® architecture, instruction filtering allows you to collect events only when they occur with a specified op-code.

Minimize Data Collection with CPU Selection
Itanium architecture exclusive!
Collect only the data you need. CPU selection lets you control exactly where data is collected. From all the processors, only those in your allocation, or only the processors you specify. This greatly reduces the amount of data you need to collect.

Eclipse* based graphical user interface is now native on Itanium® architecture.


Click to enlarge


Click to enlarge

   

Note: Features listed as "New" are new since the last major release 8.0. Some have been previewed in minor updates and beta releases.

**GCC uses the older Dwarf2 format. In some cases there is not enough information to associate the inlined instance with the correct caller source line. In this case VTune analyzer will guess and associate the contribution of the inlined instance with the nearest caller source line. This may create an event mismatch between Source and Function Views. The newer Dwarf3 format used by ICC 8.1 and higher eliminates this problem by unambiguously associating inlined instances with the caller source line. GCC 4.0.2 may partially support Dwarf3, but it not complete enough to help with this problem.

back to top 
 
Performance

Powerful User Interface Improvements

VTune Performance Analyzer for Linux offers you a view into the Linux application, exposing bottlenecks and hotspots in the code, allowing you to easily pinpoint areas in the code that can be improved. Your Linux application can gain outstanding performance, providing a competitive advantage. Improve application performance on Intel architectures with these optimization features:
Event-Based, System-Wide Sampling
Call Graph Profiling
Rich, easy-to-use Graphical User Interface
Hotspot Analysis
Memory-Intensive Application Support
Static Analysis and Disassembly
VTune Performance Analyzer Driver Kit
back to top 
 

Compatibility

Supports Intel® Architecture-Compatible Processors

Use VTune Performance Analyzer for Linux to speed applications for a variety of supported environments:

Fully Linux-based solution with Eclipse integration for IA-32
Support for the latest Intel® processors, including those with Intel® 64 architecture
Multiple programming languages
Java application profiling
Command-line capability
Support for Linux 2.6 kernels and MontaVista Linux* Carrier Grade Edition
back to top 
 

System Requirements
 
Usage Models Supported Processors
Hardware Requirements Supported Linux* Distributions
Software Requirements

Usage Models

Intel® VTune™ Performance Analyzer for Linux supports two usage models:

Single System: Analyze the performance of software running locally on the same system as VTune analyzer.
Two Systems (Host and Remote Target): A host computer running VTune analyzer can control event-based sampling and call graph data analysis on a remote target machine running a remote data collector (RDC). The connection between the target and host computers is TCP/IP. This is useful for analysis of embedded systems or to lower the measurement overhead on the target system by using fewer resources.

Hardware Requirements
Type Requirements
Memory

Integrated Eclipse* Environment, Single System or Host Only:
512 MB of RAM

Command Line or Target Only:
256 MB of RAM

Your application:
When running your application with the Call Graph collector, it will require more memory than usual.
Disk Space

Integrated Eclipse Environment, Single System or Host Only:
1 GB during installation
500 MB after installation

Command Line or Target Only:
600 MB during installation
100 MB after installation

Data Space:
This varies greatly, but for a single or dual processor system, an additional 1 GB on the target system is a good starting point.

Swap Space:
At least double the minimum RAM requirements.
back to top 
 

Supported Processors
Processor Requirements for Host (Eclipse* or command line interface)

IA-32 architecture-based processor - Intel® Pentium® 4 performance level or better

Intel® 64 architecture-based processor or equivalent

Intel® Itanium® 2 processor

AMD Athlon* or Opteron* processor
Processors Supported by Data Collector (Target)

Note: We are constantly adding new processors. Be sure you have the latest software - check for updates.

Intel® Core™ processors

Intel® Core™2 Extreme Processor

Intel® Core™2 Quad processor

Intel® Core™2 Duo processor

Intel® Core™2 Duo processor
Intel® Core™2 Solo processor

Intel® Pentium® processors

Intel® Pentium® 4 processor

Intel® Pentium® 4 processor Extreme Edition

Mobile Intel® Pentium® 4 Processor - M

Intel® Pentium® D processor

Intel® Pentium® D processor 900 Sequence

Intel® Pentium® processor Extreme Edition

Intel® Pentium® M processor

Intel® Celeron® processors

Intel® Celeron® processor

Mobile Intel® Celeron processor

Intel® Celeron® M processor

Intel® Celeron® D processor

Xeon® processors

Quad-Core Intel® Xeon® processor 5300 Series

Dual-Core Intel® Xeon® processor 5100 Series

Dual-Core Intel® Xeon® processor 5000 Sequence

Dual-Core Intel® Xeon processor LV

Intel® Xeon® processor MP

Dual-Core Intel® Xeon® processor 7100 Series

Dual-Core Intel® Xeon® processor 7000 Sequence

Intel® Xeon® processor

Itanium® 2 processors

Dual-Core Intel® Itanium® 2 processor 9000 Sequence

Intel® Itanium® 2 processor

Low Voltage Intel® Itanium® 2 processor

back to top 
 

Supported Linux Distributions (Host and Target)

Please check the release notes for details on the exact OS update and kernel version supported by the latest software.  The information on this page is for software version 9.0.  There may be software updates supporting newer distributions please check for updates.

Operating System
IA-32
Intel® 64
Architecture
Itanium® Architecture
Red Flag Linux 5.0 (DataCenter)
Yes
Yes
Yes
Red Hat Enterprise Linux* 3.0
Yes
Yes
Yes
Red Hat Enterprise Linux 4.0
Yes
Yes
Yes
Red Hat Fedora* Core 5
Yes
Yes
Yes
SGI Pro Pack* 4.0
Yes
Yes
-
SGI Pro Pack 5.0
-
-
Yes
SuSE Linux 10
-
-
Yes
SuSE Linux Enterprise Server* (SLES) 9.0
Yes
Yes
-
Red Flag Linux 5.0 (DataCenter)
Yes
Yes
Yes
SuSE Linux Enterprise Server
(SLES) 10.0
Yes
Yes
Yes

Notes:
If you are not using a default kernel on the supported distributions listed in the table above, use the included driver kit to compile drivers for your kernel.

back to top 
 

 

Software Requirements

Host Software Requirements for Graphical User Interface Integrated with Eclipse*

Type Requirements
Eclipse* Development Environment

Eclipse 3.2.1

Supported Java* Development Kit (JDK)

BEA JRockit* 5.0


VTune™ Analyzer Has Been Tested for Profiling Under the Following JDKs*

System Type Profiling Supported JDKs*
IA-32 architecture

BEA JRockit* 1.4.2 and 5.0
Sun J2SE* 5.0
IBM JDK 1.4.2 and 1.5

Intel® 64 architecture

BEA JRockit 5.0
Sun J2SE 5.0

Itanium® architecture

BEA JRockit 1.4.2 and 5.0
Sun J2SE 5.0
IBM JDK 1.42


Java Tuning Support
VTune™ Performance Analyzer for Linux* supports tuning Java 1.5 and JVMTI.

Web Browser Requirements
A modern 4.0 HTML-compliant web browser is required for reading documentation.

Note: All of the information above is for version 9.0 of VTune Performance Analyzer for Linux and is subject to change. While we make every effort to make sure this information is correct, the most current and accurate details are found in the release notes that ship with the product. Be sure you have the latest software - check for updates.

Due to the unique requirements for supporting large systems, if the software will be used on systems with more than 128 cores please contact us before purchase to make special arrangements.

back to top 
 

Intel® Premier Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides access to Intel® Premier Support and all product releases during that time. You receive online access to our expert engineering support staff and additional technical documentation.

back to top 
 

1 Large applications are welcome! For example, the source distribution tree of one large application including the tools and predefined libraries required to do a build (but not the build itself) is about 1.85 GB with over 62,700 files. The execution tree alone is about 870 MB with over 8200 files.

2 Sampling only.

3 Large applications are welcome! For example, the source distribution tree of one large application including the tools and predefined libraries required to do a build (but not the build itself) is about 1.85 GB with over 62,700 files. The execution tree alone is about 870 MB with over 8,200 files.

4Due to the unique requirements for supporting large systems, if the software will be used on systems with more than 128 cores please contact us before purchase to make special arrangements.

Technical support for Eclipse is not provided by Intel. For more information on Eclipse, please visit the Eclipse Foundation* Web site.

§ Wireless connectivity requires additional software, services or external hardware that may need to be purchased separately. Availability of public wireless access points is limited. System performance, battery life and functionality will vary depending on your specific hardware and software.

Intel provides both the tools and support to enhance the performance, functionality and efficiency of software applications.
Compatible with leading Windows* and Linux* development environments, Intel® Software Development Products are the fastest and easiest way to take advantage of the latest features of Intel processors. Intel Software Development Products are designed for use in the full development cycle, and include Intel® Performance Libraries, Intel® Compilers (C++, Fortran for Windows, Linux, and Mac OS*), Intel® VTune™ Analyzer, Intel® Threading Tools and Intel® Cluster Tools.
The Intel® Premier Customer Support Web site provides expert technical support for all Intel software products, product updates and related downloads. For additional product information visit: www.intel.com/software/products.
Intel, the Intel logo, Itanium, Pentium, Intel Centrino, Intel Xeon, Intel XScale, VTune, Celeron, Intel NetBurst, and MMX are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other brands and names may be claimed as the property of others. Visit our Legal Information Web site for more information.
Copyright © 2006, Intel Corporation
back to top