perf is a performance analysis tool in Linux. It provides a performance analysis architecture to applications for performance statistics collection based on tracepoint and kernel counters. You can use these performance statistics to analyze performance issues and locate performance bottlenecks.
perf record
You can use perf record to collect sampling information and record it in a data file.
CLI mode:
perf record [-e <EVENT> | --event=EVENT] [-a] <command> [<options>]
Parameters:
Valid values of EVENT:
- u: specifies to count performance events triggered in the user space.
- k: specifies to count performance events triggered in the kernel state.
- h: specifies to count performance events triggered by a management program.
- G: specifies to count performance events triggered by a client on a KVM virtual machine.
- H: specifies to count performance events triggered by a host on a KVM virtual machine.
-a: specifies to collect statistics from all CPUs.
Valid values of options:
- -F: specifies the number of sampling counts per second. Value range: [1,+ ∞). Default value: 4000. We recommend that you set the value to 99.
- -p: specifies the PID of the process that needs to be analyzed.
- -g: specifies the call stack used for recording.
- sleep: specifies the duration, in seconds. Value range: [1,+ ∞).
perf stat
You can use perf stat to analyze the performance profile of a specified program.
CLI mode:
perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
Parameters:
Valid values of EVENT:
- u: specifies to count performance events triggered in the user space.
- k: specifies to count performance events triggered in the kernel state.
- h: specifies to count performance events triggered by a management program.
- G: specifies to count performance events triggered by a client on a KVM virtual machine.
- H: specifies to count performance events triggered by a host on a KVM virtual machine.
-a: specifies to collect statistics from all CPUs.
Options:
- -p
: specifies the process ID. - -v: displays more data information.
- -r: returns the average value.
- -C: collects performance statistics from a specified CPU.
- -p
You can run the perf stat -p 120228 command to display the performance profile of the program with process ID 120228. Example code:
$perf stat -p 120228
Performance counter stats for process id '120228':
1212814.537787 task-clock (msec) # 77.290 CPUs utilized (100.00%)
13,932,119 context-switches # 0.011 M/sec (100.00%)
4,504,797 cpu-migrations # 0.004 M/sec (100.00%)
709,321 page-faults # 0.585 K/sec
2,956,658,594,250 cycles # 2.438 GHz (100.00%)
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
2,397,178,336,412 instructions # 0.81 insns per cycle (100.00%)
542,664,260,703 branches # 447.442 M/sec (100.00%)
3,346,269,384 branch-misses # 0.62% of all branches
15.691803125 seconds time elapsed
Field description:
task-clock: the time that the processor is occupied by the task, in ms.
context-switches: switches the context.
CPUs utilized: the CPU utilization. CPUs utilized = task-clock / time elapsed.
cpu-migrations: the number of processor migrations. In Linux systems, a task is migrated from one CPU to another under specific conditions to maintain load balancing among multiple processors.
page-faults: the number of page faults.
cycles: the number of processor cycles consumed. A CPU cycle refers to the smallest time unit that can be identified by the CPU.
instructions: the number of instructions executed.
insns per cycle: the average number of instructions executed per CPU cycle.
branches: the number of branch instructions encountered.
branch-misses: the number of branch instructions that were mis-predicted.
time elapsed: the execution time of the command.
perf list
You can use perf list to view the performance events supported by perf.
CLI mode:
perf list [hw | sw | cache | tracepoint | event_glob]
Parameters:
hw: specifies to list hardware events triggered by the PMU component.
sw: specifies to list software events triggered by a kernel.
cache: specifies to list hardware cache events in the CPU.
tracepoint: specifies to list events triggered by static tracepoints in the kernel.
event_glob: specifies to list events filtered by event_glob.
perf top
You can use perf top to display the functions or instructions with the top resource consumption in a specified performance event (in the CPU cycle by default). This command analyzes the popularity of functions in real time when a performance event occurs, and allows you to quickly locate high-profile functions, such as application functions, module functions, kernel functions, and even high-profile instructions.
CLI mode:
perf top [-e <EVENT> | --event=EVENT] [<options>]
Parameters:
Valid values of EVENT:
- u: specifies to count performance events triggered in the user space.
- k: specifies to count performance events triggered in the kernel state.
- h: specifies to count performance events triggered by a management program.
- G: specifies to count performance events triggered by a client on a KVM virtual machine.
- H: specifies to count performance events triggered by a host on a KVM virtual machine.
Valid values of options:
- -p
: specifies the process ID. - -K: specifies to hide kernel or module symbols.
- -U: specifies to hide user symbols.
- -d
: specifies the refresh interval of the interface. The default value is 2s because perf top reads data from the mmap memory area every 2s. - -G: obtains the call graph of the function.
- --call-graph: specifies to record all information required for DWARF-based stack backtrace. This option is available only after you use the -g option to enable call-graph recording during compilation.
- -p
You can run the perf top command to display the functions or instructions with the top resource consumption in a specified performance event. Sample code:
$perf top
Samples: 366K of event 'cycles', Event count (approx.): 23340587417
Overhead Shared Object Symbol
10.40% [vdso] [.] __vdso_gettimeofday
4.70% observer [.] oceanbase::storage::ObPartitionGroupIterator::get_next
4.23% observer [.] oceanbase::common::ObLatch::unlock
4.02% observer [.] oceanbase::common::ObLatch::rdlock
3.08% observer [.] oceanbase::common::ObLatch::wrlock
2.26% observer [.] oceanbase::common::ObLinearHashMap<oceanbase::clog::ObPartitionLogInfo, oceanbase::clog::ObLogCursorExt, ocea
1.97% observer [.] oceanbase::memtable::ObLockWaitMgr::check_timeout
1.82% observer [.] oceanbase::common::hash::ObPreAllocLinkHashMap<oceanbase::storage::ObITable::TableKey, oceanbase::storage::Ob
1.68% observer [.] oceanbase::omt::ObThWorker::worker
Field description:
Overhead: the percentage of performance events caused by symbols, in terms of occupied CPU cycles by default.
Shared Object: the dynamic shared object (DSO) where the symbol is located. It can be an application, kernel, dynamic link library, or module.
Symbol: the DSO type and symbol name. For the DSO type, [ . ] indicates that the symbol belongs to an ELF file in user mode, including executable files and dynamic link libraries, and [ k ] indicates that this symbol belongs to a kernel or module. For the symbol name, some symbols cannot be parsed as function names and therefore can only be expressed as addresses.
vmstat
vmstat stands for Virtual Memory Statistics. It can monitor the virtual memory, processes, and CPU activities of the operating system.
CLI mode:
vmstat [-a] [delay [count]] [<options>]
Parameters:
-a: specifies to display both active and inactive memory.
delay: specifies the refresh interval.
count: specifies the number of refreshes.
options: specifies other performance metrics that can be viewed.
You can run the vmstat 10 2 command to view the virtual memory, processes, and CPU activities of the operating system. Sample code:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
28 0 0 415902272 283432 201305696 0 0 387 1054 0 0 19 3 78 0 0
38 0 0 415039424 283432 201309824 0 0 907308 1009947 211462 242965 31 2 67 0 0
The first value is the current system value, and the second value is the average value within 10 seconds.
Field description:
procs
- r: the number of runnable processes running or waiting for run time.
- b: the number of blocked processes waiting for resources, such as I/O processes.
Description
Each CPU core has a runtime queue. When the CPU provides services for processes, the processes first need to enter the runtime queue to wait for CPU time. The runtime queue includes runnable processes and blocked processes.
memory
- swpd: the amount of memory switched to the swap area, in KB.
- free: the total idle physical memory.
- buff: the size of the cached memory. The read/write cache for common block devices.
- cache: the memory cache. Files that are frequently accessed will be cached. If the value of cached is large, it indicates that the number of cached files is large and disk access is relatively small. In this case, you need to improve the file system efficiency.
- inact: the size of the inactive memory. The value is displayed when the -a option is appended.
- active: the size of the active memory. The value is displayed when the -a option is appended.
swap
- si: the amount of memory swapped in from the disk.
- so: the amount of memory swapped to the disk.
If both values are greater than 0, the system memory is insufficient.
io
- bi: the number of disk blocks read per second.
- bo: the number of disk blocks written per second.
You can evaluate system I/O performance bottlenecks based on disk metrics and the percentage of CPU time consumed for I/O waiting.
system
- in: the number of interrupts per second.
- cs: the number of context switches per second.
The larger these two values are, the more CPU time the memory consumes. Under normal circumstances, the value of cs should be smaller than that of in. If the value of cs is larger than that of in, the system may need to process a lot of I/O requests or high-intensity system calls for a long period.
cpu
- us: the percentage of CPU time consumed by user processes.
- sy: the percentage of CPU time consumed by the kernel processes.
- id: the percentage of CPU idle time.
- wa: the percentage of CPU time consumed for I/O waiting.
- st: the CPU time occupied by virtual machines.