.gear/perf-tools.spec | 83 +++++++++++++++++++++++++++++++++++++++++ .gear/rules | 3 ++ .gear/tags/list | 1 + README.md | 2 +- examples/execsnoop_example.txt | 22 +++++------ examples/funcgraph_example.txt | 3 +- examples/funcslower_example.txt | 1 + examples/functrace_example.txt | 3 +- examples/kprobe_example.txt | 5 ++- examples/opensnoop_example.txt | 7 ++-- examples/tpoint_example.txt | 5 ++- examples/uprobe_example.txt | 5 ++- execsnoop | 2 +- iolatency | 2 +- kernel/funcgraph | 31 ++++++++++----- kernel/funcslower | 27 ++++++++++---- kernel/functrace | 30 +++++++++++---- kernel/kprobe | 26 +++++++++---- man/man8/funcgraph.8 | 5 ++- man/man8/funcslower.8 | 5 ++- man/man8/functrace.8 | 5 ++- man/man8/kprobe.8 | 5 ++- man/man8/opensnoop.8 | 5 ++- man/man8/tpoint.8 | 5 ++- man/man8/uprobe.8 | 5 ++- opensnoop | 44 ++++++++++++++-------- system/tpoint | 27 ++++++++++---- user/uprobe | 27 ++++++++++---- 28 files changed, 297 insertions(+), 94 deletions(-) diff --git a/.gear/perf-tools.spec b/.gear/perf-tools.spec new file mode 100644 index 0000000..6aaa415 --- /dev/null +++ b/.gear/perf-tools.spec @@ -0,0 +1,83 @@ +Summary:A miscellaneous collection of in-development and unsupported performance analysis tools for Linux ftrace and perf_events (aka the "perf" command). +Name: perf-tools +Version: 1.0 +Release: alt2 +License: GPLv2 +Group: Monitoring +URL: https://github.com/brendangregg/perf-tools +Source0: %name-%version.tar +Patch0: %name-%version-%release.patch +BuildArch: noarch + +BuildRequires: sh4 + +%description +A miscellaneous collection of in-development and unsupported +performance analysis tools for Linux ftrace and perf_events (aka the +"perf" command). Both ftrace and perf are core Linux tracing tools, +included in the kernel source. Your system probably has ftrace +already, and perf is often just a package add (see Prerequisites). + +These tools are designed to be easy to install (fewest dependencies), +provide advanced performance observability, and be simple to use: do +one thing and do it well. This collection was created by Brendan Gregg +(author of the DTraceToolkit). + +Many of these tools employ workarounds so that functionality is +possible on existing Linux kernels. Because of this, many tools have +caveats (see man pages), and their implementation should be considered +a placeholder until future kernel features, or new tracing subsystems, +are added. + +These are intended for Linux 3.2 and newer kernels. For Linux 2.6.x, +see Warnings. + +%prep +%setup +%patch0 -p1 + +%build +%install +mkdir -p %buildroot%_man8dir %buildroot%_bindir +cp -va man/man8/*.8 %buildroot%_man8dir +for i in disk/bitesize fs/cachestat execsnoop kernel/funccount kernel/funcgraph kernel/funcslower kernel/functrace iolatency iosnoop killsnoop kernel/kprobe opensnoop misc/perf-stat-hist tools/reset-ftrace syscount net/tcpretrans system/tpoint user/uprobe;do +cp -va $i %buildroot%_bindir +done + +%files +%_bindir/bitesize +%_bindir/cachestat +%_bindir/execsnoop +%_bindir/funccount +%_bindir/funcgraph +%_bindir/funcslower +%_bindir/functrace +%_bindir/iolatency +%_bindir/iosnoop +%_bindir/killsnoop +%_bindir/kprobe +%_bindir/opensnoop +%_bindir/perf-stat-hist +%_bindir/reset-ftrace +%_bindir/syscount +%_bindir/tcpretrans +%_bindir/tpoint +%_bindir/uprobe +%_man8dir/*.8.* +%doc README.md examples deprecated + +%changelog +* Sun Dec 24 2017 Terechkov Evgenii 1.0-alt2 +- Fix "iolatency -T" (https://github.com/brendangregg/perf-tools/issues/30) + +* Fri Dec 22 2017 Terechkov Evgenii 1.0-alt1 +- v1.0-17-g98d42a2 + +* Fri Jan 29 2016 Terechkov Evgenii 0.1-alt3 +- git-20160129 + +* Thu Jul 9 2015 Terechkov Evgenii 0.1-alt2 +- git-20150709 + +* Tue Mar 24 2015 Terechkov Evgenii 0.1-alt1 +- Initial build for ALT Linux Sisyphus diff --git a/.gear/rules b/.gear/rules new file mode 100644 index 0000000..c042214 --- /dev/null +++ b/.gear/rules @@ -0,0 +1,3 @@ +tar: v@version@:. +diff: v@version@:. . +spec: .gear/perf-tools.spec diff --git a/.gear/tags/list b/.gear/tags/list new file mode 100644 index 0000000..b8b548a --- /dev/null +++ b/.gear/tags/list @@ -0,0 +1 @@ +5a511f5f775cfbc0569e6039435361cecd22dd86 v1.0 diff --git a/README.md b/README.md index 5be501d..acb7e29 100644 --- a/README.md +++ b/README.md @@ -183,7 +183,7 @@ Some of these tools are intended as short-term workarounds until more kernel cap As my main target is a fleet of Linux 3.2 servers that do not have debuginfo, these tools try not to require it. At times, this makes the tool more brittle than it needs to be, as I'm employing workarounds (that may be kernel version and platform specific) instead of using debuginfo information (which can be generic). See the man page for detailed prerequisites for each tool. -I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is suprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace. +I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is surprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace. Since things are changing, it's very possible you may find some tools don't work on your Linux kernel version. Some expertise and assembly will be required to fix them. diff --git a/examples/execsnoop_example.txt b/examples/execsnoop_example.txt index 3a8888b..5540111 100644 --- a/examples/execsnoop_example.txt +++ b/examples/execsnoop_example.txt @@ -62,17 +62,17 @@ Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers Swap: 0k total, 0k used, 0k free, 5432356k cached - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND - 7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs - 1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init - 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd - 3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0 - 4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0 - 5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0 - 6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0 - 7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0 - 8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1 - 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0 + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs + 1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init + 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd + 3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0 + 4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0 + 5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0 + 6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0 + 7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0 + 8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1 + 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0 [...] See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this diff --git a/examples/funcgraph_example.txt b/examples/funcgraph_example.txt index e240bcb..f7c06c9 100644 --- a/examples/funcgraph_example.txt +++ b/examples/funcgraph_example.txt @@ -2155,7 +2155,7 @@ performance investigation. Use -h to print the USAGE message: # ./funcgraph -h -USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring +USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-L TID] [-d secs] funcstring -a # all info (same as -HPt) -C # measure on-CPU time only -d seconds # trace duration, and use buffers @@ -2164,6 +2164,7 @@ USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring -H # include column headers -m maxdepth # max stack depth to show -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU -P # show process names & PIDs -t # show timestamps -T # comment function tails diff --git a/examples/funcslower_example.txt b/examples/funcslower_example.txt index 2a969f0..fb5a0d8 100644 --- a/examples/funcslower_example.txt +++ b/examples/funcslower_example.txt @@ -101,6 +101,7 @@ USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us -h # this usage message -H # include column headers -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU -P # show process names & PIDs -t # show timestamps eg, diff --git a/examples/functrace_example.txt b/examples/functrace_example.txt index 5a2f9b9..6db51a4 100644 --- a/examples/functrace_example.txt +++ b/examples/functrace_example.txt @@ -325,11 +325,12 @@ and events were missed. Use -h to print the USAGE message: # ./functrace -h -USAGE: functrace [-hH] [-p PID] [-d secs] funcstring +USAGE: functrace [-hH] [-p PID] [-L TID] [-d secs] funcstring -d seconds # trace duration, and use buffers -h # this usage message -H # include column headers -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU eg, functrace do_nanosleep # trace the do_nanosleep() function functrace '*sleep' # trace functions ending in "sleep" diff --git a/examples/kprobe_example.txt b/examples/kprobe_example.txt index 47199c8..a970fdc 100644 --- a/examples/kprobe_example.txt +++ b/examples/kprobe_example.txt @@ -346,10 +346,11 @@ This makes use of the kernel options/stacktrace feature. Use -h to print the USAGE message: # ./kprobe -h -USAGE: kprobe [-FhHsv] [-d secs] [-p PID] kprobe_definition [filter] +USAGE: kprobe [-FhHsv] [-d secs] [-p PID] [-L TID] kprobe_definition [filter] -F # force. trace despite warnings. -d seconds # trace duration, and use buffers - -p PID # PID to match on I/O issue + -p PID # PID to match on events + -L TID # thread id to match on events -v # view format file (don't trace) -H # include column headers -s # show kernel stack traces diff --git a/examples/opensnoop_example.txt b/examples/opensnoop_example.txt index afa54fa..133578c 100644 --- a/examples/opensnoop_example.txt +++ b/examples/opensnoop_example.txt @@ -28,10 +28,11 @@ The first several lines show opensnoop catching itself initializing. Use -h to print the USAGE message: # ./opensnoop -h -USAGE: opensnoop [-htx] [-d secs] [-p PID] [-n name] [filename] +USAGE: opensnoop [-htx] [-d secs] [-p PID] [-T TID] [-n name] [filename] -d seconds # trace duration, and use buffers - -n name # process name to match on I/O issue - -p PID # PID to match on I/O issue + -n name # process name to match on open + -p PID # PID to match on open + -L TID # thread id to match on open -t # include time (seconds) -x # only show failed opens -h # this usage message diff --git a/examples/tpoint_example.txt b/examples/tpoint_example.txt index 7dba97c..fecdba2 100644 --- a/examples/tpoint_example.txt +++ b/examples/tpoint_example.txt @@ -184,10 +184,11 @@ making it a read: Use -h to print the USAGE message: # ./tpoint -h -USAGE: tpoint [-hHsv] [-d secs] [-p PID] tracepoint [filter] +USAGE: tpoint [-hHsv] [-d secs] [-p PID] [-L TID] tracepoint [filter] tpoint -l -d seconds # trace duration, and use buffers - -p PID # PID to match on I/O issue + -p PID # PID to match on events + -L TID # thread id to match on events -v # view format file (don't trace) -H # include column headers -l # list all tracepoints diff --git a/examples/uprobe_example.txt b/examples/uprobe_example.txt index 830ebfa..4384a93 100644 --- a/examples/uprobe_example.txt +++ b/examples/uprobe_example.txt @@ -286,12 +286,13 @@ a different method of stack walking. Use -h to print the USAGE message: # ./uprobe -h -USAGE: uprobe [-FhHsv] [-d secs] [-p PID] {-l target | +USAGE: uprobe [-FhHsv] [-d secs] [-p PID] [-L TID] {-l target | uprobe_definition [filter]} -F # force. trace despite warnings. -d seconds # trace duration, and use buffers -l target # list functions from this executable - -p PID # PID to match on I/O issue + -p PID # PID to match on events + -L TID # thread id to match on events -v # view format file (don't trace) -H # include column headers -s # show user stack traces diff --git a/execsnoop b/execsnoop index 2a96a57..ed42e2e 100755 --- a/execsnoop +++ b/execsnoop @@ -73,7 +73,7 @@ function usage { execsnoop # watch exec()s live (unbuffered) execsnoop -d 1 # trace 1 sec (buffered) execsnoop grep # trace process names containing grep - execsnoop 'log$' # filenames ending in "log" + execsnoop 'udevd$' # process names ending in "udevd" See the man page and example file for more info. END diff --git a/iolatency b/iolatency index d646193..0917ee5 100755 --- a/iolatency +++ b/iolatency @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/sh4 # # iolatency - summarize block device I/O latency as a histogram. # Written using Linux ftrace. diff --git a/kernel/funcgraph b/kernel/funcgraph index e16d629..b796e33 100755 --- a/kernel/funcgraph +++ b/kernel/funcgraph @@ -9,7 +9,7 @@ # using other, lower overhead tools. This is a proof of concept using Linux # ftrace capabilities on older kernels. # -# USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring +# USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-L TID] [-d secs] funcstring # # Run "funcgraph -h" for full usage. # @@ -68,14 +68,14 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock -opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0 -opt_proc=0; opt_time=0; opt_tail=0; opt_nodur=0; opt_cpu=0 +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid=; pidtext= +opt_headers=0; opt_proc=0; opt_time=0; opt_tail=0; opt_nodur=0; opt_cpu=0 opt_max=0; max=0 trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring + USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-L TID] [-d secs] funcstring -a # all info (same as -HPt) -C # measure on-CPU time only -d seconds # trace duration, and use buffers @@ -84,6 +84,7 @@ function usage { -H # include column headers -m maxdepth # max stack depth to show -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU -P # show process names & PIDs -t # show timestamps -T # comment function tails @@ -118,7 +119,7 @@ function end { (( opt_cpu )) && warn "echo sleep-time > trace_options" warn "echo nop > current_tracer" - (( opt_pid )) && warn "echo > set_ftrace_pid" + (( opt_pid || opt_tid )) && warn "echo > set_ftrace_pid" (( opt_max )) && warn "echo 0 > max_graph_depth" warn "echo > set_graph_function" warn "echo > trace" @@ -140,7 +141,7 @@ function edie { } ### process options -while getopts aCd:DhHm:p:PtT opt +while getopts aCd:DhHm:p:L:PtT opt do case $opt in a) opt_headers=1; opt_proc=1; opt_time=1 ;; @@ -149,6 +150,7 @@ do D) opt_nodur=1; ;; m) opt_max=1; max=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; H) opt_headers=1; ;; P) opt_proc=1; ;; t) opt_time=1; ;; @@ -160,8 +162,10 @@ shift $(( $OPTIND - 1 )) ### option logic (( $# == 0 )) && usage +(( opt_pid && opt_tid )) && edie "ERROR: You can use -p or -L but not both." funcs="$1" (( opt_pid )) && pidtext=" for PID $pid" +(( opt_tid )) && pidtext=" for TID $tid" if (( opt_duration )); then echo "Tracing \"$funcs\"$pidtext for $duration seconds..." else @@ -187,9 +191,18 @@ if (( opt_max )); then fi fi if (( opt_pid )); then - if ! echo $pid > set_ftrace_pid; then - edie "ERROR: setting -p $pid (PID exist?). Exiting." - fi + echo > set_ftrace_pid + # ftrace expects kernel pids, which are thread ids + for tid in /proc/$pid/task/*; do + if ! echo ${tid##*/} >> set_ftrace_pid; then + edie "ERROR: setting -p $pid (PID exist?). Exiting." + fi + done +fi +if (( opt_tid )); then + if ! echo $tid > set_ftrace_pid; then + edie "ERROR: setting -L $tid (TID exist?). Exiting." + fi fi if ! echo > set_ftrace_filter; then edie "ERROR: writing to set_ftrace_filter. Exiting." diff --git a/kernel/funcslower b/kernel/funcslower index 4fd986b..fd7fc3e 100755 --- a/kernel/funcslower +++ b/kernel/funcslower @@ -53,19 +53,20 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock -opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0 -opt_proc=0; opt_time=0; opt_cpu=0 +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid= +pidtext=; opt_headers=0; opt_proc=0; opt_time=0; opt_cpu=0 trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us + USAGE: funcslower [-aChHPt] [-p PID] [-L TID] [-d secs] funcstring latency_us -a # all info (same as -HPt) -C # measure on-CPU time only -d seconds # trace duration, and use buffers -h # this usage message -H # include column headers -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU -P # show process names & PIDs -t # show timestamps eg, @@ -116,13 +117,14 @@ function edie { } ### process options -while getopts aCd:hHp:Pt opt +while getopts aCd:hHp:L:Pt opt do case $opt in a) opt_headers=1; opt_proc=1; opt_time=1 ;; C) opt_cpu=1; ;; d) opt_duration=1; duration=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; H) opt_headers=1; ;; P) opt_proc=1; ;; t) opt_time=1; ;; @@ -133,10 +135,12 @@ shift $(( $OPTIND - 1 )) ### option logic (( $# < 2 )) && usage +(( opt_pid && opt_tid )) && edie "ERROR: You can use -p or -L but not both." funcs="$1" shift thresh=$1 (( opt_pid )) && pidtext=" for PID $pid" +(( opt_tid )) && pidtext=" for TID $tid" printf "Tracing \"$funcs\"$pidtext slower than $thresh us" if (( opt_duration )); then echo " for $duration seconds..." @@ -175,9 +179,18 @@ if ! echo $thresh > tracing_thresh; then edie "ERROR: setting tracing_thresh to $thresh. Exiting." fi if (( opt_pid )); then - if ! echo $pid > set_ftrace_pid; then - edie "ERROR: setting -p $pid (PID exist?). Exiting." - fi + echo '' > set_ftrace_pid + # ftrace expects kernel pids, which are thread ids + for tid in /proc/$pid/task/*; do + if ! echo ${tid##*/} >> set_ftrace_pid; then + edie "ERROR: setting -p $pid (PID exist?). Exiting." + fi + done +fi +if (( opt_tid )); then + if ! echo $tid > set_ftrace_pid; then + edie "ERROR: setting -L $tid (TID exist?). Exiting." + fi fi if ! echo "$funcs" > set_ftrace_filter; then edie "ERROR: enabling \"$funcs\" filter. Function exist? Exiting." diff --git a/kernel/functrace b/kernel/functrace index 3abf4a0..0736392 100755 --- a/kernel/functrace +++ b/kernel/functrace @@ -5,7 +5,7 @@ # # This is a proof of concept using Linux ftrace capabilities on older kernels. # -# USAGE: functrace [-hH] [-p PID] [-d secs] funcstring +# USAGE: functrace [-hH] [-p PID] [-L TID] [-d secs] funcstring # eg, # functrace '*sleep' # trace all functions ending in "sleep" # @@ -59,16 +59,18 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock -opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0 +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid=; pidtext= +opt_headers=0 trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: functrace [-hH] [-p PID] [-d secs] funcstring + USAGE: functrace [-hH] [-p PID] [-L TID] [-d secs] funcstring -d seconds # trace duration, and use buffers -h # this usage message -H # include column headers -p PID # trace when this pid is on-CPU + -L TID # trace when this thread is on-CPU eg, functrace do_nanosleep # trace the do_nanosleep() function functrace '*sleep' # trace functions ending in "sleep" @@ -93,7 +95,7 @@ function end { echo "Ending tracing..." 2>/dev/null cd $tracing warn "echo nop > current_tracer" - (( opt_pid )) && warn "echo > set_ftrace_pid" + (( opt_pid || opt_tid )) && warn "echo > set_ftrace_pid" warn "echo > set_ftrace_filter" warn "echo > trace" (( wroteflock )) && warn "rm $flock" @@ -113,11 +115,12 @@ function edie { } ### process options -while getopts d:hHp: opt +while getopts d:hHp:L: opt do case $opt in d) opt_duration=1; duration=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; H) opt_headers=1; ;; h|?) usage ;; esac @@ -126,8 +129,10 @@ shift $(( $OPTIND - 1 )) ### option logic (( $# == 0 )) && usage +(( opt_pid && opt_tid )) && edie "ERROR: You can use -p or -L but not both." funcs="$1" (( opt_pid )) && pidtext=" for PID $pid" +(( opt_tid )) && pidtext=" for TID $pid" if (( opt_duration )); then echo "Tracing \"$funcs\"$pidtext for $duration seconds..." else @@ -148,9 +153,18 @@ sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status read mode < current_tracer [[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)" if (( opt_pid )); then - if ! echo $pid > set_ftrace_pid; then - edie "ERROR: setting -p $pid (PID exist?). Exiting." - fi + echo > set_ftrace_pid + # ftrace expects kernel pids, which are thread ids + for tid in /proc/$pid/task/*; do + if ! echo ${tid##*/} >> set_ftrace_pid; then + edie "ERROR: setting -p $pid (PID exist?). Exiting." + fi + done +fi +if (( opt_tid )); then + if ! echo $tid > set_ftrace_pid; then + edie "ERROR: setting -L $tid (TID exist?). Exiting." + fi fi if ! echo "$funcs" > set_ftrace_filter; then edie "ERROR: enabling \"$funcs\". Exiting." diff --git a/kernel/kprobe b/kernel/kprobe index efedeae..b4523d8 100755 --- a/kernel/kprobe +++ b/kernel/kprobe @@ -9,7 +9,7 @@ # the probe alias is optional (it will become to kprobe: if not # specified). # -# USAGE: ./kprobe [-FhHsv] [-d secs] [-p pid] kprobe_definition [filter] +# USAGE: ./kprobe [-FhHsv] [-d secs] [-p pid] [-L tid] kprobe_definition [filter] # # Run "kprobe -h" for full usage. # @@ -50,16 +50,18 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock; wroteflock=0 -opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter= -opt_view=0; opt_headers=0; opt_stack=0; dmesg=2; debug=0; opt_force=0 +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid= +opt_filter=0; filter=; opt_view=0; opt_headers=0; opt_stack=0; dmesg=2 +debug=0; opt_force=0 trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: kprobe [-FhHsv] [-d secs] [-p PID] kprobe_definition [filter] + USAGE: kprobe [-FhHsv] [-d secs] [-p PID] [-L TID] kprobe_definition [filter] -F # force. trace despite warnings. -d seconds # trace duration, and use buffers -p PID # PID to match on events + -L TID # thread id to match on events -v # view format file (don't trace) -H # include column headers -s # show kernel stack traces @@ -125,12 +127,13 @@ function edie { } ### process options -while getopts Fd:hHp:sv opt +while getopts Fd:hHp:L:sv opt do case $opt in F) opt_force=1 ;; d) opt_duration=1; duration=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; H) opt_headers=1 ;; s) opt_stack=1 ;; v) opt_view=1 ;; @@ -147,12 +150,21 @@ if (( $# )); then fi ### option logic -(( opt_pid && opt_filter )) && die "ERROR: use either -p or a filter." +(( opt_pid + opt_filter + opt_tid > 1 )) && \ + die "ERROR: use at most one of -p, -L, or filter." (( opt_duration && opt_view )) && die "ERROR: use either -d or -v." if (( opt_pid )); then # convert to filter opt_filter=1 - filter="common_pid == $pid" + # ftrace common_pid is thread id from user's perspective + for tid in /proc/$pid/task/*; do + filter="$filter || common_pid == ${tid##*/}" + done + filter=${filter:3} # trim leading ' || ' (four characters) +fi +if (( opt_tid )); then + opt_filter=1 + filter="common_pid == $tid" fi if [[ "$kprobe" != p:* && "$kprobe" != r:* ]]; then echo >&2 "ERROR: invalid kprobe definition (should start with p: or r:)" diff --git a/man/man8/funcgraph.8 b/man/man8/funcgraph.8 index 01d4c97..348c68e 100644 --- a/man/man8/funcgraph.8 +++ b/man/man8/funcgraph.8 @@ -3,7 +3,7 @@ funcgraph \- trace kernel function graph, showing child function calls and times. Uses Linux ftrace. .SH SYNOPSIS .B funcgraph -[\-aCDhHPtT] [\-m maxdepth] [\-p PID] [\-d secs] funcstring +[\-aCDhHPtT] [\-m maxdepth] [\-p PID] [\-L TID] [\-d secs] funcstring .SH DESCRIPTION This is an exploratory tool that shows the graph of child function calls for a given kernel function. This can cost moderate overhead to execute, and @@ -66,6 +66,9 @@ available for newer Linux kernel versions. \-p PID Only trace kernel functions when this process ID is on-CPU. .TP +\-L TID +Only trace kernel functions when this thread ID is on-CPU. +.TP \-P Show process names and process IDs with every line of output. .TP diff --git a/man/man8/funcslower.8 b/man/man8/funcslower.8 index b1a3a65..1c8f697 100644 --- a/man/man8/funcslower.8 +++ b/man/man8/funcslower.8 @@ -3,7 +3,7 @@ funcslower \- trace kernel functions slower than a threshold (microseconds). Uses Linux ftrace. .SH SYNOPSIS .B funcslower -[\-aChHPt] [\-p PID] [\-d secs] funcstring latency_us +[\-aChHPt] [\-p PID] [\-L TID] [\-d secs] funcstring latency_us .SH DESCRIPTION This uses the Linux ftrace function graph profiler to time kernel functions and filter them based on a latency threshold. Latency outliers can be studied @@ -49,6 +49,9 @@ Print column headers. \-p PID Only trace kernel functions when this process ID is on-CPU. .TP +\-L TID +Only trace kernel functions when this thread ID is on-CPU. +.TP \-P Show process names and process IDs with every line of output. .TP diff --git a/man/man8/functrace.8 b/man/man8/functrace.8 index a9b9904..bd7f08c 100644 --- a/man/man8/functrace.8 +++ b/man/man8/functrace.8 @@ -3,7 +3,7 @@ functrace \- trace kernel function calls matching specified wildcards. Uses Linux ftrace. .SH SYNOPSIS .B functrace -[\-hH] [\-p PID] [\-d secs] funcstring +[\-hH] [\-p PID] [\-L TID] [\-d secs] funcstring .SH DESCRIPTION This tool provides a quick way to capture the execution of kernel functions, showing basic details including as the process ID, timestamp, and calling @@ -45,6 +45,9 @@ Print column headers. \-p PID Only trace kernel functions when this process ID is on-CPU. .TP +\-L TID +Only trace kernel functions when this thread ID is on-CPU. +.TP funcstring A function name to trace, which may include file glob style wildcards ("*") at the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed diff --git a/man/man8/kprobe.8 b/man/man8/kprobe.8 index 58ac243..477224d 100644 --- a/man/man8/kprobe.8 +++ b/man/man8/kprobe.8 @@ -3,7 +3,7 @@ kprobe \- trace a given kprobe definition. Kernel dynamic tracing. Uses Linux ftrace. .SH SYNOPSIS .B kprobe -[\-FhHsv] [\-d secs] [\-p PID] kprobe_definition [filter] +[\-FhHsv] [\-d secs] [\-p PID] [\-L TID] kprobe_definition [filter] .SH DESCRIPTION This will create, trace, then destroy a given kprobe definition. See Documentation/trace/kprobetrace.txt in the Linux kernel source for the @@ -60,6 +60,9 @@ for use in a custom filter. \-p PID Only trace kernel functions when this process ID is on-CPU. .TP +\-L TID +Only trace kernel functions when this thread ID is on-CPU. +.TP kprobe_definition A full kprobe definition, as documented by Documentation/trace/kprobetrace.txt in the Linux kernel source. Note that the probe alias name is optional with diff --git a/man/man8/opensnoop.8 b/man/man8/opensnoop.8 index 7a3635f..a09eda3 100644 --- a/man/man8/opensnoop.8 +++ b/man/man8/opensnoop.8 @@ -3,7 +3,7 @@ opensnoop \- trace open() syscalls with file details. Uses Linux ftrace. .SH SYNOPSIS .B opensnoop -[\-htx] [\-d secs] [\-p pid] [\-n name] [filename] +[\-htx] [\-d secs] [\-p pid] [\-L tid] [\-n name] [filename] .SH DESCRIPTION This traces open() syscalls, showing the file name (pathname) and returned file descriptor number (or \-1, for error). @@ -41,6 +41,9 @@ expressions are allowed. This is post-filtered using awk. \-p PID Only trace this process ID. This is filtered in-kernel. .TP +\-L TID +Only trace this thread ID. This is filtered in-kernel. +.TP \-t Include timestamps, in seconds. .TP diff --git a/man/man8/tpoint.8 b/man/man8/tpoint.8 index d0a95e2..a3e8c93 100644 --- a/man/man8/tpoint.8 +++ b/man/man8/tpoint.8 @@ -3,7 +3,7 @@ tpoint \- trace a given tracepoint. Static tracing. Uses Linux ftrace. .SH SYNOPSIS .B tpoint -[\-hHsv] [\-d secs] [\-p PID] tracepoint [filter] +[\-hHsv] [\-d secs] [\-p PID] [\-L TID] tracepoint [filter] .B tpoint \-l @@ -53,6 +53,9 @@ for use in a custom filter. \-p PID Only trace kernel functions when this process ID is on-CPU. .TP +\-L TID +Only trace kernel functions when this thread ID is on-CPU. +.TP tracepoint A tracepoint name. Eg, block:block_rq_issue. See the EXAMPLES section. .TP diff --git a/man/man8/uprobe.8 b/man/man8/uprobe.8 index 807c77e..67f4675 100644 --- a/man/man8/uprobe.8 +++ b/man/man8/uprobe.8 @@ -3,7 +3,7 @@ uprobe \- trace a given uprobe definition. User-level dynamic tracing. Uses Linux ftrace. EXPERIMENTAL. .SH SYNOPSIS .B uprobe -[\-FhHsv] [\-d secs] [\-p PID] {\-l target | uprobe_definition [filter]} +[\-FhHsv] [\-d secs] [\-p PID] [\-L TID] {\-l target | uprobe_definition [filter]} .SH DESCRIPTION This will create, trace, then destroy a given uprobe definition. See Documentation/trace/uprobetracer.txt in the Linux kernel source for the @@ -70,6 +70,9 @@ for use in a custom filter. \-p PID Only trace user-level functions when this process ID is on-CPU. .TP +\-L TID +Only trace user-level functions when this thread ID is on-CPU. +.TP uprobe_definition A full uprobe definition, as documented by Documentation/trace/uprobetracer.txt in the Linux kernel source. Note that the probe alias name is optional with diff --git a/opensnoop b/opensnoop index 7e0bb48..1d69a98 100755 --- a/opensnoop +++ b/opensnoop @@ -13,7 +13,7 @@ # It is a workaround, and proof of concept for ftrace, until more kernel tracing # functionality is available. # -# USAGE: ./opensnoop [-htx] [-d secs] [-p pid] [-n name] [filename] +# USAGE: ./opensnoop [-htx] [-d secs] [-p pid] [-L tid] [-n name] [filename] # # Run "opensnoop -h" for full usage. # @@ -48,16 +48,17 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock; wroteflock=0 -opt_duration=0; duration=; opt_name=0; name=; opt_pid=0; pid=; ftext= -opt_time=0; opt_fail=0; opt_file=0; file= +opt_duration=0; duration=; opt_name=0; name=; opt_pid=0; pid=; opt_tid=0; tid= +ftext=; opt_time=0; opt_fail=0; opt_file=0; file= trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: opensnoop [-htx] [-d secs] [-p PID] [-n name] [filename] + USAGE: opensnoop [-htx] [-d secs] [-p PID] [-L TID] [-n name] [filename] -d seconds # trace duration, and use buffers - -n name # process name to match on I/O issue - -p PID # PID to match on I/O issue + -n name # process name to match on open + -p PID # PID to match on open + -L TID # PID to match on open -t # include time (seconds) -x # only show failed opens -h # this usage message @@ -87,7 +88,7 @@ function end { cd $tracing warn "echo 0 > events/kprobes/getnameprobe/enable" warn "echo 0 > events/syscalls/sys_exit_open/enable" - if (( opt_pid )); then + if (( opt_pid || opt_tid )); then warn "echo 0 > events/kprobes/getnameprobe/filter" warn "echo 0 > events/syscalls/sys_exit_open/filter" fi @@ -110,12 +111,13 @@ function edie { } ### process options -while getopts d:hn:p:tx opt +while getopts d:hn:p:L:tx opt do case $opt in d) opt_duration=1; duration=$OPTARG ;; n) opt_name=1; name=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; t) opt_time=1 ;; x) opt_fail=1 ;; h|?) usage ;; @@ -130,8 +132,10 @@ fi (( $# )) && usage ### option logic -(( opt_pid && opt_name )) && die "ERROR: use either -p or -n." +(( opt_pid + opt_name + opt_tid > 1 )) && \ + die "ERROR: use at most one of -p, -n, -L." (( opt_pid )) && ftext=" issued by PID $pid" +(( opt_tid )) && ftext=" issued by TID $tid" (( opt_name )) && ftext=" issued by process name \"$name\"" (( opt_file )) && ftext="$ftext for filenames containing \"$file\"" if (( opt_duration )); then @@ -167,12 +171,24 @@ if ! echo $kprobe >> kprobe_events; then edie "ERROR: adding a kprobe for getname(). Exiting." fi if (( opt_pid )); then - if ! echo "common_pid==$pid" > events/kprobes/getnameprobe/filter || \ - ! echo "common_pid==$pid" > events/syscalls/sys_exit_open/filter + filter= + for tid in /proc/$pid/task/*; do + filter="$filter || common_pid == ${tid##*/}" + done + filter=${filter:3} # trim leading ' || ' (four characters) + if ! echo $filter > events/kprobes/getnameprobe/filter || \ + ! echo $filter > events/syscalls/sys_exit_open/filter then edie "ERROR: setting -p $pid. Exiting." fi fi +if (( opt_tid )); then + if ! echo "common_pid == $tid" > events/kprobes/getnameprobe/filter || \ + ! echo "common_pid == $tid" > events/syscalls/sys_exit_open/filter + then + edie "ERROR: setting -L $tid. Exiting." + fi +fi if ! echo 1 > events/kprobes/getnameprobe/enable; then edie "ERROR: enabling kprobe for getname(). Exiting." fi @@ -222,10 +238,8 @@ fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \ # # eg: ... (do_sys_open+0xc3/0x220 <- getname) arg1="file1" # - filename = $NF - sub(/"$/, "", filename) - sub(/.*"/, "", filename) - lastfile[pid] = filename + match($0, /arg1=\"(.+)\"/, m) + lastfile[pid] = m[1] } # sys_open() diff --git a/system/tpoint b/system/tpoint index 9cd7916..dbc8572 100755 --- a/system/tpoint +++ b/system/tpoint @@ -8,7 +8,7 @@ # printing live tracepoint events only. Wildcards are currently not supported. # If this is insufficient for any reason, use the perf command instead. # -# USAGE: ./tpoint [-hHsv] [-d secs] [-p pid] tracepoint [filter] +# USAGE: ./tpoint [-hHsv] [-d secs] [-p pid] [-L tid] tracepoint [filter] # ./tpoint -l # # Run "tpoint -h" for full usage. @@ -53,16 +53,17 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock; wroteflock=0 -opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter= -opt_view=0; opt_headers=0; opt_stack=0; dmesg=2 +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid= +opt_filter=0; filter=; opt_view=0; opt_headers=0; opt_stack=0; dmesg=2 trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: tpoint [-hHsv] [-d secs] [-p PID] tracepoint [filter] + USAGE: tpoint [-hHsv] [-d secs] [-p PID] [-L TID] tracepoint [filter] tpoint -l -d seconds # trace duration, and use buffers - -p PID # PID to match on I/O issue + -p PID # PID to match on event + -L TID # thread id to match on event -v # view format file (don't trace) -H # include column headers -l # list all tracepoints @@ -117,11 +118,12 @@ function edie { } ### process options -while getopts d:hHlp:sv opt +while getopts d:hHlp:L:sv opt do case $opt in d) opt_duration=1; duration=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; H) opt_headers=1 ;; l) opt_list=1 ;; s) opt_stack=1 ;; @@ -141,12 +143,21 @@ if (( !opt_list )); then fi ### option logic -(( opt_pid && opt_filter )) && die "ERROR: use either -p or -f." +(( opt_pid + opt_filter + opt_tid > 1 )) && \ + die "ERROR: use at most one of -p, -L, or filter." (( opt_duration && opt_view )) && die "ERROR: use either -d or -v." if (( opt_pid )); then # convert to filter opt_filter=1 - filter="common_pid == $pid" + # ftrace common_pid is thread id from user's perspective + for tid in /proc/$pid/task/*; do + filter="$filter || common_pid == ${tid##*/}" + done + filter=${filter:3} # trim leading ' || ' (four characters) +fi +if (( opt_tid )); then + opt_filter=1 + filter="common_pid == $tid" fi if (( !opt_view && !opt_list )); then if (( opt_duration )); then diff --git a/user/uprobe b/user/uprobe index baa11ee..ff407ec 100755 --- a/user/uprobe +++ b/user/uprobe @@ -8,7 +8,7 @@ # syntax of a uprobe definition, and "uprobe -h" for examples. With this tool, # the probe alias is optional (it will default to something meaningful). # -# USAGE: ./uprobe [-FhHsv] [-d secs] [-p pid] {-l target | +# USAGE: ./uprobe [-FhHsv] [-d secs] [-p pid] [-L tid] {-l target | # uprobe_definition [filter]} # # Run "uprobe -h" for full usage. @@ -63,20 +63,21 @@ ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock; wroteflock=0 -opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter= -opt_view=0; opt_headers=0; opt_stack=0; dmesg=2; debug=0; opt_force=0 -opt_list=0; target= +opt_duration=0; duration=; opt_pid=0; pid=; opt_tid=0; tid= +opt_filter=0; filter=; opt_view=0; opt_headers=0; opt_stack=0; dmesg=2 +debug=0; opt_force=0; opt_list=0; target= PATH=$PATH:/usr/bin:/sbin # ensure we find objdump, ldconfig trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 - USAGE: uprobe [-FhHsv] [-d secs] [-p PID] {-l target | + USAGE: uprobe [-FhHsv] [-d secs] [-p PID] [-L TID] {-l target | uprobe_definition [filter]} -F # force. trace despite warnings. -d seconds # trace duration, and use buffers -l target # list functions from this executable -p PID # PID to match on events + -L TID # thread id to match on events -v # view format file (don't trace) -H # include column headers -s # show user stack traces @@ -192,12 +193,13 @@ function set_addr { } ### process options -while getopts Fd:hHl:p:sv opt +while getopts Fd:hHl:p:L:sv opt do case $opt in F) opt_force=1 ;; d) opt_duration=1; duration=$OPTARG ;; p) opt_pid=1; pid=$OPTARG ;; + L) opt_tid=1; tid=$OPTARG ;; l) opt_list=1; target=$OPTARG ;; H) opt_headers=1 ;; s) opt_stack=1 ;; @@ -248,12 +250,21 @@ done ### option logic [[ "$uprobe" == "" ]] && usage -(( opt_pid && opt_filter )) && die "ERROR: use either -p or a filter." +(( opt_pid + opt_filter + opt_tid > 1 )) && \ + die "ERROR: use at most one of -p, -L, or filter." (( opt_duration && opt_view )) && die "ERROR: use either -d or -v." if (( opt_pid )); then # convert to filter opt_filter=1 - filter="common_pid == $pid" + # ftrace common_pid is thread id from user's perspective + for tid in /proc/$pid/task/*; do + filter="$filter || common_pid == ${tid##*/}" + done + filter=${filter:3} # trim leading ' || ' (four characters) +fi +if (( opt_tid )); then + opt_filter=1 + filter="common_pid == $tid" fi if [[ "$uprobe" != p:* && "$uprobe" != r:* ]]; then echo >&2 "ERROR: invalid uprobe definition (should start with p: or r:)"