|
| 1 | +# Tracing |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +Firecracker implements a framework for instrumentation based tracing. |
| 6 | + |
| 7 | +Instrumentation based tracing as defined by |
| 8 | +[Sheng Liang on usenix.org](https://www.usenix.org/legacy/publications/library/proceedings/coots99/full_papers/liang/liang_html/node9.html): |
| 9 | + |
| 10 | +> There are two ways to obtain profiling information: either statistical |
| 11 | +> sampling or code instrumentation. Statistical sampling is less |
| 12 | +> disruptive to program execution, but cannot provide completely |
| 13 | +> accurate information. Code instrumentation, on the other hand, may be |
| 14 | +> more disruptive, but allows the profiler to record all the events it |
| 15 | +> is interested in. Specifically in CPU time profiling, statistical |
| 16 | +> sampling may reveal, for example, the relative percentage of time |
| 17 | +> spent in frequently-called methods, whereas code instrumentation can |
| 18 | +> report the exact number of time each method is invoked. |
| 19 | +
|
| 20 | +The focus with tracing in Firecracker is to improve debug-ability. |
| 21 | + |
| 22 | +Enabling tracing adds logs output on each functions entry and exit. |
| 23 | +This assists debugging problems that relate to deadlocks or high |
| 24 | +latencies by quickly identifying elongated function calls. |
| 25 | + |
| 26 | +## Implementation |
| 27 | + |
| 28 | +Firecracker implements instrumentation based tracing via |
| 29 | +[`log`](https://github.com/rust-lang/log) and |
| 30 | +[`log_instrument`](https://github.com/JonathanWoollett-Light/log-instrument), |
| 31 | +outputting a `Trace` level log when entering and exiting every function. |
| 32 | + |
| 33 | +It is disabled by default at compile-time. Tracing functionality has no |
| 34 | +impact on the release binary. |
| 35 | + |
| 36 | +To enable tracing in Firecracker, add instrumentation with: |
| 37 | + |
| 38 | +``` |
| 39 | +clippy-tracing \ |
| 40 | +--action fix \ |
| 41 | +--path ./src \ |
| 42 | +--exclude vmm_config/logger.rs,virtio/gen,bindings.rs,net/gen,benches,logger/,signal_handler.rs,time.rs |
| 43 | +``` |
| 44 | + |
| 45 | +and re-compile with `--features tracing`: |
| 46 | + |
| 47 | +``` |
| 48 | +cargo build --features tracing |
| 49 | +``` |
| 50 | + |
| 51 | +This will result in an increase in the binary size (~100kb) and a |
| 52 | +significant regression in performance (>10x). To mitigate the |
| 53 | +performance impact you can filter the tracing output as described in the |
| 54 | +next section. |
| 55 | + |
| 56 | +## Filtering |
| 57 | + |
| 58 | +You can filter tracing output both at run-time and compile-time. |
| 59 | +This can be used to mitigate the performance impact of logging many |
| 60 | +traces. |
| 61 | + |
| 62 | +Run-time filtering is implemented with the `/logger` API call, this can |
| 63 | +significantly mitigate the impact on execution time but cannot mitigate |
| 64 | +the impact on memory usage. Execution time impact is mitigated by |
| 65 | +avoiding constructing and writing the trace log, it still needs to check |
| 66 | +the condition in every place it would output a log. Memory usage impact |
| 67 | +is not mitigated as the instrumentation remains in the binary unchanged. |
| 68 | + |
| 69 | +Compile-time filtering is a manual process using the |
| 70 | +[`clippy-tracing`](https://github.com/JonathanWoollett-Light/clippy-tracing) |
| 71 | +tool. This can almost entirely mitigate the impact on execution time and |
| 72 | +the impact on memory usage. |
| 73 | + |
| 74 | +### Run-time |
| 75 | + |
| 76 | +You can filter by module path and/or file path at runtime, e.g.: |
| 77 | + |
| 78 | +```bash |
| 79 | +curl -X PUT --unix-socket "${API_SOCKET}" \ |
| 80 | + --data "{ |
| 81 | + \"level\": \"Trace\", |
| 82 | + \"module\": \"api_server::request\", |
| 83 | + }" \ |
| 84 | + "http://localhost/logger" |
| 85 | +``` |
| 86 | + |
| 87 | +Instrumentation logs are `Trace` level logs, at runtime the level must |
| 88 | +be set to `Trace` to see them. The module filter applied here ensures |
| 89 | +only logs from the `request` modules within the `api_server` crate will |
| 90 | +be output. |
| 91 | + |
| 92 | +This will mitigate most of the performance regression. |
| 93 | + |
| 94 | +### Compile-time |
| 95 | + |
| 96 | +Specific environments can restrict run-time configuration. In these |
| 97 | +environments it becomes necessary to support targeted tracing without |
| 98 | +run-time re-configuration, for this compile-time filtering must be used. |
| 99 | + |
| 100 | +To reproduce the same filtering as run-time at compile-time, you can use |
| 101 | +[`clippy-tracing`](../src/clippy-tracing) |
| 102 | +at compile-time like: |
| 103 | + |
| 104 | +```bash |
| 105 | +# Remove all instrumentation. |
| 106 | +clippy-tracing --action strip --path ./src |
| 107 | +# Adds instrumentation to the specific file/s. |
| 108 | +clippy-tracing --action fix --path ./src/api_server/src/request |
| 109 | +# Build Firecracker. |
| 110 | +cargo build --features tracing |
| 111 | +``` |
| 112 | + |
| 113 | +Then at run-time: |
| 114 | + |
| 115 | +```bash |
| 116 | +curl -X PUT --unix-socket "${API_SOCKET}" \ |
| 117 | + --data "{ |
| 118 | + \"level\": \"Trace\", |
| 119 | + }" \ |
| 120 | + "http://localhost/logger" |
| 121 | +``` |
| 122 | + |
| 123 | +The instrumentation has been stripped from all files other than those at |
| 124 | +`./src/api_server/src/request` so we do not need to apply a run-time |
| 125 | +filter. Runtime filtering could be applied but in this case it yields no |
| 126 | +additional benefit. |
| 127 | + |
| 128 | +## Example |
| 129 | + |
| 130 | +In this example we start Firecracker with tracing then make a simple API |
| 131 | +call. |
| 132 | + |
| 133 | +### API call |
| 134 | + |
| 135 | +``` |
| 136 | +~/Projects/firecracker$ sudo curl -X GET --unix-socket "/run/firecracker.socket" "http://localhost/" |
| 137 | +{"id":"anonymous-instance","state":"Not started","vmm_version":"1.6.0-dev","app_name":"Firecracker"} |
| 138 | +``` |
| 139 | + |
| 140 | +### Firecracker |
| 141 | + |
| 142 | +``` |
| 143 | +~/Projects/firecracker$ sudo ./firecracker/build/cargo_target/release/firecracker --level Trace |
| 144 | +2023-10-13T14:15:38.851263983 [anonymous-instance:main] Running Firecracker v1.6.0-dev |
| 145 | +2023-10-13T14:15:38.851316122 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 146 | +2023-10-13T14:15:38.851322264 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 147 | +2023-10-13T14:15:38.851325119 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 148 | +2023-10-13T14:15:38.851328776 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 149 | +2023-10-13T14:15:38.851331351 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present |
| 150 | +2023-10-13T14:15:38.851335809 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of |
| 151 | +2023-10-13T14:15:38.851338254 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of |
| 152 | +2023-10-13T14:15:38.851342091 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present |
| 153 | +2023-10-13T14:15:38.851345638 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 154 | +2023-10-13T14:15:38.851349245 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 155 | +2023-10-13T14:15:38.851352721 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 156 | +2023-10-13T14:15:38.851355827 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 157 | +2023-10-13T14:15:38.851359444 [anonymous-instance:main] ThreadId(1)::main::main_exec>>from_args |
| 158 | +2023-10-13T14:15:38.851362931 [anonymous-instance:main] ThreadId(1)::main::main_exec<<from_args |
| 159 | +2023-10-13T14:15:38.851366207 [anonymous-instance:main] ThreadId(1)::main::main_exec>>get_filters |
| 160 | +2023-10-13T14:15:38.851368401 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters>>get_default_filters |
| 161 | +2023-10-13T14:15:38.851372068 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters>>deserialize_binary |
| 162 | +2023-10-13T14:15:38.851380033 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters<<deserialize_binary |
| 163 | +2023-10-13T14:15:38.851383990 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters>>filter_thread_categories |
| 164 | +2023-10-13T14:15:38.851388098 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters<<filter_thread_categories |
| 165 | +2023-10-13T14:15:38.851391845 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters<<get_default_filters |
| 166 | +2023-10-13T14:15:38.851394360 [anonymous-instance:main] ThreadId(1)::main::main_exec<<get_filters |
| 167 | +2023-10-13T14:15:38.851398077 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 168 | +2023-10-13T14:15:38.851400462 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 169 | +2023-10-13T14:15:38.851403507 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 170 | +2023-10-13T14:15:38.851410961 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 171 | +2023-10-13T14:15:38.851414107 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 172 | +2023-10-13T14:15:38.851417955 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 173 | +2023-10-13T14:15:38.851420650 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 174 | +2023-10-13T14:15:38.851426130 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 175 | +2023-10-13T14:15:38.851428434 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present |
| 176 | +2023-10-13T14:15:38.851430949 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of |
| 177 | +2023-10-13T14:15:38.851434766 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of |
| 178 | +2023-10-13T14:15:38.851438133 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present |
| 179 | +2023-10-13T14:15:38.851440577 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present |
| 180 | +2023-10-13T14:15:38.851444575 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of |
| 181 | +2023-10-13T14:15:38.851447941 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of |
| 182 | +2023-10-13T14:15:38.851450005 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present |
| 183 | +2023-10-13T14:15:38.851453772 [anonymous-instance:main] ThreadId(1)::main::main_exec>>arguments |
| 184 | +2023-10-13T14:15:38.851456488 [anonymous-instance:main] ThreadId(1)::main::main_exec<<arguments |
| 185 | +2023-10-13T14:15:38.851458902 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 186 | +2023-10-13T14:15:38.851462679 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 187 | +2023-10-13T14:15:38.851466587 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 188 | +2023-10-13T14:15:38.851470324 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>as_single_value |
| 189 | +2023-10-13T14:15:38.851473239 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<as_single_value |
| 190 | +2023-10-13T14:15:38.851476896 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 191 | +2023-10-13T14:15:38.851479521 [anonymous-instance:main] ThreadId(1)::main::main_exec>>arguments |
| 192 | +2023-10-13T14:15:38.851485062 [anonymous-instance:main] ThreadId(1)::main::main_exec<<arguments |
| 193 | +2023-10-13T14:15:38.851488398 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 194 | +2023-10-13T14:15:38.851491925 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 195 | +2023-10-13T14:15:38.851494900 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 196 | +2023-10-13T14:15:38.851496934 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 197 | +2023-10-13T14:15:38.851499689 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 198 | +2023-10-13T14:15:38.851502374 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 199 | +2023-10-13T14:15:38.851504629 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 200 | +2023-10-13T14:15:38.851507234 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>as_single_value |
| 201 | +2023-10-13T14:15:38.851508897 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<as_single_value |
| 202 | +2023-10-13T14:15:38.851510630 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 203 | +2023-10-13T14:15:38.851513576 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 204 | +2023-10-13T14:15:38.851515559 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 205 | +2023-10-13T14:15:38.851517503 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 206 | +2023-10-13T14:15:38.851520068 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 207 | +2023-10-13T14:15:38.851521731 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 208 | +2023-10-13T14:15:38.851525628 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 209 | +2023-10-13T14:15:38.851529045 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 210 | +2023-10-13T14:15:38.851533153 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 211 | +2023-10-13T14:15:38.851536339 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value |
| 212 | +2023-10-13T14:15:38.851538883 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of |
| 213 | +2023-10-13T14:15:38.851542330 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of |
| 214 | +2023-10-13T14:15:38.851544704 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value |
| 215 | +2023-10-13T14:15:38.851548572 [anonymous-instance:main] ThreadId(1)::main::main_exec>>run_with_api |
| 216 | +2023-10-13T14:15:38.851664621 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>new |
| 217 | +2023-10-13T14:15:38.851672586 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api<<new |
| 218 | +2023-10-13T14:15:38.851677876 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>init |
| 219 | +2023-10-13T14:15:38.851684739 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api<<init |
| 220 | +2023-10-13T14:15:38.851724064 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>build_microvm_from_requests |
| 221 | +2023-10-13T14:15:38.851728171 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>default |
| 222 | +2023-10-13T14:15:38.851731888 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<default |
| 223 | +2023-10-13T14:15:38.851734634 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>new |
| 224 | +2023-10-13T14:15:38.851737830 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<new |
| 225 | +2023-10-13T14:15:38.851748550 [anonymous-instance:fc_api] ThreadId(2)>>new |
| 226 | +2023-10-13T14:15:38.851761404 [anonymous-instance:fc_api] ThreadId(2)<<new |
| 227 | +2023-10-13T14:15:38.851764861 [anonymous-instance:fc_api] ThreadId(2)>>run |
| 228 | +2023-10-13T14:15:38.851775200 [anonymous-instance:fc_api] ThreadId(2)::run>>apply_filter |
| 229 | +2023-10-13T14:15:38.851823462 [anonymous-instance:fc_api] ThreadId(2)::run<<apply_filter |
| 230 | +2023-10-13T14:15:55.422397039 [anonymous-instance:fc_api] ThreadId(2)::run>>handle_request |
| 231 | +2023-10-13T14:15:55.422417909 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>try_from |
| 232 | +2023-10-13T14:15:55.422420554 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>describe |
| 233 | +2023-10-13T14:15:55.422424551 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<describe |
| 234 | +2023-10-13T14:15:55.422426395 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>log_received_api_request |
| 235 | +2023-10-13T14:15:55.422429270 [anonymous-instance:fc_api] The API server received a Get request on "/". |
| 236 | +2023-10-13T14:15:55.422431354 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<log_received_api_request |
| 237 | +2023-10-13T14:15:55.422433298 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>parse_get_instance_info |
| 238 | +2023-10-13T14:15:55.422435211 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info>>new_sync |
| 239 | +2023-10-13T14:15:55.422437165 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info::new_sync>>new |
| 240 | +2023-10-13T14:15:55.422439289 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info::new_sync<<new |
| 241 | +2023-10-13T14:15:55.422441123 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info<<new_sync |
| 242 | +2023-10-13T14:15:55.422444459 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<parse_get_instance_info |
| 243 | +2023-10-13T14:15:55.422446833 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<try_from |
| 244 | +2023-10-13T14:15:55.422448837 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>into_parts |
| 245 | +2023-10-13T14:15:55.422450921 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<into_parts |
| 246 | +2023-10-13T14:15:55.422453967 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>serve_vmm_action_request |
| 247 | +2023-10-13T14:15:55.422472552 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>handle_preboot_request |
| 248 | +2023-10-13T14:15:55.422480477 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<handle_preboot_request |
| 249 | +2023-10-13T14:15:55.422488963 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request>>convert_to_response |
| 250 | +2023-10-13T14:15:55.422492289 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response>>success_response_with_data |
| 251 | +2023-10-13T14:15:55.422493983 [anonymous-instance:fc_api] The request was executed successfully. Status code: 200 OK. |
| 252 | +2023-10-13T14:15:55.422498331 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data>>serialize |
| 253 | +2023-10-13T14:15:55.422501387 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data::serialize>>fmt |
| 254 | +2023-10-13T14:15:55.422506086 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data::serialize<<fmt |
| 255 | +2023-10-13T14:15:55.422509171 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data<<serialize |
| 256 | +2023-10-13T14:15:55.422511776 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response<<success_response_with_data |
| 257 | +2023-10-13T14:15:55.422514371 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request<<convert_to_response |
| 258 | +2023-10-13T14:15:55.422516385 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<serve_vmm_action_request |
| 259 | +2023-10-13T14:15:55.422518719 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>take_deprecation_message |
| 260 | +2023-10-13T14:15:55.422520533 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<take_deprecation_message |
| 261 | +2023-10-13T14:15:55.422522847 [anonymous-instance:fc_api] ThreadId(2)::run<<handle_request |
| 262 | +2023-10-13T14:15:55.422525422 [anonymous-instance:fc_api] Total previous API call duration: 132 us. |
| 263 | +
|
| 264 | +``` |
0 commit comments