-
Notifications
You must be signed in to change notification settings - Fork 188
Expand Kineto profiler support (part 1) #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D27873669 |
11 similar comments
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
This pull request was exported from Phabricator. Differential Revision: D27873669 |
Summary: Pull Request resolved: #193 Expanding Kineto support to more platforms Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 7c557e1b0025ba994f922ce4feef10400f371b8f
This pull request was exported from Phabricator. Differential Revision: D27873669 |
Summary: Pull Request resolved: pytorch/kineto#193 Expanding Kineto support to more platforms Test Plan: CI and OSS CI: pytorch#56323 Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 6c3d415e984c8f42ab009f692dd82dc0f52d55c8
This pull request has been merged in 2caeb94. |
Summary: Pull Request resolved: #57333 Pull Request resolved: pytorch/kineto#193 Expanding Kineto support to more platforms Test Plan: CI and OSS CI: #56323 Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
Summary: Pull Request resolved: pytorch#57333 Pull Request resolved: pytorch/kineto#193 Expanding Kineto support to more platforms Test Plan: CI and OSS CI: pytorch#56323 Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
* Put HeartbeatMonitor in a detail namespace (#143) Summary: Pull Request resolved: #143 I'm not actually sure there's any potential negative consequence here, but it's potentially harmful to use an anonymous namespace in a header, since it creates a unique symbol for each translation unit which can lead to ODR violations. I happened to notice a warning for this while building OSS pytorch :-) Reviewed By: gdankel Differential Revision: D27383716 fbshipit-source-id: 0fffc93478bc044a543b44011943652dc13c75cd * Fix C++14 compile error (#144) Summary: Pull Request resolved: #144 Accidentally used C++17 feature, and PyTorch still only supports C++14. Reviewed By: ilia-cher Differential Revision: D27422287 fbshipit-source-id: d5b72571f0272cb0cb25a395caf916ac3c16a4d3 * add JSON-formatted metadata for ClientTraceActivity (#164) Summary: Pull Request resolved: #164 Plan is to deprecate ClientTraceActivity and use a single concrete data structure to log all client side activities :`GenericTraceActivity`. CTA has a lot of explicit fields for metadata that are rarely used which results in sparse definition of activities. Instead of adding a new field, encode the key/value as a string. Note, more type-safe and rich alternative is `std::map` but we are not using it for performance reason. Reviewed By: gdankel Differential Revision: D27298829 fbshipit-source-id: b7a82081f983c5502f6d53104fc2d01ddee77b08 * deprecate metdata args from ClientTraceActivity (#55988) Summary: Pull Request resolved: pytorch/pytorch#55988 Pull Request resolved: #165 as part of the ClientTraceActivity -> GenericTraceActivity migration, move all the metadata fields to JSON encoded string Reviewed By: gdankel Differential Revision: D27340314 fbshipit-source-id: f55b77a779e4bda1fb8667cb4e0f4252b93af5ea * deprecate pthreadid (#56209) Summary: Pull Request resolved: pytorch/pytorch#56209 Pull Request resolved: #172 in this diff of the stack, we remove the threadId field from the ClientTraceActivity as our step towards the deprecation Reviewed By: ilia-cher Differential Revision: D27662747 fbshipit-source-id: 040ba040390680a0fc63ddc8149c6fad940439fc * Fix empty metadata invalid json (#173) Summary: Pull Request resolved: #173 Fix a case of invalid json when metadata is empty Reviewed By: chaekit Differential Revision: D27837639 fbshipit-source-id: 24c80671d8a15ac2d39096d401a5ec05b3339317 * Fix OSS compilation error (#176) Summary: Pull Request resolved: #176 In OSS not all builds support nodiscard https://app.circleci.com/pipelines/github/pytorch/pytorch/305130/workflows/1885277b-9d54-4637-ab9f-7234be2f2ee2/jobs/12588162 Reviewed By: dzhulgakov Differential Revision: D27872203 fbshipit-source-id: 711712f5b819fd4faf21da1c363b70960acf9518 * Support for trace metadata (#192) Summary: Pull Request resolved: #192 Add user-defined metadata api Reviewed By: gdankel Differential Revision: D27906278 fbshipit-source-id: 601de1e6f7ce61ed2cc62dd7d9a4bd3d4c955286 * Fix TraceActivity forward declaration (#199) Summary: Types defined as `struct` should be forward declared as struct https://github.com/pytorch/kineto/blob/5bc9386b6d60c3b34b77961ea2900947103304b9/libkineto/include/TraceActivity.h#L21 Find while working on making PyTorch clang-tidy compliant Pull Request resolved: #199 Reviewed By: ilia-cher Differential Revision: D28092540 Pulled By: malfet fbshipit-source-id: 428a5e762002f763f636957f8c207122f411f3c0 * Deprecate ClientTraceActivity and merge it with GenericTraceActivity (#56743) Summary: Pull Request resolved: pytorch/pytorch#56743 Pull Request resolved: #184 as part of the migration to ClientTraceActivity -> GenericTraceActivity, now that all CTA mirrors GTA's data structure, we can safely swap out the symbol name. Reviewed By: gdankel Differential Revision: D27353973 fbshipit-source-id: 7012c6524c3c75079029ac290c1dd722ac187ec5 * Expand Kineto profiler support (part 1) (#57333) Summary: Pull Request resolved: pytorch/pytorch#57333 Pull Request resolved: #193 Expanding Kineto support to more platforms Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7 * Refactor CuptiActivityBuffer Summary: Refactor for better readability. Also add locks to ensure CUPTI threads and profiler threads don't interfere. Reviewed By: chaekit Differential Revision: D28051462 fbshipit-source-id: c8a4f75b47c20f8467145728d4b38da8bdfcdccd * Fix a couple of trace collection corner cases Summary: Fixes two issues: 1) Back-to-back collections hit concurrency issue in the observer. 2) Early exit due to running out of buffers during warmup is broken due to missing break statement. Reviewed By: leitian, ilia-cher Differential Revision: D28051464 fbshipit-source-id: 7ddc46a344af2a922ee8f7e904b1b4e48bfac773 * Minor refactoring of per-op metadata Summary: I hit some jemalloc aborts (size mismatch) occasionally when freeing metadata strings. Not entirely sure I understand why but I have simplified the code and the issue has disappeared. Reviewed By: chaekit Differential Revision: D28051465 fbshipit-source-id: 699515409d6600f7aea14a96c1dac6fb9cad311e * Allow specifying CPU_OP as activity type via config Summary: CPU_OP was missing from the available activity types. It should be possible to profile only CPU ops, or to exclude them. Reviewed By: chaekit Differential Revision: D28051461 fbshipit-source-id: 2cfc3e4262b39ad08e38d7bf79541ce0cca97e60 * Shorten default warmup period Summary: The current warmup time is extremely conservative and not needed for the vast majority of workloads. Change the default to something more reasonable: 5s. Reviewed By: chaekit Differential Revision: D28051463 fbshipit-source-id: 3919342978e7137c0bd9d57536ee1e59eeb2f704 * Remove distinction between active and inactive profiler interval (#201) Summary: Pull Request resolved: #201 Previously tracing buffers were being cleared frequently during warmup in order to stay under the buffer memory limit. But most use cases need at least 1-2s active trace time to capture multiple iterations and so clearing the buffers every 1s seems sufficient. Reviewed By: chaekit Differential Revision: D28094676 fbshipit-source-id: a6a99ff02c710587a467d1d607964939bf23f39e * add 2 kernel metrics (#185) Summary: 1. Update "warps per sm" to "blocks per sm". 2. Add "occupancy" per kernel. Pull Request resolved: #185 Reviewed By: chaekit Differential Revision: D28120846 Pulled By: gdankel fbshipit-source-id: c7ce33b1421b60ae4323c66d38bba5eb175b105b * Fix empty metadata invalid json (#205) Summary: Pull Request resolved: #205 Fix invalid json in case of empty metadata Reviewed By: rohan-varma Differential Revision: D28239004 fbshipit-source-id: 3e2930b1b80a5d5f6c6f92de7eee899729a048dc * set the correct device id for GenericTraceActivity Summary: while merging ClientTraceActivity and GenericTraceActivity, we accidentally adopted CTA's behavior of returning process id over its `device`. This causes GTA to show up in CPU timeline rather than associated GPU's Reviewed By: gdankel Differential Revision: D28196723 fbshipit-source-id: eb8330c14e7c43a470bb4df4811b80754d96535b * Fix clock discrepancy (#207) Summary: Pull Request resolved: #207 Different platforms may use different aliases for clocks, fixing the discrepancy by specifying system_clock directly Reviewed By: gdankel Differential Revision: D28272403 fbshipit-source-id: 79df6c3e67cf883ca73146a15a9de2ce226891ae * Support for memory allocs/deallocs in the traces (#57835) Summary: Pull Request resolved: pytorch/pytorch#57835 Pull Request resolved: #208 Adding ability to save memory allocs/deallocs into the trace Reviewed By: gdankel Differential Revision: D28260915 fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f * Rename theoretical occupancy to est. achieved occupancy (#210) Summary: Pull Request resolved: #210 Theoretical occupancy is used in Nvidia docs to mean the occupancy a kernel could achieve given sufficient input parallelism (grid size). It's more interesting in a trace to know what occupancy is actually achieved. Then if it's low, theoretical occupancy can be studied for each kernel to estimate improvement from increasing grid size. But that's something we can do as part of a recommendation or tutorial. Reviewed By: ilia-cher Differential Revision: D28327605 fbshipit-source-id: 80b54a955fa2885dbbd45006c45f2fb0d9fca2d2 * Log compute properties to trace file (#211) Summary: Pull Request resolved: #211 Device properties are useful in any case, but especially for performing analysis on traces such as occupancy. This patch is a re-implementation of #209 Reviewed By: ilia-cher Differential Revision: D28337067 fbshipit-source-id: 87267d53bdc1c257db452319339518539b81efed * Kineto cmake fix (#216) Summary: Pull Request resolved: #216 Check the variable value before adding dep on cupti Reviewed By: gdankel, malfet Differential Revision: D28363222 fbshipit-source-id: 09f7da1e756ee35559e79ec7a2e5018223f5a12f * Call [Get|Set]ThreadDescription via runtime linking (#224) Summary: SetThreadDescription is not available on some older runtimes/older versions of Windows, which can lead to to following linker errors: ``` kineto.lib(ThreadUtil.cpp.obj) : error LNK2019: unresolved external symbol __imp_SetThreadDescription referenced in function "bool __cdecl libkineto::setThreadName(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?setThreadName@libkineto@YA_NAEBV?$basic_string@DU?$char_traits@D@std@V?$allocator@D@2@std@@z) ``` Use runtime linking method recommended in See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreaddescription Pull Request resolved: #224 Reviewed By: ilia-cher Differential Revision: D28404596 Pulled By: malfet fbshipit-source-id: 01937821a131aeff562aed5ebf274eac74bd816e * Get rid of tautological check (#225) Summary: `daemonConfigLoaderFactory` is a function and therefore always non-null Fixes following clang warning: ``` ../third_party/kineto/libkineto/src/ConfigLoader.cpp:161:7: warning: address of function 'daemonConfigLoaderFactory' will always evaluate to 'true' [-Wpointer-bool-conversion] if (daemonConfigLoaderFactory && daemonConfigLoaderFactory()) { ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~ ``` Pull Request resolved: #225 Reviewed By: ilia-cher Differential Revision: D28405389 Pulled By: malfet fbshipit-source-id: 3930ab1acd6f2909602eada246d9ad01d9a923e3 * Remove dependency on CuptiInterface from loggers Summary: Now that we're extracting SM count from CUDA API in the chrome trace logger, there is no longer any need for passing in the CUPTI API wrapper. Reviewed By: ilia-cher Differential Revision: D28579642 fbshipit-source-id: 75004e092d3bc9c26435a02ce1a2034c0ea70004 * Add support for multiple protocols in client API (#251) Summary: Pull Request resolved: #251 Add a way to add support for multiple protocols and associated logger objects. Reviewed By: ilia-cher Differential Revision: D28579660 fbshipit-source-id: 497c45eca0529ba0620d635c6605fc0f20172d9c * Invert ConfigLoader -> profiler dependency (#271) Summary: Pull Request resolved: #271 Change direction of dependency from ConfigLoader -> profiler to profiler -> ConfigLoader. This way, the profiler is able to lazily initialize config loader and also move towards the pluggable profiler design. Reviewed By: xw285cornell Differential Revision: D28802798 fbshipit-source-id: 56e88b223fe8fc5276500e1b9a19c602450ae6dc * Revert verbose logging Summary: Verbose logging was accidentally left enabled, revert. Reviewed By: ilia-cher Differential Revision: D28916054 fbshipit-source-id: 9dd3bd831190f246990f1fe17c04ea505ce219e5 * Back out "Invert ConfigLoader -> profiler dependency" Summary: S233925 Reviewed By: satgera, xw285cornell Differential Revision: D28907829 fbshipit-source-id: 2224720c178fb885d8dccb8a38f50c856e48bdd7 * Refactor trace activities (#59360) Summary: Pull Request resolved: pytorch/pytorch#59360 Pull Request resolved: #206 Replace ClientTraceActivity with GenericActivity. In addition: * Add a couple of new activity types for user annotations * Simplify code for GPU-side user annotations * Add accessor to containing trace span object in activities. Later we can replace this with a trace context / trace session object. * Simplified MemoryTraceLogger * Added early exit for cupti push/pop correlation ID Reviewed By: ilia-cher Differential Revision: D28231675 fbshipit-source-id: 7129f2493016efb4d3697094f24475e2c39e6e65 * add activity profiler plugin interface and test (#202) Summary: Pull Request resolved: #202 # Activity Profiler Interface Adds the child Activiity profiler interface and implementation. This interface can be used by libraries and frameworks to supply trace events to Kineto. The first version only consolidates trace events and does not handle correlation yet. ## Details * Add Activity Profiler interface header that includes both profiler and the profiler session. A session manages all the trace event data captured by the respective profiler * Also had to move GenericTrace activity to includes dir as it used in the profiler session interface. * Creates sessions and starts and stops them in the primary ActivityProfilers flow. * First use-case is to integrate with Glow (Graph Lowering Compiler) , adding a glow_runtime event type for it. https://github.com/pytorch/glow/blob/master/docs/Tracing.md Reviewed By: gdankel Differential Revision: D27601906 fbshipit-source-id: 38e608a3a1a6e1b9e69f80d07312e388ed7ad339 * Do not leak file descriptors in processName() (#281) Summary: Each `fopen()` must be followed by `fclose()` Pull Request resolved: #281 Reviewed By: seemethere Differential Revision: D29008280 Pulled By: malfet fbshipit-source-id: 2171740a4f95bb980fe41cae1080f38458069836 * Merge Plugin/0.2 (#284) Summary: Pull Request resolved: #284 Reviewed By: leitian, chaekit Differential Revision: D29052618 Pulled By: gdankel fbshipit-source-id: 9f38cdfc7c7e73f5f62844ef857ebe6fed46f30a * Read base config via daemon (#291) Summary: Pull Request resolved: #291 In container setups, libkineto may not be able to read config file from the host file system, so add a path to retrieve it from the daemon. Reviewed By: briancoutinho Differential Revision: D28877610 fbshipit-source-id: a5f513524ac7afea14a7089677c5eff96b3530c6 Co-authored-by: Bert Maher <[email protected]> Co-authored-by: Gisle Dankel <[email protected]> Co-authored-by: Jay Chae <[email protected]> Co-authored-by: Ilia Cherniavskii <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Teng Gao <[email protected]> Co-authored-by: Chris Cai <[email protected]> Co-authored-by: Brian Coutinho <[email protected]>
Summary: Expanding Kineto support to more platforms
Differential Revision: D27873669