Skip to content

Expand Kineto profiler support (part 1) #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Expand Kineto profiler support (part 1) #193

wants to merge 1 commit into from

Conversation

ilia-cher
Copy link

Summary: Expanding Kineto support to more platforms

Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

11 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

Summary:
Pull Request resolved: #193

Expanding Kineto support to more platforms

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 7c557e1b0025ba994f922ce4feef10400f371b8f
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D27873669

ilia-cher pushed a commit to ilia-cher/pytorch that referenced this pull request Apr 30, 2021
Summary:
Pull Request resolved: pytorch/kineto#193

Expanding Kineto support to more platforms

Test Plan:
CI and OSS CI:
pytorch#56323

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 6c3d415e984c8f42ab009f692dd82dc0f52d55c8
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 2caeb94.

facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull request Apr 30, 2021
Summary:
Pull Request resolved: #57333

Pull Request resolved: pytorch/kineto#193

Expanding Kineto support to more platforms

Test Plan:
CI and OSS CI:
#56323

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#57333

Pull Request resolved: pytorch/kineto#193

Expanding Kineto support to more platforms

Test Plan:
CI and OSS CI:
pytorch#56323

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
guyang3532 added a commit that referenced this pull request Jun 15, 2021
* Put HeartbeatMonitor in a detail namespace (#143)

Summary:
Pull Request resolved: #143

I'm not actually sure there's any potential negative consequence here,
but it's potentially harmful to use an anonymous namespace in a header, since
it creates a unique symbol for each translation unit which can lead to ODR
violations.

I happened to notice a warning for this while building OSS pytorch :-)

Reviewed By: gdankel

Differential Revision: D27383716

fbshipit-source-id: 0fffc93478bc044a543b44011943652dc13c75cd

* Fix C++14 compile error (#144)

Summary:
Pull Request resolved: #144

Accidentally used C++17 feature, and PyTorch still only supports C++14.

Reviewed By: ilia-cher

Differential Revision: D27422287

fbshipit-source-id: d5b72571f0272cb0cb25a395caf916ac3c16a4d3

* add JSON-formatted metadata for ClientTraceActivity (#164)

Summary:
Pull Request resolved: #164

Plan is to deprecate ClientTraceActivity and use a single concrete data structure to log all client side activities :`GenericTraceActivity`. CTA has a lot of explicit fields for metadata that are rarely used which results in sparse definition of activities. Instead of adding a new field, encode the key/value as a string. Note, more type-safe and rich alternative is `std::map` but we are not using it for performance reason.

Reviewed By: gdankel

Differential Revision: D27298829

fbshipit-source-id: b7a82081f983c5502f6d53104fc2d01ddee77b08

* deprecate metdata args from ClientTraceActivity (#55988)

Summary:
Pull Request resolved: pytorch/pytorch#55988

Pull Request resolved: #165

as part of the ClientTraceActivity -> GenericTraceActivity migration, move all the metadata fields to JSON encoded string

Reviewed By: gdankel

Differential Revision: D27340314

fbshipit-source-id: f55b77a779e4bda1fb8667cb4e0f4252b93af5ea

* deprecate pthreadid (#56209)

Summary:
Pull Request resolved: pytorch/pytorch#56209

Pull Request resolved: #172

in this diff of the stack, we remove the threadId field from the ClientTraceActivity as our step towards the deprecation

Reviewed By: ilia-cher

Differential Revision: D27662747

fbshipit-source-id: 040ba040390680a0fc63ddc8149c6fad940439fc

* Fix empty metadata invalid json (#173)

Summary:
Pull Request resolved: #173

Fix a case of invalid json when metadata is empty

Reviewed By: chaekit

Differential Revision: D27837639

fbshipit-source-id: 24c80671d8a15ac2d39096d401a5ec05b3339317

* Fix OSS compilation error (#176)

Summary:
Pull Request resolved: #176

In OSS not all builds support nodiscard
https://app.circleci.com/pipelines/github/pytorch/pytorch/305130/workflows/1885277b-9d54-4637-ab9f-7234be2f2ee2/jobs/12588162

Reviewed By: dzhulgakov

Differential Revision: D27872203

fbshipit-source-id: 711712f5b819fd4faf21da1c363b70960acf9518

* Support for trace metadata (#192)

Summary:
Pull Request resolved: #192

Add user-defined metadata api

Reviewed By: gdankel

Differential Revision: D27906278

fbshipit-source-id: 601de1e6f7ce61ed2cc62dd7d9a4bd3d4c955286

* Fix TraceActivity forward declaration (#199)

Summary:
Types defined as `struct` should be forward declared as struct

https://github.com/pytorch/kineto/blob/5bc9386b6d60c3b34b77961ea2900947103304b9/libkineto/include/TraceActivity.h#L21

Find while working on making PyTorch clang-tidy compliant

Pull Request resolved: #199

Reviewed By: ilia-cher

Differential Revision: D28092540

Pulled By: malfet

fbshipit-source-id: 428a5e762002f763f636957f8c207122f411f3c0

* Deprecate ClientTraceActivity and merge it with GenericTraceActivity (#56743)

Summary:
Pull Request resolved: pytorch/pytorch#56743

Pull Request resolved: #184

as part of the migration to ClientTraceActivity -> GenericTraceActivity, now that all CTA mirrors GTA's data structure, we can safely swap out the symbol name.

Reviewed By: gdankel

Differential Revision: D27353973

fbshipit-source-id: 7012c6524c3c75079029ac290c1dd722ac187ec5

* Expand Kineto profiler support (part 1) (#57333)

Summary:
Pull Request resolved: pytorch/pytorch#57333

Pull Request resolved: #193

Expanding Kineto support to more platforms

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7

* Refactor CuptiActivityBuffer

Summary:
Refactor for better readability.
Also add locks to ensure CUPTI threads and profiler threads don't interfere.

Reviewed By: chaekit

Differential Revision: D28051462

fbshipit-source-id: c8a4f75b47c20f8467145728d4b38da8bdfcdccd

* Fix a couple of trace collection corner cases

Summary:
Fixes two issues:
1) Back-to-back collections hit concurrency issue in the observer.
2) Early exit due to running out of buffers during warmup is broken due to missing break statement.

Reviewed By: leitian, ilia-cher

Differential Revision: D28051464

fbshipit-source-id: 7ddc46a344af2a922ee8f7e904b1b4e48bfac773

* Minor refactoring of per-op metadata

Summary: I hit some jemalloc aborts (size mismatch) occasionally when freeing metadata strings. Not entirely sure I understand why but I have simplified the code and the issue has disappeared.

Reviewed By: chaekit

Differential Revision: D28051465

fbshipit-source-id: 699515409d6600f7aea14a96c1dac6fb9cad311e

* Allow specifying CPU_OP as activity type via config

Summary: CPU_OP was missing from the available activity types. It should be possible to profile only CPU ops, or to exclude them.

Reviewed By: chaekit

Differential Revision: D28051461

fbshipit-source-id: 2cfc3e4262b39ad08e38d7bf79541ce0cca97e60

* Shorten default warmup period

Summary: The current warmup time is extremely conservative and not needed for the vast majority of workloads. Change the default to something more reasonable: 5s.

Reviewed By: chaekit

Differential Revision: D28051463

fbshipit-source-id: 3919342978e7137c0bd9d57536ee1e59eeb2f704

* Remove distinction between active and inactive profiler interval (#201)

Summary:
Pull Request resolved: #201

Previously tracing buffers were being cleared frequently during warmup in order to stay under the buffer memory limit. But most use cases need at least 1-2s active trace time to capture multiple iterations and so clearing the buffers every 1s seems sufficient.

Reviewed By: chaekit

Differential Revision: D28094676

fbshipit-source-id: a6a99ff02c710587a467d1d607964939bf23f39e

* add 2 kernel metrics (#185)

Summary:
1. Update "warps per sm" to "blocks per sm".
2. Add "occupancy" per kernel.

Pull Request resolved: #185

Reviewed By: chaekit

Differential Revision: D28120846

Pulled By: gdankel

fbshipit-source-id: c7ce33b1421b60ae4323c66d38bba5eb175b105b

* Fix empty metadata invalid json (#205)

Summary:
Pull Request resolved: #205

Fix invalid json in case of empty metadata

Reviewed By: rohan-varma

Differential Revision: D28239004

fbshipit-source-id: 3e2930b1b80a5d5f6c6f92de7eee899729a048dc

* set the correct device id for GenericTraceActivity

Summary: while merging ClientTraceActivity and GenericTraceActivity, we accidentally adopted CTA's behavior of returning process id over its `device`. This causes GTA to show up in CPU timeline rather than associated GPU's

Reviewed By: gdankel

Differential Revision: D28196723

fbshipit-source-id: eb8330c14e7c43a470bb4df4811b80754d96535b

* Fix clock discrepancy (#207)

Summary:
Pull Request resolved: #207

Different platforms may use different aliases for clocks, fixing the
discrepancy by specifying system_clock directly

Reviewed By: gdankel

Differential Revision: D28272403

fbshipit-source-id: 79df6c3e67cf883ca73146a15a9de2ce226891ae

* Support for memory allocs/deallocs in the traces (#57835)

Summary:
Pull Request resolved: pytorch/pytorch#57835

Pull Request resolved: #208

Adding ability to save memory allocs/deallocs into the trace

Reviewed By: gdankel

Differential Revision: D28260915

fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f

* Rename theoretical occupancy to est. achieved occupancy (#210)

Summary:
Pull Request resolved: #210

Theoretical occupancy is used in Nvidia docs to mean the occupancy a kernel could achieve given sufficient input parallelism (grid size).

It's more interesting in a trace to know what occupancy is actually achieved. Then if it's low, theoretical occupancy can be studied for each kernel to estimate improvement from increasing grid size. But that's something we can do as part of a recommendation or tutorial.

Reviewed By: ilia-cher

Differential Revision: D28327605

fbshipit-source-id: 80b54a955fa2885dbbd45006c45f2fb0d9fca2d2

* Log compute properties to trace file (#211)

Summary:
Pull Request resolved: #211

Device properties are useful in any case, but especially for performing analysis on traces such as occupancy.

This patch is a re-implementation of #209

Reviewed By: ilia-cher

Differential Revision: D28337067

fbshipit-source-id: 87267d53bdc1c257db452319339518539b81efed

* Kineto cmake fix (#216)

Summary:
Pull Request resolved: #216

Check the variable value before adding dep on cupti

Reviewed By: gdankel, malfet

Differential Revision: D28363222

fbshipit-source-id: 09f7da1e756ee35559e79ec7a2e5018223f5a12f

* Call [Get|Set]ThreadDescription via runtime linking (#224)

Summary:
SetThreadDescription is not available on some older runtimes/older
versions of Windows, which can lead to to following linker errors:
```
kineto.lib(ThreadUtil.cpp.obj) : error LNK2019: unresolved external symbol __imp_SetThreadDescription referenced in function "bool __cdecl libkineto::setThreadName(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?setThreadName@libkineto@YA_NAEBV?$basic_string@DU?$char_traits@D@std@V?$allocator@D@2@std@@z)
```

Use runtime linking method recommended in
See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreaddescription

Pull Request resolved: #224

Reviewed By: ilia-cher

Differential Revision: D28404596

Pulled By: malfet

fbshipit-source-id: 01937821a131aeff562aed5ebf274eac74bd816e

* Get rid of tautological check (#225)

Summary:
`daemonConfigLoaderFactory` is a function and therefore always non-null

Fixes following clang warning:
```
../third_party/kineto/libkineto/src/ConfigLoader.cpp:161:7: warning: address of function 'daemonConfigLoaderFactory' will always evaluate to 'true' [-Wpointer-bool-conversion]
  if (daemonConfigLoaderFactory && daemonConfigLoaderFactory()) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~
```

Pull Request resolved: #225

Reviewed By: ilia-cher

Differential Revision: D28405389

Pulled By: malfet

fbshipit-source-id: 3930ab1acd6f2909602eada246d9ad01d9a923e3

* Remove dependency on CuptiInterface from loggers

Summary: Now that we're extracting SM count from CUDA API in the chrome trace logger, there is no longer any need for passing in the CUPTI API wrapper.

Reviewed By: ilia-cher

Differential Revision: D28579642

fbshipit-source-id: 75004e092d3bc9c26435a02ce1a2034c0ea70004

* Add support for multiple protocols in client API (#251)

Summary:
Pull Request resolved: #251

Add a way to add support for multiple protocols and associated logger objects.

Reviewed By: ilia-cher

Differential Revision: D28579660

fbshipit-source-id: 497c45eca0529ba0620d635c6605fc0f20172d9c

* Invert ConfigLoader -> profiler dependency (#271)

Summary:
Pull Request resolved: #271

Change direction of dependency from ConfigLoader -> profiler to profiler -> ConfigLoader. This way, the profiler is able to lazily initialize config loader and also move towards the pluggable profiler design.

Reviewed By: xw285cornell

Differential Revision: D28802798

fbshipit-source-id: 56e88b223fe8fc5276500e1b9a19c602450ae6dc

* Revert verbose logging

Summary: Verbose logging was accidentally left enabled, revert.

Reviewed By: ilia-cher

Differential Revision: D28916054

fbshipit-source-id: 9dd3bd831190f246990f1fe17c04ea505ce219e5

* Back out "Invert ConfigLoader -> profiler dependency"

Summary: S233925

Reviewed By: satgera, xw285cornell

Differential Revision: D28907829

fbshipit-source-id: 2224720c178fb885d8dccb8a38f50c856e48bdd7

* Refactor trace activities (#59360)

Summary:
Pull Request resolved: pytorch/pytorch#59360

Pull Request resolved: #206

Replace ClientTraceActivity with GenericActivity.
In addition:
* Add a couple of new activity types for user annotations
* Simplify code for GPU-side user annotations
* Add accessor to containing trace span object in activities. Later we can replace this with a trace context / trace session object.
* Simplified MemoryTraceLogger
* Added early exit for cupti push/pop correlation ID

Reviewed By: ilia-cher

Differential Revision: D28231675

fbshipit-source-id: 7129f2493016efb4d3697094f24475e2c39e6e65

* add activity profiler plugin interface and test (#202)

Summary:
Pull Request resolved: #202

# Activity Profiler Interface
Adds the child Activiity profiler interface and implementation. This interface can be used by libraries and frameworks to supply trace events to Kineto.
The first version only consolidates trace events and does not handle correlation yet.

## Details
* Add Activity Profiler interface header that includes both profiler and the profiler session.
   A session manages all the trace event data captured by the respective profiler
* Also had to move GenericTrace activity to includes dir as it used in the profiler session interface.
* Creates sessions and starts and stops them in the primary ActivityProfilers flow.
* First use-case is to integrate with Glow (Graph Lowering Compiler) , adding a glow_runtime event type for it.
https://github.com/pytorch/glow/blob/master/docs/Tracing.md

Reviewed By: gdankel

Differential Revision: D27601906

fbshipit-source-id: 38e608a3a1a6e1b9e69f80d07312e388ed7ad339

* Do not leak file descriptors in processName() (#281)

Summary:
Each `fopen()` must be followed by `fclose()`

Pull Request resolved: #281

Reviewed By: seemethere

Differential Revision: D29008280

Pulled By: malfet

fbshipit-source-id: 2171740a4f95bb980fe41cae1080f38458069836

* Merge Plugin/0.2 (#284)

Summary: Pull Request resolved: #284

Reviewed By: leitian, chaekit

Differential Revision: D29052618

Pulled By: gdankel

fbshipit-source-id: 9f38cdfc7c7e73f5f62844ef857ebe6fed46f30a

* Read base config via daemon (#291)

Summary:
Pull Request resolved: #291

In container setups, libkineto may not be able to read config file from the host file system, so add a path to retrieve it from the daemon.

Reviewed By: briancoutinho

Differential Revision: D28877610

fbshipit-source-id: a5f513524ac7afea14a7089677c5eff96b3530c6

Co-authored-by: Bert Maher <[email protected]>
Co-authored-by: Gisle Dankel <[email protected]>
Co-authored-by: Jay Chae <[email protected]>
Co-authored-by: Ilia Cherniavskii <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Teng Gao <[email protected]>
Co-authored-by: Chris Cai <[email protected]>
Co-authored-by: Brian Coutinho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants