Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler improvements #2724

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andresovela
Copy link
Contributor

@andresovela andresovela commented Oct 18, 2024

This PR adds a few improvements to the MicroProfiler class.

bug=#2725

First, I made it a template so that it's possible to adjust the maximum number of events the profiler can handle via a template parameter.

The motivation for this change is to save memory. #1835 increased the maximum number of events from 1024 to 4096, which is a huge waste of RAM. The MicroProfiler class has these buffers:

const char* tags_[kMaxEvents];
uint32_t start_ticks_[kMaxEvents];
uint32_t end_ticks_[kMaxEvents];

struct TicksPerTag {
  const char* tag;
  uint32_t ticks;
};

TicksPerTag total_ticks_per_tag[kMaxEvents];

We need 20 bytes per event, so an instance of MicroProfiler uses 80 KB of RAM, even if the model you're profiling has only a few nodes.

This PR also adds the following enum to specify the logging format and remove some duplicated code and clean the API.

enum class MicroProfilerLogFormat {
  HumanReadable,
  Csv,
};

I replaced LogTicksPerTagCsv() with LogGrouped(), which does essentially the same thing, but also prints in a human readable format which looks like this:

Cumulative event times:
Count    Tag                              Ticks        Time
6        DEPTHWISE_CONV_2D                3085702      5.194 ms
2        SUM                              1281206      2.156 ms
66       FULLY_CONNECTED                  934142       1.572 ms
10       CONV_2D                          765357       1.288 ms
67       QUANTIZE                         731923       1.232 ms
20       CONCATENATION                    362212       0.609 ms
76       STRIDED_SLICE                    335828       0.565 ms
3        RELU                             245643       0.413 ms
4        DEQUANTIZE                       156682       0.263 ms
6        PAD                              145834       0.245 ms
29       ADD                              130229       0.219 ms
65       RESHAPE                          49593        0.083 ms
13       MUL                              48852        0.082 ms
4        SUB                              16371        0.027 ms
6        LOGISTIC                         10336        0.017 ms
3        TANH                             4761         0.008 ms
3        UNPACK                           2738         0.004 ms

Total time: 13.985 ms (8307409 ticks)

Finally I also added TicksToUs() to micro_time.h for convenience.

@andresovela andresovela requested a review from a team as a code owner October 18, 2024 13:11
// sections. This can be used in conjunction with running the relevant micro
// benchmark to evaluate end-to-end performance.
template <int MAX_EVENTS>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be template <int MAX_EVENTS = 4096> so that all existing applications using the profiler do not need to be changed.

Comment on lines +107 to +118
case MicroProfilerLogFormat::Csv:
MicroPrintf("\"Event\",\"Tag\",\"Ms\",\"Ticks\"");
for (int i = 0; i < num_events_; ++i) {
#if defined(HEXAGON) || defined(CMSIS_NN)
int ticks = end_ticks_[i] - start_ticks_[i];
MicroPrintf("%d,%s,%u,%d", i, tags_[i], TicksToMs(ticks), ticks);
#else
uint32_t ticks = end_ticks_[i] - start_ticks_[i];
uint64_t us = TicksToUs(ticks);
MicroPrintf("%d,%s,%" PRIu64 ".%" PRIu64 ",%" PRIu32, i, tags_[i],
us / 1000, us % 1000, ticks);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the CSV format probably won't pass review, as it most likely breaks countless internal Google, and external third-party scripts.

const char* tag;
uint32_t ticks;
uint32_t tag_count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used by the "human readable" log format.

Comment on lines +209 to +219
void SortTagGroups() {
for (int i = 0; i < num_tag_groups_ - 1; ++i) {
for (int j = i + 1; j < num_tag_groups_; ++j) {
if (tag_groups_[j].ticks > tag_groups_[i].ticks) {
TagGroup temp_tag_group = tag_groups_[i];
tag_groups_[i] = tag_groups_[j];
tag_groups_[j] = temp_tag_group;
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use standard C++ library algorithms (as long as they don't require memory allocation). Here std::sort should be used.

Comment on lines -36 to +49
virtual uint32_t BeginEvent(const char* tag) override;
uint32_t BeginEvent(const char* tag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an override of the abstract base class, and should remain as such.

Comment on lines -45 to +71
virtual void EndEvent(uint32_t event_handle) override;
void EndEvent(uint32_t event_handle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an override of the abstract base class, and should remain as such.


// Prints the profiling information of each of the events in human readable
// form.
void Log() const;
void Log(MicroProfilerLogFormat format) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a default argument of MicroProfilerLogFormt::Csv so that all existing applications continue to compile without modification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants