i#6643: Add finalize_interval_snapshots API to analysis_tool_t (#6664)

Adds a new finalize_interval_snapshots API to analysis_tool_t. This is invoked with the list of shard-local interval state snapshots for each shard separately in the parallel mode, or the whole-trace ones in serial mode. This allows the tool the opportunity to make any required holistic adjustments to the snapshots since now all of the snapshots can be observed together; e.g., computing diffs with the prior snapshot. This is invoked before the shard-local snapshots are possibly combined to create whole-trace snapshots, and before the snapshots are passed to print_interval_results. Adds unit tests for the new API to the existing tool.drcacheoff.trace_interval_analysis_unit_tests tests. Refactors some existing code to accumulate the interval snapshots in an std::vector instead of an std::queue. This adds some more complexity to the merge_shard_interval_results implementation, but is better because now we have more usages where an std::vector is needed (and we want to avoid a back-and-forth conversion between a queue and a vector). Augments various documentation to provide more details about intended usages of the interval APIs. Notably: documented the new finalize_interval_snapshots API, and that modifications made after the combine_interval_snapshots API has been invoked do not have any effect). Issue: #6643, #6020
DynamoRIO · Feb 22, 2024 · 41b55f2 · 41b55f2
1 parent 4bf1163
commit 41b55f2
Show file tree

Hide file tree

Showing 5 changed files with 227 additions and 120 deletions.
diff --git a/api/docs/release.dox b/api/docs/release.dox
@@ -142,11 +142,11 @@ changes:
    refers to timestamps and direct switches, which is what most users should want.
  - Rename the macro INSTR_CREATE_mul_sve to INSTR_CREATE_mul_sve_imm to
    differentiate it from the other SVE MUL instructions.
- - Added a new drmemtrace analyzer option \p -interval_instr_count that enables trace
-   analyzer interval results for every given count of instrs in each shard. This mode
-   does not support merging the shard interval snapshots to output the whole-trace
-   interval snapshots. Instead, the print_interval_results() API is called separately
-   for each shard with the interval state snapshots of that shard.
+ - Renamed a protected data member in #dynamorio::drmemtrace::analyzer_tmpl_t from
+   merged_interval_snapshots_ to whole_trace_interval_snapshots_ (may be relevant for
+   users sub-classing analyzer_tmpl_t).
+ - Converted #dynamorio::drmemtrace::analysis_tool_tmpl_t::interval_state_snapshot_t
+   into a class with all its data members marked private with public accessor functions.
 
 Further non-compatibility-affecting changes include:
  - Added DWARF-5 support to the drsyms library by linking in 4 static libraries
@@ -203,11 +203,15 @@ Further non-compatibility-affecting changes include:
  - Added #dynamorio::drmemtrace::TRACE_MARKER_TYPE_VECTOR_LENGTH marker to indicate the
    current vector length for architectures with a hardware defined or runtime changeable
    vector length (such as AArch64's SVE scalable vectors).
- - Renamed a protected data member in #dynamorio::drmemtrace::analyzer_tmpl_t from
-   merged_interval_snapshots_ to whole_trace_interval_snapshots_ (may be relevant for
-   users sub-classing analyzer_tmpl_t).
- - Converted #dynamorio::drmemtrace::analysis_tool_tmpl_t::interval_state_snapshot_t
-   into a class with all its data members marked private with public accessor functions.
+ - Added a new drmemtrace analyzer option \p -interval_instr_count that enables trace
+   analyzer interval results for every given count of instrs in each shard. This mode
+   does not support merging the shard interval snapshots to output the whole-trace
+   interval snapshots. Instead, the print_interval_results() API is called separately
+   for each shard with the interval state snapshots of that shard.
+ - Added a new finalize_interval_snapshots() API to
+   #dynamorio::drmemtrace::analysis_tool_t to allow the tool to make holistic
+   adjustments to the interval snapshots after all have been generated, and before
+   they are used for merging across shards (potentially), and printing the results.
 
 **************************************************
 <hr>

diff --git a/clients/drcachesim/analysis_tool.h b/clients/drcachesim/analysis_tool.h
@@ -189,12 +189,13 @@ template <typename RecordType> class analysis_tool_tmpl_t {
     print_results() = 0;
 
     /**
-     * Struct that stores details of a tool's state snapshot at an interval. This is
+     * Type that stores details of a tool's state snapshot at an interval. This is
      * useful for computing and combining interval results. Tools should inherit from
-     * this struct to define their own state snapshot structs. Tools do not need to
-     * supply any values to construct this base struct; they can simply use the
+     * this type to define their own state snapshot types. Tools do not need to
+     * supply any values to construct this base class; they can simply use the
      * default constructor. The members of this base class will be set by the
-     * framework automatically.
+     * framework automatically, and must not be modified by the tool at any point.
+     * XXX: Perhaps this should be a class with private data members.
      */
     class interval_state_snapshot_t {
         // Allow the analyzer framework access to private data members to set them
@@ -220,6 +221,10 @@ template <typename RecordType> class analysis_tool_tmpl_t {
             , instr_count_delta_(instr_count_delta)
         {
         }
+        // This constructor should be used by tools that subclass
+        // interval_state_snapshot_t. The data members will be set by the framework
+        // automatically when the tool returns a pointer to their created object from
+        // generate_*interval_snapshot or combine_interval_snapshots.
         interval_state_snapshot_t()
         {
         }
@@ -257,8 +262,9 @@ template <typename RecordType> class analysis_tool_tmpl_t {
         // The following fields are set automatically by the analyzer framework after
         // the tool returns the interval_state_snapshot_t* in the
         // generate_*interval_snapshot APIs. So they'll be available to the tool in
-        // the combine_interval_snapshots (for the parameter snapshots) and
-        // print_interval_results APIs via the above public accessor functions.
+        // the finalize_interval_snapshots(), combine_interval_snapshots() (for the
+        // parameter snapshots), and print_interval_results() APIs via the above
+        // public accessor functions.
 
         // Identifier for the shard to which this interval belongs. Currently, shards
         // map only to threads, so this is the thread id. Set to WHOLE_TRACE_SHARD_ID
@@ -280,23 +286,26 @@ template <typename RecordType> class analysis_tool_tmpl_t {
     };
     /**
      * Notifies the analysis tool that the given trace \p interval_id has ended so
-     * that it can generate a snapshot of its internal state in a struct derived
+     * that it can generate a snapshot of its internal state in a type derived
      * from \p interval_state_snapshot_t, and return a pointer to it. The returned
-     * pointer will be provided to the tool in later combine_interval_snapshots()
+     * pointer will be provided to the tool in later finalize_interval_snapshots(),
      * and print_interval_result() calls.
      *
      * \p interval_id is a positive ordinal of the trace interval that just ended.
-     * Trace intervals have a length equal to the \p -interval_microseconds specified
-     * to the framework. Trace intervals are measured using the value of the
-     * #TRACE_MARKER_TYPE_TIMESTAMP markers. The provided \p interval_id
-     * values will be monotonically increasing but may not be continuous,
-     * i.e. the tool may not see some \p interval_id if the trace did not have
-     * any activity in that interval.
+     * Trace intervals have a length equal to either \p -interval_microseconds or
+     * \p -interval_instr_count. Time-based intervals are measured using the value
+     * of the #TRACE_MARKER_TYPE_TIMESTAMP markers. Instruction count intervals are
+     * measured in terms of shard-local instrs.
      *
-     * The returned \p interval_state_snapshot_t* will be passed to the
-     * combine_interval_snapshots() API which is invoked by the framework to merge
-     * multiple \p interval_state_snapshot_t from different shards in the parallel
-     * mode of the analyzer.
+     * The provided \p interval_id values will be monotonically increasing. For
+     * \p -interval_microseconds intervals, these values may not be continuous,
+     * i.e. the tool may not see some \p interval_id if the trace did not have any
+     * activity in that interval.
+     *
+     * After all interval state snapshots are generated, the list of all returned
+     * \p interval_state_snapshot_t* is passed to finalize_interval_snapshots()
+     * to allow the tool the opportunity to make any holistic adjustments to the
+     * snapshots.
      *
      * Finally, the print_interval_result() API is invoked with a list of
      * \p interval_state_snapshot_t* representing interval snapshots for the
@@ -313,6 +322,40 @@ template <typename RecordType> class analysis_tool_tmpl_t {
     {
         return nullptr;
     }
+    /**
+     * Finalizes the interval snapshots in the given \p interval_snapshots list.
+     * This callback provides an opportunity for tools to make any holistic
+     * adjustments to the snapshot list now that we have all of them together. This
+     * may include, for example, computing the diff with the previous snapshot.
+     *
+     * Tools can modify the individual snapshots and also the list of snapshots itself.
+     * If some snapshots are removed, release_interval_snapshot() will not be invoked
+     * for them and the tool is responsible to de-allocate the resources. Adding new
+     * snapshots to the list is undefined behavior; tools should operate only on the
+     * provided snapshots which were generated in prior generate_*interval_snapshot
+     * calls.
+     *
+     * Tools cannot modify any data set by the framework in the base
+     * \p interval_state_snapshot_t; note that only read-only access is allowed anyway
+     * to those private data members via public accessor functions.
+     *
+     * In the parallel mode, this is invoked for each list of shard-local snapshots
+     * before they are possibly merged to create whole-trace snapshots using
+     * combine_interval_snapshots() and passed to print_interval_result(). In the
+     * serial mode, this is invoked with the list of whole-trace snapshots before it
+     * is passed to print_interval_results().
+     *
+     * This is an optional API. If a tool chooses to not override this, the snapshot
+     * list will simply continue unmodified.
+     *
+     * Returns whether it was successful.
+     */
+    virtual bool
+    finalize_interval_snapshots(
+        std::vector<interval_state_snapshot_t *> &interval_snapshots)
+    {
+        return true;
+    }
     /**
      * Invoked by the framework to combine the shard-local \p interval_state_snapshot_t
      * objects pointed at by \p latest_shard_snapshots, to create the combined
@@ -338,6 +381,10 @@ template <typename RecordType> class analysis_tool_tmpl_t {
      *   \p interval_end_timestamp)
      * - or if the tool mixes cumulative and delta metrics: some field-specific logic that
      *   combines the above two strategies.
+     *
+     * Note that after the given snapshots have been combined to create the whole-trace
+     * snapshot using this API, any change made by the tool to the snapshot contents will
+     * not have any effect.
      */
     virtual interval_state_snapshot_t *
     combine_interval_snapshots(
@@ -350,14 +397,14 @@ template <typename RecordType> class analysis_tool_tmpl_t {
      * Prints the interval results for the given series of interval state snapshots in
      * \p interval_snapshots.
      *
-     * This is currently invoked with the list of whole-trace interval snapshots (for
-     * the parallel mode, these are the snapshots created by merging the shard-local
-     * snapshots).
+     * This is invoked with the list of whole-trace interval snapshots (for the
+     * parallel mode, these are the snapshots created by merging the shard-local
+     * snapshots). For the \p -interval_instr_count snapshots in parallel mode, this is
+     * invoked separately for the snapshots of each shard.
      *
      * The framework should be able to invoke this multiple times, possibly with a
      * different list of interval snapshots. So it should avoid free-ing memory or
-     * changing global state. This is to keep open the possibility of the framework
-     * printing interval results for each shard separately in future.
+     * changing global state.
      */
     virtual bool
     print_interval_results(
@@ -370,6 +417,10 @@ template <typename RecordType> class analysis_tool_tmpl_t {
      * by \p interval_snapshot is no longer needed by the framework. The tool may
      * de-allocate it right away or later, as it needs. Returns whether it was
      * successful.
+     *
+     * Note that if the tool removed some snapshot from the list passed to
+     * finalize_interval_snapshots(), then release_interval_snapshot() will not be
+     * invoked for that snapshot.
      */
     virtual bool
     release_interval_snapshot(interval_state_snapshot_t *interval_snapshot)
@@ -476,10 +527,10 @@ template <typename RecordType> class analysis_tool_tmpl_t {
     /**
      * Notifies the analysis tool that the given trace \p interval_id in the shard
      * represented by the given \p shard_data has ended, so that it can generate a
-     * snapshot of its internal state in a struct derived from \p
+     * snapshot of its internal state in a type derived from \p
      * interval_state_snapshot_t, and return a pointer to it. The returned pointer will
-     * be provided to the tool in later combine_interval_snapshots() and
-     * print_interval_result() calls.
+     * be provided to the tool in later combine_interval_snapshots(),
+     * finalize_interval_snapshots(), and print_interval_result() calls.
      *
      * Note that the provided \p interval_id is local to the shard that is
      * represented by the given \p shard_data, and not the whole-trace interval. The
@@ -488,30 +539,22 @@ template <typename RecordType> class analysis_tool_tmpl_t {
      * shard-local \p interval_state_snapshot_t corresponding to that whole-trace
      * interval.
      *
-     * \p interval_id is a positive ordinal of the trace interval that just ended.
-     * Trace intervals have a length equal to the \p -interval_microseconds specified
-     * to the framework. Trace intervals are measured using the value of the
-     * #TRACE_MARKER_TYPE_TIMESTAMP markers. The provided \p interval_id
-     * values will be monotonically increasing but may not be continuous,
-     * i.e. the tool may not see some \p interval_id if the trace shard did not
-     * have any activity in that interval.
+     * The \p interval_id field is defined similar to the same field in
+     * generate_interval_snapshot().
      *
-     * The returned \p interval_state_snapshot_t* will be passed to the
-     * combine_interval_snapshot() API which is invoked by the framework to merge
-     * multiple \p interval_state_snapshot_t from different shards in the parallel
-     * mode of the analyzer.
+     * The returned \p interval_state_snapshot_t* is treated in the same manner as
+     * the same in generate_interval_snapshot(), with the following additions:
      *
-     * Finally, the print_interval_result() API is invoked with a list of
-     * \p interval_state_snapshot_t* representing interval snapshots for the
-     * whole trace. In the parallel mode of the analyzer, this list is computed by
-     * combining the shard-local \p interval_state_snapshot_t using the tool's
-     * combine_interval_snapshot() API.
+     * In case of \p -interval_microseconds in the parallel mode: after
+     * finalize_interval_snapshots() has been invoked, the \p interval_state_snapshot_t*
+     * objects generated at the same time period across different shards are passed to
+     * the combine_interval_snapshot() API by the framework to merge them to create the
+     * whole-trace interval snapshots. The print_interval_result() API is then invoked
+     * with the list of whole-trace \p interval_state_snapshot_t* thus obtained.
      *
-     * The tool must not de-allocate the state snapshot until
-     * release_interval_snapshot() is invoked by the framework.
-     *
-     * An example use case of this API is to create a time series of some output
-     * metric over the whole trace.
+     * In case of \p -interval_instr_count in the parallel mode: no merging across
+     * shards is done, and the print_interval_results() API is invoked for each list
+     * of shard-local \p interval_state_snapshot_t*.
      */
     virtual interval_state_snapshot_t *
     generate_shard_interval_snapshot(void *shard_data, uint64_t interval_id)