-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose conv2d weights/biases as ops #14566
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/1)
#include <sys/types.h> | ||
#include <cstdint> | ||
|
||
using namespace tt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use namespace using-directives; use using-declarations instead
#include "ttnn/cpp/ttnn/operations/data_movement/reshape_view/reshape.hpp" | ||
#include "ttnn/tensor/tensor.hpp" | ||
|
||
using namespace tt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use namespace using-directives; use using-declarations instead
@@ -104,8 +176,16 @@ def conv2d( | |||
memory_config: ttnn.MemoryConfig = None, # memory config overrides by user | |||
conv_op_cache={}, # basic conv object caching in python needed for intermediate refactoring. Not needed after full op refactoring in C++. | |||
debug=False, # ignored | |||
return_output_size=False, | |||
return_prepared_device_weights=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why complicate the API like this?
If the values are computed then there's no additional cost to returning them.
) | ||
|
||
|
||
def prepare_conv_bias( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't new ops use the new op framework?
If you use that, then bind_registered_operation should remove the need for this extra code.
@@ -30,6 +30,8 @@ def Conv1d( | |||
conv_config: Conv1dConfig = None, # config overrides by user | |||
conv_op_cache={}, # basic conv object caching in python needed for intermediate refactoring. Not needed after full op refactoring in C++. | |||
debug=False, | |||
return_output_length=False, | |||
return_prepared_device_weights=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why complicate the API like this?
If the values are computed then there's no additional cost to returning them.
Same comment below for 2d.
c368626
to
638c23e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/2)
conv_config.act_block_h_override = constants::TILE_HEIGHT; | ||
} | ||
} | ||
|
||
template <typename T> | ||
Result conv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
|
||
Result Conv2dOperation::invoke( | ||
uint8_t queue_id, | ||
template std::tuple<ttnn::Tensor, uint32_t, uint32_t, ttnn::Tensor, std::optional<ttnn::Tensor>> conv2d<Device>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit template instantiation cannot have a definition; if this definition is meant to be an explicit specialization, add <>
after the template
keyword
template std::tuple<ttnn::Tensor, uint32_t, uint32_t, ttnn::Tensor, std::optional<ttnn::Tensor>> conv2d<Device>( | |
template<> std::tuple<ttnn::Tensor, uint32_t, uint32_t, ttnn::Tensor, std::optional<ttnn::Tensor>> conv2d<Device>( |
|
||
Result Conv2dOperation::invoke( | ||
uint8_t queue_id, | ||
template std::tuple<ttnn::Tensor, uint32_t, uint32_t, ttnn::Tensor, std::optional<ttnn::Tensor>> conv2d<Device>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no function template matches function template specialization conv2d
|
||
template <typename T> | ||
std::pair<ttnn::Tensor, std::optional<ttnn::Tensor>> prepare_conv_weights_biases_and_move_to_device(const ttnn::Tensor& weight_tensor, std::optional<const ttnn::Tensor>& bias_tensor, uint32_t input_channels_alignment, DataType weights_bias_dtype, uint32_t weight_block_h_ntiles, uint32_t weight_block_w_ntiles, const sliding_window::ParallelConfig& parallel_config, T * device, uint32_t groups, uint32_t act_block_h_ntiles, uint32_t input_width); | ||
|
||
template <typename T> | ||
Result conv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
@@ -215,7 +45,6 @@ | |||
std::optional<const Conv2dConfig> conv_config_ = std::nullopt, | |||
const std::optional<const MemoryConfig> memory_config = std::nullopt); | |||
|
|||
|
|||
struct Conv2dOperation{ | |||
static Result invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
|
||
template <typename T> | ||
std::pair<ttnn::Tensor, std::optional<ttnn::Tensor>> prepare_conv_weights_biases_and_move_to_device(const ttnn::Tensor& weight_tensor, std::optional<const ttnn::Tensor>& bias_tensor, uint32_t input_channels_alignment, DataType weights_bias_dtype, uint32_t weight_block_h_ntiles, uint32_t weight_block_w_ntiles, const sliding_window::ParallelConfig& parallel_config, T * device, uint32_t groups, uint32_t act_block_h_ntiles, uint32_t input_width); | ||
|
||
template <typename T> | ||
Result conv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
@@ -215,7 +45,6 @@ | |||
std::optional<const Conv2dConfig> conv_config_ = std::nullopt, | |||
const std::optional<const MemoryConfig> memory_config = std::nullopt); | |||
|
|||
|
|||
struct Conv2dOperation{ | |||
static Result invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
|
||
template <typename T> | ||
std::pair<ttnn::Tensor, std::optional<ttnn::Tensor>> prepare_conv_weights_biases_and_move_to_device(const ttnn::Tensor& weight_tensor, std::optional<const ttnn::Tensor>& bias_tensor, uint32_t input_channels_alignment, DataType weights_bias_dtype, uint32_t weight_block_h_ntiles, uint32_t weight_block_w_ntiles, const sliding_window::ParallelConfig& parallel_config, T * device, uint32_t groups, uint32_t act_block_h_ntiles, uint32_t input_width); | ||
|
||
template <typename T> | ||
Result conv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
@@ -215,7 +45,6 @@ | |||
std::optional<const Conv2dConfig> conv_config_ = std::nullopt, | |||
const std::optional<const MemoryConfig> memory_config = std::nullopt); | |||
|
|||
|
|||
struct Conv2dOperation{ | |||
static Result invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown type name Result
if (num_cores_nhw < compute_grid_size.x && out_nhw_ntiles > compute_grid_size.x) { | ||
num_cores_nhw = find_closest_largest_divisor_with_num_padding(out_nhw_ntiles, compute_grid_size.x); | ||
} | ||
grid = num_cores_to_corerange_set(num_cores_nhw, compute_grid_size, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use of undeclared identifier num_cores_to_corerange_set
; did you mean num_cores_to_corerangeset
?
grid = num_cores_to_corerange_set(num_cores_nhw, compute_grid_size, true); | |
grid = num_cores_to_corerangeset(num_cores_nhw, compute_grid_size, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (2/2)
} else if (shard_layout == TensorMemoryLayout::WIDTH_SHARDED) { | ||
num_cores_nhw = 1; | ||
uint32_t num_cores_c = find_closest_common_largest_divisor(out_c_ntiles, std::ceil((float)input_channels / effective_tile_width), max_num_cores); | ||
grid = num_cores_to_corerange_set(num_cores_c, compute_grid_size, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use of undeclared identifier num_cores_to_corerange_set
; did you mean num_cores_to_corerangeset
?
grid = num_cores_to_corerange_set(num_cores_c, compute_grid_size, true); | |
grid = num_cores_to_corerangeset(num_cores_c, compute_grid_size, true); |
Added new weight and bias preparation ops, and new conv op that can only take pre-prepared weights. Working conv test with pre-prepared weights added return weight/output dims kwargs Only auto-shard if shard_layout not specified Pass input memory config to prepare functions Organize utility functions into their own files
638c23e
to
f22c36d
Compare
Tickets
#13219, #12793
Problem description
Modelling convolutions in MLIR is difficult when you need to execute conv2d once in order to get the proper inputs for the next iteration.
What's changed
Moving weight and bias preparation to dedicated ops. The logic that prepared the ops on host already exists and is functional. The issue is that it is done within
conv2d
. Most of the code changes is copy/pasted.Checklist