(Ask) why InferInheritedType handle int8 to fp16 out? #844

DeepTecher · 2023-07-19T05:50:44Z

As you described in the docs,

InferInheritedType(info); // All Inputs inherit the type of their previous node’s outputs and sets outputs to the data type of the first input.

But in the code, it seems that you handle int8 for special case. Can you tell me why you do this?
https://github.com/openppl-public/ppl.nn/blob/252e7f27eec3976a3be48bb21f15c660cddec6af/src/ppl/nn/engines/cuda/optimizer/opt_kernel.h#L264

The text was updated successfully, but these errors were encountered:

Si-XU · 2023-07-20T08:44:30Z

Int8 type should be used combining with quant scale.
We strictly use function of UnifyToOutputQuant and CopyQuantType to deal with int8 inputs.

ppl.nn/src/ppl/nn/engines/cuda/optimizer/opt_kernel.h
This statement is just in case some operations forget to handle int8 inputs.

DeepTecher · 2023-07-20T09:42:51Z

Okay. But why do we handle int8 input to fp16 input in the case that some operations forget to handle int8 inputs?

However, perhaps some ops may be not quant op, but support int8 input and int8 output. Like on Min/Max operation

Si-XU · 2023-07-21T03:20:22Z

Yes, some ops support int8 input and int8 output directlly.
However, we suggest the user to specify which ops use int8 precision via "quant.json". The rest of the operators will not save quant information and use float16 by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Ask) why InferInheritedType handle int8 to fp16 out? #844

(Ask) why InferInheritedType handle int8 to fp16 out? #844

DeepTecher commented Jul 19, 2023

Si-XU commented Jul 20, 2023

DeepTecher commented Jul 20, 2023

Si-XU commented Jul 21, 2023

(Ask) why InferInheritedType handle int8 to fp16 out? #844

(Ask) why InferInheritedType handle int8 to fp16 out? #844

Comments

DeepTecher commented Jul 19, 2023

Si-XU commented Jul 20, 2023

DeepTecher commented Jul 20, 2023

Si-XU commented Jul 21, 2023