You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, some ops support int8 input and int8 output directlly.
However, we suggest the user to specify which ops use int8 precision via "quant.json". The rest of the operators will not save quant information and use float16 by default.
As you described in the docs,
But in the code, it seems that you handle int8 for special case. Can you tell me why you do this?
https://github.com/openppl-public/ppl.nn/blob/252e7f27eec3976a3be48bb21f15c660cddec6af/src/ppl/nn/engines/cuda/optimizer/opt_kernel.h#L264
The text was updated successfully, but these errors were encountered: