On gemma3n with decode batch > 1, it happens when the embedding is coupled with PLE by einsum.
The export steps are:
1) Initial: BMM([b,2048]x[2048,7680] -> [b,7680])
2) FuseInputReshape_BatchMatMulWithFlattenedRhsDims: BMM([b,2048]x[2048,7680] -> [b,7680])
3) ConvertBatchMatMulOp2FullyConnectedOp_Rank2ConstantRhs: FC([b,2048]x[2048,7680] -> [b,7680])
4) StrictQuantizationPattern(by IsDrqTensor): FC([b,1,2048]x[2048,7680] -> [b,7680])
When FC's keep_num_dims is false and it's followed by reshape op (like gemma3n), keep_num_dims will be set to true later with correct shapes by EnableFullyConnectedKeepNumDimsBeforeReshape.
LiteRT-Converter-PiperOrigin-RevId: 847813526
Imported from GitHub PR https://github.com/google-ai-edge/LiteRT/pull/4837
Copybara import of the project:
--
12b5bfe82e5d1575df2b49e7dc819a88b5313b61 by chunhsue-qti <chunhsue@qti.qualcomm.com>:
Qualcomm AI Engine Direct - Add QNN E-wise Max & Div INT16 tests.
Co-Authored-By: William Lin <chengwl@qti.qualcomm.com>
Merging this change closes#4837
COPYBARA_INTEGRATE_REVIEW=https://github.com/google-ai-edge/LiteRT/pull/4837 from graham0824:dev/chunhsue/add_op_test 12b5bfe82e5d1575df2b49e7dc819a88b5313b61
LiteRT-PiperOrigin-RevId: 846884452
The GoogleTensorCompileFlatbuffer API now receives the SOC model information embedded within the GoogleTensorOptions proto instead of as a separate argument. The compiler_plugin populates the GoogleTensorCompilerConfig within the options proto based on the provided SOC model string.
LiteRT-PiperOrigin-RevId: 846789694
- The check for the existence of Google Tensor options was incorrectly using `GetOpaqueOptions()` instead of `GetGoogleTensorOptions()`.
LiteRT-PiperOrigin-RevId: 846543576
Also introduces an API (but not ABI) modification to thrRegisterBuffer when type == kThrBufferTypeDmaBuf, specifying that buffer is now a pointer to an fd, rather than directly containing an fd. This is to prevent clients from storing an int in a void*, which is implementation-defined behavior.
LiteRT-PiperOrigin-RevId: 846468851
Also, add a `numThreads` option.
Additionally, fix a bug where the wrong options were passed when compiling the model, (hence the changes to other tests).
LiteRT-PiperOrigin-RevId: 846355706
This does not add `pre_compiled` test as it only contains the qualcomm model.
Additionally, `single_op_device_tests` is failing and may brick your device.
Thus, it is excluded as for now.
LiteRT-PiperOrigin-RevId: 846021174