Provides a common interface for CUDA kernels.
Runs the kernel with the given inputs and outputs.
See Implementation
Provides a common interface for CUDA kernels.