Runs the kernel with the given inputs and outputs.
An array of CUDABuffer objects, each corresponding to one of the dependencies of the operation used to construct this kernel.
The destination buffer.
See Implementation
Runs the kernel with the given inputs and outputs.