array.fill_ is currently implemented as a host-to-device copy plus a memtile kernel. During graph capture, the address of the source argument is recorded and used for the HtoD copies in the subsequent ...