Fig. 3From: Scalable non-negative matrix tri-factorizationComputational and data transfer workflow for block-wise update of factor matrix U on architecture with four processing units and data matrix X partitioned into 2×2 blocks. Each vertical band represents a processing unit (PU0 to PU3). Stages where all data are available for the next wave of asynchronous operations are horizontally aligned and are marked with t i , i∈{0,1,…,11}Back to article page