Adaptive Sparse Tiling for Sparse Matrix Multiplication (ASpT)
Adaptive Sparse Tiling; SpMM && SDDMM; 2D tiling; row panels and classifies column segments as either "dense" or "sparse"; reordering to group dense columns contiguously
3
3
2
2022
PPoPP
China University of Petroleum-Beijing
TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
divide sparse matrices into fixed-size sparse tiles; SpGEMM; determining the tile structure of the result matrix via symbolic SpGEMM; generating the nonzero structure and row pointers for each tile using bitmask operations and binary search; performing numeric computation with an adaptive sparse or dense accumulator based on tile density
4
3
3
2024
SC
Indiana University && UIUC
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
Tall-and-Skinny Matrix Multiplication; SpGEMM; distributed-memory algorithm; 1D row partitioning combined with virtual 2D tiling strategy; based on the sparsity pattern of each tile to adaptively choose between local or remote computation modes
3
2
3
2024
PPoPP
THU
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs (Rode)
Row Decomposition; SpMM && SDDMM; decompose row into a regular part (containing a multiple of 32 nonzeros) and a residual part; block splitting technique to achieve load balancing; sub-block pipelining technique to overlap computation and memory access
3
3
2
2025
TACO
HUST
ApSpGEMM: Accelerating Large-scale SpGEMM with Heterogeneous Collaboration and Adaptive Panel
heterogeneous collaboration methods; SpGEMM && SpMM && SDDMM;lightweight analysis to extract matrix features;varying sparsity levels to either CPU or GPU using core affinity analysis;synchronous computation and transfer overlapping
Georgia Institute of Technology && University of Delaware
Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise Computation
hierarchical clustering; SpGEMM; new format called CSR_Cluster; identifies similar rows via a single SpGEMM operation A×AT; process cluster collectively to improve data reuse of B
spECK: accelerating gpu sparse matrix-matrix multiplication through lightweight analysis
A lightweight and multi-level analysis framework that dynamically selects and tunes the best algorithm;Choose between hashing and dense accumulation;Direct referencing for each row based on real-time matrix characteristics
3
3
2
2025
SC
China University of Petroleum-Beijing
KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU
tensor cores as compute units;registers for local storage;shared memory for communication;1D & 2D & 3D partitioning strategies to optimize data locality and reduce communication overhead
2
2
2
2025
HPCA
Hunan University && Arizona State University
HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs
a binary search-based accumulator design; pre-generating a sorted column index array during the symbolic stage;incorporates tailored symbolic processing for matrices of different scales
3
3
1
Using Tensor Core: Hardware adaptation and computation granularity optimization¶
Year
Venue
Authors
Title
Tags
P
E
N
2021
SC
University of California
Efficient Tensor Core-Based GPU Kernels for Structured Sparsity under Reduced Precision