Solution: an algorithm is a well-defined, finite sequence of steps that solves a specific problem or accomplishes a particular task. We focus on algorithms that can solving problems.
Solution: Dynamic graph algorithms efficiently update solutions to graph problems as the graph evolves, addressing the challenge of frequent changes in structure and data.
Year
Venue
Authors
Title
Tags
P
E
N
2023
ASPLOS
UCR
CommonGraph: Graph Analytics on Evolving Data
convert deletions to additions; common graph concept; Triangular Grid (TG) for work sharing; mutation-free representation
Solution: ML algorithms are fundamental tools that enable computers to learn from data and make predictions or decisions without being explicitly programmed.
Solution: Transformer is an old algorithm, which have many problems like square complexity. These problems raise new algorithms to fix the old architecture.
Year
Venue
Authors
Title
Tags
P
E
N
2019
arXiv
Google
Fast Transformer Decoding: One Write-Head is All You Need
MQA; share same KV cache for all heads; multi-query attention
1
4
3
2024
NeuroComputing
ZhuiYi
RoFormer: Enhanced Transformer with Rotary Position Embedding
use rotary position embedding to fix the problem of long context; nter-word dependencies decay gradually with the increase of relative distance
3
4
3
2025
arXiv
Qwen
Parallel Scaling Law for Language Models
enhance model's parallel ability to enhance the performance instead of increasing the model size; parallel multi output and conclude one output
Challenge: diffusion models generate result from noise, this is different from traditional AR paradigm. Diffusion LLMs need to solve the problem of the order of text logic and the generation of diffusion's random output index.
Year
Venue
Authors
Title
Tags
P
E
N
2025
arXiv
RUC
Large Language Diffusion Models
First using diffusion LLM model to generate; Diffusion model excels at reversal reasoning; inter-block AR while in-block diffusion
3
4
3
2025
arXiv
HKU
Dream 7B: Diffusion Large Language Models
based on AR model's pre-train; inter-block diffusion
3
3
3
2025
arXiv
THU
Survey on Diffusion Language Models
survey on training strategies, inference optimization, multimodal and applications of diffusion language models
4
2
2
2025
arXiv
ByteDance
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
two-stage training with mask-based and edit-based noise; constrained-order training by filtering optimal generation paths; direct training to reduce generation steps
3
3
5
2025
arXiv
RUC
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Diffusion-aware NTK extrapolation for RoPE; long-context post-training with adaptive attention masking
Solution: LLM alignment aims to make LLM outputs more consistent with user intent. Its challenges are ensuring safety, addressing multi-modal complexities, and balancing inference ability with alignment.
Year
Venue
Authors
Title
Tags
P
E
N
2024
arXiv
SJTU
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
social scene simulation; emulate realistic multiparty interactions and consequences; monopolylogue
2025
ICLR
Princeton
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
ai-savety centered alignment; enhance sacety on deeper tokens and data
3
3
3
2025
ACL
PKU
Language Models Resist Alignment: Evidence From Data Compression
LLM have inner resistance to alignment; the larger scale in pre-train increase the resistance
Solution: RL learns from rewards or penalties received without labeled data. It takes actions that interact with the environment. It can learn optimal policies in super large config space.
Year
Venue
Authors
Title
Tags
P
E
N
2015
Nature
DeepMind
Human-level control through deep reinforcement learning
deep reinforcement learning; human-level control; playing Atari games
Solution: Investigate and optimize critical implementation details and hyperparameters, such as network initialization and observation normalization, to ensure training stability and maximize the agent’s final performance.
Year
Venue
Authors
Title
Tags
P
E
N
2020
ICML
MitsubishiElectric
Can Increasing Input Dimensionality Improve Deep Reinforcement Learning
Solution: DNN training algorithms are essential for optimizing deep neural networks, enabling them to learn from data and improve their performance on various tasks. They address challenges like convergence speed, generalization, and robustness.
Year
Venue
Authors
Title
Tags
P
E
N
2017
ICLR
Stanford
DSD: Dense-Sparse-Dense Training for Deep Neural Networks
3 step dense-sparse-dense training
3
5
4
2020
NeurIPS
MIT
Differentiable Augmentation for Data-Efficient GAN Training
Differentiable Augmentation to improve data efficiency in generative adversarial networks training
3
4
4
2020
CVPR
NTHU
Robust Processing-In-Memory Neural Networks via Noise-Aware Normalization
noise-aware calibration in BatchNorm statistics
3
3
3
2025
ASPLOS
Nvidia&CMU&MIT
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
graph pipeline parallelism; topology-aware stage partitioning and scheduling algorithm
4
3
2
2025
arXiv
ZhiCun
Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware
extension of STE for complex noise environments; STE-based gradient approximation strategy
Solution: Multi-task learning (MTL) is a machine learning paradigm where multiple related tasks are learned simultaneously, leveraging shared representations to improve performance across tasks.
Year
Venue
Authors
Title
Tags
P
E
N
2018
NeurIPS
Intel
Multi-Task Learning as Multi-Objective Optimization
Frank-Wolfe-based optimizer that scales to high-dimensional problems; provide an upper bound for the MGDA(multiple-gradient descent algorithm) optimization objective
3
4
4
2019
NeurIPS
CUHK
Pareto Multi-Task Learning
method to decompose a MTL problem into multiple subproblems; scalable optimization algorithm to solve all constrained subproblems
3
4
4
2021
NeurIPS
UTexas
Conflict-Averse Gradient Descent for Multi-task learning
Conflict-Averse Gradient descent (CAGrad); reduces the conflict among gradients while provably converges to minimum average loss
Solution: Graph Neural Network (GNN) is a model that leverages the relationships between nodes and edges in graph-structured data to perform feature propagation and representation learning, enabling the capture of complex topological dependencies and structural patterns.
Year
Venue
Authors
Title
Tags
P
E
N
2020
TITS
CSU
T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction
GCN for spatial dependence; GRU for temporal dependence; noise based perturbation analysis
4
4
2
2020
ICLR
Walmart
Inductive Representation Learning on Temporal Graphs
functional time encoding; temporal graph attention layer
3
4
2
2020
AAAI
MIT
EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs
EvolveGCN-H / EvolveGCN-O architectures; evolving graph convolution unit; model adaptation via parameter evolution
4
3
2
2023
ICSE
CUHK
Eadro: An End-to-End Troubleshooting Framework for Microservices on Multi-source Data
Hawkes process; dilated causal convolution; joint detection and localization via multi-task learning
Solution: Quantization are focusing on tradeoffs of accuracy and computation/memory. The challenges are how to run models in high performance and low memory/computation cost.
Solution: Adaptive datatypes aim to optimize numerical representation by dynamically adjusting to the precision and range requirements of data. The challenge lies in balancing computational efficiency, memory usage, and accuracy across diverse tasks and hardware constraints.
OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
outlier-victim pair that sacrifices the colocated normal values to accommodate the outliers;OVP-based quantization framework and architectural implementation
4
4
2
2023
ICLR
ETH Zurich
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Arbitrary Order Insight; Lazy Batch-Updates; Cholesky Reformulation
4
4
3
2024
MLSys
MIT
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Preserving 1% Salient Weights; Protecting Salient Weights by Activation-aware Scaling; searching to scale
4
4
4
2025
arXiv
Rice
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
dynamic-length float; preserving bit-for-bit identical outputs; BFloat16 exponents carry significantly less information than their allocated bit width
4
4
4
2025
HPCA
SJTU
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type
group-wise quantization for both weight and KV cache; new encoding paradigm to improve information utilization in group-wise quantization; specific processing element for encoding paradigm
AdaBits: Neural Network Quantization With Adaptive Bit-Widths
joint-quantization method applied in training;Switchable Clipping Level (SCL) between layers
4
3
3
2022
ICLR
Snap Inc.
F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
variance-based fixed-point format selection for weights and activations; training algorithm for fixed-point models
3
3
2
2022
MICRO
SJTU
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
fixed-length adaptive numerical data type; combines the advantages of float and int for adapting to the importance of different values within a tensor; adaptive framework that selects the best type for each tensor
2024
TCAD
HKU
DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference
adaptive data representation with variablelength encoding; hardware-aware quantization framework
2024
arXiv
Harvard
Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models
Solution: General quantization methods aim to optimize the trade-off between model accuracy and computational efficiency. Challenges include addressing layer-specific quantization errors, enhancing fault tolerance, and finding optimal bit-width configurations.
HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference
method without dequantization; homomorphic quantization method for matrix multiplication; requantization elimination
2
2
3
2025
arXiv
SJTU
MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization
a non-uniform quantization algorithm based on product quantization; leverages sparse computation and asynchronous quantization; distributes quantization power unevenly across channels
measurement to estimate the effect of parameter quantization errors in individual layers;optimization process for finding optimal quantization bit-width for each layer
3
3
4
2020
ISCA
SJTU
DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration
Solution: Quantization-aware training (QAT) is a technique that simulates the effects of quantization during the training process, allowing the model to learn to adapt to the quantization noise.
Year
Venue
Authors
Title
Tags
P
E
N
2018
arXiv
IBM
PACT: Parameterized Clipping Activation for Quantized Neural Networks
activation quantization scheme for finding the optimal quantization scale during training
3
4
3
2020
ICLR
IBM
Learned Step Size Quantization
approximate the gradient to the quantizer step size; heuristic to bring the magnitude of step size updates into better balance with weight updates
3
4
3
2022
CVPR
HKUST
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
Nonuniform-to-Uniform Quantizer (N2UQ) via learning input thresholds; Generalized Straight-Through Estimator (GSTE) to tackle intractable gradient computation in N2UQ
Solution: modeling and searching the parameter space of collective communication libraries, dynamically selecting optimal configurations under real training/analyzing workloads
Year
Venue
Authors
Title
Tags
P
E
N
2024
SIGCOMM
UPenn&Microsoft
Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
traffic engineering based collective communication optimization; mixed-integer linear program; A* technique for scaling
4
3
2
2025
NSDI
USTC
AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
Solution: investigating how modern distributed systems detect and mitigate fail-slow behaviors, focusing on the design of detection mechanisms, threshold policies, and recovery strategies to improve system resilience under partial performance degradation
Year
Venue
Authors
Title
Tags
P
E
N
2019
ATC
UChicago
IASO: A Fail-Slow Detection and Mitigation Framework for Distributed Storage Services
slowdown detection based on peer score; sub-root causes for five kinds of root causes
2023
FAST
SJTU & Alibaba
PERSEUS: A Fail-Slow Detection Framework for Cloud Storage Systems
outlier data detection; regression model for detection threshold; risk evaluating algorithm
4
4
3
2025
ASPDAC
Xiamen University
A Fail-Slow Detection Framework for HBM Devices
outlier data detection; regression model for detection threshold; risk evaluating algorithm
2
4
2
2025
NSDI
SJTU & UMich
One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
adaptive detection at runtime(ADR); slow-fault injection pipeline; danger zone analysis
Solution: Use software-level techniques to manage or exploit the on-chip heterogeneity caused by process variation; use adaptive resource allocation and workload management to improve overall performance, power efficiency, and reliability.
Year
Venue
Authors
Title
Tags
P
E
N
2014
DATE
UW-Madison
Process Variation-Aware Workload Partitioning Algorithms for GPUs Supporting Spatial-Multitasking
per-SM clocking (PSMC); process variation-aware SM-to-application assignment
3
3
2
2016
CSUR
ORNL
A Survey of Architectural Techniques for Managing Process Variation
PV-aware processor management; specific component targeted management