Algorithms, Theory, and Formal Methods¶

Algorithm design and analysis¶

Solution: an algorithm is a well-defined, finite sequence of steps that solves a specific problem or accomplishes a particular task. We focus on algorithms that can solving problems.

Dynamic Graph Algorithms¶

Solution: Dynamic graph algorithms efficiently update solutions to graph problems as the graph evolves, addressing the challenge of frequent changes in structure and data.

Year	Venue	Authors	Title	Tags	P	E	N
2023	ASPLOS	UCR	CommonGraph: Graph Analytics on Evolving Data	convert deletions to additions; common graph concept; Triangular Grid (TG) for work sharing; mutation-free representation	3	4	4

ML Algorithms¶

Solution: ML algorithms are fundamental tools that enable computers to learn from data and make predictions or decisions without being explicitly programmed.

Diffusion Models¶

Solution: Diffusion models are generative models that learn to reverse a gradual noising process to generate data from noise.

Year	Venue	Authors	Title	Tags	P	E	N
2023	arXiv	UC Berkeley	Scalable Diffusion Models with Transformers	Diffusion Transformer (DiT) architecture; replace the original U-Net with transformer blocks; adaptive layer norm (adaLN-Zero) for conditioning	3	5	5

Auto Regressive Models for Image¶

Year	Venue	Authors	Title	Tags	P	E	N
2024	NeurIPS	PKU	Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	VAR modeling with next-scale prediction; multi-scale quantization for coarse-to-fine tokenization; power-law scaling laws for visual AR models	4	5	5

LLM Algorithm¶

Solution: enable ai chat with human, some people think is the way to AGI.

Year	Venue	Authors	Title	Tags	P	E	N
2020	arXiv	OpenAI	Scaling Laws for Neural Language Models	fundamentals of LLM; increase model size and performance raise	4	5	5
2024	ICLR	NUS	Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	pair-rank aggregation; verbalized confidence mimicry	4	4	4

Speculative Decoding¶

Solution: Speculative decoding aims to accelerate the inference of large language models (LLMs) by utilizing a smaller, faster draft model to generate multiple tokens in parallel, which are then verified and corrected by the larger, more accurate target model.

Year	Venue	Authors	Title	Tags	P	E	N
2023	ICML	Google	Fast Inference from Transformers via Speculative Decoding	speculative sampling; draft/approximation model; theoretical bound of acceptance rate	4	5	5
2024	arXiv	Princeton & UIUC	Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	medusa heads appended to the last layer; medusa-2 joint training	4	4	4

LLM Inference Based on Activation Sparsity¶

Solution: Activation sparsity-based LLM inference aims to optimize the computational efficiency of large language models by leveraging the inherent sparsity in their activations, allowing for reduced resource consumption during inference.

Year	Venue	Authors	Title	Tags	P	E	N
2023	ICML	Rice	Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time	asynchronous lookahead predictor; hardware-aware contextual sparsity; layer-wise activation similarity	3	4	4
2023	NeurIPS	UTexas	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models	greedy accumulation eviction; dynamic submodular maximization; sparsity-preserving generation	4	4	4

Early Exit LLM¶

Solution: Early exit LLMs aim to reduce inference time and computational cost by allowing the model to produce outputs at intermediate layers when sufficient confidence is achieved.

Year	Venue	Authors	Title	Tags	P	E	N
2020	ACL	UWaterloo	DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference	intermediate off-ramps; Frozen backbone fine-tuning; entropy-based inference threshold	3	3	4
2020	ACL	PKU & Tencent	FastBERT: a Self-distilling BERT with Adaptive Inference Time	branch-wise self-distillation; adaptive conditional computation	4	4	4

LLM Transformer¶

Solution: Transformer is an old algorithm, which have many problems like square complexity. These problems raise new algorithms to fix the old architecture.

Year	Venue	Authors	Title	Tags	P	E	N
2019	arXiv	Google	Fast Transformer Decoding: One Write-Head is All You Need	MQA; share same KV cache for all heads; multi-query attention	1	4	3
2024	NeuroComputing	ZhuiYi	RoFormer: Enhanced Transformer with Rotary Position Embedding	use rotary position embedding to fix the problem of long context; nter-word dependencies decay gradually with the increase of relative distance	3	4	3
2025	arXiv	Qwen	Parallel Scaling Law for Language Models	enhance model's parallel ability to enhance the performance instead of increasing the model size; parallel multi output and conclude one output	4	4	4

Diffusion LLMs¶

Challenge: diffusion models generate result from noise, this is different from traditional AR paradigm. Diffusion LLMs need to solve the problem of the order of text logic and the generation of diffusion's random output index.

Year	Venue	Authors	Title	Tags	P	E	N
2025	arXiv	RUC	Large Language Diffusion Models	First using diffusion LLM model to generate; Diffusion model excels at reversal reasoning; inter-block AR while in-block diffusion	3	4	3
2025	arXiv	HKU	Dream 7B: Diffusion Large Language Models	based on AR model's pre-train; inter-block diffusion	3	3	3
2025	arXiv	THU	Survey on Diffusion Language Models	survey on training strategies, inference optimization, multimodal and applications of diffusion language models	4	2	2
2025	arXiv	ByteDance	Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference	two-stage training with mask-based and edit-based noise; constrained-order training by filtering optimal generation paths; direct training to reduce generation steps	3	3	5
2025	arXiv	RUC	UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models	Diffusion-aware NTK extrapolation for RoPE; long-context post-training with adaptive attention masking	3	3	4

dLLM with other technique¶

Year	Venue	Authors	Title	Tags	P	E	N
2025	arXiv	RUC	LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning	visual instruction tuning for diffusion models; multi-stage training for multimodal reasoning; visual instruction tuning for diffusion models	2	3	4
2025	arXiv	Nvidia	TiDAR: Think in Diffusion, Talk in Autoregression	dLLM with spec technique; use dllm-self as the draft model to accelerate the generation process	3	3	2
2025	arXiv	RUC	LLaDA-MoE: A Sparse MoE Diffusion Language Model	Sparse MoE masked diffusion architecture; variable-length training intervention; multi-stage annealing pipeline	4	3	3

LLM Alignment¶

Solution: LLM alignment aims to make LLM outputs more consistent with user intent. Its challenges are ensuring safety, addressing multi-modal complexities, and balancing inference ability with alignment.

Year	Venue	Authors	Title	Tags	P	E	N
2024	arXiv	SJTU	Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation	social scene simulation; emulate realistic multiparty interactions and consequences; monopolylogue
2025	ICLR	Princeton	Safety Alignment Should Be Made More Than Just a Few Tokens Deep	ai-savety centered alignment; enhance sacety on deeper tokens and data	3	3	3
2025	ACL	PKU	Language Models Resist Alignment: Evidence From Data Compression	LLM have inner resistance to alignment; the larger scale in pre-train increase the resistance	4	3	4

LLM Finetune¶

Solution: finetune adapts a pre-trained model to a specific task or domain. By doing so, the model can better fit the specific task or domain.

Year	Venue	Authors	Title	Tags	P	E	N
2021	ICLR	Miscrosoft	LoRA: Low-Rank Adaptation of Large Language Models	split the weight matrix into two parts; reduce the number of parameters to finetune	2	4	4

Coding LLM Finetune¶

Year	Venue	Authors	Title	Tags	P	E	N
2024	arXiv	UMD	HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages	large synthetic parallel programming dataset; parallel code generation; HPC AI developer tools

LLM-Powered AI Agent¶

Challenge: How to scale agent number and how to reach human level behavior.

Year	Venue	Authors	Title	Tags	P	E	N
2018	CVPR	MIT	VirtualHome: Simulating Household Activities via Programs	simulation framework for home agent; interaction within the home; action planning and execution	3	3	3
2023	ICLR	Google	ReAct: Synergizing Reasoning and Acting in Language Models	interleaved reasoning and acting; Thought-Act-Observation loop for LLM agents; few-shot prompting for agentic behavior	4	4	4
2024	arXiv	THU	LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination	hierarchical language agent; real-time human-AI coordination; slow mind & fast mind
2025	arXiv	THU	Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents	RAG-based LLM agent; use OpenAI api to run LLM; agent num is about 50	3	3	2
2025	arXiv	THU	OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents	IO optimization for agent; prompt compression for LLM	3	2	1

Year	Venue	Authors	Title	Tags	P	E	N
2023	UIST	Stanford	Generative Agents: Interactive Simulacra of Human Behavior	agent socail simulation baseline; 25 agents in town	3	4	3
2024	arXiv	Fudan	From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents	dividing simulations into Individual/Scenario/Society levels; modular agent architecture analysis involving Profile-Memory-Planning-Action	3	3	3

LLM Agent for tool use¶

Year	Venue	Authors	Title	Tags	P	E	N
2023	NeurIPS	Meta	Toolformer: Language Models Can Teach Themselves to Use Tools	perplexity reduction as self-supervised signal for tool learning; automatic annotation of API calls for fine-tuning; in-context learning to generate tool-use training data	4	4	3
2025	arXiv	Berkeley	The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models	tool call ability benchmark	3	3	3
2025	arXiv	Startup	MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Calling in LLM Agent Multi-Turn Conversations	remove unused tool in multi-turn conversation; use tool selector to select the most appropriate tool	3	3	2

RL Algorithms¶

Solution: RL learns from rewards or penalties received without labeled data. It takes actions that interact with the environment. It can learn optimal policies in super large config space.

Year	Venue	Authors	Title	Tags	P	E	N
2013	Arxiv	DeepMind	Playing Atari with Deep Reinforcement Learning	Deep Q-Network (DQN) architecture; frame stacking; experience replay based random sampling	3	3	3
2015	Nature	DeepMind	Human-level control through deep reinforcement learning	deep reinforcement learning; human-level control; playing Atari games	5	5	3
2016	ICLR	Google	Continuous control with Deep Reinforcement Learning	Deep Deterministic Policy Gradient (DDPG); soft target updates; Ornstein-Uhlenbeck (OU) action noise	3	4	3
2017	arXiv	OpenAI	Proximal Policy Optimization Algorithms	clipped surrogate objective; importance sampling; Actor-Critic with shared parameters; generalized advantage estimation	4	3	3
2025	arXiv	DeepReinforce	CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	contrastive RL-driven CUDA optimization without human priors; LLM-based CUDA kernel optimization; reward design for CUDA kernel	4	4	2

RL Parameter Adjustment¶

Solution: Investigate and optimize critical implementation details and hyperparameters, such as network initialization and observation normalization, to ensure training stability and maximize the agent’s final performance.

Year	Venue	Authors	Title	Tags	P	E	N
2015	Arxiv	Google	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	internal covariate shift; mini-batch statistics normalization; parameter scale invariance	4	4	2
2020	ICML	MitsubishiElectric	Can Increasing Input Dimensionality Improve Deep Reinforcement Learning	high-dimensional feature expansion; online MLP-denseNet module; batch normalization for covariate shift	4	4	2
2020	arXiv	Google	What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study	near-zero action initialization; input observation normalization; data multi-pass training	3	5	2
2020	ICLR	MIT	Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO	value function clipping; orthogonal initialization; trust region enforcement	3	3	2

DNN Training Algorithms¶

Solution: DNN training algorithms are essential for optimizing deep neural networks, enabling them to learn from data and improve their performance on various tasks. They address challenges like convergence speed, generalization, and robustness.

Year	Venue	Authors	Title	Tags	P	E	N
2017	ICLR	Stanford	DSD: Dense-Sparse-Dense Training for Deep Neural Networks	3 step dense-sparse-dense training	3	5	4
2020	NeurIPS	MIT	Differentiable Augmentation for Data-Efficient GAN Training	Differentiable Augmentation to improve data efficiency in generative adversarial networks training	3	4	4
2020	CVPR	NTHU	Robust Processing-In-Memory Neural Networks via Noise-Aware Normalization	noise-aware calibration in BatchNorm statistics	3	3	3
2025	ASPLOS	Nvidia&CMU&MIT	GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism	graph pipeline parallelism; topology-aware stage partitioning and scheduling algorithm	4	3	2
2025	arXiv	ZhiCun	Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware	extension of STE for complex noise environments; STE-based gradient approximation strategy	3	3	3

Differentiable Compilers & Custom Ops¶

Challenge: Manually deriving and implementing efficient gradients for novel operators is complex, error-prone, and hinders rapid innovation.

Year	Venue	Authors	Title	Tags	P	E	N
2018	SIGGRAPH	MIT	Differentiable Programming for Image Processing and Deep Learning in Halide	reverse-mode automatic differentiation; scatter-to-gather conversion for gradients; compiling differentiable programs	4	5	4

Multi-task Learning¶

Solution: Multi-task learning (MTL) is a machine learning paradigm where multiple related tasks are learned simultaneously, leveraging shared representations to improve performance across tasks.

Year	Venue	Authors	Title	Tags	P	E	N
2018	NeurIPS	Intel	Multi-Task Learning as Multi-Objective Optimization	Frank-Wolfe-based optimizer that scales to high-dimensional problems; provide an upper bound for the MGDA(multiple-gradient descent algorithm) optimization objective	3	4	4
2019	NeurIPS	CUHK	Pareto Multi-Task Learning	method to decompose a MTL problem into multiple subproblems; scalable optimization algorithm to solve all constrained subproblems	3	4	4
2021	NeurIPS	UTexas	Conflict-Averse Gradient Descent for Multi-task learning	Conflict-Averse Gradient descent (CAGrad); reduces the conflict among gradients while provably converges to minimum average loss	3	3	3

Graph Neural Network¶

Solution: Graph Neural Network (GNN) is a model that leverages the relationships between nodes and edges in graph-structured data to perform feature propagation and representation learning, enabling the capture of complex topological dependencies and structural patterns.

Year	Venue	Authors	Title	Tags	P	E	N
2020	TITS	CSU	T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction	GCN for spatial dependence; GRU for temporal dependence; noise based perturbation analysis	4	4	2
2020	ICLR	Walmart	Inductive Representation Learning on Temporal Graphs	functional time encoding; temporal graph attention layer	3	4	2
2020	AAAI	MIT	EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs	EvolveGCN-H / EvolveGCN-O architectures; evolving graph convolution unit; model adaptation via parameter evolution	4	3	2
2023	ICSE	CUHK	Eadro: An End-to-End Troubleshooting Framework for Microservices on Multi-source Data	Hawkes process; dilated causal convolution; joint detection and localization via multi-task learning	3	3	2

Quantization¶

Solution: Quantization are focusing on tradeoffs of accuracy and computation/memory. The challenges are how to run models in high performance and low memory/computation cost.

Adaptive Datatype¶

Solution: Adaptive datatypes aim to optimize numerical representation by dynamically adjusting to the precision and range requirements of data. The challenge lies in balancing computational efficiency, memory usage, and accuracy across diverse tasks and hardware constraints.

For LLM¶

Year	Venue	Authors	Title	Tags	P	E	N
2023	ISCA	SJTU	OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization	outlier-victim pair that sacrifices the colocated normal values to accommodate the outliers;OVP-based quantization framework and architectural implementation	4	4	2
2023	ICLR	ETH Zurich	GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Arbitrary Order Insight; Lazy Batch-Updates； Cholesky Reformulation	4	4	3
2024	MLSys	MIT	AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration	Preserving 1% Salient Weights; Protecting Salient Weights by Activation-aware Scaling; searching to scale	4	4	4
2025	arXiv	Rice	70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float	dynamic-length float; preserving bit-for-bit identical outputs; BFloat16 exponents carry significantly less information than their allocated bit width	4	4	4
2025	HPCA	SJTU	M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type	group-wise quantization for both weight and KV cache; new encoding paradigm to improve information utilization in group-wise quantization; specific processing element for encoding paradigm	4	4	2
2025	HPCA	Cornell	BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration	introduce additional asymmetry to FP by repurposing a redundant zero value with another special value; hardware accelerator design	3	3	3

For Non-LLM¶

Year	Venue	Authors	Title	Tags	P	E	N
2020	CVPR	ByteDance Inc.	AdaBits: Neural Network Quantization With Adaptive Bit-Widths	joint-quantization method applied in training;Switchable Clipping Level (SCL) between layers	4	3	3
2022	ICLR	Snap Inc.	F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization	variance-based fixed-point format selection for weights and activations; training algorithm for fixed-point models	3	3	2
2022	MICRO	SJTU	ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization	fixed-length adaptive numerical data type; combines the advantages of float and int for adapting to the importance of different values within a tensor; adaptive framework that selects the best type for each tensor
2024	TCAD	HKU	DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference	adaptive data representation with variablelength encoding; hardware-aware quantization framework
2024	arXiv	Harvard	Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models	Nanoscaling Floating-Point (NxFP); NanoMantissa; Adaptive Microexponents; Code Recycling
2025	ISCA	SJTU	FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypE	dimensional fuzzing-distance importance measure; fine-grained compression framework	4	3	4

General method¶

Solution: General quantization methods aim to optimize the trade-off between model accuracy and computational efficiency. Challenges include addressing layer-specific quantization errors, enhancing fault tolerance, and finding optimal bit-width configurations.

Outlier-aware Quantization¶

Solution: Outlier-aware quantization focuses on identifying and handling outlier values in neural network

Year	Venue	Authors	Title	Tags	P	E	N
2023	ICML	MIT	SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models	offline migrates the quantization difficulty from activations to weights	4	5	3
2025	ICML	Huawei,THU	FlatQuant: Flatness Matters for LLM Quantization	post-training quantization method to enhance the flatness of both weights and activations in LLMs	4	4	3
2025	arXiv	Houmo AI	ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models	activation-aware scaling; sensitivity-based truncation rank searching; low-rank KV Cache storage	4	3	3
2025	arXiv	Houmo AI	OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting	frobenius norm of quantization noise; iterative alternating optimization	3	3	3

For General LLM¶

Year	Venue	Authors	Title	Tags	P	E	N
2024	ISCA	SNU	Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization	“power of 2” channel decomposition rule; Tender accelerator design	4	3	2
2025	arXiv	PKU	Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	ternary mpGEMM library; avoid intricate bit-level manipulations; achieving lossless inference for BitNet b1.58
2025	AAAI	ByteDance	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	block-wise distribution correction and compensation scheme; bit balance strategy	4	3	2

KV Cache specialized¶

Year	Venue	Authors	Title	Tags	P	E	N
2025	arXiv	UVa	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	method without dequantization; homomorphic quantization method for matrix multiplication; requantization elimination	2	2	3
2025	arXiv	SJTU	MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization	a non-uniform quantization algorithm based on product quantization; leverages sparse computation and asynchronous quantization; distributes quantization power unevenly across channels	3	4	2

For Non-LLM¶

Year	Venue	Authors	Title	Tags	P	E	N
2018	AAAI	SUTD	Adaptive Quantization for Deep Neural Network	measurement to estimate the effect of parameter quantization errors in individual layers;optimization process for finding optimal quantization bit-width for each layer	3	3	4
2020	ISCA	SJTU	DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration	dynamic region-based quantization algorithm; sub-feature map quantization; accelerator architecture for proposing dynamic region-based quantization	4	3	2
2021	MLSys	Nvidia	VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference	per-vector(≈16-64 elements) scaled quantization technique; two-level scaling scheme and algorithm; modified MAC unit in accelerator	4	3	5
2021	ICML	Intel	Accurate Post Training Quantization With Small Calibration Sets	layer-by-layer optimization method; integer programming; para-normalization	3	3	3
2023	ACML	KOBE-U	A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers	semilayers based on whether loss difference is positive or negative	3	2	2

Fault Tolerance¶

Solution: Fault tolerance in quantization ensures that models remain robust and reliable despite errors or noise

Year	Venue	Authors	Title	Tags	P	E	N
2019	DFT	Xilinx	Efficient Error-Tolerant Quantized Neural Network Accelerators	selective channel replication; fault-aware scheduling of processing elements for folded implementations	3	2	3
2023	DAC	Yonsei	RQ-DNN: Reliable Quantization for Fault-tolerant Deep Neural Networks	quantization to enhance fault tolerance caused by fault in memory; quantize to bimodal	3	3	3

Quantization-Aware Training¶

Solution: Quantization-aware training (QAT) is a technique that simulates the effects of quantization during the training process, allowing the model to learn to adapt to the quantization noise.

Year	Venue	Authors	Title	Tags	P	E	N
2018	arXiv	IBM	PACT: Parameterized Clipping Activation for Quantized Neural Networks	activation quantization scheme for finding the optimal quantization scale during training	3	4	3
2020	ICLR	IBM	Learned Step Size Quantization	approximate the gradient to the quantizer step size; heuristic to bring the magnitude of step size updates into better balance with weight updates	3	4	3
2022	CVPR	HKUST	Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation	Nonuniform-to-Uniform Quantizer (N2UQ) via learning input thresholds; Generalized Straight-Through Estimator (GSTE) to tackle intractable gradient computation in N2UQ	3	3	3
2025	arXiv	HKU & ByteDance	Scaling Law for Quantization-Aware Training	a mathematical model for QAT quantization error	4	4	4

DNN Compression¶

Solution: DNN compression aims to reduce the size and computational requirements of deep neural networks

Year	Venue	Authors	Title	Tags	P	E	N
2016	ICLR	Stanford	Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding	three stage pipeline: pruning, trained quantization and Huffman coding	4	4	4
2020	JSTSP	Fraunhofer HHI	DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks	identify set of priors in DNN; redefine CABAC's core scheme to capture priors	3	5	3
2022	ICLR	Samsung	Language model compression with weighted low-rank factorization	fisher-weighted frobenius norm; alternating least squares (ALS) for matrix factorization	4	4	4

Statistical Parameter Estimation¶

Solution: infer the distribution of variables using statistical methods from observed data

Year	Venue	Authors	Title	Tags	P	E	N
1977	JRSSB	Harvard	Maximum Likelihood from Incomplete Data via the EM Algorithm	incomplete data; maximum likelihood expectation algorithm	2	1	3
2016	Big Data	LPNU	Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems	extreme gradient boosting classifier; generalized linear model	2	1	2
2023	J Process Contr	UA	Modeling and Bayesian inference for processes characterized by abrupt variations	dynamic latent variable model; variational Bayesian inference framework	3	2	2

Time Synchronization¶

Solution: designing appropriate synchronization strategies, and improving the performance and adaptability of discrete event simulation

Year	Venue	Authors	Title	Tags	P	E	N
1993	JACM	W&M	The Cost of Conservative Synchronization in Parallel Discrete Event Simulations	windowing mechanism based conservative synchronization; lower-bound performance analysis based on stochastic modeling	2	3	2
2002	TPDS	Dartmouth‌	Composite Synchronization in Parallel Discrete-Event Simulation	composite synchronization mechanism; mathematical model based on synchronization overhead optimization	3	3	2
2013	PDES	MSOE	Synchronization methods in parallel and distributed discrete-event simulation	conservative/optimistic synchronization methods; chandy-misra-bryant algorithm; time warp mechanism	3	1	1

Communication Optimization¶

Solution: modeling and searching the parameter space of collective communication libraries, dynamically selecting optimal configurations under real training/analyzing workloads

Year	Venue	Authors	Title	Tags	P	E	N
2024	SIGCOMM	UPenn&Microsoft	Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem	traffic engineering based collective communication optimization; mixed-integer linear program; A* technique for scaling	4	3	2
2025	NSDI	USTC	AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training	low-level performance parameters tuning; subspace division and intra-subspace coordinate descent search algorithms	4	4	2
2025	SC	THU	TraceFlow: Efficient Trace Analysis for Large-Scale Parallel Applications via Interaction Pattern-Aware Trace Distribution	communication skeleton tree; interaction-aware trace distribution; communication-minimized trace shuffling	4	4	2

Fail-Slow Detection¶

Solution: investigating how modern distributed systems detect and mitigate fail-slow behaviors, focusing on the design of detection mechanisms, threshold policies, and recovery strategies to improve system resilience under partial performance degradation

Year	Venue	Authors	Title	Tags	P	E	N
2019	ATC	UChicago	IASO: A Fail-Slow Detection and Mitigation Framework for Distributed Storage Services	slowdown detection based on peer score; sub-root causes for five kinds of root causes
2023	FAST	SJTU & Alibaba	PERSEUS: A Fail-Slow Detection Framework for Cloud Storage Systems	outlier data detection; regression model for detection threshold; risk evaluating algorithm	4	4	3
2025	ASPDAC	Xiamen University	A Fail-Slow Detection Framework for HBM Devices	outlier data detection; regression model for detection threshold; risk evaluating algorithm	2	4	2
2025	NSDI	SJTU & UMich	One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems	adaptive detection at runtime(ADR); slow-fault injection pipeline; danger zone analysis	4	4	2
2025	ATC	HUKST & Alibaba	Greyhound: Hunting Fail-Slows in Hybrid-Parallel Training at Scale	Bayesian online change-point detection; ski-rental problem based mitigation; topology-aware parallelism adjustment	4	4	2

Process Variation Management¶

Solution: Use software-level techniques to manage or exploit the on-chip heterogeneity caused by process variation; use adaptive resource allocation and workload management to improve overall performance, power efficiency, and reliability.

Year	Venue	Authors	Title	Tags	P	E	N
2014	DATE	UW-Madison	Process Variation-Aware Workload Partitioning Algorithms for GPUs Supporting Spatial-Multitasking	per-SM clocking (PSMC); process variation-aware SM-to-application assignment	3	3	2
2016	CSUR	ORNL	A Survey of Architectural Techniques for Managing Process Variation	PV-aware processor management; specific component targeted management	3	1	1

Dynamic Voltage and Frequency Scaling¶

Solution: Develop DVFS control policies to optimize the trade-offs among performance, power consumption, and thermal constraints.

Year	Venue	Authors	Title	Tags	P	E	N
2021	TCAD	YNU	CARTAD: Compiler-Assisted Reinforcement Learning for Thermal-Aware Task Scheduling and DVFS on Multicores	XGBoost based intrinsic feature identification; RL-Based scheduler	3	3	2
2023	JETCAS	Uppsala	Game-of-Life Temperature-Aware DVFS Strategy for Tile-Based Chip Many-Core Processors	GoL temperature-aware DVFS; core/un-core performance characterizer; GoL DVFS controller	4	3	2
2024	ISCA	IBM&CU	BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCs	coin exchange algorithm; unified voltage and frequency regulation	3	4	2
2024	DATE	KIT	Multi-Agent Reinforcement Learning for Thermally-Restricted Performance Optimization on Manycores	RL-based thermally-restricted performance optimization; multi-agent based per-core DVFS	4	3	2

Data structures¶

Solution: organizing and storing data efficiently to enable fast access, modification, and processing

Dynamic Graph Processing¶

Solution: data structures for processing dynamic graphs, which are graphs that change over time.

Survey & Benchmarking¶

Year	Venue	Authors	Title	Tags	P	E	N
2024	FCS	HUST	A survey on dynamic graph processing on GPUs: concepts, terminologies and systems	taxonomy of GPU graph updating (Array/CSR/Chain-based) and computing; standardization of Stream/Snapshot/Temporal graph models
2024	VLDB	PKU	Towards Sufficient GPU-Accelerated Dynamic Graph Management: Survey and Experiment	topology structure; attribute storage; auxiliary structures	4	4	2

In-Memory Data Structures¶

Solution: Data organizations optimizing in-memory topology updates and local access on GPUs.

Year	Venue	Authors	Title	Tags	P	E	N
2023	TKDE	PKU	An Efficient Data Structure for Dynamic Graph on GPUs	leveled packed memory array; redundancy-free top-down re-balancing method; con-concurrent strategy Opera	4	4	3
2023	SC	IIT Ropar	GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs	Complete Binary Tree (CBT) organization for adjacency edge blocks; edge pre-allocated queue with prefix-sum popping to eliminate runtime allocation	3	4	3

Out-of-Core & Heterogeneous Systems¶

Solution: Techniques handling graphs exceeding GPU memory or leveraging CPU-GPU co-execution.

Year	Venue	Authors	Title	Tags	P	E	N
2025	TACO	ICT	CGCGraph: Efficient CPU-GPU Co-execution for Concurrent Dynamic Graph Processing	shared/unshared graph partitioning for CPU-GPU co-execution; kernel fusion for shared concurrent snapshots; Assemble Existence Bitmap for compressed atomic updates	3	4	2
2025	VLDB	NUS & ByteDance	Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing	GPU-native dependency-memoization using decoupled CAS; vertex-centric hot subgraph cache management; chunked CSR for partial updates	3	4	3

Computational complexity¶

Solution: analyzing and classifying how the time and space requirements of an algorithm grow as the input size increases.

Computability theory¶

Solution: helping to identify the fundamental limits of what can be computed, regardless of time or space constraints.

Algorithms, Theory, and Formal Methods¶

Algorithm design and analysis¶

Dynamic Graph Algorithms¶

ML Algorithms¶

Diffusion Models¶

Auto Regressive Models for Image¶

LLM Algorithm¶

Speculative Decoding¶

LLM Inference Based on Activation Sparsity¶

Early Exit LLM¶

LLM Transformer¶

Diffusion LLMs¶

dLLM with other technique¶

LLM Alignment¶

LLM Finetune¶

Coding LLM Finetune¶

LLM-Powered AI Agent¶

LLM Agent for social simulation¶

LLM Agent for tool use¶

RL Algorithms¶

RL Parameter Adjustment¶

DNN Training Algorithms¶

Differentiable Compilers & Custom Ops¶

Multi-task Learning¶

Graph Neural Network¶

Quantization¶

Adaptive Datatype¶

For LLM¶

For Non-LLM¶

General method¶

Outlier-aware Quantization¶

For General LLM¶

KV Cache specialized¶

For Non-LLM¶

Fault Tolerance¶

Quantization-Aware Training¶

DNN Compression¶

Statistical Parameter Estimation¶

Time Synchronization¶

Communication Optimization¶

Fail-Slow Detection¶

Process Variation Management¶

Dynamic Voltage and Frequency Scaling¶

Data structures¶

Dynamic Graph Processing¶

Survey & Benchmarking¶

In-Memory Data Structures¶

Out-of-Core & Heterogeneous Systems¶

Computational complexity¶

Computability theory¶