Programming Languages and Software Engineering¶

Language design and semantics¶

Solution: user-friendly, resource-efficient, and secure programming languages

Compiler construction and optimization¶

Solution: improving performance, reducing resource usage, and ensuring correctness

Year	Venue	Authors	Title	Tags	P	E	N
2024	MICRO	Georgia Tech	Unleashing CPU Potential for Executing GPU Programs through Compiler/Runtime Optimizations	anti-coalescing transformation; block size invariant analysis; tail block adaptive synchronization; GPU-block dynamic tiling	2	4	3

Program Optimization and Rewriting Frameworks¶

Challenge: Traditional sequential compiler optimizations suffer from the phase-ordering problem, while existing equational reasoning tools are often too slow or rigid for domain-specific, non-syntactic analyses.

Solution: Utilize equality saturation with efficient data structures (e.g., e-graphs) and extensible analysis mechanisms to explore the space of equivalent programs without strict ordering.

Year	Venue	Authors	Title	Tags	P	E	N
2021	POPL	UW	egg: Fast and Extensible Equality Saturation	equality saturation algorithm;e-graphs (Equality Graphs);deferred rebuilding technique	3	4	4
2025	ASPLOS	Cornell	SmoothE: Differentiable E-Graph Extraction	differentiable e-graph extraction;probabilistic continuous relaxation;learning-based (MLP) cost model	4	5	5
2026	ASPLOS	PKU	Finding Reusable Instructions via E-Graph Anti-Unification	e-graph anti-unification algorithm for custom instruction identification;pattern vectorization;hardware-aware cost model	4	4	4
2026	ASPLOS	KAIST	Trinity: Three-Dimensional Tensor Program Optimization via Tile-level Equality Saturation	stateful equality saturation via expression propagation; context-aware two-pass extraction algorithm	3	5	4

Deep Learning Compilers¶

Solution: Graph transformations, Kernel fusion, Tensor optimization for compute and memory

Foundational DSLs & Compilers¶

Challenge: Bridging high-level algorithmic expression with low-level hardware performance for array/tensor computations.

Solution: Decouple algorithm specification from its execution schedule to enable performance portability and automated optimization.

Year	Venue	Authors	Title	Tags	P	E	N
2013	PLDI	MIT	Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines	algorithm-schedule decoupling; image processing DSL; explicit scheduling language	4	4	4
2018	OSDI	UW	TVM: An Automated End-to-End Optimizing Compiler for Deep Learning	operator fusion; graph-level DL compiler; automatic code generation; tensor expression simplification	4	4	4
2020	OSDI	UCB	Ansor: Generating High-Performance Tensor Programs for Deep Learning	hierarchical search space program sampler; evolutionary fine-tuning&learned cost model; gradient descent based task scheduler	4	4	3
2023	ASPLOS	CMU	TensorIR: An Abstraction for Automatic Tensorized Program Optimization	use block abstraction for computation isolation and loop transformations; tensor intrinsic matching algorithm; evolutionary search-based automatic scheduling	4	4	4
2023	ASPLOS	UToronto	Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs	task-mapping based programming; hardware-centric schedule space; post-scheduling operator fusion	4	4	3

Tensor & Pipeline Optimization¶

Challenge: Manually writing high-performance kernels for diverse tensor operations is difficult and not portable.

Solution: Use MLIR-based abstractions, novel programming models, and automatic tuning to generate efficient code for tensor programs and pipelines.

Year	Venue	Authors	Title	Tags	P	E	N
2023	OSDI	Microsoft Research	WELDER: Scheduling Deep Learning Memory Access via Tile-graph	Tile-graph abstraction for fine-grained memory management; inter-layer independence for optimization space decoupling; tile traffic-based cost model	3	4	4
2025	PPoPP	Thu	FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property	dataflow centered code recognition and optimization; two-stage heuristic algorithm to optimize tensor computation; kernel fusion	4	4	2
2026	ICLR	PKU	TILELANG: Bridge Programmability and Performance in Modern Neural Kernels	tvm-based compiler; compared to triton is more flexible	4	4	3
2025	OSDI	PKU	PipeThreader: Software-Defined Pipelining for Efficient DNN Execution	Pipeline programming abstraction and orchestration mechanism for heterogeneous computing units; tile size and pipeline stage number tradeoff	4	4	4
2025	arXiv	CMU	Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	mega-kernelization; SM level parallel optimization; SM level representation	3	3	3

Graph-Level Transformation and Optimization¶

Challenge: Large DNN topologies and massive tensor sizes easily exceed hardware memory capacities and execution efficiency limits, making isolated operator-level optimizations insufficient.

Solution: Apply coordinated graph-level transformations (such as operator fission and fusion) and holistic topological scheduling to systematically optimize the entire computation graph for peak memory and execution latency.

Year	Venue	Authors	Title	Tags	P	E	N
2021	MLSys	MIT	IOS: Inter-Operator Scheduler for CNN Acceleration	inter-operator parallelism; dynamic programming based scheduler; concurrent execution; operator fusion	4	4	2
2022	OSDI	Stanford	Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization	parallel computation graph; graph substitution; three level joint optimization algorithm	4	4	2
2023	DAC	PKU	Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC	dataflow grouping; location-aware accelerator mapping; hybrid scheduling algorithm	4	3	2
2024	DAC	THU	GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference	flow-based graph partition; joint optimization computational graph(JOCG); joint cost model; backtracking search algorithm	4	4	2
2024	ASPLOS	PKU	MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN	Fission Transformation (F-Trans) for graph splitting; Dimension Graph (D-Graph) representation; incremental scheduling algorithm for fast evaluation	4	4	4

TinyML & Edge Compilers¶

Solution: address extreme memory and compute constraints on microcontrollers through specialized compilation strategies like fine-grained memory planning and redundancy-free tensor splitting.

Year	Venue	Authors	Title	Tags	P	E	N
2024	HPCA	NYCU	TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers	dependency-free tensor splitting model; virtual feature map (VFP) for zero-copy concatenation; fine-grained life-cycle aware memory planner	4	4	3
2025	HPCA	NYCU	EDA: Energy-Efficient Inter-Layer Model Compilation for Edge DNN Inference Acceleration	Inter-Layer Operator Scheduling;SRAM-Constrained Tiling	4	3	3

Compiler for Accelerators¶

Challenge: The semantic gap and astronomically large scheduling space of spatial accelerators make traditional compilers and manual tuning ineffective.

Solution: High-level programming model, automatic code generation, performance optimization for specialized hardware

Year	Venue	Authors	Title	Tags	P	E	N
2021	ISCA	UCB	CoSA: Scheduling by Constrained Optimization for Spatial Accelerators	mixed-integer programming (MIP) for scheduling; prime-factor allocation; constant binary matrices for algorithm-hardware constraints	5	3	4
2022	ISCA	PKU	AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction	hardware abstraction; constraints based mapping generation; binary matrix based validation; hierarchical level performance model	3	4	2
2022	PLDI	MIT	Exocompilation for Productive Programming of Hardware Accelerators	Exocompilation; externalized accelerator specification; user-defined instructions; rewrite-based scheduling; effect analysis for safety	4	4	3
2024	HPCA	Stanford	Revet: A Language and Compiler for Dataflow Threads	dataflow threads execution model for vRDA; structured-link tensor format (SLTF) for control flow encoding; compiler lowering from imperative control flow to streaming dataflow	4	3	4
2026	ASPLOS	Stanford	Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism	asynchronous dataflow streaming abstraction;symbolic shape notation for dynamic tiling;dynamic hardware configuration time-multiplexing for MoE	4	3	4

Embedded DSL Compiler Frameworks¶

Challenge: Developing high-performance parallel DSLs for heterogeneous hardware requires significant repetitive effort in building IRs, optimizers, and code generators from scratch.

Year	Venue	Authors	Title	Tags	P	E	N
2011	IEEE Micro	Stanford & EPFL	Implementing Domain-Specific Languages for Heterogeneous Parallel Computing	Delite compiler framework; language virtualization; multi-view IR (Generic/Parallel/Domain-Specific); lightweight modular staging (LMS); heterogeneous code generation	4	4	3
2011	PPoPP	Stanford	A Domain-Specific Approach To Heterogeneous Parallelism	Delite runtime; deferred execution model; dynamic task graph; Delite op archetypes; GPU memory manager; run-ahead model	4	4	4

Sparse Tensor Compilers¶

Challenge: Compared to dense tensors, sparse tensors have more complex data structure and computation patterns.

Year	Venue	Authors	Title	Tags	P	E	N
2023	ASPLOS	UW	SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning	use composable formats for the expression for sparse matrix; divide the compute to different stages and reuse current optimizations;TensorIR-based sparse compiler	4	3	3
2024	OOPSLA	Cornell	UniSparse: An Intermediate Language for General Sparse Format Customization	language-based holistic format abstraction; decoupled logical (data structure) vs. physical (memory layout) representation; index map & orthogonal primitives (mutation/layout/query) for customization	4	4	4

Graph Mining Compilers¶

Challenge: generic runtime algorithms, automatically compile high-level specifications into efficient code.

Year	Venue	Authors	Title	Tags	P	E	N
2019	SOSP	CSM	AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining	automatic algorithm generation; set-based embedding representation; schedule generation via graph tournament	4	4	4

Domain-Specific Languages¶

Solution: formal semantics definition, tool generation automation, cross-domain generalization

Machine Learning DSLs¶

Challenge: Using general-purpose languages for ML requires explicit, complex parallelization for heterogeneous hardware, limiting productivity and performance portability.

Year	Venue	Authors	Title	Tags	P	E	N
2011	ICML	Stanford	OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning	OptiML; implicitly parallel DSL; best-effort computing; relaxed dependencies; domain-specific intermediate representation	4	4	4

Graph DSLs¶

Challenge: balancing expressiveness, usability, and performance in graph DSL design

Year	Venue	Authors	Title	Tags	P	E	N
2012	ASPLOS	Stanford	Green-Marl: A DSL for Easy and Efficient Graph Analysis	domain-specific breadth-first/depth-first ordered traversal primitives; deferred data-parallel assignment for bulk synchronous consistency; architecture-independent loop fusion and reduction bounds relaxation	4	4	4
2018	OOPSLA	MIT	GraphIt: A High-Performance Graph DSL	algorithm-schedule decoupling for graph; Graph Iteration Space (GIS); scheduling language for traversal strategies; compiler-guided autotuning	4	4	4

Sparse Tensor Algebra Compilers¶

Solution: multi-format iteration efficiency, format combination optimization, architecture-agnostic code generation

Format Abstraction and Conversion¶

Challenge: Sparse tensor algebra compilers need a compact way to describe many tensor formats and generate format-specific iteration code without hand-writing every format combination.

Solution: Use tensor index notation, merge-based sparse iteration, and format-level abstractions to generate code for many sparse and dense layouts.

Year	Venue	Authors	Title	Tags	P	E	N
2017	OOPSLA	MIT	The Tensor Algebra Compiler	TACO compiler; iteration graph; merge lattice; compound tensor algebra codegen	4	4	5
2018	OOPSLA	MIT	Format Abstraction for Sparse Tensor Algebra Compilers	coordinate hierarchies; level formats abstraction; property-based merge lattice optimizations; level iterator conversion	4	4	3
2020	PLDI	MIT	Automatic Generation of Efficient Sparse Tensor Format Conversion Routines	coordinate remapping notation; attribute query language; tensor assembly abstract interface; three-phase conversion decomposition	4	4	4

Sparse Tensor Scheduling and Assembly¶

Scheduling and Interoperability¶

Challenge: Sparse tensor programs require schedule choices and library interoperation that depend on sparse formats, nonzero structure, and external optimized kernels.

Solution: Expose sparse iteration transformations, asymptotic schedule selection, and verified external-function binding as compiler-level mechanisms.

Year	Venue	Authors	Title	Tags	P	E	N
2020	OOPSLA	Reservoir Labs && Stanford	A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra	sparse iteration space transformations; derived iteration spaces; nonzero-space tiling; sparse scheduling API	4	4	4
2022	PLDI	MIT	Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model	automatic asymptotic scheduler; asymptotic cost model; Pareto frontier schedule pruning; TACO schedule search	4	4	4
2023	PLDI	Stanford	Mosaic: An Interoperable Compiler for Tensor Algebra	verified external-function binding; Mosaic function interface; automatic binding search; heterogeneous tensor algebra codegen	4	4	4

Dynamic Assembly and Workspaces¶

Challenge: Sparse tensor computations often need dynamic updates or scattered writes into sparse results whose formats do not support efficient random insertion.

Solution: Generate update-friendly dynamic formats and intermediate sparse workspaces that adapt scatter-heavy computation to sparse result assembly.

Year	Venue	Authors	Title	Tags	P	E	N
2022	OOPSLA	MIT	Compilation of Dynamic Sparse Tensor Algebra	node schema language; assembly abstract interface; map function generation; iterator optimization; dynamic tensor format composition	4	4	3
2024	PLDI	Stanford	Compilation of Modular and General Sparse Workspaces	sparse workspace insertion; workspace insertion algorithm template; sparse scattering detection; modular workspace implementations	4	4	4

Transpilers¶

Solution: automatic, correct, and performant source-to-source code translation across different hardware ecosystems

Year	Venue	Authors	Title	Tags	P	E	N
2025	OSDI	CAS	QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach	neural-symbolic synthesis; LLM-assisted transcompilation; SMT-based code repair; hierarchical auto-tuning	3	4	2

Hardware Description Languages¶

Solution: expressive hardware specification, efficient simulation and synthesis, robust verification methodologies

HDL Language Design¶

Challenge: balancing expressiveness, usability, and synthesis efficiency in HDL design

Year	Venue	Authors	Title	Tags	P	E	N
2024	FPGA	PKU	Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and Synthesis	incorporates an event layer and the ctrl sub-language; event-based extension; cycle-level timing analysis and control synthesis techniques	4	4	4

Streaming Computation Models¶

Solution: high-throughput data processing, real-time analytics, efficient resource utilization for continuous data

Year	Venue	Authors	Title	Tags	P	E	N
2020	ASPLOS	Stanford	Fleet: A Framework for Massively Parallel Streaming on FPGAs	user write serial code for parallel; multi-stream parallelism; ready-valid signaling	3	4	3
2020	PLDI	Stanford	Type-Directed Scheduling of Streaming Accelerators	SSeq/TSeq space-time types; static throughput matching via types; invalid-bubble encoding in type system; type-directed scheduling	4	4	3

HLS Code Generation and Automation¶

Solution: bridging high-level languages to hardware, design space exploration, QoR improvement automation

Predictable HLS Programming Models¶

Challenge: Legacy software languages repurposed for HLS rely on complex, unsystematic heuristics, leading to unpredictable area-performance trade-offs.

Solution: Use formal type systems and language constraints (like time-sensitive affine types) to guarantee predictable and optimal hardware generation.

Year	Venue	Authors	Title	Tags	P	E	N
2020	PLDI	Cornell	Predictable Accelerator Design with Time-Sensitive Affine Types	Dahlia language; time-sensitive affine type system representing consumable hardware resources; logical time steps encoded in types; memory views for decoupling iteration from memory banking	3	3	3

General HLS Optimizations and Techniques¶

Solution: Develop general techniques to enhance HLS QoR by optimizing key aspects like timing, resource management, and code structure.

Year	Venue	Authors	Title	Tags	P	E	N
2020	DAC	University of California	Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency	systematic classification of HLS implicit broadcasts (data/control/pipeline); broadcast-aware scheduling with calibrated delay models; skid-buffer-based pipeline flow control to eliminate stall signal broadcasting	4	4	4
2022	ASPLOS	UCLA	HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repair	automated test generation; dependence-guided search space pruning; early candidate rejection using coding styles	3	4	3
2022	FPGA	Cornell	HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs	Decoupled data placement; Unified data placement primitive; Multi-level memory hierarchy optimization	4	4	4
2024	DATE	UIUC	Subgraph Extraction-Based Feedback-Guided Iterative Scheduling for HLS	ISDC iterative SDC scheduling; subgraph extraction-based low-level feedback; fanout and window-based subgraph extraction mechanism	4	4	4
2025	FPGA	University of Glasgow	Dynamic Loop Fusion in High-Level Synthesis	Dynamic loop fusion; HLS; Irregular memory access; Address monotonicity; Decoupled Access/Execute (DAE); Program-order schedule; Data Unit (DU)	4	4	4

High-Level Language to HLS Abstractions¶

Solution: Raise the abstraction level by enabling HLS code generation from high-level languages like Python, simplifying hardware design for non-experts.

Year	Venue	Authors	Title	Tags	P	E	N
2019	FPGA	UCLA	HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing	algorithm-schedule decoupling; Python DSL; tensor-based computation; quantitative data types; design space exploration	4	4	4
2021	TC	UIUC	PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow	Python-based HLS flow; algorithm-centric operators (map/dot); automatic hardware type inference; automatic HLS pragma insertion	4	4	3
2024	MICRO	HUST	A Scalable Efficient and Robust Dynamic Memory Management Library for HLS-based FPGAs	DMM as graph analytics; request-guided graph traversal; data-centric concurrent traversal; shortcut-assisted fast traversal	4	4	3

MLIR-based HLS Compiler Frameworks¶

Solution: Leverage the MLIR infrastructure to build modular, extensible, and reusable HLS frameworks for better analysis, transformation, and code generation.

Year	Venue	Authors	Title	Tags	P	E	N
2022	ICCAD	PKU	HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies	high-level topological representation; low-level hierarchical elastic component; time graph transformation	4	4	3
2022	HPCA	UIUC	ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation	multi-level IR for HLS; HLS-dedicated analysis/transform library; MLIR-based HLS framework	4	4	3
2023	ASPLOS	IISc	HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description	MLIR-based hardware IR; datapath + schedule model; explicit scheduling via time variables; automatic FSM synthesis	4	4	4
2024	HPCA	SJTU	An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation	polyhedral-based dependence analysis for loop transformations; bottleneck-oriented design space exploration; Dependence/Polyhedral/Affine IR hierarchy for FPGA HLS	4	4	4
2024	PLDI	Cornell	Allo: A Programming Model for Composable Accelerator Design	composable programming model; decoupled hardware customizations; bottom-up type-safe composition; hierarchical dataflow graph; memory layout composition via type inference	4	4	3

HLS Verification and Debugging¶

Solution: Develop automated tools for bug detection, formal verification, and cross-level debugging to ensure the correctness of HLS designs.

Year	Venue	Authors	Title	Tags	P	E	N
2024	FPGA	Cornell	Formal Verification of Source-to-Source Transformations for HLS	hybrid verification via concrete interpretation of control-flow and symbolic analysis of dataflow; Computation Directed Acyclic Graph (CDAG) as a syntax-agnostic semantic representation; formal equivalence proof for Statically Interpretable Control-Flow (SICF) programs	4	4	4
2024	MICRO	PKU	Hestia: An Efficient Cross-level Debugger for High-level Synthesis	allowing inspection at multiple granularities; establishes the correspondence at different levels; a multi-level interpreter for three levels	4	4	4
2024	LAD	UIUC	An Iteratively-refined Dataset for High-Level Synthesis Functional Verification through LLM-Aided Bug Injection	Chrysalis dataset with bug injection; ICL+RAG+CoT bug injection methodology; iteratively-refined HLS verification dataset	4	4	4

Dataflow-centric HLS Acceleration¶

Solution: Exploit task-level parallelism by automatically transforming and scheduling designs as dataflow graphs to maximize pipeline throughput and resource utilization.

Year	Venue	Authors	Title	Tags	P	E	N
2024	ASPLOS	UIUC	HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis	hierarchical dataflow IR (HIDA-IR); multi-level dataflow optimizer (HIDA-OPT); pattern-driven task fusion	3	5	4
2025	FPGA	UCLA	Stream-HLS: Towards Automatic Dataflow Acceleration	automatic dataflow HLS; global scheduling for streaming; MINLP for HLS optimization	4	4	4
2025	MICRO	UIUC	StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs	itensor iterative tensor type system;automatic stream-based kernel fusion;LP-based FIFO sizing;unified dataflow component generation	4	4	3

HLS for Systolic Arrays and AI Engines¶

Solution: Provide automated compilation flows and programming models to efficiently map algorithms, especially for AI workloads, onto specialized compute fabrics like systolic arrays and AI Engines.

Year	Venue	Authors	Title	Tags	P	E	N
2017	DAC	PKU & Falcon & UCLA	Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs	2D systolic array architecture; analytical model for performance and resource; two-phase design space exploration; end-to-end C-to-FPGA automation flow	3	4	3
2019	ISCAS	PKU & UCLA	Frequency Improvement of Systolic Array-Based CNNs on FPGAs	front-end accumulation chain segmentation; back-end topology-aware floorplanning constraints; frequency optimization for systolic arrays	3	4	3
2025	FPGA	Brown University	ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines	MLIR-based AIE compilation; Unified AIE+PL IR; Tile-based parallelism; ADF dialect; Automated AIE placement	4	5	4

HLS for Advanced Memory/Packaging¶

Solution: Develop HLS methodologies and co-design techniques to effectively utilize advanced hardware features like High-Bandwidth Memory (HBM), multi-die packaging, and direct storage access.

Year	Venue	Authors	Title	Tags	P	E	N
2022	FPGA	Cornell	High-Performance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS: A Case Study on SpMV	HLS methodology; HBM FPGA; SpMV accelerator; split-kernel design; microarchitecture in HLS; load-store forwarding; pipelined arbiter	4	4	4
2023	FPGA	UoP	DONGLE: Direct FPGA-Orchestrated NVMe Storage for HLS	HLS direct NVMe access; FPGA-orchestrated storage; Unified HLS storage interface; Single-source HLS for storage; DONGLE architecture	4	4	4
2023	FPGA	HKUST	FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs	Floorplan-aware HLS; Multi-die FPGA optimization; Directive-floorplan co-optimization; Incremental floorplanning for HLS; MMBP for HLS DSE	3	4	4
2024	ASPLOS	UCLA	TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs	two-layer ILP-based inter/intra-FPGA floorplanning coupled with interconnect pipelining during HLS; topology-aware communication cost model; latency-insensitive cross-FPGA partitioning; automatic multi-FPGA design partitioning with cut-set pipeline balancing	3	4	3

Program analysis¶

Solution: statically or dynamically analyzing programs to understand their behavior, detect errors, and optimize performance

Domain-specific program analysis¶

Solution: leveraging domain knowledge for precise analysis, specialized bug detection, targeted optimization insights

Year	Venue	Authors	Title	Tags	P	E	N
2024	PPoPP	Information Engineering University	A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs	holistic code generation and tuning; polyhedral model for mixed-precision; model-driven autotuning	4	5	4
2024	PPoPP	University of Delaware	Recurrence Analysis for Automatic Parallelization of Subscripted Subscripts	recurrence analysis for parallelization; subscripted subscript analysis; intermittent monotonicity detection	3	4	3

HLS program analysis¶

Solution: verifying functional correctness of HLS, analyzing performance bottlenecks, ensuring interface compatibility

Year	Venue	Authors	Title	Tags	P	E	N
2025	FPGA	UoE	Latency Insensitivity Testing for Dataflow HLS Designs	Automated Latency Insensitivity Testing; Parallel Hardware-Accelerated Testing Platform; Test space reduction; Stalling Units (SU)	4	4	4