ICNCE 2026

Name: ICNCE 2026
Start: 2026-06-28T08:00:00+02:00
End: 2026-07-02T20:00:00+02:00
Location: Eurogress Aachen

28. Juni 2026 bis 2. Juli 2026

Eurogress Aachen

Europe/Berlin Zeitzone

The Organizing Committee of ICNCE 2026

info@icnce-2026.de

Memory-Centric Devices and Architectures for Efficient Attention Computation and Continual Learning at the Edge

01.07.2026, 14:20

40m

Brussel Hall (Eurogress Aachen)

Brussel Hall

Eurogress Aachen

Oral (Invited) S2 Technical Session (Brussel Hall)

Tania Roy

Abstract: The explosive growth of transformer-based AI models and the push toward adaptive intelligence at the edge have exposed fundamental limits of conventional von Neumann hardware, where data movement—not computation—dominates energy and latency. This talk presents recent progress from our group on memory-centric co-design spanning devices, circuits, and architectures to address these challenges for two key AI primitives.
First, we reformulate attention score computation as massively parallel in-memory similarity search using Flash-based Content-Addressable Memory (FlashCAM). High-uniformity amorphous oxide semiconductor Flash devices (>95% yield, 4 V memory window) with optimized speed–retention–endurance characteristics have been realized and integrated into 16×16 CAM arrays. A custom PCB measurement platform with Arduino/Jetson control has been developed to demonstrate matchline discharge dynamics that directly encode similarity scores.
Second, we introduce a family of CMOS-compatible non-filamentary memristors (graphene- to metal-insulator-metal stacks) engineered for BEOL monolithic 3D integration and edge continual learning. Latest devices achieve 100 ns switching at 2.5 V while maintaining >100 s retention, high uniformity via via-hole structures, and low cycle-to-cycle variation that enables verification-free programming. We experimentally validate a deterministic outer-product parallel programming scheme on 6×6 subarrays within 32×32 crossbars, achieving O(1) weight updates. Supported by generalizable compact models and macro architectures that emulate floating-point operations for BF16-quantized LoRA adapters, these primitives enable accurate in-situ LLM fine-tuning with minimal accuracy loss.
Together, these results demonstrate practical hardware pathways that dramatically reduce data movement for attention mechanisms and enable efficient on-device adaptation, offering a cohesive device-to-architecture framework for next-generation AI accelerators.

Es gibt derzeit keine Materialien.

ICNCE 2026

The Organizing Committee of ICNCE 2026

Memory-Centric Devices and Architectures for Efficient Attention Computation and Continual Learning at the Edge

Brussel Hall

Eurogress Aachen

Sprecher

Beschreibung

Präsentationsmaterialien

Wähle Zeitzone

ICNCE 2026

The Organizing Committee of ICNCE 2026

Sprecher

Beschreibung

Präsentationsmaterialien