WoNDP: 3rd Workshop on Near-Data Processing
In conjunction with the
Date: Saturday December 5th, 2015
Waikiki, Hawaii
Program:
- 1:30-1:40: Introduction: Rajeev Balasubramonian, University of Utah
- 1:40-2:10: Keynote I: Mircea Stan, University of Virginia, Automata Computing
Abstract:
The Automata Processor (AP) is a programmable silicon device
capable of performing very high-speed, comprehensive search and analysis of
complex, unstructured data streams. The AP is a hardware implementation of
non-deterministic finite automata (NFA) and it consists of a massively
parallel, scalable, reconfigurable, two dimensional fabric comprised of
~50,000 simple pattern-matching elements per chip. This talk will introduce
the main concepts and features of the AP, as well as provide some ideas
about the challenges and opportunities encountered when trying to map
parallel algorithms and applications to this non-Von Neumann
Multiple-Instructions Single-Data (MISD) architecture.
Bio:
Mircea R. Stan received the Ph.D. (1996) and the M.S. (1994) degrees in
Electrical and Computer Engineering from the University of Massachusetts at
Amherst and the Diploma (1984) in Electronics and Communications from
"Politehnica" University in Bucharest, Romania. Since 1996 he has been with
the Charles L. Brown Department of Electrical and Computer Engineering at
the University of Virginia, where he is now a professor. Prof. Stan is
teaching and doing research in the areas of high-performance low-power VLSI,
temperature-aware circuits and architecture, embedded systems, spintronics,
and nanoelectronics. He leads the High-Performance Low-Power (HPLP) lab and
is a co-director of the Center for Automata Processing (CAP). He has more
than eight years of industrial experience, has been a visiting faculty at UC
Berkeley in 2004-2005, at IBM in 2000, and at Intel in 2002 and 1999. He has
received the NSF CAREER award in 1997 and was a co-author on best paper
awards at ISQED 2008, GLSVLSI 2006, ISCA 2003 and SHAMAN 2002. Prof. Stan is
a fellow of the IEEE, a member of ACM, and of Eta Kappa Nu, Phi Kappa Phi
and Sigma Xi. His h-index is 43 and his i10-index is 105.
- 2:10-2:25: Implementing Radix Sort on MemoryWeb, Marco Minutoli (Pacific Northwest National Laboratory), Shannon Kuntz (EMU Solutions Inc.), Antonino Tumeo (Pacific Northwest National Laboratory), and Peter Kogge (EMU Solutions Inc.)
- 2:25-2:40: Scaling Deep Learning on Multiple In-Memory Processors, Lifan Xu (U. Delaware), Dongping Zhang (AMD), and Nuwan Jayasena (AMD)
- 2:40-2:55: Dataflow based Near Data Processing using Coarse Grain Reconfigurable Logic, Charles Shelor (U. North Texas), Krishna Kavi (U. North Texas), and Shashak Adavally (U. North Texas)
- 2:55-3:15: Break
- 3:15-3:45: Keynote II: Naveen Muralimanohar, HPE Labs, In-Situ Computing Through an Analog Dot-Product Engine
Abstract:
Due to current technology trends, there is growing interest in the area of
accelerators, especially with a focus on reducing communication overhead
to memory. However, to achieve the next big leap in performance, it is
also critical to embrace new hardware primitives and integrate them into
traditional systems. Although research thrusts in this area are gaining
momentum, we have barely scratched the surface of this field. A promising
step in this direction is to extend memory beyond a simple load/store unit,
and perform more complex operations in-situ. Such an approach has the
potential to break the memory bandwidth wall faced by conventional
von-Neumann architectures and help unleash innovations in memory. This
talk will introduce a memristor based Dot-Product Engine (DPE) that can
perform analog matrix-vector computation at high speed and low power. The
talk will focus on design aspects of a DPE, challenges in getting high
precision out of it, and potential opportunities for architects in
integrating it in future systems.
Bio:
Naveen Muralimanohar is a Principal Researcher at Hewlett-Packard Labs, and his research focuses on accelerators and memory system architecture for future servers. Currently, he is working on memristor technology, architecting it for both storage and computation. He is the primary developer of CACTI 6.5 and has been maintaining it since 2008. He has co-authored a book titled "Multi-Core Cache Hierarchies". Naveen has been granted 10 patents and has over 35 under submission. He received his Ph.D. in computer science from the University of Utah in 2009.
- 3:45-4:00: A 3D-Stacked Memory Manycore Stencil Accelerator System, Jiyuan Zhang (CMU), Tze Meng Low (CMU), Qi Guo (CMU), and Franz Franchetti (CMU)
- 4:00-4:15: Realizing the Full Potential of Heterogeneity through Processing in Memory, Nuwan Jayasena (AMD), Dongping Zhang (AMD), Amin Farmahini-Farahani (AMD), and Mike Ignatowski (AMD)
- 4:15-4:45: Keynote III: Scott Klasky, Oak Ridge National Laboratory, Near Real-time Processing of Scientific Data: Can we keep up?
Abstract:
As we progress further in science we understand that the fundamental techniques used to process data in near-real-time must change in order for scientists to make significant progress. The overarching goal of my research has been to address fundamental barriers faced by scientists, thus enabling them to make Near-Real-Time and Quality-based decisions. Most experiments (computational and physical) are seeing data growth rates that outpace those dictated by Moore's law. To address this, we must look at new ways to extract and process the information contained in the raw data. This means that we must move into processing, moving, storing, and retrieving information and not just data. In this presentation I will focus on six key areas of research, which are driven by DOE application requirements: 1) Science-driven Data Management for Multi-tiered storage, 2) Data-in-motion techniques, 3) Data-At-Rest (Storage), 4) Data Representation and Abstraction, 5) Analytics and Visualization Tools and Technologies, and 6) Scientific Data Indexing and Queries. Throughout my presentation I will focus on our collaborators' software artifacts that have been used throughout the world, including ADIOS, a winner of the RD 100 award in 2013, pbdR, and VTK-m. These technologies are leading us to a new revolution in exascale computing by allowing us to work with a large collaborative group comprising of applied mathematicians, domain scientists, and computer scientists.
Bio:
Scott A. Klasky is a distinguished scientist and the group leader for Scientific Data in the Computer Science and Mathematics Division at the Oak Ridge National Laboratory. He holds an appointment at the University of Tennessee, Georgia Tech University, and North Carolina State University. He obtained his Ph.D. in Physics from the University of Texas at Austin (1994), and has previously worked at the University of Texas at Austin, Syracuse University, and the Princeton Plasma Physics Laboratory. Dr. Klasky is a world expert in scientific computing and scientific data management, co-authoring over 190 papers. He is also the team leader of the Adaptable I/O System (ADIOS) project, which won an R&D 100 Award in 2013.