(Lecture notes in computer science 9040) kentaro sano, dimitrios soudris, michael hübner, pedro c diniz (eds ) applied reconfigurable computing 11th international symposium, ARC 2015, bochum, german

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (35.54 MB, 564 trang )

LNCS 9040

Kentaro Sano · Dimitrios Soudris
Michael Hübner · Pedro C. Diniz (Eds.)

Applied Reconfigurable
Computing
11th International Symposium, ARC 2015
Bochum, Germany, April 13–17, 2015
Proceedings

123

Lecture Notes in Computer Science
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zürich, Zürich, Switzerland
John C. Mitchell

Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany

9040

More information about this series at />

Kentaro Sano · Dimitrios Soudris
Michael Hübner · Pedro C. Diniz (Eds.)

Applied Reconfigurable
Computing
11th International Symposium, ARC 2015
Bochum, Germany, April 13–17, 2015
Proceedings

ABC

Editors
Kentaro Sano
Tohoku University
Sendai
Japan

Michael Hübner
Ruhr-Universität Bochum
Bochum
Germany

Dimitrios Soudris
National Technical University of Athens
Athens
Greece

Pedro C. Diniz
University of Southern California
Marina del Rey
California
USA

ISSN 0302-9743
Lecture Notes in Computer Science
ISBN 978-3-319-16213-3
DOI 10.1007/978-3-319-16214-0

ISSN 1611-3349

(electronic)

ISBN 978-3-319-16214-0

(eBook)

Library of Congress Control Number: 2015934029
LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues
Springer Cham Heidelberg New York Dordrecht London
c Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known
or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media
(www.springer.com)

Preface

Reconfigurable computing provides a wide range of opportunities to increase performance and energy efficiency by exploiting spatial/temporal and fine/coarse-grained parallelism with custom hardware structures for processing, movement, and storage of
data. For the last several decades, reconfigurable devices such as FPGAs have evolved

from a simple and small programmable logic device to a large-scale and fully programmable system-on-chip integrated with not only a huge number of programmable
logic elements, but also various hard macros such as multipliers, memory blocks, standard I/O blocks, and strong microprocessors. Such devices are now one of the prominent
actors in the semiconductor industry fabricated by a state-of-the-art silicon technology,
while they were no more than supporting actors as glue logic in the 1980s. The capability and flexibility of the present reconfigurable devices are attracting application developers from new fields, e.g., big-data processing at data centers. This means that custom
computing based on the reconfigurable technology is recently being recognized as important and effective measures to achieve efficient and/or high-performance computing
in wider application domains spanning from highly specialized custom controllers to
general-purpose high-end programmable computing systems.
The new computing paradigm brought by reconfigurability increasingly requires researches and engineering challenges to connect capability of devices and technologies
with real and profitable applications. The foremost challenges that we are still facing
today include: appropriate architectures and structures to allow innovative hardware resources and their reconfigurability to be exploited for individual application, languages,
and tools to enable highly productive design and implementation, and system-level platforms with standard abstractions to generalize reconfigurable computing. In particular,
the productivity issue is considered a key for reconfigurable computing to be accepted
by wider communities including software engineers.
The International Applied Reconfigurable Computing (ARC) symposium series provides a forum for dissemination and discussion of ongoing research efforts in this transformative research area. The series of editions was first held in 2005 in Algarve, Portugal. The second edition of the symposium (ARC 2006) took place in Delft, The Netherlands during March 1–3, 2006, and was the first edition of the symposium to have
selected papers published as a Springer LNCS (Lecture Notes in Computer Science)
volume. Subsequent editions of the symposium have been held in Rio de Janeiro, Brazil
(ARC 2007), London, UK (ARC 2008), Karlsruhe, Germany (ARC 2009), Bangkok,
Thailand (ARC 2010), Belfast, UK (ARC 2011), Hong Kong, China (ARC 2012), Los
Angeles, USA (ARC 2013), and Algarve, Portugal (ARC 2014).
This LNCS volume includes the papers selected for the 11th edition of the symposium (ARC 2015), held in Bochum, Germany, during April 13–17, 2015. The symposium attracted a lot of very good papers, describing interesting work on reconfigurable
computing-related subjects. A total of 85 papers were been submitted to the symposium from 22 countries: Germany (20), USA (10), Japan (10), Brazil (9), Greece (6),

VI

Preface

Canada (3), Iran (3), Portugal (3), China (3), India (2), France (2), Italy (2), Singapore (2), Egypt (2), Austria (1), Finland (1), The Netherlands (1), Nigeria (1), Norway
(1), Pakistan (1), Spain (1), and Switzerland (1). Submitted papers were evaluated by
at least three members of the Technical Program Committee. After careful selection,

23 papers were accepted as full papers (acceptance rate of 27.1%) for oral presentation and 20 as short papers (global acceptance rate of 50.6%) for poster presentation.
We could organize a very interesting symposium program with those accepted papers,
which constitute a representative overview of ongoing research efforts in reconfigurable
computing, a rapidly evolving and maturing field.
Several persons contributed to the success of the 2015 edition of the symposium.
We would like to acknowledge the support of all the members of this year’s symposium Steering and Program Committees in reviewing papers, in helping in the paper
selection, and in giving valuable suggestions. Special thanks also to the additional researchers who contributed to the reviewing process, to all the authors who submitted
papers to the symposium, and to all the symposium attendees. Last but not least, we are
especially indebted to Mr. Alfred Hoffmann and Mrs. Anna Kramer from Springer for
their support and work in publishing this book and to Jürgen Becker from the University of Karlsruhe for their strong support regarding the publication of the proceedings
as part of the LNCS series.

January 2015

Kentaro Sano
Dimitrios Soudris

Organization

The 2015 Applied Reconfigurable Computing Symposium (ARC 2015) was organized
by the Ruhr-University Bochum (RUB) in Bochum, Germany.

Organization Committee
General Chairs
Michael Hübner
Pedro C. Diniz

Ruhr-Universität, Bochum, Germany
University of Southern California/Information

Sciences Institute, USA

Program Chairs
Kentaro Sano
Dimitrios Soudris

Graduate School of Information Sciences,
Tohoku University, Sendai, Japan
National Technical University of Athens, Greece

Finance Chair
Maren Arndt

Ruhr-Universität, Bochum, Germany

Publicity Chair
Ricardo Reis

Universidade Federal do Rio Grande do Sul,
Porto Alegre, Brazil

Web Chairs
Farina Fabricius
Daniela Horn

Ruhr-Universität, Bochum, Germany
Ruhr-Universität, Bochum, Germany

Proceedings Chair
Pedro C. Diniz

University of Southern California/Information
Sciences Institute, USA

Special Journal Edition Chairs
Kentaro Sano
Pedro C. Diniz
Michael Hübner

Graduate School of Information Sciences,
Tohoku University, Sendai, Japan
University of Southern California/Information
Sciences Institute, USA
Ruhr-Universität, Bochum, Germany

VIII

Organization

Local Arrangements Chairs
Maren Arndt
Horst Gass

Ruhr-Universität, Bochum, Germany
Ruhr-Universität, Bochum, Germany

Steering Committee
Hideharu Amano
Jürgen Becker

Mladen Berekovic
Koen Bertels
João M. P. Cardoso

Keio University, Japan
Karlsruhe Institute of Technology, Germany
Braunschweig University of Technology, Germany
Delft University of Technology, The Netherlands
Faculdade de Engenharia da Universidade do Porto,
Portugal
George Constantinides
Imperial College of Science, Technology and
Medicine, UK
Pedro C. Diniz
University of Southern California/Information
Sciences Institute, USA
Philip H.W. Leong
University of Sydney, Australia
Katherine (Compton) Morrow
University of Wisconsin-Madison, USA
Walid Najjar
University of California Riverside, USA
Roger Woods
The Queen’s University of Belfast, UK
In memory of Stamatis Vassiliadis Delft University of Technology, The Netherlands

Program Committee
Zack Backer
Jürgen Becker
Mladen Berekovic

Koen Bertels
Matthias Birk
João Bispo
Stephen Brown
João Canas Ferreira
João M. P. Cardoso
Cyrille Chavet
Ray Cheung
Daniel Chillet
Kiyoung Choi
Paul Chow
René Cumplido
Florent de Dinechin

Los Alamos National Laboratory, Los Alamos,
USA
Karlsruhe Institute of Technology, Germany
Braunschweig University of Technology, Germany
Delft University of Technology, The Netherlands
Karlsruhe Institute of Technology, Germany
Instituto Superior Técnico/Universidade Técnica
de Lisboa, Portugal
Altera and University of Toronto, Canada
Faculdade de Engenharia da Universidade do Porto,
Portugal
Faculdade de Engenharia da Universidade do Porto,
Portugal
Université de Bretagne-Sud, France
City University of Hong Kong, China
Inria Rennes, France

Seoul National University, South Korea
University of Toronto, Canada
National Institute for Astrophysics, Optics, and
Electronics, Mexico
INSA Lyon, France

Organization

Steven Derrien
Pedro C. Diniz
António Ferrari
Carlo Galuzzi
Diana Göhringer
Frank Hannig
Jim Harkin
Reiner Hartenstein
Dominic Hillenbrand
Christian Hochberger
Michael Hübner
Waqar Hussain
Tomonori Izumi
Ricardo Jacobi
Krzysztof Kepa
Andreas Koch
Dimitrios Kritharidis
Vianney Lapotre
Philip H.W. Leong
Gabriel M. Almeida
Eduardo Marques

Konstantinos Masselos
Antonio Miele
Takefumi Miyoshi
Horácio Neto
Smail Niar
Seda O. Memik
Monica M. Pereira
Christian Pilato
Thilo Pionteck
Marco Platzner
Dan Poznanovic
Kyle Rupnow
Kentaro Sano
Marco D. Santambrogio
Yukinori Sato
Pete Sedcole
Yuichiro Shibata
Dimitrios Soudris

IX

Université de Rennes 1, France
University of Southern California/Information
Sciences Institute, USA
Universidade de Aveiro, Portugal
Delft University of Technology, The Netherlands
Ruhr-Universität, Bochum, Germany
Friedrich-Alexander University
Erlangen-Nürnberg, Germany
University of Ulster, Northern Ireland, UK

Technische Universität Kaiserslautern, Germany
Karlsruhe Institute of Technology, Germany
Technische Universität Dresden, Germany
Ruhr-Universität, Bochum, Germany
Tampere University of Technology, Finland
Ritsumeikan University, Japan
Universidade de Brasília, Brazil
Virginia Bioinformatics Institute, USA
Technische Universität Darmstadt, Germany
Intracom Telecom, Greece
LIRMM-CNRS, Montpellier, France
University of Sydney, Australia
Leica Biosystems/Danaher, Germany
University of São Paulo, Brazil
Imperial College of Science, Technology
and Medicine, UK
Politecnico di Milano, Italy
e-trees Inc., Japan
Instituto Superior Técnico, Portugal
University of Valenciennes, France
Northwestern University, Illinois, USA
University Federal do Rio Grande do Norte, Brazil
Columbia University, USA
University of Lübeck, Germany
Universität Paderborn, Germany
Cray Inc., USA
Nanyang Technological University, Singapore
Tohoku University, Sendai, Japan
Politecnico di Milano, Italy
Japan Advanced Institute of Science and

Technology, Japan
Celoxica, Paris, France
Nagasaki University, Japan
National Technical University of Athens, Greece

X

Organization

David Thomas
Tim Todman
Pedro Trancoso
Chao Wang
Markus Weinhardt
Theerayod Wiangtong
Yoshiki Yamaguchi
Peter Zipf

Imperial College of Science, Technology and
Medicine, UK
Imperial College of Science, Technology and
Medicine, UK
University of Cyprus, Cyprus
University of Science and Technology of China,
China
Hochschule Osnabrück, Germany
Mahanakorn University of Technology, Thailand
University of Tsukuba, Japan
Universität Kassel, Germany

Additional Reviewers
Andreas Agne
Ihsen Alouani
Jecel Assumpção Jr.
Cristiano Bacelar de Oliveira
Rico Backasch
Mouna Baklouti
Davide B. Bartolini
Cristopher Blochwitz
Anthony Brandon
Jae Min Cho
David de La Chevallerie
Gianluca Durelli
Andreas Engel
Peter Figuli
Philip Gottschling
Adib Haron
Jan Heisswolf
Gerald Hempel
Rainer Hoeckmann
Matei Istoan
Moritz Joseph
Lukas Jung
Jehangir Khan
Kyounghoon Kim
Jinho Lee
Jinx Liu
Charles Lo
Thomas Marconi

Universität Paderborn, Germany
University of Valenciennes, France
University of São Paulo, Brazil
University of São Paulo, Brazil
Technische Universität Dresden, Germany
École Nationale d’Ingnieurs de Sfax, Tunisia
Politecnico di Milano, Italy
Universität zu Lübeck, Germany
Delft University of Technology, The Netherlands
Seoul National University, South Korea
Technische Universität Darmstadt, Germany
Politecnico di Torino, Italy
University of Basel, Switzerland
Karlsruhe Institute of Technology, Germany
Technische Universität Darmstadt, Germany
Delft University of Technology, The Netherlands
Karlsruhe Institute of Technology, Germany
Technische Universität Dresden, Germany
Osnabrück University of Applied Sciences,
Germany
Inria, France
Universität zu Lübeck, Germany
Technische Universität Darmstadt, Germany
University of Valenciennes, France
Seoul National University, South Korea
Seoul National University, South Korea
University of Ulster, Northern Ireland, UK
University of Toronto, Canada
Delft University of Technology, The Netherlands

Organization

Fernando Martin del Campo
Luiz Martins
Joachim Meyer
Alessandro A. Nacci
Sancta Pandit
Lazaros Papadopoulos
Erinaldo Pereira
Michael Raitza
Simon Reder
Guillaume Salagnac
Shimpei Sato
Ali Asgar Sohanghpurwala
Hyunjik Song
Florian Stock
Berna Torun
Erik Vermij
Alex Weiss
Tobias Wiersema
Bartosz Wojciechowski
Jianfeng Zhang

XI

University of Toronto, Canada
Federal University of Uberlandia, Brazil
Karlsruhe Institute of Technology, Germany

Politecnico di Milano, Italy
University of Toronto, Canada
Democritus University of Thrace, Greece
University of São Paulo, Brazil
Technische Universität Dresden, Germany
Karlsruhe Institute of Technology, Germany
INSA-Lyon, France
Japan Advanced Institute of Science and
Technology, Japan
Virginia Polytechnic Institute and State University,
USA
Seoul National University, South Korea
Technische Universität Darmstadt, Germany
Delft University of Technology, The Netherlands
IBM Research, The Netherlands
Accemic gmbh, Germany
Universität Paderborn, Germany
Wroclaw University of Technology, Poland
University of Toronto, Canada

Contents

Architecture and Modeling
Reducing Storage Costs of Reconfiguration Contexts by Sharing Instruction
Memory Cache Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Thiago Baldissera Biazus and Mateus Beck Rutzig

3

A Vector Caching Scheme for Streaming FPGA SpMV Accelerators . . . . . .
Yaman Umuroglu and Magnus Jahre

15

Hierarchical Dynamic Power-Gating in FPGAs . . . . . . . . . . . . . . . . . . . . .
Rehan Ahmed, Steven J.E. Wilton, Peter Hallschmid, and Richard Klukas

27

Tools and Compilers I
Hardware Synthesis from Functional Embedded Domain-Specific Languages:
A Case Study in Regular Expression Compilation . . . . . . . . . . . . . . . . . . .
Ian Graves, Adam Procter, William L. Harrison, Michela Becchi,
and Gerard Allwein
ArchHDL: A Novel Hardware RTL Design Environment in C++ . . . . . . . . .
Shimpei Sato and Kenji Kise
Operand-Value-Based Modeling of Dynamic Energy Consumption of Soft
Processors in FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zaid Al-Khatib and Samar Abdi

41

53

65

Systems and Applications I
Preemptive Hardware Multitasking in ReconOS . . . . . . . . . . . . . . . . . . . . .
Markus Happe, Andreas Traber, and Ariane Keller

79

A Fully Parallel Particle Filter Architecture for FPGAs . . . . . . . . . . . . . . . .
Fynn Schwiegelshohn, Eugen Ossovski, and Michael Hübner

91

TEAChER: TEach AdvanCEd Reconfigurable Architectures and Tools. . . . .
Kostas Siozios, Peter Figuli, Harry Sidiropoulos, Carsten Tradowsky,
Dionysios Diamantopoulos, Konstantinos Maragos, Shalina Percy Delicia,
Dimitrios Soudris, and Jürgen Becker

103

XIV

Contents

Tools and Compilers II
Dynamic Memory Management in Vivado-HLS for Scalable
Many-Accelerator Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dionysios Diamantopoulos, S. Xydis, K. Siozios, and D. Soudris

117

SET-PAR: Place and Route Tools for the Mitigation of Single Event
Transients on Flash-Based FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Luca Sterpone and Boyang Du

129

Advanced SystemC Tracing and Analysis Framework for Extra-Functional
Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Philipp A. Hartmann, Kim Grüttner, and Wolfgang Nebel

141

Run-Time Partial Reconfiguration Simulation Framework
Based on Dynamically Loadable Components . . . . . . . . . . . . . . . . . . . . . .
Xerach Peña, Fernando Rincon, Julio Dondo, Julian Caba,
and Juan Carlos Lopez

153

Network-on-a-Chip
Architecture Virtualization for Run-Time Hardware Multithreading
on Field Programmable Gate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael Metzner, Jesus A. Lizarraga, and Christophe Bobda

167

Centralized and Software-Based Run-Time Traffic Management Inside
Configurable Regions of Interest in Mesh-Based Networks-on-Chip . . . . . . .
Philipp Gorski, Tim Wegner, and Dirk Timmermann

179

Survey on Real-Time Network-on-Chip Architectures . . . . . . . . . . . . . . . . .

Salma Hesham, Jens Rettkowski, Diana Göhringer,
and Mohamed A. Abd El Ghany

191

Cryptography Applications
Efficient SR-Latch PUF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bilal Habib, Jens-Peter Kaps, and Kris Gaj
Hardware Benchmarking of Cryptographic Algorithms Using High-Level
Synthesis Tools: The SHA-3 Contest Case Study . . . . . . . . . . . . . . . . . . . .
Ekawat Homsirikamol and Kris Gaj
Dual CLEFIA/AES Cipher Core on FPGA . . . . . . . . . . . . . . . . . . . . . . . .
João Carlos Resende and Ricardo Chaves

205

217
229

Contents

XV

Systems and Applications II
An Efficient and Flexible FPGA Implementation of a Face
Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hichem Ben Fekih, Ahmed Elhossini, and Ben Juurlink

243

A Flexible Software Framework for Dynamic Task Allocation on MPSoCs
Evaluated in an Automotive Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jens Rettkowski, Philipp Wehner, Marc Schülper, and Diana Göhringer

255

A Dynamically Reconfigurable Mixed Analog-Digital Filter Bank . . . . . . . .
Hiroki Nakahara, Hideki Yoshida, Shin-ich Shioya, Renji Mikami,
and Tsutomu Sasao
The Effects of System Hyper Pipelining on Three Computational Benchmarks
Using FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tobias Strauch

267

280

Extended Abstracts (Posters)
A Timing Driven Cycle-Accurate Simulation for Coarse-Grained
Reconfigurable Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anupam Chattopadhyay and Xiaolin Chen
Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy
Consumption on the Layers CGRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zoltán Endre Rákossy, Dominik Stengele, Axel Acosta-Aponte,
Saumitra Chafekar, Paolo Bientinesi, and Anupam Chattopadhyay
A Novel Concept for Adaptive Signal Processing
on Reconfigurable Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Peter Figuli, Carsten Tradowsky, Jose Martinez,
Harry Sidiropoulos, Kostas Siozios, Holger Stenschke,

Dimitrios Soudris, and Jürgen Becker
Evaluation of High-Level Synthesis Techniques for Memory and Datapath
Tradeoffs in FPGA Based SoC Architectures . . . . . . . . . . . . . . . . . . . . . . .
Efstathios Sotiriou-Xanthopoulos, Dionysios Diamantopoulos,
and George Economakos
Measuring Failure Probability of Coarse and Fine Grain TMR Schemes
in SRAM-based FPGAs Under Neutron-Induced Effects . . . . . . . . . . . . . . .
Lucas A. Tambara, Felipe Almeida, Paolo Rech,
Fernanda L. Kastensmidt, Giovanni Bruni, and Christopher Frost

293

301

311

321

331

XVI

Contents

Modular Acquisition and Stimulation System for Timestamp-Driven
Neuroscience Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Paulo Matias, Rafael T. Guariento, Lirio O.B. de Almeida,
and Jan F.W. Slaets
DRAM Row Activation Energy Optimization for Stride Memory Access

on FPGA-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ren Chen and Viktor K. Prasanna
Acceleration of Data Streaming Classification using Reconfigurable
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pavlos Giakoumakis, Grigorios Chrysos, Apostolos Dollas,
and Ioannis Papaefstathiou

339

349

357

On-The-Fly Verification of Reconfigurable Image Processing Modules
Based on a Proof-Carrying Hardware Approach . . . . . . . . . . . . . . . . . . . . .
Tobias Wiersema, Sen Wu, and Marco Platzner

365

Partial Reconfiguration for Dynamic Mapping of Task Graphs
onto 2D Mesh Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mansureh S. Moghaddam, M. Balakrishnan, and Kolin Paul

373

A Challenge of Portable and High-Speed FPGA Accelerator . . . . . . . . . . . .
Takuma Usui, Ryohei Kobayashi, and Kenji Kise
Total Ionizing Dose Effects of Optical Components on an Optically
Reconfigurable Gate Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Retsu Moriwaki, Hiroyuki Ito, Kouta Akagi, Minoru Watanabe,

and Akifumi Ogiwara
Exploring Dynamic Reconfigurable CORDIC Co-Processors Tightly Coupled
with a VLIW-SIMD Soft-Processor Architecture . . . . . . . . . . . . . . . . . . . .
Stephan Nolting, Guillermo Payá-Vayá, Florian Giesemann,
and Holger Blume
Mesh of Clusters FPGA Architectures: Exploration Methodology
and Interconnect Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sonda Chtourou, Zied Marrakchi, Vinod Pangracious, Emna Amouri,
Habib Mehrez, and Mohamed Abid
DyAFNoC: Dynamically Reconfigurable NoC Characterization
Using a Simple Adaptive Deadlock-Free Routing Algorithm
with a Low Implementation Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ernesto Castillo, Gabriele Miorandi, Davide Bertozzi,
and Wang Jiang Chau
A Flexible Multilayer Perceptron Co-processor for FPGAs . . . . . . . . . . . . .
Zeyad Aklah and David Andrews

383

393

401

411

419

427

Contents

Reconfigurable Hardware Assist for Linux Process Scheduling
in Heterogeneous Multicore SoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Maikon Bueno, Carlos R.P. Almeida, José A.M. de Holanda,
and Eduardo Marques

XVII

435

Towards Performance Modeling of 3D Memory Integrated FPGA
Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shreyas G. Singapura, Anand Panangadan, and Viktor K. Prasanna

443

Pyverilog: A Python-Based Hardware Design Processing Toolkit
for Verilog HDL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shinya Takamaeda-Yamazaki

451

Special Session 1: Funded R&D Running and Completed Projects
(Invited Papers)
Towards Unification of Accelerated Computing and Interconnection
For Extreme-Scale Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Hideharu Amano,
Hitoshi Murai, Masayuki Umemura, and Mitsuhisa Sato
SPARTAN/SEXTANT/COMPASS: Advancing Space Rover Vision

via Reconfigurable Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
George Lentaris, Ioannis Stamoulias, Dionysios Diamantopoulos,
Konstantinos Maragos, Kostas Siozios, Dimitrios Soudris,
Marcos Aviles Rodrigalvarez, Manolis Lourakis, Xenophon Zabulis,
Ioannis Kostavelis, Lazaros Nalpantidis, Evangelos Boukas,
and Antonios Gasteratos
Hardware Task Scheduling for Partially Reconfigurable FPGAs . . . . . . . . . .
George Charitopoulos, Iosif Koidis, Kyprianos Papadimitriou,
and Dionisios Pnevmatikatos
SWAN-iCARE Project: On the Efficiency of FPGAs Emulating Wearable
Medical Devices for Wound Management and Monitoring. . . . . . . . . . . . . .
Vasileios Tsoutsouras, Sotirios Xydis, Dimitrios Soudris,
and Leonidas Lymperopoulos

463

475

487

499

Special Session 2: Horizon 2020 Funded Projects (Invited Papers)
DynamIA: Dynamic Hardware Reconfiguration in Industrial Applications . . .
Nele Mentens, Jochen Vandorpe, Jo Vliegen, An Braeken, Bruno da Silva,
Abdellah Touhafi, Alois Kern, Stephan Knappmann, Jens Rettkowski,
Muhammed Soubhi Al Kadi, Diana Göhringer, and Michael Hübner

513

XVIII

Contents

Robots in Assisted Living Environments as an Unobtrusive, Efficient,
Reliable and Modular Solution for Independent Ageing:
The RADIO Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Christos Antonopoulos, Georgios Keramidas, Nikolaos S. Voros,
Michael Hübner, Diana Göhringer, Maria Dagioglou,
Theodore Giannakopoulos, Stasinos Konstantopoulos,
and Vangelis Karkaletsis

519

Reconfigurable Computing for Analytics Acceleration of Big Bio-Data:
The AEGLE Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Andreas Raptopoulos, Sotirios Xydis, and Dimitrios Soudris

531

COSSIM : A Novel, Comprehensible, Ultra-Fast, Security-Aware
CPS Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ioannis Papaefstathiou, Gregory Chrysos, and Lambros Sarakis

542

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

555

Architecture and Modeling

Reducing Storage Costs of Reconfiguration Contexts
by Sharing Instruction Memory Cache Blocks
Thiago Baldissera Biazus and Mateus Beck Rutzig()
Federal University of Santa Maria, Santa Maria, RS, Brazil
,

Abstract. Reconfigurable architectures have emerged as energy efficient
solution to increase the performance of the current embedded systems.
However, the employment of such architectures causes area and power
overhead mainly due to the mandatory attachment of a memory structure
responsible for storing the reconfiguration contexts, named as context memory.
However, most reconfigurable architectures, besides the context memory,
employ a cache memory to store regular instructions which, somehow, cause a
needless redundancy. In this work, we propose a Demand-based Cache Memory
Block Manager (DCMBM) that allows the storing of regular instructions and
reconfigurable contexts in a single memory structure. At runtime, depending on
the application requirements, the proposed approach manages the ratio of
memory blocks that is allocated for each type of information. Results show that
the DCMBM-DIM spends, on average, 43.4% less energy maintaining the same
performance of split memories structures with the same storage capacity.

1

Introduction

Nowadays, the increasing complexity of embedded systems, such as tablets and
smartphones, is a consensus. One of the reasons of such complexity is the growing
amount of applications, with different behaviors, running in a single device, being
most of them not foreseen at design time. Thus, designers of such devices must
handle severe power and energy constraints, since the capacity of battery does not
scale with the performance requirements.
Companies conceive their embedded platforms with few general purpose
processors surrounded by dozens of ASICs to deal with power and performance
challenges of such embedded devices. General Purpose Processors (GPP) are
responsible for interface controlling and operating system processing. Basically,
ASICs are employed to execute applications that would overload the general purpose
processor. Due to their specialization, ASICs achieve better performance and energy
consumption than GPP when executing applications that belong to its domain. Thus,
video, audio and telecommunication standards are employed as ASICs. However, as
the technology evolves, the constant release of new standards becomes a drawback,
since it should be incorporated in the platform as an ASIC. Besides making the design
increasingly complex, this approach affects the time to market, since new tools and
compilers should be available to support new ASICs.
© Springer International Publishing Switzerland 2015
K. Sano et al. (Eds.): ARC 2015, LNCS 9040, pp. 3–14, 2015.
DOI: 10.1007/978-3-319-16214-0_1

4

T.B. Biazus and M.B. Rutzig

Reconfigurable architectures have emerged as energy efficient solution to increase
the performance of the current embedded system scenario due to the adaptability
offered by these architectures [1][2][3]. Due to its adaptive capability, reconfigurable

architectures could emulate the behavior of ASICs employed in the current embedded
platforms, being a candidate to replace them.
Typically, a reconfigurable architecture works by moving the execution of portions
of code from the general purpose processor to reconfigurable logic, offering positive
tradeoff between performance and energy, with area and power consumption
penalties. Such area and power consumption overhead mainly relies on two
structures: the reconfigurable logic and the context memory. The context memory is
responsible for storing contexts. A context represents the execution behavior of a
portion of code in the reconfigurable logic, where the execution happens indeed.
Several techniques have been proposed aiming to decrease the impact of
reconfigurable logic [4][12] but few approaches have been concerned about the
context memory overhead [5]. However, the efficiency of the reconfigurable systems
relies in this storage component, since application speedup is directly proportional to
the context memory hit rate.
Most dynamic reconfigurable architectures, besides the context memory, employ a
cache memory to store regular instructions which, somehow, cause a needless
redundancy [1][2][3]. Such redundancy is supported by the ordinary execution of
these architectures. When the execution starts, most memory accesses are due to
regular instruction, since, in this period of the execution, these instructions are being
translated to contexts. After some execution time, due to the increasing use of
reconfigurable architecture, the pattern on memory accesses changes, since accesses
to fetch contexts increase while for regular instructions decrease.
In this work we propose a demand-based allocation cache memory that joins
regular instructions and reconfigurable contexts in a single memory structure. Due to
the aforementioned memory access pattern behavior, the proposed approach
measures, at runtime, the best allocation ratio of cache memory blocks between
contexts and regular instructions considering the demand for each data type. In order
to achieve this goal, we propose the Demand-based Cache Memory Block Manager
(DCMBM) to support the allocation of both data types and to decide which data type
would be replaced in a single cache memory structure.

This paper is organized as follows. Section 2 shows a review of researches
regarding context memory exploitation. Section 3 presents the proposed cache
architecture. The methodology used to gather data about the proposed approach and
the results are shown in Section 4. Section 5 presents the final remarks.

2

Related Work

Several researches have proposed different partitioning strategies aiming to increase
the hit rate of cache memories. Most of them focus on sharing cache memory blocks
among several threads that are running concurrently in a scenario of multiprocessor
systems. In [6] is proposed a Gradient-based Cache Partitioning Algorithm to improve

Reducing Storage Costs of Reconfiguration Contexts

5

the cache hit rate by dynamic monitoring thread references and giving extra cache
space for threads that require it. The cache memory is divided in regions and an
algorithm calculates the affinity of threads to acquire certain cache region.
The proposal shown in [7] works over the premise that more cache resources
should not be given for applications that have more demand and few resources but it
should be provided for applications that benefit more from cache resources. A runtime monitor constantly tracks the misses of each running application, partitioning the
number of ways of a set-associative cache among them. After each modification in
the partitioning, the algorithm verifies the difference of miss rate of the threads in
comparison with the previous partitioning and acts to minimize the global miss rate
by varying the number of ways for each application. The approaches presented in
[8][9] propose strategies to switch off ways depending on the cache miss rate aiming

at saving energy.
Despite several researchers have proposed techniques to partition the cache
memory among several threads/process, to the best of our knowledge, there is no
work considering cache partitioning in the field of reconfigurable architectures.
Aiming to support the importance of optimizing the storage components when
reconfigurable architectures are considered, Table 1 shows the impact of the context
memory showing the amount of bytes required to configure the reconfigurable fabric
of three different architectures. As it can be seen in this Table, these architectures rely
on a significant amount of bytes to store a single configuration. For instance, GARP
[2], a traditional reconfigurable architecture, requires 786 KB to hold 128
configurations, such amount of memory would certainly provide a considerable
impact in the power consumption of entire system.
Table 1. Bytes per Configuration Required by Different Reconfigurable Architectures

Rec. Architecture

GARP[1]

DIM[3]

Piperench[2]

Bytes per Configuration

6,144

4,261

21,504

In this work, we propose a cache partitioning technique for coarse-grained
reconfigurable architecture where regular instructions and reconfigurable context share
the same cache structure. Considering that the need for a large storage volume of each
type of information occurs at different periods of the execution time, a Demand-based
Cache Memory Block Manager (DCMBM) is proposed to handle such behavior by
partitioning the cache memory blocks depending on the demand of each type of
information.

3

Demand-Based Cache Memory Block Manager (DCMBM)

3.1

The Structure of the Cache Memory

Figure 1 shows the structure of the cache memory of the Demand-based Cache
Memory Block Manager (DCMBM). As it can be seen in this Figure, DCMBM has
almost the same structure of a traditional cache being composed of valid, tag and data

6

T.B. Biazus and M.B. Rutzig

fields. The valid bit is used to verify the truth of the stored data and tag is used to
verify if the data of the address stored matches with requested address. Data field
holds the information indeed. Additionally, every block of DCMBM has a field,
named Type (t), to identify if the stored information is a regular instruction or a
context. The DCMBM works as a traditional cache memory, if a cache miss happens

in a certain line of the cache memory, the replacement algorithm chooses, in the case
of a set associative cache, one of the blocks of the target set to be replaced.
Address

VT

Tag

Data

VT

Tag

Data

VT

Tag

Data

VT

Tag

Data

Index 0

Index N

=

=

Type
Requested

Data
Hit

Fig. 1. Circuit of the DCMBM

3.2

Block Allocation Hardware

The Block Allocation Hardware (BAH) is responsible for managing the ratio of
blocks that would be allocated to each type of information. The algorithm is based on
a threshold and works over the cache associativity. Based on the demand for each
type of information, the BAH uses the threshold to decide, when a write to the cache
happens, which type of information should be replaced.
The BAH is implemented as a 4-bit circuit, thus the range of the values goes from
0 to 15. There is a 4-bit register for each cache set that holds a value in order to
inform if a block that contains a context or a regular instruction would be replaced.
When a new context is created, it means that it should be stored in the cache
memory (a write in the cache), and the value of the register of the target set is lower
than a certain threshold (defined at design time), a block of regular instruction is
selected as victim to be replaced. However, when the value is greater than a given

threshold and a regular instruction causes a cache miss (a write in the cache), a
context is chosen as victim to be replaced.
There are two scenarios where the value of the register of the set is updated:
•

When a context should be stored in the memory cache, the BAH algorithms
decrements the value of the target set by one unit. This strategy focuses on
increasing the number of blocks to store contexts instead of regular

Reducing Storage Costs of Reconfiguration Contexts

•

7

instructions, since the lower the value, the more blocks to store contexts will
be opened in the set. Following the pattern of memory accesses of dynamic
reconfigurable architectures, there are some periods of the application
execution where the process to translate regular instructions to contexts boosts,
thus the number of requests to store context would increase. Therefore, more
cache blocks must be given to store contexts to maximize the context hit ratio
and, consequently, to speed up the application.
When neither a regular instruction nor a context has generated a hit of a certain
address (a cache miss happens due to a regular instruction), the BAH
algorithm increments the value of the target set by one unit. This strategy aims
to increase the number of blocks to store regular instructions, since the higher
the value, the more blocks to store regular instructions will be opened in the
set. There is a high probability that a miss generated by both regular
instruction and context is due to the first execution of a certain portion of code.

It means that the dynamic reconfigurable architecture is starting to translate
such portion of code and will not request a block to store the context related to
such portion of code soon. However, as a new portion of code is being
executed, more blocks for regular instructions would be necessary to increase
the hit rate and to avoid penalties in the execution time of the application.

In the following topics, we summarize how the BAH handles each possibility of
cache memory access:
1) When a miss happens from both regular instruction and context and the value
of the register of the target set is:
a. lower than a certain threshold, a block of regular instruction is
selected as a victim and the value of the register is incremented by
one unit.
b. greater than a certain threshold, a block of context is selected as a
victim and the value of the register is incremented by one unit.
2) When a new context is finished by the reconfigurable architecture (it means
that it should be stored in the cache memory) and the value of the register of
the target set is:
a. lower than a certain threshold, a block of regular instruction is
selected as a victim and the value of the register is decremented by
one unit.
b. greater than a certain threshold, a block of context is selected as a
victim and the value of the register is decremented by one unit.
3) When a hit happens, both from regular instructions or context, the values of
the registers are not updated.
3.3

Replacement Algorithm

As the DCMBM is based on the cache associativity, a replacement algorithm should

be implemented to select the block, into target set, that would be victim to be
replaced. We have selected the Least Recently Used (LRU) as the replacement

8

T.B. Biazus and M.B
B. Rutzig

algorithm since it is widelly employed in the current processors in the market (e.g.
ARM Cortex, Intel Core, etc). We have implemented a modified LRU to w
work
together with the BAH. Un
nlike the original version of LRU, where any of the bloocks
into the target set would bee victim to be replaced, the DCMBM algorithm works oonly
over blocks, into the targeet set, that match with the type of information that w
was
chosen to be victim by thee BAH. It is implemented by just comparing the typee of
information that should be replaced
r
(provided by BAH) and the type of informationn of
every block into the target set
s (provided by the field t (type of date)).

4

Case Study

In this section we show how
h

the Demand-based Cache Memory Block Manaager
(DCMBM) works togetherr with a reconfigurable system. As a case study, we hhave
selected the Dynamic Instrruction Merging (DIM) [3]. Particularly, this architectture
was selected since it has allready shown to be energy efficient on accelerating a w
wide
range of application behav
viors [3]. In addition, such reconfigurable system has ttwo
memory structures (instru
uction memory and context memory) and would ttake
advantage of the proposed
d approach since it is based on a hardware which buuilds
contexts at runtime.
4.1

DIM Architecture

e
reconfigurable system is divided into six blocks: the
As shown in Figure 2, the entire
DIM hardware; the Reconfiigurable Data Path; the MIPS R3000 processor; the conttext
memory; the instruction and
d data memory. The next subsections give a brief overview
of each block.

Fig. 2. The Reconfigurable System

(Lecture notes in computer science 9040) kentaro sano, dimitrios soudris, michael hübner, pedro c diniz (eds ) applied reconfigurable computing 11th international symposium, ARC 2015, bochum, german

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về