A power efficient neural network implementation on heterogeneous FPGA and GPU Devices

Yuexuan Tu, Saad Sadiq, Yudong Tao, Mei Ling Shyu, Shu Ching Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages193-199
Number of pages7
ISBN (Electronic)9781728113371
DOIs
StatePublished - Jul 2019
Event20th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2019 - Los Angeles, United States
Duration: Jul 30 2019Aug 1 2019

Publication series

NameProceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019

Conference

Conference20th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2019
CountryUnited States
CityLos Angeles
Period7/30/198/1/19

Fingerprint

Field programmable gate arrays (FPGA)
Neural networks
Closed circuit television systems
Image recognition
Face recognition
Network architecture
Disasters
Electric power utilization
Graphics processing unit
Deep neural networks
Processing

Keywords

  • FPGA
  • GPU
  • Heterogeneous Computing
  • Low Powered Devices

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Information Systems

Cite this

Tu, Y., Sadiq, S., Tao, Y., Shyu, M. L., & Chen, S. C. (2019). A power efficient neural network implementation on heterogeneous FPGA and GPU Devices. In Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019 (pp. 193-199). [8843495] (Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2019.00040

A power efficient neural network implementation on heterogeneous FPGA and GPU Devices. / Tu, Yuexuan; Sadiq, Saad; Tao, Yudong; Shyu, Mei Ling; Chen, Shu Ching.

Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 193-199 8843495 (Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tu, Y, Sadiq, S, Tao, Y, Shyu, ML & Chen, SC 2019, A power efficient neural network implementation on heterogeneous FPGA and GPU Devices. in Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019., 8843495, Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019, Institute of Electrical and Electronics Engineers Inc., pp. 193-199, 20th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2019, Los Angeles, United States, 7/30/19. https://doi.org/10.1109/IRI.2019.00040
Tu Y, Sadiq S, Tao Y, Shyu ML, Chen SC. A power efficient neural network implementation on heterogeneous FPGA and GPU Devices. In Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 193-199. 8843495. (Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019). https://doi.org/10.1109/IRI.2019.00040
Tu, Yuexuan ; Sadiq, Saad ; Tao, Yudong ; Shyu, Mei Ling ; Chen, Shu Ching. / A power efficient neural network implementation on heterogeneous FPGA and GPU Devices. Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 193-199 (Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019).
@inproceedings{df63a383a6e74b13b254f6096db39880,
title = "A power efficient neural network implementation on heterogeneous FPGA and GPU Devices",
abstract = "Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.",
keywords = "FPGA, GPU, Heterogeneous Computing, Low Powered Devices",
author = "Yuexuan Tu and Saad Sadiq and Yudong Tao and Shyu, {Mei Ling} and Chen, {Shu Ching}",
year = "2019",
month = "7",
doi = "10.1109/IRI.2019.00040",
language = "English (US)",
series = "Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "193--199",
booktitle = "Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019",

}

TY - GEN

T1 - A power efficient neural network implementation on heterogeneous FPGA and GPU Devices

AU - Tu, Yuexuan

AU - Sadiq, Saad

AU - Tao, Yudong

AU - Shyu, Mei Ling

AU - Chen, Shu Ching

PY - 2019/7

Y1 - 2019/7

N2 - Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.

AB - Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.

KW - FPGA

KW - GPU

KW - Heterogeneous Computing

KW - Low Powered Devices

UR - http://www.scopus.com/inward/record.url?scp=85073256442&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073256442&partnerID=8YFLogxK

U2 - 10.1109/IRI.2019.00040

DO - 10.1109/IRI.2019.00040

M3 - Conference contribution

AN - SCOPUS:85073256442

T3 - Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019

SP - 193

EP - 199

BT - Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -