A vector floating point processing unit design

Thesis (M.Eng.)--Memorial University of Newfoundland, 2008. Engineering and Applied Science Includes bibliographical references (leaves 101-104) The main contribution of this thesis is the successful development of a vector floating point processing unit for high accuracy science computing. For thes...

Full description

Bibliographic Details
Main Author: Chen, Shi, 1976-
Other Authors: Memorial University of Newfoundland. Faculty of Engineering and Applied Science
Format: Thesis
Language:English
Published: 2008
Subjects:
Online Access:http://collections.mun.ca/cdm/ref/collection/theses4/id/29765
Description
Summary:Thesis (M.Eng.)--Memorial University of Newfoundland, 2008. Engineering and Applied Science Includes bibliographical references (leaves 101-104) The main contribution of this thesis is the successful development of a vector floating point processing unit for high accuracy science computing. For these numerically-intensive applications, vector processing offers simple and straightforward parallelism by executing mathematical operations on multiple data elements simultaneously. The simple control and datapath structures of vector processing enable the embedded computing system to attain high performance at low power. -- This vector floating point processing unit includes: a vector register file, vector floating point arithmetic units, and vector memory units. The central module, a vector register file, is divided into twelve lanes. One lane contains 16 vector registers, each including 32x32-bit elements, and is connected to a floating point adder and a floating point multiplier. By modeling the multi-port register file using configurable block RAM on Field Programmable Gate Arrays (FPGA) target, vector register files can efficiently obtain data from external memory and feed data to different arithmetic units simultaneously. Utilizing the quick carry out path and embedded multiplier macro unit, the vector floating point arithmetic units can run at over 200 MHz. A flag register is used to indicate the calculation sequence for the specific computing model. Moreover, the embedded Power PC processor not only can easily control the calculation flow, but also can support an embedded operating system to extend a broad range of applications. The prototype is implemented on Xilinx Virtex II Pro devices, and a peak performance of 4.530 GFLOPS at 188.768 MHz has been achieved. -- First, we present a brief introduction to the floating point arithmetic operations, including addition, multiplication, and multiplier-adder-fused. Second, the architecture of the vector processing unit and a detailed description of vector function units are introduced. Moreover, for a specific computing application, the appropriate overlap execution scheme is discussed. In the end, the performance of each component is analyzed, and the time and area analysis of whole system is provided.