As we have seen in the previous image processing task section there are algorithms which are not suited for an acceleration on a cluster or other parallel computing architectures, because of their fine-grain granularity.
The innermost loop of the image processing task has shown that up to nine additions and one division have to be processed in the innermost loop. If additional coefficients are used for the filter operation, then another nine multiplications have to be computed. If the innermost loop of the image filter is executed an optimal solution for the computation would be a parallel architecture where each operation is performed by an individual processor. All these processors must be connected according to the image filtering algorithm to perform the complete innermost loop operation.
A general-purpose processor is inappropriate for the implementation of such a parallel architecture, because there are only limited parallel execution units available. Therefore, the optimum solution would be an application specific hardware architecture that fulfils the needed requirements and executes each operation in an independent processor.
One possible technology that can be used for the implementation of such a parallel working architecture is an application specific integrated circuit (ASIC) . Such an ASIC allows the implementation of individual and independent processors on one chip. These processors are connected to execute the specific algorithm. The ASIC technology is powerful and allows to implement only the needed operations in the form of specific processors, so that there is no dissipated space on the integrated circuit. All operations are implemented directly on the hardware circuit itself and perform the operations without any instruction decoding steps. The two most important disadvantages of the ASIC technology are the long design process time and the total costs of the fabrication setup. Furthermore, the parallel architecture is implemented as an inflexible structure of transistors and connections which are highly optimized for the selected ASIC manufacturing process. This structure optimization together with the inflexible structure on the ASIC leads to a long design process for such a chip. This is the reason why the ASIC technology is usually restricted to applications that are needed at high volumes. Good examples are specific image processing chips like digital cameras or network chips.
An alternative technology that provides the possibility to implement application specific parallel architectures are FPGAs [8-10]. The FPGA technology provides a flexible and programmable network of processing units that can be programmed and connected to implement a parallel working architecture. This relatively new technology eliminates the long design process times and the inflexibility of ASICs and can be used as an alternative for applications that are needed at low or medium volumes.
FPGAs devices are the basis for the execution of parallel applications. The FPGAs are programmable logic devices which have a matrix of processors connected by an interconnection network. This matrix of processors and the interconnects are surrounded by I/O cells used for all kinds of data transfers. Figure 28.2 shows the schematic layout of the internal FPGA architecture.
Was this article helpful?