Newsletter


October 26, 2007

PRODUCT HOW-TO: Optimizing GPS, audio/video streaming algorithm designs with Atmel's customizable CAP MCUs

Atmel's CAP MCU-based SoCs provide the basic processing capacity and a high density block of metal-programmable (MP) digital logic that can be personalized to provide DSP-like or other dedicated function execution hardware.

Various applications, ranging from GPS to audio/video stream processing, require complex algorithms to be executed in real time. Many of these algorithms follow industry standards that are upgraded periodically.

Engineers who are developing such applications are facing a challenge: to optimize the execution of these algorithms within the tight constraints on the unit cost, physical size and power consumption of the device that is often manufactured in high volume, and strict limits on the cost and development time. The end product must be able to be adapted to upgrades in the processing algorithms at a reasonable cost.

For optimum algorithm execution, the basic rule of thumb is hardware for performance and software for flexibility. In practice, this rule is difficult to apply. Hardware choices are limited to the basic arithmetic functions of the MCU core, the multiply/accumulate and linear function processing of a DSP core, or the wider flexibility of an FPGA with its downside of physical size, power consumption and unit cost in volume.

The alternative of a standard-cell ASIC can give a higher level of performance, but at a development time and cost that is often prohibitive. Software is ported onto the MCU or MCU/DSP combination that has been selected for the hardware implementation.

Once the hardware/software (HW/SW) partitioning has been made, altering it is extremely difficult and time-consuming, unless the application will go into volume based on an FPGA. Often, it is only in the final stages of application development that the software can be run on the target hardware and when it can be determined whether the implementation of the processing algorithm is optimal.

Implementation flow
Atmel's CAP comprises MCU-based SoC that provides the basic processing capacity and a highdensity block of metal-programmable (MP) digital logic that can be personalized to provide DSP-like or other dedicated function execution hardware.

It provides a reasonable development cycle time and cost. The development flow for an application-specific CAP includes an emulation step based on a development board that uses a high-density FPGA to emulate the algorithm execution functionality that will subsequently be hardened into the MP block.

CAP enables an application developer to get the best of the FPGA and ASIC worlds. The first phase of the CAP application development cycle uses FPGAbased libraries and tools. This is to make an initial HW/SW partition of the algorithm and then map the hardware-based functions onto DSP-like structures or other processing elements implemented in the FPGA.

In parallel, the software-based algorithm processing is compiled for execution by the MCU that sees the FPGA/MP block in its address space, with a distributed DMA architecture to optimize data flows between the functional and memory blocks.

Figure 1: HW/SW partitioning involves implementing an algorithm using a library of IP blocks containing hardware modules and their associated software drivers.

Figure 1 above shows the overall steps of the HW/SW partitioning and implementation of an algorithm using a library of IP blocks that contains both hardware modules and their associated software drivers.

On the hardware side, the algorithm modules are first synthesized using tools available from the IP library or FPGA supplier. These are then synthesized with the required DSP or similar function processing blocks from a library provided by the FPGA supplier. The final step is to map these high-level constructs onto the basic FPGA structure to configure the FPGA in the CAP development board.

On the software side, the IP blocks required for the algorithm are compiled, and then linked with Atmel's library of low-level device drivers that handles the detained operation of the multiple peripherals and external interfaces of the CAP SoC. If required, this code is then linked to the OS, user interface and top-level control modules for the operation of the entire system. The complete code set is loaded into the program memory for the MCU core, which is the central architectural element of the CAP.

The basic architecture of the CAP development board is shown in Figure 2 below. The fixed portion of the device is in the CAP chip that is implemented as a standard MCU together with its on-chip memories, peripherals and interfaces, all of which are brought out to the external connections shown.

Figure 2: Shown is the basic architecture of the CAP development board.

A wide choice of memories can be connected to the external bus interface. The hardware part of the algorithm under development is mapped into the FPGA via its configuration memory, and the software is loaded into the selected program memory (external or internal) of the MCU.

The development board configured emulates the operation of the final CAP device at close to operational speed, including aspects such as multitasking, inter-process communication and interrupts that are almost impossible to simulate.

This emulation step enables the algorithm implementation to be thoroughly debugged under realistic conditions of use. It also enables metrics to be applied to determine whether the initial HW/SW partitioning and the subsequent synthesis/compilation of the various modules is optimal. If improvements are required, these can be implemented using the same design flow, as described previously, at no additional cost other than that of extended development time.

Multiple iterations of the HW/SW partitioning and implementation of the HW/SW modules are possible in order to achieve an optimal implementation.

MP, fabrication flow
Once the functionality of the device under development has been frozen, the final RTL code that has been used to program the FPGA is mapped (by Atmel or an accredited third-party design house) onto the metal layers that personalize the CAP MP block. Rigorous post-layout simulation ensures that the functionality of the metal-programmed CAP is identical to that of the emulated version.

Prototypes are rapidly fabricated from blanks that have been staged in the fab prior to metal layers. They enable the application developer to do a final verification of the device's HW/ SW functionality - in particular, to check that the processing of the algorithm is optimal.

In the worst case, if the prototypes are not satisfactory, the additional cost and time of a re-spin starting from the emulation phase are reasonable, much lower than those for a full mask replacement for a standard-cell ASIC.

When the prototypes have been approved, volume fabrication of the personalized CAP devices commences, using the same flow as for the prototypes. Based on field feedback and in response to any upgrades of the data processing algorithm, subsequent incremental versions of the CAP-based device can be developed more rapidly and at lower cost than the initial version, basing the modifications on the final FPGA configuration of the development board before metal programming.

Peter Bishop is Communications Manager at Atmel Corp.