October 26, 2007
PRODUCT HOW-TO: Optimizing GPS, audio/video streaming algorithm designs with Atmel's customizable CAP MCUs
Atmel's CAP MCU-based SoCs provide the basic processing capacity and a high density block of metal-programmable (MP) digital logic that can be personalized to provide DSP-like or other dedicated function execution hardware.
|
By
Peter Bishop, Atmel Corp.
|
|
Various applications, ranging from GPS to audio/video stream
processing, require complex algorithms to be executed in real time.
Many of these algorithms follow industry standards that are upgraded
periodically.
Engineers who are developing such applications are facing a
challenge: to optimize the execution of these algorithms within the
tight constraints on the unit cost, physical size and power consumption
of the device that is often manufactured in high volume, and strict
limits on the cost and development time. The end product must be able
to be adapted to upgrades in the processing algorithms at a reasonable
cost.
For optimum algorithm execution, the basic rule of thumb is hardware
for performance and software for flexibility. In practice, this rule is
difficult to apply. Hardware choices are limited to the basic
arithmetic functions of the MCU core, the multiply/accumulate and
linear function processing of a DSP core, or the wider flexibility of
an FPGA with its downside of physical size, power consumption and unit
cost in volume.
The alternative of a standard-cell ASIC can give a higher level of
performance, but at a development time and cost that is often
prohibitive. Software is ported onto the MCU or MCU/DSP combination
that has been selected for the hardware implementation.
Once the hardware/software (HW/SW) partitioning has been made,
altering it is extremely difficult and time-consuming, unless the
application will go into volume based on an FPGA. Often, it is only in
the final stages of application development that the software can be
run on the target hardware and when it can be determined whether the
implementation of the processing algorithm is optimal.
Implementation flow
Atmel's CAP comprises MCU-based SoC that provides the basic processing
capacity and a highdensity block of metal-programmable (MP) digital
logic that can be personalized to provide DSP-like or other dedicated
function execution hardware.
It provides a reasonable development cycle time and cost. The
development flow for an application-specific CAP includes an emulation
step based on a development board that uses a high-density FPGA to
emulate the algorithm execution functionality that will subsequently be
hardened into the MP block.
CAP enables an application developer to get the best of the FPGA and
ASIC worlds. The first phase of the CAP application development cycle
uses FPGAbased libraries and tools. This is to make an initial HW/SW
partition of the algorithm and then map the hardware-based functions
onto DSP-like structures or other processing elements implemented in
the FPGA.
In parallel, the software-based algorithm processing is compiled for
execution by the MCU that sees the FPGA/MP block in its address space,
with a distributed DMA architecture to optimize data flows between the
functional and memory blocks.
 |
| Figure
1: HW/SW partitioning involves implementing an algorithm using a
library of IP blocks containing hardware modules and their associated
software drivers. |
Figure 1 above shows the
overall steps of the HW/SW partitioning and implementation of an
algorithm using a library of IP blocks that contains both hardware
modules and their associated software drivers.
On the hardware side, the algorithm modules are first synthesized
using tools available from the IP library or FPGA supplier. These are
then synthesized with the required DSP or similar function processing
blocks from a library provided by the FPGA supplier. The final step is
to map these high-level constructs onto the basic FPGA structure to
configure the FPGA in the CAP development board.
On the software side, the IP blocks required for the algorithm are
compiled, and then linked with Atmel's library of low-level device
drivers that handles the detained operation of the multiple peripherals
and external interfaces of the CAP SoC. If required, this code is then
linked to the OS, user interface and top-level control modules for the
operation of the entire system. The complete code set is loaded into
the program memory for the MCU core, which is the central architectural
element of the CAP.
The basic architecture of the CAP development board is shown in
The development board configured emulates the operation of the final
CAP device at close to operational speed, including aspects such as
multitasking, inter-process communication and interrupts that are
almost impossible to simulate.
This emulation step enables the algorithm implementation to be
thoroughly debugged under realistic conditions of use. It also enables
metrics to be applied to determine whether the initial HW/SW
partitioning and the subsequent synthesis/compilation of the various
modules is optimal. If improvements are required, these can be
implemented using the same design flow, as described previously, at no
additional cost other than that of extended development time.
Multiple iterations of the HW/SW partitioning and implementation of
the HW/SW modules are possible in order to achieve an optimal
implementation.
Prototypes are rapidly fabricated from blanks that have been staged
in the fab prior to metal layers. They enable the application developer
to do a final verification of the device's HW/ SW functionality - in
particular, to check that the processing of the algorithm is optimal.
In the worst case, if the prototypes are not satisfactory, the
additional cost and time of a re-spin starting from the emulation phase
are reasonable, much lower than those for a full mask replacement for a
standard-cell ASIC.
When the prototypes have been approved, volume fabrication of the
personalized CAP devices commences, using the same flow as for the
prototypes. Based on field feedback and in response to any upgrades of
the data processing algorithm, subsequent incremental versions of the
CAP-based device can be developed more rapidly and at lower cost than
the initial version, basing the modifications on the final FPGA
configuration of the development board before metal programming.