Newsletter

DSP performance tuning: Part 2 - Buses, DMA, and interrupts

Here's how to tune internal and external memory for maximum performance. We show how to manage external bus access, DMA transfers, and interrupts, using Blackfin as a case study.

Page 1 of 3

Courtesy of DSP DesignLine

Part 1 shows how to build a software framework, choose between cache and DMA, and partition instructions and data for optimal performance. Part 3 explains how to optimize code placement and data buffering.

To put the discussion from part1 into perspective, we'll now describe the Blackfin memory architecture. The Blackfin memory system also provides some "knobs" that the developer can turn for added system performance. We'll discuss how best to make use of these knobs.

Blackfin memory hierarchy
The Blackfin features a 3-level memory hierarchy, as shown in Figure 10. Level 1 (L1) memory is the memory located closest to the core. It operates at the core clock frequency (typically 600 MHz for Blackfin), and provides single-cycle access for instructions and data. Typically, L1 memory contains tens of kilobytes and is configurable as either SRAM or cache.


(Click to enlarge)

Figure 10. Blackfin Memory Hierarchy.

On-chip Level 2 (L2) memory is located on the chip but further away from the core. Accessing instructions and data from this memory can take several cycles. On-chip L2 sizes are generally in the hundreds of kilobytes—sizes of 128 KB and 256 KB are typical.

Off-chip L2 memory is located off chip, and hence provides the slowest access. It operates at the system clock, which is usually 133 MHz. But off-chip L2 memory is usually huge: sizes of hundreds of megabytes are typical.

Internal memory accesses in the Blackfin
In a single core clock cycle, the Blackfin allows one instruction fetch of 64 bits. In can also perform two 32-bit data fetches, or one 32-bit data fetch and one 32-bit data store in the same clock cycle.

The internal instruction and data memory of the Blackfin are divided into sub-banks. This allows the core and the DMA to access multiple pieces of instructions or data in the same clock cycle. Figure 11 shows how this works. The left side shows an un-optimized arrangement of data. Here, the buffers and coefficients are arranged such that the core and DMA require access to the same memory sub-bank at the same time, resulting in memory stalls. On the right side, the buffers and coefficients are arranged to allow core fetches and DMA accesses to occur in parallel.


Figure 11. Sub-bank memory scheme in the Blackfin. The core and DMA can operate in parallel on different sub-banks.

Blackfin processors typically feature blocks of 16 KB instruction and data memory banks, configurable as either SRAM or cache. As illustrated in Figure 12, each bank is made up of 4 KB sub-banks. The core (or DMA) can fetch instructions or data from different sub-banks without incurring a memory stall. On Blackfin, the core can also access an odd and an even memory location within a sub-bank without incurring a stall.


Figure 12. Sub-banks in Blackfin data memory.

One way to optimize instruction fetches is to configure L1 instruction memory as SRAM. This allows the core single-cycle access to the instructions. The instruction fetch unit can be made to fetch instructions from one sub-bank, while a DMA controller brings instructions into another sub-bank. (This is the "memory overlay" discussed earlier.)

Similarly, to optimize data fetches, you could have a DMA controller operate on one sub-bank, while the core accesses data in other sub-banks.

External memory accesses in the Blackfin
The "cheap, slow" external memory we referred to earlier is usually SDRAM. Just as Blackfin's internal memory is divided into sub-banks, SDRAM is divided into banks. SDRAM typically contains four internal banks. Each bank is further divided into rows. Before a row is accessed it must first be opened—a process that consumes multiple system clock cycles. (Remember, the system clock operates at a much lower frequency than the core clock.) Once a row is open, further accesses to the row proceed without delay. However, only one row per bank may be open at a time.

Blackfin has a feature that enables the external bus interface unit to keep track of up to four rows open across the four SDRAM banks. You can use this feature to optimize accesses to external memory, as shown in Figure 13. The left side shows an un-optimized arrangement of data code. Here, the video frames and code are arranged inside of two memory banks. With this arrangement, the SDRAM has to open new rows with almost every access. For example, suppose the DMA is transferring data out to Video Frame 1 while reading in from the Ref Frame. Because both buffers are in the same bank, the SDRAM will constantly open new rows as it switches back and forth between the two buffers. On the right side, the buffers and code are arranged in separate banks so that the SDRAM only has to open a new row when it reaches the end of an "old" row.


Figure 13. Amortizing row activation cycles across multiple accesses.

The bottom line is this: To get the best performance, you must lay out your memory carefully, and you must use DMA whenever possible. This is especially true of external memory accesses for Blackfin. Accessing a 16-bit SDRAM value from the Blackfin core consumes eight system clock cycles. In contrast, DMA allows a 16-bit value to be read or written every system clock. (For a comprehensive listing of Blackfin memory access benchmarks, refer to the Blackfin hardware reference manuals.)



Page 2: Managing resources  

Page 1 | 2 | 3







Related Content

TECH PAPER
1. How the 1394 Automotive Network Simplifies Infotainment Delivery

WEBINAR
2. Detecting Five Distinct Motions with MEMS Inertial Sensors

WEBINAR
3. Quick Start Embedded System Development with Spartan-6 and Virtex-6 FPGAs

COURSE
4. Hands-on Training with the New TMS320VC5505 eZdsp USB Stick Development Tool

 


 Featured Jobs
Ascension Health seeking Solutions Development Analyst in St. Louis, MO

National Semiconductor seeking Principal IC Design Engineer in Santa Clara, CA

Taylor Guitars seeking Sr. Web Designer in El Cajon, CA

Covidien seeking Hardware Manager in Boulder, CO

Sierra Nevada seeking Software Engineer in Hagerstown, MD

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.