Trace port on PowerPC 405 cores
01 June 2007
The use of one or more embedded processor cores in FPGA-based digital design is common. Cost and integration are both reasons to move away from the use of a separate microprocessor chip

This is especially true given new tools that enable easy debug of systems with certain embedded processors. For example, the Xilinx Virtex II Pro and Virtex 4 families of FPGAs offer embedded IBM PowerPC 405 processors with a built-in trace port. When used in conjunction with an IBM 405 Trace Port inverse assembler on a logic analyser, this port allows the user to minimise the number of pins required on their FPGA while still providing visibility of program flow on the embedded microprocessor. Without the use of the Trace Port, this would require up to 50 signals to observe.
A typical system
A simple digital system might utilise a Xilinx Virtex II Pro FPGA, external SDRAM, and custom I/O for external system control. The FPGA design might include a state machine to drive memory controller operations associated with the capture of external data, and the IBM 405 processor could be involved in driving some DSP operations. A common scenario would be that the external data is being corrupted once it reaches the memory but it is not obvious why.
The Trace Port captures raw bits in real time that contain the minimum information necessary to reconstruct the flow of the code. This information is then used in a nonreal-time process of program flow reconstruction. Using the Trace Port information in combination with the target executable (.elf file) and a pointer to the source files, the reconstruction of the flow of the code can be displayed in both assembly and source mode. The number of cycles taken to execute an instruction can also be determined.
The number of signals involved with an IBM 405 processor, just counting control and address lines alone, is upwards of 50 signals, which is more than a designer would usually consider routing out to FPGA pins in order to see them. Through this reconstruction process, the Trace Port can work off only eight signals. 32bit addresses are sent in four 8bit Bytes, for example. Information is clocked in on both the rising and falling edge of clock and also encoded, so it takes work to capture this trace port data and make sense of it.
Interpreting the data
Interpreting trace port data requires a smart trace port decoder and inverse assembler connected to a logic analyser. There are also advantages in laying out a target board with particular connections for the logic analyser to simplify setup and to take advantage of pre-configured channel setups. There are several options for how to access JTAG (for processor run control) and logic analyser trace signals with some trade-offs to consider.
In the case of the 405 program execution with the Trace Port and Trace Port decoder, the process begins with an initial condition where the Trace Port sends out a starting 32bit address in memory where everything begins. The Trace Port is only 8bits wide, so it takes multiple clock cycles to get this full address out to the logic analyser.
The Trace Port decoder determines that the microprocessor is executing a program instruction at an address. With the 32bit address, the decoder looks into the target executable file (the .elf file) and sees the program instruction that should be at that address. If not probing the data bus to read what is in memory at that address in order to conserve pins, the .elf file must be used to determine what is at that address and therefore what was executed. Once the type of program instruction that was executed is known, the inverse assembler can write that assembly instruction out for display on the logic analyser.
That process continues, and the processor might execute an instruction repeatedly and then execute a branch instruction. If there is not a branch instruction, the decoder looks at the sequential locations in memory to see what was executed. If there is a direct branch instruction, the Trace Port outputs signals that say there was a direct branch. Because the decoder can find the destination of that branch by looking in the .elf file, it can decode the instruction that it executed a branch and that the next address is 0x1000, and determine the next executed instruction.
In the pipeline
If there is an indirect branch through a register, the Trace Port must output some portion of that branch address and the decoder works with that address. The Trace Port operates directly from the processor execution unit, so it is not affected by caching or pipelining.
There are two separate streams of data from the 8bits of the Trace Port. The first is in real time and is the part that executed an instruction repeatedly and then executed a branch instruction. The other stream is the address information, where an indirect branch address was output on the Trace Port so the decoder could know where the branch went.
Physical probing
A logic analyser can be connected to a Trace Port with flying leads, a Mictor connector, SoftTouch connector-less probing, and so on. Configuration files simplify setup of the logic analyser, so the best approach is to bring the Trace Port signals out in the way specified in the documentation for the trace tools. For example, to use a Mictor connector, the signals would be routed to it.
It is usual to use a debugger and run/control (requiring connection to the JTAG chain) while making logic analyser (Trace Port) measurements. There are three different options for how to bring both JTAG and Trace Port signals off the board. The first two require two connectors (one for JTAG and the other for the Trace Port signals) while the third brings both JTAG and Trace Port signals out through a single connector, which requires a splitter board.
Time correlation
It is often helpful during debug to get a system view of what is going on that requires probing other signals, both internal to the FPGAs and on their periphery or in other parts of the system. A number of options exist here. Trace cores can be placed inside FPGAs to provide access to other internal FPGA signals in conjunction with the Trace Port measurements.
The recommended approach is to use the Xilinx Embedded Development Kit (EDK) from Xilinx Platform Studio to insert the microprocessor core, then invoke Core Inserter from Xilinx ISE and ChipScope Pro to insert logic analyser measurement cores. If timecorrelated measurements are desired on signals external to the FPGA, like the interface to external memory, it is important to turn off cache. Good correlation can be expected between the 405 trace and those external measurements.
The logic analyser can be split into two analysers, each tied to a clock domain or able to run from an internal logic analyser clock in timing mode, providing greater flexibility to handle these signals from different parts of the system. The 16800 series portable logic analyser was used in the 405 trace example, and has capabilities for system level measurements as well.
Validation
Debug and validation of FPGA-based systems that include an embedded IBM PowerPC 405 processor is greatly aided with the ability to trace program execution via the Trace Port and a logic analyser. By planning ahead and routing a few signals from the FPGA in a pre-described way, the process is greatly simplified and automated.
Since the Trace Port probes behind the cache, there is good visibility despite caching and pipelining that might be taking place. Full correlation is possible between a source view and disassembly view, even with cache turned on. It is also possible to get timecorrelated measurements between the 405 trace, other internal FPGA signals, and signals external to the FPGA, but only once the cache is turned off.
BRAD FRIEDEN is applications development engineer, Digital Verification Solutions, Agilent
Contact Details and Archive...
Related Articles...
Most Viewed Articles...