asickrishna

Tuesday, May 22, 2012

FPGA Design tips

1. Reduce clock skew
2. Clock dividers
3. Avoid glitches on clocks and asynchronous set/reset signals
4. The Global Set/Reset network
5. Select a state machine encoding scheme
6. Access carry logic
7. Build efficient counters

Why Synchronous Design?

1.Synchronous circuits are more reliable
2.Events are triggered by clock edges which occur at welldefined
intervals
3. Outputs from one logic stage have a full clock cycle to
propagate to the next stage
4. Skew between data arrival times is tolerated within the same
clock period
5 .Asynchronous circuits are less reliable
6 .A delay may need to be a specific amount (e.g. 12ns)
7. Multiple delays may need to hold a specific relationship (e.g.
DATA arrives 5ns before SELECT)

Monday, January 10, 2011

What's the difference between retry and split in AHB?

The SPLIT and RETRY responses provide a mechanism for slaves to release the bus when they are unable to supply data for a transfer immediately. Both mechanisms allow the transfer to finish on the bus and therefore allow a higher-priority master to get access to the bus.

When a master initiates a transaction on the AMBA bus, if the target detects that the transfer will take a large number of cycles to perform, it can issue a SPLIT signal. What happens now is that the arbiter can grant the bus to other masters even before the SPLIT transaction is complete. The master to which the SPLIT has been issued has to then wait and complete the entire transaction.

During the address phase of a transfer the arbiter generates a tag, or bus master number, on HMASTER[3:0] which identifies the master that is performing the transfer. Any slave issuing a SPLIT response must be capable of indicating that it can complete the transfer, and it does this by making a note of the master number on the HMASTER[3:0] signals.

Later, when the slave can complete the transfer, it asserts the appropriate bit, according to the master number, on the HSPLITx[15:0] signals from the slave to the arbiter. The arbiter then uses this information to unmask the request signal from the master and in due course the master will be granted access to the bus to retry the transfer. The arbiter samples the HSPLITx bus every cycle and therefore the slave only needs to assert the appropriate bit for a single cycle in order for the arbiter to recognize it.

The basic stages of a SPLIT transaction are:
1. The master starts the transfer in an identical way to any other transfer and
issues address and control information
2. If the slave is able to provide data immediately it may do so. If the slave
decides that it may take a number of cycles to obtain the data it gives a SPLIT
transfer response. During every transfer the arbiter broadcasts a number, or
tag, showing which master is using the bus. The slave must record this number,
to use it to restart the transfer at a later time.
3. The arbiter grants other masters use of the bus and the action of the SPLIT
response allows bus master handover to occur. If all other masters have also
received a SPLIT response then the default master is granted.
4. When the slave is ready to complete the transfer it asserts the appropriate bit of
the HSPLITx bus to the arbiter to indicate which master should be regranted
access to the bus.
5. The arbiter observes the HSPLITx signals on every cycle, and when any bit of
HSPLITx is asserted the arbiter restores the priority of the appropriate master.
6. Eventually the arbiter will grant the master so it can re-attempt the transfer. This
may not occur immediately if a higher priority master is using the bus.
7. When the transfer eventually takes place the slave finishes with an OKAY
transfer response.

For a SPLIT transfer the arbiter will adjust the priority scheme so that any other
master requesting the bus will get access, even if it is a lower priority. In order
for a SPLIT transfer to complete the arbiter must be informed when the slave has
the data available.

For RETRY the arbiter will continue to use the normal priority scheme and
therefore only masters having a higher priority will gain access to the bus.

AHB LOCKED Transfer

LOCK tells the arbiter to keep the current master granted, SPLIT tells the arbiter to grant another master, so the only possible action the arbiter can take for these contradictory requests is to grant the dummy master that must exist in any system with SPLIT capable slaves.

The dummy master will only perform IDLE transfers (i.e. no data transfers), so cannot corrupt the LOCKed sequence that is ongoing.

When the original slave is able to complete the SPLIT transfer, it will signal this to the arbiter on HSPLIT and the arbiter can then re-grant the original master, and the LOCKed sequence can then continue.

However as the slave has an HMASTLOCK input telling it that the current transfer is part of a LOCKed sequence, it should know that there is no system advantage in returning a SPLIT.

So yes, a slave can return a SPLIT response to a LOCKed transfer, and the arbiter must then grant the dummy master, but the slave should use the HMASTLOCK input to see that a SPLIT response is not useful at this time.

AMBA AHB FAQ

AMBA AHB - Arbitration
----------------------

1. When should a master assert and deassert the HLOCK signal for a locked
transfer?

The HLOCK signal must be asserted at least one cycle before the start of
the address phase of a locked transfer. This is required so that the
arbiter can sample the HLOCK signal as high at the start of the address
phase.

The master should deassert the HLOCK signal when the address phase of the
last transfer in the locked sequence has started.

2. Can an arbiter be designed to always allow bursts to complete?

A SPLIT, RETRY or ERROR response from a slave can always cause a burst to
be early terminated. This is outwith the control of the Arbiter and so must
be supported.

Undefined length INCR bursts cannot have their end point predicted, so
there is no efficient way that an Arbiter design can allow the burst to
complete before granting another master. INCR bursts must be arbitrated on
a cycle by cycle basis.

Defined length INCRx and WRAPx bursts can have their beats counted, and so
allowed to complete by the Arbiter. However because of the AHB arbitration
synchronous timing, there is no way to avoid possibly terminating a burst
immediately after the first transfer of the burst has been indicated.

The Arbiter only knows that a defined length burst is in progress by
sampling the HBURST bus. However the first point at which HBURST can be
sampled is after the first clock cycle of the first burst beat, by which
time the Arbiter may already have decided to grant another master and will
have changed the HGRANT outputs accordingly. Only a combinatorial path from
HBURST to HGRANT would allow the burst to be detected in time to avoid
early termination in this scenario, but combinatorial paths in the AHB bus
are not allowed. ask ARM

3. Why is HADDR sometimes shown as an input to the arbiter?

The address bus, HADDR, is not required as an input to the arbiter but in
some system designs it may be useful to use the address bus to determine a
good point to change over between bus masters. For example, the arbiter
could be designed to change bus ownership when a burst of transfers reaches
a quad word boundary.

4. When can the HGRANT signal change?

The HGRANT signal can change in any cycle and the following cases are
possible:

* It is possible that the HGRANT signal may be asserted and then removed
before the current transfer completes. This is acceptable because the
HGRANT signal is only sampled by masters when HREADY is high.

* A master can be granted the bus without requesting it.

* The above point also means that it is possible to be granted the bus in
the same cycle that it is requested. This can occur if the master is
coincidentally granted the bus in the same cycle that it requests it.

5. What is the relationship between the HLOCK signal and the HMASTLOCK
signal?

At the start of the address phase of every transfer the arbiter will sample
the HLOCK signal of the master that is about to start driving the address
bus and if HLOCK is asserted at this point then HMASTLOCK will be asserted
by the arbiter for the duration of the address phase of the transfer.

6. When should a master deassert its HBUSREQ signal?

For an undefined length burst (INCR) a master must keep its HBUSREQ signal
asserted until it has started the address phase of the last transfer in the
burst. This will mean that if the penultimate transfer in the burst is zero
wait state then the master may be granted the bus for an additional
transfer at the end of an undefined length burst.

For a defined length burst the master can deassert the HBUSREQ signal once
the master has been granted the bus for the first transfer. This can be
done because the arbiter is able to count the transfers in the burst and
keep the master granted until the burst completes.

However it is not a mandatory requirement for an Arbiter to allow a burst
to complete, so the master will have to re-assert HBUSREQ if the Arbiter
removes HGRANT before the burst has been completed.

7. When will the arbiter grant another master after a locked transfer?

The arbiter will always grant the master an extra transfer at the end of a
locked sequence, so the master is guaranteed to perform one transfer with
the HMASTLOCK signal low at the end of the locked sequence. This coincides
with the data phase of the last transfer in the locked sequence.

During this time the arbiter can change the HGRANT signals to a new bus
master, but if the data phase of the last locked transfer receives either a
SPLIT or RETRY response then the arbiter will drive the HGRANT signals to
ensure that either the master performing the locked sequence remains
granted on the bus for a RETRY response, or the Dummy master is granted the
bus for the SPLIT response.

8. Can a master deassert HLOCK during a burst?

The AHB specification requires that all address phase timed control signals
(other than HADDR and HTRANS) remain constant for the duration of a burst.

Although HLOCK is not an address phase timed signal, it does directly
control the HMASTLOCK signal which is address phase timed.

Therefore HLOCK must remain high for the duration of a burst, and can only
be deasserted such that the following HMASTLOCK signal changes after the
final address phase of the burst.

9. If a master is currently granted the bus by default, how many cycles before
starting an non-IDLE transfer does it have to assert HBUSREQ?

None. It can start a non IDLE transfer immediately.

10. Can a master perform transfers other than IDLE when the bus was granted to
it, but not requested by the master?

Yes. A master can perform transfers other than IDLE when it had not
requested the bus. Please note that in this case it is still recommended
that the master asserts its request signal so that the arbiter does not
change ownership of the bus to a lower priority master while the transfers
are in progress.

_______________________________________________________________________________

AMBA 2 AHB - General
--------------------
1. The specification recommends that only 16 wait states are used. What should
you do if more than 16 cycles are needed?

For some slaves it is acceptable to insert more than 16 wait states. For
example, a serial boot ROM which is only ever accessed at initial power up
could insert a larger number of wait states and it would not affect the
calculation of the system performance and latency once system power up has
been completed.

For other slaves a number of options exist. A SPLIT or RETRY response could
be used to indicate that the slave is not yet able to perform the requested
data transfer, or the slave could be accessed either in response to
interrupts or after polling a status register, in either case indicating
that the slave is now able to respond in an acceptable number of cycles.

2. Why is a burst not allowed to cross a 1 kilobyte boundary?

If an AHB slave samples HSELx at the start of a burst transaction, it knows
it will be selected for the duration of the burst. Also, a slave which is
not selected at the start of a burst will know that it will not become
selected until a new burst is started.

1 kilobyte is the smallest area an AHB slave may occupy in the memory map.
Therefore, if a burst did cross a 1 kilobyte boundary, the access could
start accessing one slave at the beginning of the burst and then switch to
another on the boundary, which must not happen for the above reason.

The 1 kilobyte boundary has been chosen as it is large enough to allow
reasonable length bursts, but small enough that peripherals can be aligned
to the 1 kilobyte boundary without using up too much of the available
memory map.

3. Can an AHB master be connected directly to an AHB slave?

Any slave which does not use SPLIT responses can be connected directly to
an AHB master. If the slave does use SPLIT responses then a simplified
version of the arbiter is also required.

If an AHB master is connected directly to an AHB slave it is important to
ensure that the slave drives HREADY high during reset and that the select
signal HSEL for the slave is tied permanently high.

4. What is the state of the AHB signals during reset?

The specification states that during reset the bus signals should be at
valid levels. This simply means that the signals should be logic '0' or
'1', but not Hi-Z. The actual logic levels driven are left up to the
designer. HTRANS is the only signal specified during reset, with a
mandatory value of IDLE.

It is important that HREADY is high during reset. If all slaves in the
system drive HREADY high during reset then this will ensure that this is
the case. However, if slaves are used which do not drive HREADY high
during reset it should be ensured that a slave which does drive HREADY high
is selected at reset.

5. Can a BUSY transfer occur at the end of a burst?

A BUSY transfer can only occur at the end of an undefined length burst
(INCR). A BUSY transfer cannot occur at the end of a fixed length burst
(SINGLE, INCR4, WRAP4, INCR8, WRAP8, INCR16, WRAP16).

6. What is a default slave?

If the memory map of a system does not define the full 4 gigabyte address
space then a default slave is required, which is selected when an access is
attempted to the empty areas of the memory map. The default slave should
use an OKAY response for IDLE/BUSY transfers and an ERROR response sequence
for NONSEQ/SEQ transfers.

7. Is a default slave really necessary?

If the entire 4 gigabyte address space is defined then a default slave is
not required. If, however, there are undefined areas in the memory map then
it is important to ensure that a spurious access to a non-existent address
location will not lock up the system. The functionality of the default
slave is extremely simple and it will often make sense to implement this
within the decoder.

8. Is a dummy master really necessary?

A dummy master is necessary in any system which has a slave that can give
SPLIT transfer responses. The dummy master is required so that something
can be granted the bus if all the other masters have received a SPLIT
response.

No logic is required for the dummy master and it can be implemented by
simply tying off the inputs to the master address/control multiplexer for
the dummy master position. The requirements for a dummy master are that
HTRANS is driven to IDLE, HLOCK is driven low, and all other master outputs
are driven to legal values.

9. Is it specified that HPROT, HSIZE and HWRITE remain constant throughout a
burst?

Yes, the control signals must remain constant throughout the duration of a
burst.

10. What default state should be used for the HREADY and HRESP outputs from a
slave?

It is recommended that the default value for HREADY is high and the default
value for HRESP is OKAY. This combination ensures that the slave will
respond correctly to IDLE transfers to the slave, even if the slave is in
some form of power saving mode.

11. Is HREADY an input or an output from slaves?

An AHB slave must have the HREADY signal as both an input and an output.

HREADY is required as an output from a slave so that the slave can extend
the data phase of a transfer.

HREADY is also required as an input so that the slave can determine when
the previously selected slave has completed its final transfer and the
first data phase transfer for this slave is about to commence.

Each AHB Slave should have an HREADY output signal (conventionally named
HREADYOUT) which is connected to the Slave-to-Master Multiplexer. The
output of this multiplexer is the global HREADY signal which is routed to
all masters on the AHB and is also fed back to all slaves as the HREADY
input.

12. How many masters can there be in an AHB system?

The AHB specification caters for up to 16 masters. However, allowing for a
dummy bus master means the maximum number of real bus masters is actually
15. By convention bus master number 0 is allocated to the dummy bus
master.

13. Can a master change the address/control signals during a waited transfer?

Yes. If the address/control signals are indicating an IDLE transfer then
the master can change to a real transfer (NONSEQ) when HREADY is low.

However, if a master is indicating a real transfer (NONSEQ or SEQ) then it
cannot cancel this during a waited transfer unless it receives a SPLIT,
RETRY or ERROR response.

14. When a master rebuilds a burst which has been terminated early are there
any limitations on how it rebuilds the burst?

The only limitation is that the master uses legal burst combinations to
rebuild the burst. For example, if a master was performing an 8 beat burst,
but had only completed 3 transfers before losing control of the bus, then
the remaining 5 transfers could be performed either by using a 1 beat
SINGLE burst followed by a 4 beat INCR4 burst, or it could be performed
using a 5 beat undefined length INCR burst.

For simplicity it is recommended that masters use INCR bursts to rebuild
the remaining transfers.

15. What is the recommended default value for HPROT?

Many bus masters will not be able to generate accurate protection
information and for these bus masters it is recommended that the HPROT
encoding shows, Non-cacheable, Non-bufferable, Privileged, Data Accesses
which corresponds to HPROT[3:0] = 4'b0011.

16. Do all slaves have to support the BUSY transfer type?

Yes. All slaves must support the BUSY transfer type to ensure they are
compatible with any bus master.

17. What system support is required if a slave can be powered down or have its
clock stopped?

If a slave access is attempted while that slave is in a power down state or
has had its clock stopped, you must ensure that an access will cause the
power/clock to be restored, or else configure the AHB decoder up to
redirect any such accesses to the dummy slave so that the system does not
hang forever when an access to the device is made when it is disabled.

Redirecting the access in this way will ensure that random "IDLE" addresses
are treated with the HREADY high and HRESP=OKAY default response, but real
accesses (NONSEQ or SEQ) will be detected with an ERROR response.

18. When can Early Burst Termination occur

Bursts can be early terminated either as a result of the Arbiter removing
the HGRANT to a master part way through a burst, or after a slave returns a
non-OKAY response to any beat of a burst. Note however that a master cannot
decide to terminate a defined length burst unless prompted to do so by the
Arbiter or Slave responses.

All AHB Masters, Slaves and Arbiters must be designed to support Early
Burst Termination.

19. Does the address have to be aligned, even for IDLE transfers?

Yes. The address should be aligned according to the transfer size (HSIZE)
even for IDLE transfers. This will prevent spurious warnings from bus
monitors used during simulation.

20. What is the difference between a dummy bus master and a default bus
master?

The term default bus master is used to describe the master that is granted
when none of the masters in the system are requesting access to the bus.
Usually the bus master which is most likely to request the bus is made the
default master.

The dummy bus master is a master which only performs IDLE transfers. It is
required in a system so the arbiter can grant a master which is guaranteed
not to perform any real transfers. The two cases when the arbiter would
need to do this are when a SPLIT response is given to a locked transfer and
when a SPLIT response is given and all other masters have already been
SPLIT.

21. Is it legal for a master to change HADDR when a transfer is extended?

If a master is indicating that it wants to do a NONSEQ, SEQ or BUSY
transfer then it cannot change the address during an extended transfer
(when HREADY is low) unless it receives an ERROR, RETRY or SPLIT response.
If the master is indicating that it wants to do an IDLE transfer then it
may change the address.

22. Can HTRANS change whilst HREADY is low?

In general, an AHB master should not change control signals whilst HREADY
is low. However it is allowable to change HTRANS in the following
conditions:

* HTRANS = IDLE
The AHB master is performing internal operations and has not yet
committed to a bus transfer. However during the AHB wait states (HREADY
low) the master may determine that a bus transfer is required and change
HTRANS on the next cycle to NONSEQ.

* HTRANS = BUSY
HTRANS is being used to give the master time to complete internal
operations, which may be entirely independent of HREADY (i.e. wait states
on the AHB). Therefore HTRANS can change on the next cycle to any legal
value, i.e. SEQ if the burst is to continue, IDLE if the burst has
completed, NONSEQ if a separate burst is to begin.

* HRESP = SPLIT/RETRY
As stated in the AHB specification, a master must assert IDLE on HTRANS
during the second cycle of the two-cycle SPLIT or RETRY slave response so
HTRANS will change value from the first cycle to the second cycle of the
response.

* HRESP = ERROR
The master is permitted to change HTRANS in reaction to an ERROR response
in the same way as in reaction to a SPLIT/RETRY response and cancel any
further beats in the current burst (even if HBURST is indicating a
defined-length burst). In this case HTRANS changes to IDLE on the second
cycle of the response. Alternatively, the master is permitted to continue
with the current transfers.

23. What are the different bursts used for?

Typically a master would use wrapping bursts for cache line fills where the
master wants to access the data it requires first and then it completes the
burst to fetch the remaining data it requires for the cache line fill.
Incrementing bursts are used by masters, such as DMA controllers, that are
filling a buffer in memory which may not be aligned to a particular address
boundary.

24. What sequences of transfers types (HTRANS) can occur on the bus?

The following examples show some of the sequences of HTRANS that can occur
on the bus:

A normal burst of four transfers followed by an IDLE.
N - S - S - S - I

A normal burst of four transfers which includes BUSY transfers.
N - S - B - S - B - S - I

A burst of four transfers followed by another burst.
N - S - S - S - N - S - S - S - I

A single transfer followed by a burst of four transfers.
N - N - S - S - S - I

A single transfer followed by an IDLE
N - I

An undefined length burst which concludes with a BUSY transfer.
N - B - S - B - S - B - I

An undefined length burst which concludes with a BUSY transfer and is followed
immediately by another burst.

N - B - S - B - S - B - N - S

25. How should AHB to APB bridges handle accesses that are not 32-bits?

The bridge should simply pass the entire 32-bit data bus through the
bridge. Please note that when transfers less than 32-bits are performed to
an APB slave it is important to ensure that the peripheral is located on
the appropriate bits of the APB data bus.

_______________________________________________________________________________

AHB - Split/Retry
-----------------

1. What value should be used for HTRANS when an AHB master gets a RETRY
response from a slave in the middle of burst?

Whenever a transfer is restarted it must use HTRANS set to NONSEQ and it
may also be necessary to adjust the HBURST information (usually just to
indicate INCR).

2. What address should be on the bus during the IDLE cycle after a SPLIT or
RETRY?

It does not matter what address is driven onto the bus during this cycle.
The slave selected by the driven address should not take any action and
must respond with a zero wait state OKAY response.

In many cases it will be simpler for the master to leave the address
unaltered during this cycle, so that it remains at the address of the next
transfer that the master wishes to perform and only in the following cycle
does the master return the address to that of the transfer that must be
repeated because of the SPLIT or RETRY response.

In some designs it may be possible for the master to return the address to
that required to repeat the previous transfer during the IDLE cycle and
this behaviour is also perfectly acceptable.

3. Do all masters have to support SPLIT and RETRY?

Yes. All masters must support SPLIT and RETRY responses to ensure they are
compatible with any bus slave. A master will handle both SPLIT and RETRY
responses in an identical manner.

4. Can a SPLIT or RETRY response be given at any point during a burst?

Yes. A SPLIT, RETRY or ERROR response can be given by a slave to any
transfer during a burst. The slave is not restricted to only giving these
responses to the first transfer.

5. Will a master always lose the bus after a SPLIT response?

Yes. A slave must not assert the relevant bit of the HSPLIT bus in the same
cycle that it gives the SPLIT response and therefore the master will always
lose the bus.

6. Can a slave assert HSPLITx in the same cycle that it gives a SPLIT
response?

No. The specification requires that HSPLITx can only be asserted after the
slave has given a SPLIT response.

7. Do all slaves have to support the SPLIT and RETRY responses?

No. A slave is only required to support the response types that it needs to
use. For example, a simple on-chip memory block which can respond to all
transfers in just a few wait states does not need to use either the SPLIT
or RETRY responses.

8. Can a slave use both SPLIT and RETRY responses?

Normally a slave will not use both the SPLIT and RETRY responses. The SPLIT
response should be used by any slave that may be accessed by many different
masters at the same time. The RETRY response is intended to be used by
peripherals that are only accessed by one bus master.

9. What is the difference between SPLIT and RETRY responses?

Both the Split and Retry responses are used by slaves which require a large
number of cycles to complete a transfer. These responses allow a data phase
transfer to appear completed to avoid stalling the bus, but at the same
time indicate that the transfer should be re-attempted when the master is
next granted the bus.

The difference between them is that a SPLIT response tells the Arbiter to
give priority to all other masters until the SPLIT transfer can be
completed (effectively ignoring all further requests from this master until
the SPLIT slave indicates it can complete the SPLIT transfer), whereas the
RETRY response only tells the Arbiter to give priority to higher priority
masters.

A SPLIT response is more complicated to implement than a RETRY, but has the
advantage that it allows the maximum efficiency to be made of the bus
bandwidth.

The master behaviour is identical to both SPLIT and RETRY responses, the
master has to cancel the next access and re-attempt the current failed
access.

_______________________________________________________________________________

AMBA 2 APB - General
--------------------
1. Why is there no wait signal on the APB?

The APB has been designed to implement as simple an interface as possible.
Having this simple design makes it much easier to connect new APB
peripherals and makes the analysis of the system performance easier to
calculate.

Although many APB peripherals are slow devices, such as UARTs, they are
normally accessed via control registers. Typically the driver software will
first access a status register to determine that data is available and only
then access the data register. Both of these accesses are possible without
the addition of wait states and therefore the peripheral can easily be
accessed as an APB device.

Peripherals which do require wait states can be designed as AHB slaves and
in the rare case that a design does include a large number of these
peripherals then a secondary stub AHB can be used to reduce the loading on
the main system bus.

2. How should AHB to APB bridges handle accesses that are not 32-bits?

The bridge should simply pass the entire 32-bit data bus through the
bridge. Please note that when transfers less than 32-bits are performed to
an APB slave it is important to ensure that the peripheral is located on
the appropriate bits of the APB data bus
_______________________________________________________________________________

Migrating from AHB to AXI based SoC Designs

This article describes the most important AMBA bus architectures and how they evolved to accommodate to the ever increasing complexity of SoC technology. Digital designers will learn about the differences between common bus-based and recent transaction based interconnection architectures.

AHB revisited

AHB (Advanced High-performance Bus) first appeared to the public as part of AMBA 2.0 Specification and set out to replace ASB (Advanced System Bus) as the basis for ARM based System on Chip (SoC) interconnect fabrics between processor(s), internal/external memory controllers, and other high-bandwidth peripherals.

Both being traditional bus systems AHB and ASB are fairly similar in concept. The newer AHB, with only unidirectional (multiplexed, rather than tri-stated) signals, has been specifically aimed at synthesizable, DFT-friendly ASIC designs.

AHB supports single data access and various types of burst accesses (including wrapping bursts to support cache line fill operations). Each transfer is defined by an address and a data phase where the address phase of one transfer occurs during the data phase of the previous transfer.

Underlying AHB is a traditional bus architecture with arbitration between multiple masters. The protocol supports advanced features such as SPLIT and RETRY signaling in cases where a slave is not able to respond immediately. The master that had been granted the bus will back off and other masters will get a turn.

Multiplexed bus

Multi-layer AHB and AHB-Lite

Although traditional multiplexed multi-master systems are still quite common, little over a decade ago the ARM SoC world started shifting towards crossbar switched interconnects, in the form of multi-layer busses. This was a rather important initial step which lead over time to some critical improvements:

Each layer of the bus is an independent single master AHB system. Instead of a rather complex monolithic multiplexing scheme, a multi-layer AHB bus architecture with M masters and S slaves is structured as M X 1:S multiplexers plus S X M:1 slave multiplexers all connected to separate arbitration and decoding logic.

Multilayer bus

Consequently, multiple masters can talk to multiple slaves concurrently, as long as no two masters don't try to access the same slave at the same time. Think of a DMA controller moving data from a receiver into a memory region, while the processor continues to execute code in a different memory region.

All arbitration and protocol complexity moves into the fabric. The interface implementation becomes simpler as a number of unneeded signals, most notably HGRANT and HBUSREQ, can be removed along with their associated protocol. Although not a necessary consequence of the multi-layer architecture, getting rid of the unpopular SPLIT and RETRY handshaking mechanism was another advantage.

With the advent of AMBA3 this AHB subset has been standardized upon as AHB-Lite.

AXI3

With modern Systems on Chip including multi-core clusters, additional DSP, graphics controllers ond other sophisticated peripherals, the system fabric poses a critical performance bottleneck. The AHB protocol, even in its multi-layer configuration cannot keep up with the demands of today's SoC. The reasons for this include:

AHB is transfer-oriented. With each transfer, an address will be submitted and a single data item will be written to or read from the selected slave. All transfers will be initiated by the master. If the slave cannot respond immediately to a transfer request the master will be stalled. Each master can have only one outstanding transaction.
Sequential accesses (bursts) consist of consecutive transfers which indicate their relationship by asserting HTRANS/HBURST accordingly.
Although AHB systems are multiplexed and thus have independent read and write data busses², they cannot operate in full-duplex mode.

An AXI interface consists of up to five channels (write address, write data, write response, read address, read data/response) which can operate largely independently of each other. Each channel uses the same trivial handshaking between source and destination (master or slave, depending on channel direction), which simplifies the interface design.

AXI channel handshake

Unlike AHB(-Lite), in the new AXI (Advanced eXtensible Interface) the P-t-P (point-to-point) concept is not an afterthought but is the central focus of the protocol design.

In AXI3 all transactions are bursts of lengths between 1 and 16. The addition of byte enable signals for the data bus supports unaligned memory accesses and store merging.

The communication between master and slave is transaction-oriented, where each transaction consists of address, data, and response transfers on their corresponding channels. Apart from rather liberal ordering rules there is no strict protocol-enforced timing relation between individual phases of a transaction. Instead every transfer identifies itself as part of a specific transaction by its transaction ID tag. Transactions may complete out-of-order and transfers belonging to different transactions may be interleaved. Thanks to the ID that every transfer carries, out-of-order transactions can be sorted out at the destination.

AXI write burst

This flexibility requires all components in an AXI system to agree on certain parameters, such as write acceptance capability, read data reordering depth and many others.

Due to the vast number of signals that make up a read/write AXI connection, routing a large AXI fabric could be thought of as rather challenging. However, the independent channels in an AXI fabric make it possible to choose a different routing structure depending on the expected data volume on that channel. Given a situation where the majority of transactions will transfer more than one data item, data channels should be routed via crossbar so that different streams can be processed at the same time. Address and response channels experience rather lower traffic and could perhaps be multiplexed.

Some experts consider it an advantage to provide AXI only at the interface level, while a special packetized routing protocol is used inside the fabric, a so called Network-on-Chip (NoC).

AXI4

AXI4 is the latest revision of the AXI protocol described above. Functionality has been added and several known issues in AXI3 have been addressed to ensure that AMBA busses remain the dominant standard in SoC connectivity. Some key points:

The maximum burst length has been increased from 16 to 256 transfers for certain types of bursts (INCR, non-exclusive).

Additional Quality-of-Service signaling has been added, where the finer details of the interpretation are implementation defined.

AXI4 defines address regions for slaves, which allows implementations of memory perspectives on the bus level. No doubt this will be used at some point in the future to break the 4GB address boundary. [Update: Since this article has been written, Cortex-A15 was announced with this very functionality]

Some ordering requirements and transfer dependencies have been refined, as have the meanings of the cache policy signals AxCACHE. Abstract memory types as defined by ARMv6/v7 architectures and multicore architectures are much better represented by these changes.

Implementation-defined per-channel sideband signals are now officially supported as AxUSER.

Legacy (AHB) locked transfers are no longer supported. The entire concept that a master can request exclusive access to the entire bus doesn't fit within the idea of a switched interconnect. The one-and-only ARM instruction causing this signal to be asserted is no longer supported in the v7 architectures¹.

A rather significant change seems to be the banning of write interleaving, which could help improve the system throughput. In practice, removing write interleaving from this part of the AMBA standard makes certain aspects of the AXI protocol easier to handle. Write interleaving is hardly used by regular masters but can be used by fabrics that gather streams from different sources. With the new AXI4-Stream protocol (see below), write interleaving is still available for fabrics.

AXI4-Stream

The new AXI4-Stream protocol was designed for streaming data to destinations that are not memory mapped internally. Display controllers, transmitters, but also routing fabrics are among the target applications for this new protocol.

Building upon the proven simple AXI channel handshake AXI4-Stream is essentially an AXI write data channel with additional control signals and a slightly modified protocol. The burst (packet) length is not restricted and the number of bytes of the data signal TDATA can be an arbitrary integer including zero.

AXI4-Lite

As described so far the focus of AXI has been on high-performance data transfer, but what about the low-end - hardware registers, configuration, etc? With good old APB there is an established, robust interface, which received an upgrade in AMBA3 extending it with slave response signaling (PERROR, PREADY), a feature that was missed dearly by designers.

You may ask what exactly was the issue with APB anyway? The answer is the bridge. In a traditional system, including AMBA3, one or more of the slaves are bridges between the main system protocol (AHB, AXI) and APB. The intention was that with many small peripherals on a "real" bus including the multi-layer variant, the fan-out of multidrop signals (HWDATA, HADDR in AHB) would be too high. A typical bridge supports up to sixteen slaves, which are assigned fragments of the address region occupied by the bridge itself so that all APB peripherals connected to this bridge are in one contiguous address region.

In modern interconnects, you may find built-in 1:1 bridges which connect between system bus and a single APB slave, enabling higher flexibility. Still a bridge though.

AXI4-Lite addresses this last issue by defining certain restrictions that would allow a slave to be connected directly to an AXI fabric. In AXI4-Lite, you might say that AXI gets "dumbed" down to a few basic transaction types. The burst length is fixed to one data transfer, transfers are non-cacheable and non-bufferable, exclusive access is not allowed and access width must always be the same as data bus width. This is supposed to make the interface design simple enough to be implemented quickly in custom IP.

Summary

Over the years AMBA has continued to provide state-of-the-art solutions for SoC interconnects. With the relatively recent addition of the AXI4 protocol family ARM maintains a competitive advantage in the field of high-performance SoC, while at the same time AHB-Lite is still available for less demanding architectures.

References

AMBA v2.0 Specification (IHI 0011A)
AMBA3 AHB-Lite v1.0 Specification (IHI 0033A)
AMBA AXI v2.0 Specification (IHI 0022C)
AXI4-Stream v1.0 Specification (IHI 0051A)
Cortex-A9 Technical Reference Manual (DDI 0388F)