Raspberry Pi Secondary Memory Interface (SMI)

Colour video signal captured at 25 MS/s

The Secondary Memory Interface (SMI) is a parallel I/O interface that is included in all the Raspberry Pi versions. It is rarely used due to the acute lack of publicly-available documentation; the only information I can find is in the source code to an external memory device driver here, and an experimental IDE interface here.

However, it is a very useful general-purpose high-speed parallel interface, that deserves wider usage; in this post I’m testing it with digital-to-analogue and analogue-to-digital converters (DAC and ADC) but there are many other parallel-bus devices that would be suitable.

To take advantage of the high data rates, I’ll be using the C language, and Direct Memory Access (DMA); if you are unfamiliar with DMA on the RPi, I suggest you read my previous 2 posts on the subject, here and here.

Parallel interface

Raspberry Pi SMI signals

The SMI interface has up to 18 bits of data, 6 address lines, read & write select lines. Transfers can be initiated internally, or externally via read & write request lines, which can take over the uppermost 2 bits of the data bus. Transfer data widths are 8, 9, 16 or 18 bits, and are fully supported by First In First Out (FIFO) buffers, and DMA; this makes for efficient memory usage when driving an 8-bit peripheral, since a single 32-bit DMA transfer can automatically be converted into four 8-bit accesses.

If you have ever worked with the classic bus-interfaces of the original microprocessors, you’ll feel quite at home with SMI, but no need to worry about timing problems, because the setup, strobe & hold times are fully programmable with 4 nanosecond resolution; what luxury!

The SMI functions are assigned to specific GPIO pins:

The GPIO pins to be included in the parallel interface are selected by setting their mode to ALT1; there is no requirement to set all the SMI pins in this way, so the I2C, SPI and PWM interfaces are still quite usable.

Parallel DAC

Hardware

The simplest device to drive from the parallel bus is a digital-to-analogue converter (DAC), using resistors from each data line to a common output. This arrangement is commonly known as an R-2R ladder, due to the resistor values needed.

I’ve used a pre-built device from Digilent (details here, or newer version here) but it is easy to make your own using discrete resistors; the least-significant is connected to GPIO8 (SD0), and the most-significant to GPIO15 (SD7).

Software

I’ll be making extensive use of the dma_utils functions that were created for my previous DMA projects, but before diving into the complication of SMI, it is helpful to test the hardware using simpler GPIO commands:

#define DAC_D0_PIN      8
#define DAC_NPINS       8

extern MEM_MAP gpio_regs;
map_periph(&gpio_regs, (void *)GPIO_BASE, PAGE_SIZE);

// Output value to resistor DAC (without SMI)
void dac_ladder_write(int val)
{
    *REG32(gpio_regs, GPIO_SET0) = (val & 0xff) << DAC_D0_PIN;
    *REG32(gpio_regs, GPIO_CLR0) = (~val & 0xff) << DAC_D0_PIN;
}

// Initialise resistor DAC
void dac_ladder_init(void)
{
    int i;
    
    for (i=0; i<DAC_NPINS; i++)
        gpio_mode(DAC_D0_PIN+i, GPIO_OUT);
}

// Output sawtooth waveform
dac_ladder_init();
while (1)
{
    i = (i + 1) % 256;
    dac_ladder_write(i);
    usleep(10);
}

This is less-than-ideal because we have to use one command to set some I/O pins to 1, and another command to clear the rest to 0, so in the gap between them the I/O state will be incorrect; also we won’t get accurate timing with the usleep command.

To my surprise, when I ran this code on a Pi Zero, and viewed the output on an oscilloscope, it didn’t look too bad; however, as soon as I moved the mouse, there were very significant gaps in the output, so clearly we need to do better.

SMI register definitions

To use SMI, we first need to define the control registers, and the bit-values within them. The primary reference is bcm2835_smi.h from the Broadcom external memory driver, but I found this difficult to use in my code, so converted the definitions into C bitfields; this makes the code a bit less portable, but a lot simpler and easier to read.

Also, when learning about a new peripheral, it is helpful if the bitfield values can be printed on the console. This normally requires the tedious copying of register field names into string constants, but with a small amount of macro processing, this can be done with a single definition, for example the SMI CS register:

#define REG_DEF(name, fields) typedef union {struct {volatile uint32_t fields;}; volatile uint32_t value;} name

#define SMI_CS_FIELDS \
    enable:1, done:1, active:1, start:1, clear:1, write:1, _x1:2,\
    teen:1, intd:1, intt:1, intr:1, pvmode:1, seterr:1, pxldat:1, edreq:1,\
    _x2:8, _x3:1, aferr:1, txw:1, rxr:1, txd:1, rxd:1, txe:1, rxf:1  
REG_DEF(SMI_CS_REG, SMI_CS_FIELDS);

volatile SMI_CS_REG  *smi_cs;

smi_cs  = (SMI_CS_REG *) REG32(smi_regs, SMI_CS);

The last bit of code is needed so that smi_cs points to the register in virtual memory; if you don’t understand why, I suggest you read my post on RPi DMA programming here. Anyway, the upshot of all this code is that we can access the whole 32-bit value of the register as smi_cs->value, and also individual bits such as smi_cs->enable, smi_cs->done, etc.

To print out the bit values, we use macros to convert the register definition to a string, then have a simple C parser:

#define STRS(x)     STRS_(x) ","
#define STRS_(...)  #__VA_ARGS__

char *smi_cs_regstrs = STRS(SMI_CS_FIELDS);

// Display bit values in register
void disp_reg_fields(char *regstrs, char *name, uint32_t val)
{
    char *p=regstrs, *q, *r=regstrs;
    uint32_t nbits, v;
    
    printf("%s %08X", name, val);
    while ((q = strchr(p, ':')) != 0)
    {
        p = q + 1;
        nbits = 0;
        while (*p>='0' && *p<='9')
            nbits = nbits * 10 + *p++ - '0';
        v = val & ((1 << nbits) - 1);
        val >>= nbits;
        if (v && *r!='_')
            printf(" %.*s=%X", q-r, r, v);
        while (*p==',' || *p==' ')
            p = r = p + 1;
    }
    printf("\n");
}

Now we can display all the non-zero bit values using:

disp_reg_fields(smi_cs_regstrs, "CS", *REG32(smi_regs, SMI_CS));

..which produces a display like..

CS 54000025 enable=1 active=1 write=1 txw=1 txd=1 txe=1

SMI registers

The SMI registers are:

CS:  control and status
L:   data length (number of transfers)
A:   address and device number
D:   data FIFO
DMC: DMA control
DSR: device settings for read
DSW: device settings for write
DCS: direct control and status
DCA: direct control address and device number
DCD: direct control data

You can specify up to 4 unique timing settings for read & write, making 8 settings in total. The settings are specified by giving a 2-bit device number for each transaction; this selects 1 of the 4 descriptors for read or write. I’ve only used one pair of settings, and the ADC & DAC don’t have address lines, so the address & device register remains at zero.

Direct mode is a simple way of doing accesses using the appropriate timings, but without DMA; it has separate address, data and control registers.

Some notable fields in the control & status register are:

Enable: it is obvious that this bit must be set for SMI to work, but it is less obvious when that should be done. Initially, I assumed it was necessary to enable the interface before any other initialisation, but then it responded with the ‘settings error’ bit set. So now I do most of the configuration with the device disabled, then enable it before clearing the FIFOs and enabling DMA, otherwise the transfers go through immediately.

Start: set this bit to start the transfer; the SMI controller will perform the number of transfers in the length register, using the timing parameters specified in DSR (for read) or DSW (for write). If there is a backlog of data (FIFO is full) the transaction may stall.

Pxldat: when this ‘pixel data’ bit is set, the 8- or 16-bit data is packed into 32-bit words.

Pvmode: I have no idea what this ‘pixel valve’ mode should do; any information would be gratefully received.

Direct Mode

As the name implies, SMI Direct Mode allows you to perform a single I/O transfer without DMA. However, it is still necessary to specify the timing parameters of the transfer, specifically:

  • The clock period, that will be used for the following timing:
    • The setup time, that is used by the peripheral to decode the address value
    • The width of the strobe pulse, that triggers the transfer
    • The hold time, that keeps the signals stable after the transfer

To add to the complication, the SMI controller can drive 4 peripheral devices, each with its own individual read & write settings, so there are a total of 8 timing registers. I’m keeping this simple by always using the first register pair (for device zero) but it is worth remembering that you can define more than one set of timings, and quickly switch between them by setting the device number.

Likewise, I’m ignoring the address field since it is also redundant for my DAC; for safety, I clear all the SMI registers on startup, in case there are any residual unwanted values.

As it happens, this setup/strobe/hold timing is largely redundant for our simple resistor DAC (since it doesn’t latch the data) but we still need to specify something, for example if we want the overall cycle time to be 1 microsecond, this can be achieved with a clock period of 10 nanoseconds, setup 25, strobe 50, and hold 25, since (25 + 50 + 25) * 10 = 1000 nanoseconds. This is the code I use to set the timing:

// Width values
#define SMI_8_BITS  0
#define SMI_16_BITS 1
#define SMI_18_BITS 2
#define SMI_9_BITS  3

// Initialise SMI interface, given time step, and setup/hold/strobe counts
// Clock period is in nanoseconds: even numbers, 2 to 30
void init_smi(int width, int ns, int setup, int strobe, int hold)
{
    int divi = ns/2;

    smi_cs->value = smi_l->value = smi_a->value = 0;
    smi_dsr->value = smi_dsw->value = smi_dcs->value = smi_dca->value = 0;
    if (*REG32(clk_regs, CLK_SMI_DIV) != divi << 12)
    {
        *REG32(clk_regs, CLK_SMI_CTL) = CLK_PASSWD | (1 << 5);
        usleep(10);
        while (*REG32(clk_regs, CLK_SMI_CTL) & (1 << 7)) ;
        usleep(10);
        *REG32(clk_regs, CLK_SMI_DIV) = CLK_PASSWD | (divi << 12);
        usleep(10);
        *REG32(clk_regs, CLK_SMI_CTL) = CLK_PASSWD | 6 | (1 << 4);
        usleep(10);
        while ((*REG32(clk_regs, CLK_SMI_CTL) & (1 << 7)) == 0) ;
        usleep(100);
    }
    if (smi_cs->seterr)
        smi_cs->seterr = 1;
    smi_dsr->rsetup = smi_dsw->wsetup = setup; 
    smi_dsr->rstrobe = smi_dsw->wstrobe = strobe;
    smi_dsr->rhold = smi_dsw->whold = hold;
    smi_dsr->rwidth = smi_dsw->wwidth = width;
}

The clock-frequency-setting code is similar to that I used to set the PWM frequency for my DMA pacing; that peripheral did seem to be really sensitive to any glitches in the clock, so I’ve been a bit over-cautious in adding extra time-delays, which may not really be necessary.

The seterr flag is supposed to indicate an error if the settings have been changed while the SMI device is active; the easiest way to avoid this error is to do most of the settings while the device is disabled, then enable it just before starting; the flag is also cleared on startup, by writing a 1 to it.

Once the timing is set, the following code can be used to initiate a single direct-control write-cycle:

// Initialise resistor DAC
void dac_ladder_init(void)
{
    smi_cs->clear = 1;
    smi_cs->aferr = 1;
    smi_dcs->enable = 1;
}

// Output value to resistor DAC
void dac_ladder_write(int val)
{
    smi_dcs->done = 1;
    smi_dcs->write = 1;
    smi_dcd->value = val & 0xff;
    smi_dcs->start = 1;
}

The code clears the FIFO, in case there is any data left over from a previous transaction (which isn’t unusual, if you have been using DMA), and the FIFO error flag, then enables the device. The transfer is initiated by clearing the completion flag, setting write mode, loading the value into the Direct Mode data register, then starting the cycle.

The transfer then proceeds using the specified timing, and the completion flag is set when complete. If we run this code with usleep for timing, there is very little difference in the DAC output; it is still susceptible to other events, such as mouse movement, as shown in the oscilloscope trace below.

To gain maximum benefit from SMI, we have to use DMA.

SMI and DMA

When using SMI with DMA, the fundamental question is where the DMA requests will be coming from.

They can be triggered by an external signal, in ‘DMA passthrough’ mode. The data lines SD16 (for read) or SD17 (for write) can be used as triggers, reducing the maximum data width from 18 to 16 bits. It is important to note that they are level-sensitive signals (not edge-triggered) so if held low, the transfers will carry on at the maximum rate; see the oscilloscope trace below, where a 500 ns request is sufficient to trigger 2 transfers.

Oscilloscope trace of DMA passthrough (200 ns/div)

So DMA passthrough is designed for use with peripherals that assert the request when they have data to send, and negate it when the transfer has gone through. I have experimented with the PWM controller to generate narrow pulses, and it does seem possible to trigger single transfers this way, but more tests are needed to make sure this method is 100% reliable, so for the time being I won’t use it.

Instead, the requests will originate from the SMI controller itself; the transfer will proceed at the maximum speed defined by the setup, strobe & hold times, with DMA keeping the FIFOs topped up with data. This places a lower limit on the rate at which the transfers go through; the maximum clock resolution is 30 ns, and the maximum setup, strobe & hold values are 63, 127 and 63, giving a slowest cycle time of 7.6 microseconds.

The DMA Control Block is similar to those in my previous projects; it just needs a data source in uncached memory, data destination as the SMI FIFO, and length

#define NCYCLES 4

// DMA values to resistor DAC
void dac_ladder_dma(MEM_MAP *mp, uint8_t *data, int len, int repeat)
{
    DMA_CB *cbs=mp->virt;
    uint8_t *txdata=(uint8_t *)(cbs+1);
    
    memcpy(txdata, data, len);
    enable_dma(DMA_CHAN_A);
    cbs[0].ti = DMA_DEST_DREQ | (DMA_SMI_DREQ << 16) | DMA_CB_SRCE_INC;
    cbs[0].tfr_len = NSAMPLES * NCYCLES;
    cbs[0].srce_ad = MEM_BUS_ADDR(mp, txdata);
    cbs[0].dest_ad = REG_BUS_ADDR(smi_regs, SMI_D);
    cbs[0].next_cb = repeat ? MEM_BUS_ADDR(mp, &cbs[0]) : 0;
    start_dma(mp, DMA_CHAN_A, &cbs[0], 0);
}

smi_dsr->rwidth = SMI_8_BITS; 
smi_l->len = NSAMPLES * REPEATS;
smi_cs->pxldat = 1;
smi_dmc->dmaen = 1;
smi_cs->write = 1;
smi_cs->enable = 1;
smi_cs->clear = 1;
dac_ladder_dma(&vc_mem, sample_buff, sample_count, NCYCLES>1);
smi_cs->start = 1;

A convenient way of outputting a repeating waveform is to create one cycle in memory, and set the control block to that length. Then the SMI length is set to the total number of bytes to be sent, assuming the pixel mode flag ‘pxldat’ has been set; this instructs the SMI controller to unpack the 32-bit DMA & FIFO values into 4 sequential output bytes.

The following trace was generated by a 256-byte ramp, repeated 6 times, using a 1 microsecond cycle time.

Oscilloscope trace of DAC output (200 us/div)

The SMI interface can generate much faster waveforms, but unfortunately they aren’t rendered very well by the DAC as it uses 10K resistors; when these are combined with the oscilloscope probe input capacitance, the resulting rise time is around 500 nanoseconds. So for faster waveforms, you need a faster DAC.

Read cycle test

The last DAC test I’m going to do will seem a bit crazy: a read cycle. The settings are the same as the write-cycle, with the following changes:

smi_cs->write = 1;

cbs[0].ti = DMA_SRCE_DREQ | (DMA_SMI_DREQ << 16) | DMA_CB_DEST_INC;
cbs[0].srce_ad = REG_BUS_ADDR(smi_regs, SMI_D);
cbs[0].dest_ad = MEM_BUS_ADDR(mp, txdata);

The scope has been set to additionally show the SOE signal as the top trace:

The DAC output starts at 3.3V which was the final value of the previous output cycle. It then drops to 1.2V during the read cycles, as this is the value it floats to when the I/O lines aren’t being driven. At the end of the last read cycle, the output is driven back to 3.3V.

This is a very important result; as soon as the input cycles stop, SMI drives the bus. This is because memory chips don’t like a floating data bus; a halfway-on voltage can cause excessive power dissipation, and even damage the chip in extreme cases. So it is a sensible precaution that the data bus is always driven, though this is about to cause a major problem…

AD9226 ADC

Searching the Internet for a fast low-cost analogue-to-digital (ADC) module with a parallel interface, I found very few; the best one featured the 12-bit AD9226, with a maximum throughput of 65 megasamples per second. It requires a 5 volt supply, but has a 3.3V logic interface, so is compatible with the Raspberry Pi.

Having worked with the module for a few days, I’ve found it to be less than ideal, for various reasons that’ll be given later, but it is still useful to demonstrate high-speed parallel input with SMI.

Connecting to the RPi isn’t difficult, but as we’re dealing with high-speed signals, it is necessary to keep the wiring short, preferably under 50 mm (2 inches), especially the power, ground & clock signals.

One minor confusion is that the pin marked D0 is the most-significant bit, and D11 the least significant; I wanted to leave the SPI0 pins free, so adopted the following connection scheme, which puts the data in the top 12 bits of a 16-bit SMI read cycle.:

H/W pin	Function	AD9226
-------	-----------	------
31	GPIO06 SOE	CLK	
16	GPIO23 SD15	D0 (MSB)
15	GPIO22 SD14	D1	
40	GPIO21 SD13	D2	
38	GPIO20 SD12	D3	
35	GPIO19 SD11	D4	
12	GPIO18 SD10	D5	
11	GPIO17 SD9	D6	
36	GPIO16 SD8	D7	
10	GPIO15 SD7	D8	
8	GPIO14 SD6	D9	
33	GPIO13 SD5	D10
32	GPIO12 SD4	D11 (LSB)
2	5V		+5V
6	GND		GND

Direct mode

We’ll start by using Direct Mode to obtain an sample without DMA. The ADC is designed to work with a continuous clock signal, but ours is derived from the SMI Output Enable (OE) line, so only changes state during data transfers.

The AD9226 data sheet describes how it stabilises the clock signal, and suggests it may require over 100 cycles when adapting to a new frequency. In practice, when starting up there seems to be a major data glitch after 8 cycles, but after that the conversions appear to have stabilised, so I allow for 10 cycles before taking a reading.

It is necessary to choose timing values for the SMI cycles; my default settings are 10 nanosecond time interval, with a setup of 25, strobe 50, hold 25, so the total cycle time is 10 * (25 + 50 + 25) = 1000 nanoseconds, or 1 megasample/sec.

for (i=0; i<ADC_NPINS; i++)
    gpio_mode(ADC_D0_PIN+i, GPIO_IN);
gpio_mode(SMI_SOE_PIN, GPIO_ALT1);

init_smi(SMI_16_BITS, 10, 25, 50, 25); // 1 MS/s

smi_start(10, 1);
usleep(20);
val = adc_gpio_val();
printf("%4u %1.3f\n", val, val_volts(val));

Voltage value

The ADC has an op-amp input circuit that can accommodate positive and negative voltages. Converting the ADC value to a voltage is a bit fraught; I determined the following values by experimentation with one module, but suspect they are subject to quite wide component tolerances, so won’t be the same for all modules.

#define ADC_ZERO        2080
#define ADC_SCALE       410.0

// Convert ADC value to voltage
float val_volts(int val)
{
    return((ADC_ZERO - val) / ADC_SCALE);
}

// Return ADC value, using GPIO inputs
int adc_gpio_val(void)
{
    int v = *REG32(gpio_regs, GPIO_LEV0);

    return((v>>ADC_D0_PIN) & ((1 << ADC_NPINS)-1));
}

It is important to note that the module has a 50-ohm input, so imposes a very heavy loading on any circuit it is monitoring. It can’t cope with significant voltages for any period of time; for example, if you apply 5 volts, the input resistor will dissipate half a watt, heat up rapidly, and probably burn out.

So, although the ADC is excellent for fast data acquisition, the module isn’t really suitable for general purpose measurement, and would benefit from a redesign with a high-impedance input.

Avoiding bus conflicts

The module doesn’t have a chip-select or chip-enable input, so the data is always being output; the 28-pin version of the AD9226 doesn’t have the facility for disabling its output drivers. In the above code I avoided the possibility of bus conflicts doing a GPIO register read, but for high speeds we have to use SMI read cycles. This is potentially a major problem; when the read cycles are complete, the SMI controller and the ADC will both try to drive the data bus at the same time, causing significant current draw, only limited by the 100 ohm resistors on the module: they are insufficient to keep the current below the maximum values (16 mA per pin, 50 mA total for all I/O) in the Broadcom data sheet.

I’ve experimented with various software solutions, basically using a DMA Control Block to set the ADC pins to SMI mode (ALT1), then the second CB for the data transfer, then a third to set the pins back to GPIO inputs. The problem with this approach is that at the higher transfer rates the DMA controller is only just keeping up with the incoming data, and there is a sizeable backlog that has to be cleared before the DMA completes. So there is a significant delay before the SMI pins are set back to inputs, and in that time, there is a bus conflict.

For this reason (and to avoid any concerns about hardware damage when debugging new code) I added a resistor in series with each data line, to reduce the current flow when a bus conflict occurs. The value is a compromise; the resistance needs to be high enough to block excessive current, but not so high that it will slow down the I/O transitions too much, when combined with the stray capacitance of the GPIO inputs.

I chose 330 ohms, which combines with the 100 ohms already on the module, to produce a maximum current of 7.7 mA per line. This is well within the per-pin limit of the Broadcom device, but if all the lines are in conflict, the total will actually exceed the maximum chip I/O current, so it is inadvisable to leave the hardware in this state for a significant period of time.

ADC code

If you’ve read my previous blogs on fast ADC data capture, the DMA code will seem quite familiar, with control blocks to set the GPIO pins to SMI mode, capture the data, and restore the pins:

// Get GPIO mode value into 32-bit word
void mode_word(uint32_t *wp, int n, uint32_t mode)
{
    uint32_t mask = 7 << (n * 3);
    *wp = (*wp & ~mask) | (mode << (n * 3));
}

// Start DMA for SMI ADC, return Rx data buffer
uint32_t *adc_dma_start(MEM_MAP *mp, int nsamp)
{
    DMA_CB *cbs=mp->virt;
    uint32_t *data=(uint32_t *)(cbs+4), *pindata=data+8, *modes=data+0x10;
    uint32_t *modep1=data+0x18, *modep2=modep1+1, *rxdata=data+0x20, i;

    // Get current mode register values
    for (i=0; i<3; i++)
        modes[i] = modes[i+3] = *REG32(gpio_regs, GPIO_MODE0 + i*4);
    // Get mode values with ADC pins set to SMI
    for (i=ADC_D0_PIN; i<ADC_D0_PIN+ADC_NPINS; i++)
        mode_word(&modes[i/10], i%10, GPIO_ALT1);
    // Copy mode values into 32-bit words
    *modep1 = modes[1];
    *modep2 = modes[2];
    *pindata = 1 << TEST_PIN;
    enable_dma(DMA_CHAN_A);
    // Control blocks 0 and 1: enable SMI I/P pins
    cbs[0].ti = DMA_SRCE_DREQ | (DMA_SMI_DREQ << 16) | DMA_WAIT_RESP;
    cbs[0].tfr_len = 4;
    cbs[0].srce_ad = MEM_BUS_ADDR(mp, modep1);
    cbs[0].dest_ad = REG_BUS_ADDR(gpio_regs, GPIO_MODE0+4);
    cbs[0].next_cb = MEM_BUS_ADDR(mp, &cbs[1]);
    cbs[1].tfr_len = 4;
    cbs[1].srce_ad = MEM_BUS_ADDR(mp, modep2);
    cbs[1].dest_ad = REG_BUS_ADDR(gpio_regs, GPIO_MODE0+8);
    cbs[1].next_cb = MEM_BUS_ADDR(mp, &cbs[2]);
    // Control block 2: read data
    cbs[2].ti = DMA_SRCE_DREQ | (DMA_SMI_DREQ << 16) | DMA_CB_DEST_INC;
    cbs[2].tfr_len = (nsamp + PRE_SAMP) * SAMPLE_SIZE;
    cbs[2].srce_ad = REG_BUS_ADDR(smi_regs, SMI_D);
    cbs[2].dest_ad = MEM_BUS_ADDR(mp, rxdata);
    cbs[2].next_cb = MEM_BUS_ADDR(mp, &cbs[3]);
    // Control block 3: disable SMI I/P pins
    cbs[3].ti = DMA_CB_SRCE_INC | DMA_CB_DEST_INC;
    cbs[3].tfr_len = 3 * 4;
    cbs[3].srce_ad = MEM_BUS_ADDR(mp, &modes[3]);
    cbs[3].dest_ad = REG_BUS_ADDR(gpio_regs, GPIO_MODE0);
    start_dma(mp, DMA_CHAN_A, &cbs[0], 0);
    return(rxdata);
}

When DMA is complete, we have a data buffer in uncached memory, containing left-justified 16-bit samples packed into 32-bit words; they are shifted and copied into the sample buffer. The first few samples are discarded as they are erratic; the ADC needs several clock cycles before its internal logic is stable.

// ADC DMA is complete, get data
int adc_dma_end(void *buff, uint16_t *data, int nsamp)
{
    uint16_t *bp = (uint16_t *)buff;
    int i;
    
    for (i=0; i<nsamp+PRE_SAMP; i++)
    {
        if (i >= PRE_SAMP)
            *data++ = bp[i] >> 4;
    }
    return(nsamp);
}

ADC speed tests

The important question is: how fast can we run the SMI interface? Here are the settings for some tests:

// RPi v0-3
#define SMI_NUM_BITS    SMI_16_BITS
#define SMI_TIMING      SMI_TIMING_25M
#define SMI_TIMING_1M   10, 25, 50, 25  // 1 MS/s
#define SMI_TIMING_20M   2,  6, 13,  6  // 20 MS/s
#define SMI_TIMING_25M   2,  5, 10,  5  // 25 MS/s
#define SMI_TIMING_31M   2,  4,  6,  4  // 31.25 MS/s
#define SMI_TIMING_50M   2,  3,  5,  2  // 50 MS/s

init_smi(SMI_16_BITS,  SMI_TIMING);

The SMI clock is 1 GHz; the first number is the clock divisor, followed by the setup, strobe & hold counts, so 1000 / (10 * (25+50+25)) = 1 MS/s. Where possible, I’ve tried to keep the waveform symmetrical by making setup + hold = strobe, but that isn’t essential; the ADC can handle asymmetric clock signals.

RPi v3

25 MS/s capture of a video test waveform

Running on a Raspberry Pi 3B v1.2, the fastest continuous rate that produces consistent results is 25 megasamples per second. The following trace shows a data line and the SOE (ADC clock) line, with a 40-byte transfer at 25 MS/s:

Scope trace 500 ns/div, 2 volts/div

The data line is being measured on the ADC module connector, so when there is a bus conflict, the 100 ohm resistor on the module combines with the 330 ohms on the data line to form a potential divider, that makes the conflict easy to see. It is inevitable that there will be a brief conflict as the read cycles end, and the SMI controller takes control of the bus, but it only lasts 900 nanoseconds, which shouldn’t be an issue, given the resistor values I’m using.

However, increasing the rate to 31.25 MS/s does cause a problem:

Scope trace 5 us/div, 2 volts/div

The system seems able to handle this rate fine for about 13 microseconds (400 samples), then it all goes wrong; there is a gap in the transfers, followed by continuous bus conflicts. Zooming in to that area, the SMI controller seems to transition between continuous evenly-paced cycles, to bursts of 8, with a continuous conflict:

Scope trace 200 ns/div, 2 volts/div

In the absence of any documentation on the SMI controller, it is difficult to speculate on the reasons for this, but it does emphasise the need for caution when working with high-speed transfers.

Since 16-bit transfers work at 25 MS/s, it should be possible to run 8-bit transfers at 50 MS/s. This can be tested using the following settings:

#define SMI_NUM_BITS    SMI_8_BITS
#define SMI_TIMING      SMI_TIMING_50M
#define SAMPLE_SIZE     1

With ADC connections I’m using, this doesn’t produce useful data (just the top 4 bits from the ADC), but the waveforms look fine on an oscilloscope, so there doesn’t seem to be a problem running 50 megabyte-per-second SMI transfers on an RPi v3.

Pi ZeroW

Switching to a Pi ZeroW, the results are remarkably good; here is a 500 kHz triangle wave, captured at 41.7 megasamples per second

Capture of 500 kHz triangle wave

This does seem to be the top speed for a Pi ZeroW, as increasing the transfer rate to 50 MS/s causes some errors in the data. However, being able to transfer over 83 megabytes per second is a remarkably good result for this low-cost computer.

The question is whether this transfer rate is completely reliable; for example, is it disrupted by network activity? The easiest way to generate a lot of network traffic is using ‘flood pings’ from a Linux PC to the RPi; I did a few data captures with pings running, and they didn’t seem to have any effect on the data, but more testing is needed.

RPi v4

The first test of a Rpi v4 at 1 MS/s actually produced 1.5 MS/s, so the base SMI clock for RPi v4 must be 1.5 GHz. This means a new set of speed definitions:

// RPi v4
#define SMI_TIMING_1M   10, 38, 74, 38  // 1 MS/s
#define SMI_TIMING_10M   6,  6, 13,  6  // 10 MS/s
#define SMI_TIMING_20M   4,  5,  9,  5  // 19.74 MS/s
#define SMI_TIMING_25M   4,  3,  8,  4  // 25 MS/s
#define SMI_TIMING_31M   4,  3,  6,  3  // 31.25 MS/s

As before, the first number is the clock divisor, followed by the setup, strobe & hold counts, so 1500 / (10 * (38+74+38)) = 1 MS/s.

Unfortunately the maximum throughput with the current code is quite poor; the following trace is for 500 samples at 25 MS/s, and you can see the bus contention towards the end, similar to that I experienced on the RPi v3.

Scope trace 5 usec/div, 2 volts/div

The upper trace is the most significant ADC bit (measured at the module pin), and the analogue input is a 500 kHz sine wave, hence the regular bit transitions.

The key question is: why does the throughput get worse with a faster processor? I’d guess that this is a memory bandwidth issue; with a single core, the DMA controller can effectively monopolise the memory, always getting the data through. On a multi-core processor, it has to cooperate with all the cores that are active during the data capture.

Clearly more work is needed to understand this phenomenon, for example by manipulating the cores and process priorities; alternatively, for maximum performance, just use a Pi Zero!

Running the code

The source code is on Github here. The main files for DAC and ADC are rpi_smi_dac_test.c and rpi_smi_adc_test.c; the other files needed are rpi_dma_utils.c, rpi_dma_utils.h and rpi_smi_defs.h.

It is necessary to edit the top of rpi_dma_utils.h depending on which RPi hardware you are using:

// Location of peripheral registers in physical memory
#define PHYS_REG_BASE   PI_23_REG_BASE
#define PI_01_REG_BASE  0x20000000  // Pi Zero or 1
#define PI_23_REG_BASE  0x3F000000  // Pi 2 or 3
#define PI_4_REG_BASE   0xFE000000  // Pi 4

There are other settings at the top of the main files, that can be changed as required. The code can then be compiled with gcc, optionally with the -O2 option to optimise the code (which isn’t really necessary), and the -pedantic option if you want to check for extra warnings:

gcc -Wall -pedantic -o rpi_smi_adc rpi_smi_adc.c rpi_dma_utils.c

The code is run using sudo, optionally with the CSV output piped to a file:

sudo ./rpi_smi_adc
..or..
sudo ./rpi_smi_adc > test6.csv

The CSV file can be imported into a spreadsheet, or plotted using Gnuplot from the RPi command line, e.g.

 gnuplot -e "set term png size 420,240 font 'sans,8'; \
  set title '41.7 Msample/s'; set grid; set key noautotitle; \
  set output 'test6.png'; plot 'test6.csv' every ::10 with lines"

You may have read elsewhere that it is necessary to enable SMI in /boot/config.txt:

dtoverlay=smi    # Not needed!

This sets the GPIO mode of the SMI pins on startup; it isn’t necessary for my code, which does its own GPIO configuration, with the added advantage that the unused pins are unchanged, so are free for use by other I/O functions.

If you want to see an example of SMI being used as a multi-channel pulse generator, see my 16 channel NeoPixel smart LED example here.

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.