Fast data capture with the Raspberry Pi

Video signal captured at 2.6 megasamples per second

Adding an Analog-to-Digital Converter (ADC) to the Raspberry Pi isn’t difficult, and there is ample support for reading a single voltage value, but what about getting a block of samples, in order to generate an oscilloscope-like trace, as shown above?

By careful manipulation of the Linux environment, it is possible to read the voltage samples in at a decent rate, but the big problem is the timing of the readings; the CPU is frequently distracted by other high-priority tasks, so there is a lot of jitter in the timing, which makes the analysis & display of the waveforms a lot more difficult – even a fast board such as the RPi 4 can suffer from this problem.

We need a way of grabbing the data samples at regular intervals without any CPU intervention; that means using Direct Memory Access, which operates completely independently of the processor, so even the cheapest Pi Zero board delivers rock-solid sample timing.

Direct Memory Access

Direct Memory Access (DMA) can be set up to transfer data between memory and peripherals, without any CPU intervention. It is a very powerful technique, and as a result, can easily cause havoc if programmed incorrectly. I strongly recommend you read my previous post on the subject, which includes some simple demonstrations of DMA in action, but here is a simplified summary:

The CPU has three memory spaces: virtual, bus and physical. DMA accesses use bus memory addresses, but a user program employs virtual addresses, so it is necessary to translate between the two.
When writing to memory, the CPU is actually writing to an on-chip cache, and sometime later the data is written to main memory. If the DMA controller tries to fetch the data before the cache has been emptied, it will get incorrect values. So it is necessary for all DMA data to be in uncached memory.
If compiler optimisation is enabled, it can bypass some memory read operations, giving a false picture of what is actually in memory. The qualifier ‘volatile’ might be needed to make sure that variables changed by DMA are correctly read by the processor.
The DMA controller receives its next instruction via a Control Block (CB) which specifies the source & destination addresses, and the number of bytes to be transferred. Control Blocks can be chained, so as to create a sequence of actions.
DMA transactions are normally triggered by a data request from a peripheral, otherwise they run through at full speed without stopping.
If the DMA controller receives incorrect data, it can overwrite any area of memory, or any peripheral, without warning. This can cause unusual malfunctions, system crashes or file corruption, so care is needed.

For this project, I’ve abstracted the DMA and I/O functions into the new files rpi_dma_utils.c and rpi_dma_utils.h. The handling of the memory spaces has also been improved, with a single structure for each peripheral or memory area:

// Structure for mapped peripheral or memory
typedef struct {
    int fd,         // File descriptor
        h,          // Memory handle
        size;       // Memory size
    void *bus,      // Bus address
        *virt,      // Virtual address
        *phys;      // Physical address
} MEM_MAP;

To access a peripheral, the structure is initialised with the physical address:

#define SPI0_BASE       (PHYS_REG_BASE + 0x204000)

// Use mmap to obtain virtual address, given physical
void *map_periph(MEM_MAP *mp, void *phys, int size)
{
    mp->phys = phys;
    mp->size = PAGE_ROUNDUP(size);
    mp->bus = phys - PHYS_REG_BASE + BUS_REG_BASE;
    mp->virt = map_segment(phys, mp->size);
    return(mp->virt);
}

MEM_MAP spi_regs;
map_periph(&spi_regs, (void *)SPI0_BASE, PAGE_SIZE);

Then a macro is used to access a specific register:

#define REG32(m, x) ((volatile uint32_t *)((uint32_t)(m.virt)+(uint32_t)(x)))
#define SPI_DLEN        0x0c

*REG32(spi_regs, SPI_DLEN) = 0;

The advantage of this approach is that it is easy to set or clear individual bits within a register, e.g.

*REG32(spi_regs, SPI_CS) |= 1;

Note that the REG32 macro uses the ‘volatile’ qualifier to ensure that the register access will still be executed if compiler optimisation is enabled.

Analog-to-Digital Converters (ADCs)

There are 3 ways an ADC can be linked to the Raspberry Pi (RPi):

Inter-Integrated Circuit (I2C) serial bus
Serial Peripheral Interface (SPI) serial bus
Parallel bus

The I2C interface is the simplest from a hardware point of view, since it only has 2 connections: clock and data. However, these devices tend to be a bit slow, and the RPi I2C interface doesn’t support DMA, so we won’t be using this method.

The parallel interface is the fastest but also the most complicated, as it has one wire for each data bit, plus one or more clock lines: the best way to drive it is using the RPi Secondary Memory Interface (SMI), read more here.

This leaves the SPI interface, which is a good compromise between complexity and speed; it has only 4 connections (clock, data out, data in and chip select) but is capable of achieving over 1 megasample per second.

In this post we’ll be using 2 SPI ADC chips; the Microchip MCP3008 which is specified as 100 Ksamples/sec maximum (though I’ve only achieved 80 KS/s, for reasons I’ll discuss later), and the Texas Instruments ADS7884 which can theoretically achieve 3 Msample/s; I’ve run that at 2.6 MS/s. Both chips are 10-bit, so return a value of 0 to 1023, when measuring 0 to 3.3 volts.

MCP3008

The RasPiO Analog Zero board ( https://rasp.io/analogzero/ ) has the Microchip MCP3008 ADC on it, and very little else.

It is in the same form-factor as the RPi Zero, but I used a version 3 CPU board for most of my testing. There are 8 analogue input channels, but only a single ADC, that has to be switched to the appropriate channel prior to conversion. The voltage reference is taken from the RPi 3.3 volt rail; if you need greater stability & accuracy, a standalone voltage reference can be used instead.

SPI interface

The board is tied to the SPI0 interface on the RPi, using 4 connections

GPIO8 CE0: SPI 0 Chip Enable 0
GPIO11 SCLK: Clock signal
GPIO10 MOSI: data output to ADC
GPIO9 MISO: data input from ADC

The Chip Enable (or Chip Select as it is often known) is used to frame the overall transfer; it is normally high, then is set low to start the analog-to-digital conversion, and is held low while the data is transferred to & from the device.

Getting a single sample from the ADC is really easy in Python:

from gpiozero import MCP3008
adc = MCP3008(channel=0)
print(adc.value * 3.3)
adc.close()

We’ll be diving a bit deeper into the way the SPI interface works, so here is the same operation in Python, but direct-driving the SPI interface:

import spidev
spi = spidev.SpiDev()
spi.open(0, 0)
spi.max_speed_hz = 500000
spi.mode = 0
msg = [0x01,0x80,0x00]
rsp = spi.xfer2(msg)
val = ((rsp[1]*256 + rsp[2]) & 0x3ff) * 3.3 / 1.024
print(val)

The most useful diagnostic method is to view the signals on an oscilloscope, so here are the corresponding traces; the scale is 20 microseconds per division (per square) horizontally, and 5 volts per division vertically:

You can see the Chip Select frames the transaction, but remains active (low) for about 120 microseconds after the transfer is finished; that is something we’ll need to improve to get better speeds. The clock is 500 kHz as specified in the code, but this can be up to 2 MHz. The MOSI (CPU output) data is as specified in the data sheet, a value of 01 80 hex has a ‘1’ start bit, followed by another ‘1’ to select single-ended mode (not differential). MISO (CPU input) data reflects the voltage value measured by the ADC. The data is always sent most-significant-bit first, and the first return byte is ignored (since the ADC hadn’t started the conversion), so the second byte has to be multiplied by 256, and added to the third byte.

You’ll see there is a downward curve at the end of the MISO trace; this shows that the line isn’t being driven high or low, and is floating. It is worth watching out for signals like this, since they can cause problems as they drift between 1 and 0; in this case the transition is harmless as the transfer cycle has already finished.

MCP3008 software

Here is the C equivalent of the Python code:

// Set / clear SPI chip select
void spi_cs(int set)
{
    uint32_t csval = *REG32(spi_regs, SPI_CS);

    *REG32(spi_regs, SPI_CS) = set ? csval | 0x80 : csval & ~0x80;
}

// Transfer SPI bytes
void spi_xfer(uint8_t *txd, uint8_t *rxd, int len)
{
    while (len--)
    {
        *REG8(spi_regs, SPI_FIFO) = *txd++;
        while((*REG32(spi_regs, SPI_CS) & (1<<17)) == 0) ;
        *rxd++ = *REG32(spi_regs, SPI_FIFO);
    }
}

// Fetch single 10-bit sample from ADC
int adc_get_sample(int chan)
{

    uint8_t txdata[3]={0x01,0x80|(chan<<4),0}, rxdata[3];
    
    spi_cs(1);
    spi_xfer(txdata, rxdata, sizeof(txdata));
    spi_cs(0);
    return(((rxdata[1]<<8) | rxdata[2]) & 0x3ff);
}

This takes 3 bytes to transfer 10 data bits, which is a bit wasteful. It is worth reading the MCP3008 data sheet, which explains that the leading ‘1’ of the outgoing data is used to trigger the conversion, so the whole cycle can be compressed into 16 bits, if you ignore the last data bit:

// Fetch 9-bit sample from ADC
int adc_get_sample(int chan)
{
    uint8_t txdata[2]={0xc0|(chan<<3),0}, rxdata[2];
    
    spi_cs(1);
    spi_xfer(txdata, rxdata, sizeof(txdata));
    spi_cs(0);
    return((((int)rxdata[0] << 9) | ((int)rxdata[1] << 1)) & 0x3ff);
}

You’ll see that the transmit bytes 0x01,0x80 have been shifted left by 7 bits to make one byte 0xc0, and this results in the response data being shifted left by the same amount.

A single transfer can easily be done using DMA, since the SPI controller has an auto-chip-select mode that handles the CE signal for us. We just need to launch 2 DMA instances, the first to read the data from the ADC interface, and the second to write the trigger data to the ADC. This may appear to be the wrong way round (wouldn’t it be more logical to do the write-cycle first?), but the reason is that the read-cycle will stall, waiting for incoming data, until that is provided by the write-cycle:

// Fetch single sample from MCP3008 ADC using DMA
int adc_dma_sample_mcp3008(MEM_MAP *mp, int chan)
{
    DMA_CB *cbs=mp->virt;
    uint32_t dlen, *txd=(uint32_t *)(cbs+2);
    uint8_t *rxdata=(uint8_t *)(txd+0x10);

    enable_dma(DMA_CHAN_A);
    enable_dma(DMA_CHAN_B);
    dlen = 4;
    txd[0] = (dlen << 16) | SPI_TFR_ACT;
    mcp3008_tx_data(&txd[1], chan);
    cbs[0].ti = DMA_SRCE_DREQ | (DMA_SPI_RX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_DEST_INC;
    cbs[0].tfr_len = dlen;
    cbs[0].srce_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    cbs[0].dest_ad = MEM_BUS_ADDR(mp, rxdata);
    cbs[1].ti = DMA_DEST_DREQ | (DMA_SPI_TX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_SRCE_INC;
    cbs[1].tfr_len = dlen + 4;
    cbs[1].srce_ad = MEM_BUS_ADDR(mp, txd);
    cbs[1].dest_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    *REG32(spi_regs, SPI_DC) = (8 << 24) | (4 << 16) | (8 << 8) | 4;
    *REG32(spi_regs, SPI_CS) = SPI_TFR_ACT | SPI_DMA_EN | SPI_AUTO_CS | SPI_FIFO_CLR | SPI_CSVAL;
    *REG32(spi_regs, SPI_DLEN) = 0;
    start_dma(mp, DMA_CHAN_A, &cbs[0], 0);
    start_dma(mp, DMA_CHAN_B, &cbs[1], 0);
    dma_wait(DMA_CHAN_A);
    return(mcp3008_rx_value(rxdata));
}

// Return Tx data for MCP3008
int mcp3008_tx_data(void *buff, int chan)
{
    uint8_t txd[3]={0x01, 0x80|(chan<<4), 0x00};
    memcpy(buff, txd, sizeof(txd));
    return(sizeof(txd));
}

// Return value from ADC Rx data
int mcp3008_rx_value(void *buff)
{
    uint8_t *rxd=buff;
    return(((int)(rxd[1]&3)<<8) | rxd[2]);
}

When testing new DMA code, it is not unusual for there to be an error such that the DMA cycle never completes, so the dma_wait function has a timeout:

// Wait until DMA is complete
void dma_wait(int chan)
{
    int n = 1000;

    do {
        usleep(100);
    } while (dma_transfer_len(chan) && --n);
    if (n == 0)
        printf("DMA transfer timeout\n");
}

So we have code to do a single transfer, can’t we use the same idea to grab multiple samples in one transfer? The problem is the CS line; this has to be toggled for each value, and the auto-chip-select mode only works for a single transfer; despite a lot of experimentation, I couldn’t find any way of getting the SPI controller to pulse CS low for each ADC cycle in a multi-cycle capture.

The solution to this problem comes in treating the transmit and receive DMA operations very differently. The receive operation simply keeps copying the 32-bit data from the SPI FIFO into memory, until all the required data has been captured. In contrast, the transmit side is repeatedly sending the same trigger message to the ADC (0x01, 0x80, 0x00 in the above example). Since the same message is repeating, we could set up a small sequence of DMA Control Blocks (CBs):

CB1: set chip select high
CB2: set chip select low
CB3: write next 32-bit word to the FIFO

The controller is normally executing CB3, waiting for the next SPI data request. When this arrives, it executes CB1 then CB2, briefly setting the chip select high & low to start a new data capture. It then stops in CB3 again, waiting for the next data request. Using this method, the typical width of the CS high pulse is 330 nanoseconds, which is more than adequate to trigger the ADC.

The bulk of code is the same as the previous example, here are the control block definitions:

    // Control block 0: read data from SPI FIFO
    cbs[0].ti = DMA_SRCE_DREQ | (DMA_SPI_RX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_DEST_INC;
    cbs[0].tfr_len = dlen;
    cbs[0].srce_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    cbs[0].dest_ad = MEM_BUS_ADDR(mp, rxdata);
    // Control block 1: CS high
    cbs[1].srce_ad = cbs[2].srce_ad = MEM_BUS_ADDR(mp, pindata);
    cbs[1].dest_ad = REG_BUS_ADDR(gpio_regs, GPIO_SET0);
    cbs[1].tfr_len = cbs[2].tfr_len = cbs[3].tfr_len = 4;
    cbs[1].ti = cbs[2].ti = DMA_DEST_DREQ | (DMA_SPI_TX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_SRCE_INC;
    // Control block 2: CS low
    cbs[2].dest_ad = REG_BUS_ADDR(gpio_regs, GPIO_CLR0);
    // Control block 3: write data to Tx FIFO
    cbs[3].ti = DMA_DEST_DREQ | (DMA_SPI_TX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_SRCE_INC;
    cbs[3].srce_ad = MEM_BUS_ADDR(mp, &txd[1]);
    cbs[3].dest_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    // Link CB1, CB2 and CB3 in endless loop
    cbs[1].next_cb = MEM_BUS_ADDR(mp, &cbs[2]);
    cbs[2].next_cb = MEM_BUS_ADDR(mp, &cbs[3]);
    cbs[3].next_cb = MEM_BUS_ADDR(mp, &cbs[1]);

A disadvantage of this approach is that we’re transferring 32 bits in order to get 10 bits of ADC data, which is quite wasteful; if the DMA controller could be persuaded to transfer 16 bits at a time, we’d be able to double the speed, but all my attempts to do this have failed.

However, on the positive side, it does produce an accurately-timed data capture with no CPU intervention:

Raspberry Pi MCP3008 ADC input using DMA

The oscilloscope trace just shows 4 transfers, but the technique works just as well with larger data blocks; here is a trace of 500 samples at 80 Ksample/s

To be honest, the ADC was overclocked to achieve this sample rate; the data sheet implies that the maximum SPI clock should be around 2 MHz with a 3.3V supply voltage, and the actual value I’ve used is 2.55 MHz, so don’t be surprised if this doesn’t work reliably in a different setup.

ADS7884

In the title of this blog post I promised ‘fast’ data capture, and I don’t think 80 Ksample/s really qualifies as fast; the generally accepted definition is at least 10 Msample/s, but that would require an SPI clock over 100MHz, which is quite unrealistic.

The ADS7884 is a fast single-channel SPI ADC; it can acquire 3 Msample/s, with an SPI clock of 48 MHz, but you do have to be quite careful when dealing with signals this fast; a small amount of stray capacitance or inductance can easily distort the signals so that the transfers are unreliable. All connections must be kept short, especially the clock, power and ground, which ideally should be less than 50 mm (2 inches) long.

The ADC chip is in a very small 6-pin package (0.95 mm pin spacing) so I soldered it to a Dual-In-Line (DIL) adaptor, with 1 uF and 10 nF decoupling capacitors as close to the power & ground pins as possible. This arrangement is then mounted on a solder prototyping board (not stripboard) with very short wires soldered to the RPi I/O connector.

You may think that the ADC should still work correctly in a poor layout, if the clock frequency is reduced. This may not be true as, generally speaking, the faster the device, the more sensitive it is to the quality of the external signals. If they aren’t clean enough, the ADC will still malfunction, no matter how slow the clock is.

The device pins are:

1  Supply (3.3V)
2  Ground
3  VIN (voltage to be measured)
4  SCLK (SPI clock)
5  SDO (SPI data output)
6  CS (chip select, active low)

You’ll see that there is no data input line; this is because, unlike the MCP3008, there is nothing to control; just set CS low, toggle the clock 16 times, then set CS high, and you’ll have the data.

This can be demonstrated by a Python program:

import spidev
bus, device = 0, 0
spi = spidev.SpiDev()
spi.open(bus, device)
spi.max_speed_hz = 1000000
spi.mode = 0
msg = [0x00,0x00]
spi.xfer2(msg)
res = spi.xfer2(msg)
val = (res[0] * 256 + res[1]) >> 6
print("%1.3f" % val * 3.3 / 1024.0)

You’ll see that I’ve discarded the first sample from the ADC; that is because it always returns the data from the previous sample, i.e. it outputs the last sample while obtaining the next.

When creating the DMA software, it is tempting to use the same technique I employed on the MCP3008, but I want really fast sampling, and using a 32-bit word to carry 10 bits of data seems much too wasteful.

Since the SPI transmit line is unused (as the ADS7884 doesn’t have a data input) we can use it for another purpose, so why not use it to drive the chip select line? This means we can drive CS high or low whenever we want, just by setting the transmit data.

So the connections between the ADC and RPi are:

Pin 1: 3.3V supply 
Pin 2: ground 
Pin 3: voltage to be measured
Pin 4: SPI0 clock, GPIO11
Pin 5: SPI0 MISO,  GPIO9
Pin 6: SPI0 MOSI,  GPIO10 (ADC chip select)

If you are driving other SPI devices, the absence of a proper chip select could be a major problem. The solution would be to invert the transmitted data, add a NAND gate between the MOSI line and the ADC chip select, and drive the other NAND input with a spare I/O line, to enable (when high) or disable (when low) the ADC transfers. You’d just need to keep an eye on the additional delay in the CS line, which could alter the phase shift between the transmitted and received data.

ADS7884 software

Driving the chip-select line from the SPI data output makes the software quite a bit simpler, just repeat the same 16-bit pattern on the transmit side, and save the received data in a buffer. This is the code:

// Fetch samples from ADS7884 ADC using DMA
int adc_dma_samples_ads7884(MEM_MAP *mp, int chan, uint16_t *buff, int nsamp)
{
    DMA_CB *cbs=mp->virt;
    uint32_t i, dlen, shift, *txd=(uint32_t *)(cbs+3);
    uint8_t *rxdata=(uint8_t *)(txd+0x10);

    enable_dma(DMA_CHAN_A); // Enable DMA channels
    enable_dma(DMA_CHAN_B);
    dlen = (nsamp+3) * 2;   // 2 bytes/sample, plus 3 dummy samples
    // Control block 0: store Rx data in buffer
    cbs[0].ti = DMA_SRCE_DREQ | (DMA_SPI_RX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_DEST_INC;
    cbs[0].tfr_len = dlen;
    cbs[0].srce_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    cbs[0].dest_ad = MEM_BUS_ADDR(mp, rxdata);
    // Control block 1: continuously repeat last Tx word (pulse CS low)
    cbs[1].srce_ad = MEM_BUS_ADDR(mp, &txd[2]);
    cbs[1].dest_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    cbs[1].tfr_len = 4;
    cbs[1].ti = DMA_DEST_DREQ | (DMA_SPI_TX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_SRCE_INC;
    cbs[1].next_cb = MEM_BUS_ADDR(mp, &cbs[1]);
    // Control block 2: send first 2 Tx words, then switch to CB1 for the rest
    cbs[2].srce_ad = MEM_BUS_ADDR(mp, &txd[0]);
    cbs[2].dest_ad = REG_BUS_ADDR(spi_regs, SPI_FIFO);
    cbs[2].tfr_len = 8;
    cbs[2].ti = DMA_DEST_DREQ | (DMA_SPI_TX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_SRCE_INC;
    cbs[2].next_cb = MEM_BUS_ADDR(mp, &cbs[1]);
    // DMA request every 4 bytes, panic if 8 bytes
    *REG32(spi_regs, SPI_DC) = (8 << 24) | (4 << 16) | (8 << 8) | 4;
    // Clear SPI length register and Tx & Rx FIFOs, enable DMA
    *REG32(spi_regs, SPI_DLEN) = 0;
    *REG32(spi_regs, SPI_CS) = SPI_TFR_ACT | SPI_DMA_EN | SPI_AUTO_CS | SPI_FIFO_CLR | SPI_CSVAL;
    // Data to be transmited: 32-bit words, MS bit of LS byte is sent first
    txd[0] = (dlen << 16) | SPI_TFR_ACT;// SPI config: data len & TI setting
    txd[1] = 0xffffffff;                // Set CS high
    txd[2] = 0x01000100;                // Pulse CS low
    // Enable DMA, wait until complete
    start_dma(mp, DMA_CHAN_A, &cbs[0], 0);
    start_dma(mp, DMA_CHAN_B, &cbs[2], 0);
    dma_wait(DMA_CHAN_A);
    // Check whether Rx data has 1 bit delay with respect to Tx
    shift = rxdata[4] & 0x80 ? 3 : 4;
    // Convert raw data to 16-bit unsigned values, ignoring first 3
    for (i=0; i<nsamp; i++)
        buff[i] = ((rxdata[i*2+6]<<8 | rxdata[i*2+7]) >> shift) & 0x3ff;
    return(nsamp);
}

There are a few points that need clarification:

When using DMA, the first word sent to the SPI controller isn’t the data to be transmitted; it is a configuration word that sets the SPI data length, and other parameters. In the MCP3008 implementation I sent it by direct-writing to the FIFO before DMA starts, but at high speed this can cause occasional glitches. So I send the initial SPI configuration using DMA Control Block 2; once that is sent, CB1 performs the main data output.
The phase relationship between the outgoing (chip-select) data and the incoming (ADC value) data isn’t immediately obvious, and as the sampling rate gets faster, this phase relationship changes by 1 bit. To detect this, I first send an all-ones word to keep CS high, then set it low, and check which bit goes low in the received data. This is also done in control block 2, and when that is complete, control block 1 takes over for the remaining transmissions.
The data decoder shifts the raw data depending on the detected phase value, then saves it as 16-bit values in the output array (which has been created in virtual memory using a conventional memory allocation call).
The ADC always returns the result of the previous conversion, so the first sample has to be discarded. Also, the chip select (SPI output) defaults to being low, so the first conversion is usually spurious, and the phase-detection method mentioned above also results in incorrect data. So it is necessary to discard the first 3 samples.

Here is an oscilloscope trace when running at 2.6 megasample/s:

Running the code

The software is in 3 files on Github here.

rpi_adc_dma_test.c
rpi_dma_utils.c
rpi_dma_utils.h

The definition at the top of rpi_adc_dma_test.c needs to be edited to select the ADC (MCP3008 or ADS7884), also rpi_dma_utils.h must be changed to reflect the CPU board you are using (RPi 0/1, 2/3, or 4) and the master clock frequency that will used to determine the SPI clock. Bizarrely, the RPi zero has a 400 MHz master clock, while the later boards use 250 MHz. If you neglect to make this change when using the Pi Zero, the SPI interface will run 1.6 times too fast; I once made this mistake, and to my surprise the ADC still seemed to work fine, even though the resulting 5.76 MS/s data rate is way beyond the values in the ADC data sheet. So if you are an overclocking enthusiast, there is plenty of scope for experimentation.

The code is compiled on the Rasberry Pi using gcc, then run with root privileges using ‘sudo’:

gcc -Wall -o rpi_adc_dma_test rpi_adc_dma_test.c rpi_dma_utils.c
sudo ./rpi_adc_dma_test

The usual security warnings apply when running code with root privileges; the operating system won’t protect you against any undesired operations.

The response will depend on which ADC and processor is in use, but should show the current ADC input value, and the corresponding voltage. This is the Pi Zero:

SPI ADC test v0.03
VC mem handle 5, phys 0xde510000, virt 0xb6f00000
SPI frequency 160000 Hz, 10000 sample/s
ADC value 212 = 0.683V
Closing

There are 2 command-line parameters:

-r to set sample rate        e.g. -r 100000 to set 100 Ksample/s
-n to set number of samples  e.g. -n 500 to fetch 500 samples.

The software reports the actual sample rate; on Pi 3 & 4 boards it generally won’t be the same as the requested value, due to the awkward divisor values to scale down 250 MHz into a suitable SPI clock.

There will be a limit as to how many samples can be gathered, as the raw data is stored in uncached memory. This limit can be increased by allocating more of the RAM to the graphics processor, see the gpu_mem option in config.txt. Alternatively, you could change the code to use cached memory (obtained with mmap) for the raw data buffer, and accept that there will be a delay while the CPU cache is emptied into it.

The output is just a list of voltages, with one sample per line; this can conveniently be piped to a CSV file for plotting in a spreadsheet, for example:

sudo ./rpi_adc_dma_test -r 3000000 -n 500 > test1.csv

The graphs in this post were actually produced using gnuplot, running on the RPi. It is easy to install using ‘sudo apt install gnuplot’, and here is a sample command line, with the graph it produces; I’ve split the commands into multiple lines for clarity:

gnuplot -e "set term png size 420,240 font 'sans,8'; \
  set title '2.5 Msample/s'; set grid; set key noautotitle; \
  set output 'test1.png'; plot 'test1.csv' every ::4 with lines"

This capture (of a composite video signal) was done on a Pi ZeroW, proving that you don’t need an expensive processor to perform fast & accurate data acquisition.

I have subsequently refined the DMA code to allow for a continuous streamed output, with the option of microsecond-accurate timestamps, see this post for details.

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.

11 thoughts on “Fast data capture with the Raspberry Pi”

ADRIANO says:

October 9, 2021 at 8:39 pm

Hi how are you?
Thank you immensely for sharing your work in the two posts (https://iosoft.blog/2020/11/16/streaming-analog-data-raspberry-pi/ and https://iosoft.blog/2020/06/11/fast -data-capture-raspberry-pi/), I have understood your code in parts. However, I have the following problem and maybe you can give me a tip or point out a way, I apologize for my English.
I intend to read the Texas ADC084S021 (8bits) with my Raspberry Pi3. Read all four channels at the same time and save everything in a .CSV file at a sample rate higher than 5kHz, in a stable and reliable way, when I see your code for the MCP3008 (10bits) I didn’t understand how to send the comments of channel selection, i made a code in python but i can’t achieve a satisfactory sample rate, could you help me better understand your code and adapt it to my application?

Thank you in advance and a strong hug.

LikeLike

1. iosoftcode says:
  
  October 10, 2021 at 8:29 am
  
  Sadly I have too many requests for assistance, and too little time to respond in detail.
  
  However, I’ve skimmed the ADC084S021 data sheet, and it seems to work very differently to the MCP3008, in that it uses the CS line to start conversion, rather than a ‘start’ bit on the MOSI line.
  
  So I think you can just assert CS and send a 16-bit all-zero value to the ADC, and you should receive a 16-bit value from channel 0. Then negate CS, ready for the next cycle.
  
  The returned 16-bit binary value should have the ADC result shifted right by 5 bits (since the ADC starts outputting data on the 5th clock cycle), but I strongly recommend you take the time to read & understand the data sheet; it really does contain all the information you need to know.
  
  LikeLike
  
CodingRPI says:

January 18, 2022 at 3:25 am

Hi, I try your code on my RPI4 and the dma can never complete even though the code has used your solution.
I think that the DMA address in the code is not the same as the RPI4 datasheet(0xFE007000). So I change it but dma still fails. Do you know what other reasons may lead to the dma timeout?

LikeLike

1. iosoftcode says:
  
  January 18, 2022 at 9:44 am
  
  I’m not aware of the datasheet address being wrong, but you do need to be clear whether any address value is in the bus, virtual or physical address space. Have you changed the definition at the top of rpi_dma_utils.h to match the board you are using?
  
  LikeLike
  
2. Oriol Castillo says:
  
  February 24, 2022 at 11:27 pm
  
  Hi, I had the same problem on RPI 4, but it was solved changing the following definitions in the h-file
  
  #define DMA_CHAN_A 0 // 10
  #define DMA_CHAN_B 1 // 11
  
  I do not why channels 10 and 11 do not work.
  I hope it was useful.
  
  LikeLike
  
Zdenek says:

April 8, 2023 at 3:12 pm

Hello, I try use this code with modification for my HW with ADS868xA, but code every time terminated on the begin in function

f = init_spi(spi_freq);
….
function call
gpio_set(SPI0_CE0_PIN, GPIO_ALT0, GPIO_NOPULL);

returns segmentation Fault

our HW: Raspberry Pi 4 Model B Rev 1.2
Linux raspberrypi 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux

Any Help?

Kinds Regards,
Zdenek

LikeLike

1. iosoftcode says:
  
  April 9, 2023 at 9:44 am
  
  I’m afraid my code only works on the 32-bit OS
  
  LikeLike
  
Sarah says:

January 11, 2024 at 8:23 am

Hi,
thanks for your blog and this interesting post! Would you mind sharing your code for the two files rpi_dma_utils.h and rpi_dma_utils.c? I transferred your code boxes from this post into the files and tried my best to add all the required definitions and to include the necessary libraries. Unfortunately I still haven’t been able to get rid of all the errors when compiling the code.
Thank you in advance!

LikeLike

1. iosoftcode says:
  
  January 11, 2024 at 11:05 am
  
  See https://github.com/jbentham/rpi
  
  LikeLike
  
Sarah says:

January 24, 2024 at 9:03 am

Hi, thanks for this interesting post!
I have encountered several Segmentation fault errors with my Raspberry Pi 4 Model B and the latest Debian GNU/Linux 12 (bookworm). Here are some examples:
rpi_adc_dma_test.c: lines 456, 170, 163, 47, 186, 134, 417, 140
rpi_dma_utils.c: lines 107, 87
What OS version did you use on your Raspi 3? Is there a possibility that the program could function correctly if a different Raspbian OS was installed or do you have any other suggestions for resolving this issue?
Thank you very much!

LikeLike

1. iosoftcode says:
  
  January 24, 2024 at 10:11 am
  
  Are you using a 64-bit OS? The current code only runs on 32-bit, I’ll modify the text to make that clear
  
  LikeLike