Web display for Pi Pico oscilloscope

Web oscilloscope display

In part 1 of this series, I added WiFi connectivity to the Pi Pico using an ESP32 moduleand MicroPython. Part 2 showed how Direct Memory Access (DMA) can be used to get analog samples at regular intervals from the Pico on-board Analog Digital Converter (ADC).

I’m now combining these two techniques with some HTML and Javascript code to create a Web display in a browser, but since this code will be quite complicated, first I’ll sort out how the data is fetched from the Pico Web server.

Data request

The oscilloscope display will require user controls to alter the sample rate, number of samples, and any other settings we’d like to change. These values must be sent to the Web server, along with a filename that will trigger the acquisition. To fetch 1000 samples at 10000 samples per second, the request received by the server might look like:

GET /capture.csv?nsamples=1000&xrate=10000

If you avoid any fancy characters, the Python code in the server that extracts the filename and parameters isn’t at all complicated:

ADC_SAMPLES, ADC_RATE = 20, 100000
parameters = {"nsamples":ADC_SAMPLES, "xrate":ADC_RATE}

# Get HTTP request, extract filename and parameters
req = esp.get_http_request()
if req:
    line = req.split("\r")[0]
    fname = get_fname_params(line, parameters)

# Get filename & parameters from HTML request
def get_fname_params(line, params):
    fname = ""
    parts = line.split()
    if len(parts) > 1:
        p = parts[1].partition('?')
        fname = p[0]
        query = p[2].split('&')
        for param in query:
            p = param.split('=')
            if len(p) > 1:
                if p[0] in params:
                    try:
                        params[p[0]] = int(p[1])
                    except:
                        pass
    return fname

The default parameter names & values are stored in a dictionary, and when the URL is decoded, and names that match those in the dictionary will have their values updated. Then the data is fetched using the parameter values, and returned in the form of a comma-delimited (CSV) file:

if CAPTURE_CSV in fname:
    vals = adc_capture()
    esp.put_http_text(vals, "text/csv", esp32.DISABLE_CACHE)

The name ‘comma-delimited’ is a bit of a misnomer in this case, we just with the given number of lines, with one floating-point voltage value per line.

Requesting the data

Before diving into the complexities of graphical display and Javascript, it is worth creating a simple Web page to fetch this data.

The standard way of specifying parameters with a file request is to define a ‘form’ that will be submitted to the server. The parameter values can be constrained using ‘select’, to avoid the user entering incompatible numbers:

<html><!DOCTYPE html><html lang="en">
<head><meta charset="utf-8"/></head><body>
  <form action="/capture.csv">
    <label for="nsamples">Number of samples</label>
    <select name="nsamples" id="nsamples">
      <option value=100>100</option>
      <option value=200>200</option>
	  <option value=500>500</option>
      <option value=1000>1000</option>
    </select>
    <label for="xrate">Sample rate</label>
    <select name="xrate" id="xrate">
      <option value=1000>1000</option>
      <option value=2000>2000</option>
	  <option value=5000>5000</option>
      <option value=10000>10000</option>
    </select>
	<input type="submit" value="Submit">
  </form>
</body></html>

This generates a very simple display on the browser:

Form to request ADC samples

On submitting the form, we get back a raw list of values:

CSV data

Since the file we have requested is pure CSV data, that is all we get; the controls have vanished, and we’ll have to press the browser ‘back’ button if we want to retry the transaction. This is quite unsatisfactory, and to improve it there are various techniques, for example using a template system to always add the controls at the top of the data. However, we also want the browser to display the data graphically, which means a sizeable amount of Javascript, so we might as well switch to a full-blown AJAX implementation, as mentioned in the first part.

AJAX

To recap, AJAX originally stood for ‘Asynchronous JavaScript and XML’, where the Javascript on the browser would request an XML file from the server, then display data within that file on the browser screen. However, there is no necessity that the file must be XML; for simple unstructured data, CSV is adequate.

The HTML page is similar to the previous one, the main changes are that we have specified a button that’ll call a Javascript function when clicked, and there is a defined area to display the response data; this is tagged as ‘preformatted’ so the text will be displayed in a plain monospaced style.

  <form id="captureForm">
    <label for="nsamples">Number of samples</label>
    <select name="nsamples" id="nsamples">
      <option value=100>100</option>
      <option value=200>200</option>
	  <option value=500>500</option>
      <option value=1000>1000</option>
    </select>
    <label for="xrate">Sample rate</label>
    <select name="xrate" id="xrate">
      <option value=1000>1000</option>
      <option value=2000>2000</option>
	  <option value=5000>5000</option>
      <option value=10000>10000</option>
    </select>
    <button onclick="doSubmit(event)">Submit</button>
  </form>
  <pre><p id="responseText"></p></pre>

The button calls the Javascript function ‘doSubmit’ when clicked, with the click event as an argument. As this button is in a form, by default the browser would attempt to re-fetch the current document using the form data, so we need to block this behaviour and substitute the action we want, which is to wait until the response is obtained, and display it in the area we have allocated. This is ‘asynchronous’ (using a callback function) so that the browser doesn’t stall waiting for the response.

function doSubmit() {
  // Eliminate default action for button click
  // (only necessary if button is in a form)
  event.preventDefault();

  // Create request
  var req = new XMLHttpRequest();

  // Define action when response received
  req.addEventListener( "load", function(event) {
    document.getElementById("responseText").innerHTML = event.target.responseText;
  } );

  // Create FormData from the form
  var formdata = new FormData(document.getElementById("captureForm"));

  // Collect form data and add to request
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0] + '=' + entry[1]);
  }
  req.open( "GET", "/capture.csv?" + encodeURI(params.join("&")));
  req.send();
}

The resulting request sent by the browser looks something like:

GET /capture.csv?nsamples=100&xrate=1000

This is created by looping through the items in the form, and adding them to the base filename. When doing this, there is a limited range of characters we can use, in order not to wreck the HTTP request syntax. I have used the ‘encodeURI’ function to encode any of these unusable characters; this isn’t necessary with simple parameters that are just alphanumeric values, but if I’d included a parameter with free-form text, this would be needed. For example, if one parameter was a page title that might include spaces, then the title “Test page” would be encoded as

GET /capture.csv?nsamples=100&xrate=1000&title=Test%20page

You may wonder why I am looping though the form entries, when in theory they can just be attached to the HTTP request in one step:

// Insert form data into request - doesn't work!
req.open("GET", "/capture.csv");
req.send(formdata);

I haven’t been able to get this method to work; I think the problem is due to the way the browser adapts the request if a form is included, but in the end it isn’t difficult to iterate over the form entries and add them directly to the request.

The resulting browser display is a minor improvement over the previous version, in that it isn’t necessary to use the ‘back’ button to re-fetch the data, but still isn’t very pretty.

Partial display of CSV data

Graphical display

There many ways to display graphic content within a browser. The first decision is whether to use vector graphics, or a bitmap; I prefer the former, since it allows the display to be resized without the lines becoming jagged.

There is a vector graphics language for browsers, namely Scalable Vector Graphics (SVG) and I have experimented with this, but find it easier to use Javascript commands to directly draw on a specific area of the screen, known as an ‘HTML canvas’, that is defined within the HTML page:

<div><canvas id="canvas1"></canvas></div>

To draw on this, we create a ‘2D context’ in Javascript:

var ctx1 = document.getElementById("canvas1").getContext("2d");

We can now use commands such as ‘moveto’ and ‘lineto’ to draw on this context; a useful first exercise is to draw a grid across the display.

var ctx1, xdivisions=10, ydivisions=10, winxpad=10, winypad=30;
var grid_bg="#d8e8d8", grid_fg="#40f040";
window.addEventListener("load", function() {
  ctx1 = document.getElementById("canvas1").getContext("2d");
  resize();
  window.addEventListener('resize', resize, false);
} );

// Draw grid
function drawGrid(ctx) {
  var w=ctx.canvas.clientWidth, h=ctx.canvas.clientHeight;
  var dw = w/xdivisions, dh=h/ydivisions;
  ctx.fillStyle = grid_bg;
  ctx.fillRect(0, 0, w, h);
  ctx.lineWidth = 1;
  ctx.strokeStyle = grid_fg;
  ctx.strokeRect(0, 1, w-1, h-1);
  ctx.beginPath();
  for (var n=0; n<xdivisions; n++) {
    var x = n*dw;
    ctx.moveTo(x, 0);
    ctx.lineTo(x, h);
  }
  for (var n=0; n<ydivisions; n++) {
    var y = n*dh;
    ctx.moveTo(0, y);
    ctx.lineTo(w, y);
  }
  ctx.stroke();
}

// Respond to window being resized
function resize() {
  ctx1.canvas.width = window.innerWidth - winxpad*2;
  ctx1.canvas.height = window.innerHeight - winypad*2;
  drawGrid(ctx1);
}

I’ve included a function that resizes the canvas to fit within the window, which is particularly convenient when getting a screen-grab for inclusion in a blog post:

All that remains is to issue a request, wait for the response callback, and plot the CSV data onto the canvas.

var running=false, capfile="/capture.csv"

// Do a single capture (display is done by callback)
function capture() {
  var req = new XMLHttpRequest();
  req.addEventListener( "load", display);
  var params = formParams()
  req.open( "GET", capfile + "?" + encodeURI(params.join("&")));
  req.send();
}

// Display data (from callback event)
function display(event) {
  drawGrid(ctx1);
  plotData(ctx1, event.target.responseText);
  if (running) {
    window.requestAnimationFrame(capture);
  }
}

// Get form parameters
function formParams() {
  var formdata = new FormData(document.getElementById("captureForm"));
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0]+ '=' + entry[1]);
  }
  return params;
}

A handy feature is to have the display auto-update when the current data has been displayed; I’ve done this by using requestAnimationFrame to trigger another capture cycle, if the global ‘running’ variable is set. Then we just need some buttons to control this feature:

<button id="single" onclick="doSingle()">Single</button>
<button id="run"  onclick="doRun()">Run</button>
// Handle 'single' button press
function doSingle() {
  event.preventDefault();
  running = false;
  capture();
}

// Handle 'run' button press
function doRun() {
  event.preventDefault();
  running = !running;
  capture();
}

The end result won’t win any prizes for style or speed, but it does serve as a useful basis for acquiring & displaying data in a Web browser.

100 Hz sine wave

You’ll see that the controls have been rearranged slightly, and I’ve also added a ‘simulate’ checkbox; this invokes MicroPython code in the Pico Web server that doesn’t use the ADC; instead it uses the CORDIC algorithm to incrementally generate sine & cosine values, which are multiplied, with some random noise added:

# Simulate ADC samples: sine wave plus noise
def adc_sim():
    nsamp = parameters["nsamples"]
    buff = array.array('f', (0 for _ in range(nsamp)))
    f, s, c = nsamp/20.0, 1.0, 0.0
    for n in range(0, nsamp):
        s += c / f
        c -= s / f
        val = ((s + 1) * (c + 1)) + random.randint(0, 100) / 300.0
        buff[n] = val
    return "\r\n".join([("%1.3f" % val) for val in buff])
Distorted sine wave with random noise added

Running the code

If you haven’t done so before, I suggest you run the code given in the first and second parts, to check the hardware is OK.

Load rp_devices.py and rp_esp32.py onto the Micropython filesystem, not forgetting to modify the network name (SSID) and password at the top of that file. Then load the HTML files rpscope_capture, rpscope_ajax and rpscope_display, and run the MicroPython server rp_adc_server.py using Thonny. The files are on Github here.

You should then be able to display the pages as shown above, using the IP address that is displayed on the Thonny console; I’ve used 10.1.1.11 in the examples above.

When experimenting with alternative Web pages, I found it useful to run a Web server on my PC, as this allows a much faster development process. There are many ways to do this, the simplest is probably to use the server that is included as standard in Python 3:

python -m http.server 8000

This makes the server available on port 8000. If the Web browser is running on the same PC as the server, use the ‘localhost’ address in the browser, e.g.

http://127.0.0.1:8000/rpscope_display.html

This assumes the HTML file is in the same directory that you used to invoke the Web server. If you also include a CSV file named ‘capture.csv’, then it will be displayed as if the data came from the Pico server.

However, there is one major problem with this approach: the CSV file will be cached by the browser, so if you change the file, the display won’t change. This isn’t a problem on the Pico Web server, as it adds do-not-cache headers in the HTTP response. The standard Python Web server doesn’t do that, so will use the cached data, even after the file has changed.

One other issue is worthy of mention; in my setup, the ESP32 network interface sometimes locks up after it has transferred a significant amount of data, which means the Web server becomes unresponsive. This isn’t an issue with the MicroPython code, since the ESP32 doesn’t respond to pings when it is in this state. I’m using ESP32 Nina firmware v 1.7.3; hopefully, by the time you read this, there is an update that fixes the problem.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Pi Pico ADC input using DMA and MicroPython

Analog data capture using DMA

This is the second part of my Web-based Pi Pico oscilloscope project. In the first part I used an Espressif ESP32 to add WiFi connectivity to the Pico, and now I’m writing code to grab analog data from the on-chip Analog-to-Digital Converter (ADC), which can potentially provide up to 500k samples/sec.

High-speed transfers like this normally require code written in C or assembly-language, but I’ve decided to use MicroPython, which is considerably slower, so I need to use hardware acceleration to handle the data rate, specifically Direct Memory Access (DMA).

MicroPython ‘uctypes’

MicroPython does not have built-in functions to support DMA, and doesn’t provide any simple way of accessing the registers that control the ADC, DMA and I/O pins. However it does provide a way of defining these registers, using a new mechanism called ‘uctypes’. This is vaguely similar to ‘ctypes’ in standard Python, which is used to define Python interfaces for ‘foreign’ functions, but defines hardware registers, using a very compact (and somewhat obscure) syntax.

To give a specific example, the DMA controller has multiple channels, and according to the RP2040 datasheet section 2.5.7, each channel has 4 registers, with the following offsets:

0x000 READ_ADDR
0x004 WRITE_ADDR
0x008 TRANS_COUNT
0x00c CTRL_TRIG

The first three of these require simple 32-bit values, but the fourth has a complex bitfield:

Bit 31:   AHB_ERROR
Bit 30:   READ_ERROR
..and so on until..
Bits 3-2: DATA_SIZE
Bit 1:    HIGH_PRIORITY
Bit 0:    EN

With MicroPython uctypes, we can define the registers, and individual bitfields within those registers, e.g.

from uctypes import BF_POS, BF_LEN, UINT32, BFUINT32
DMA_CHAN_REGS = {
    "READ_ADDR_REG":       0x00|UINT32,
    "WRITE_ADDR_REG":      0x04|UINT32,
    "TRANS_COUNT_REG":     0x08|UINT32,
    "CTRL_TRIG_REG":       0x0c|UINT32,
    "CTRL_TRIG":          (0x0c,DMA_CTRL_TRIG_FIELDS)
}
DMA_CTRL_TRIG_FIELDS = {
    "AHB_ERROR":   31<<BF_POS | 1<<BF_LEN | BFUINT32,
    "READ_ERROR":  30<<BF_POS | 1<<BF_LEN | BFUINT32,
..and so on until..
    "DATA_SIZE":    2<<BF_POS | 2<<BF_LEN | BFUINT32,
    "HIGH_PRIORITY":1<<BF_POS | 1<<BF_LEN | BFUINT32,
    "EN":           0<<BF_POS | 1<<BF_LEN | BFUINT32
}

The UINT32, BF_POS and BF_LEN entries may look strange, but they are just a way of encapsulating the data type, bit position & bit count into a single variable, and once that has been defined, you can easily read or write any element of the bitfield, e.g.

# Set DMA data source to be ADC FIFO
dma_chan.READ_ADDR_REG = ADC_FIFO_ADDR

# Set transfer size as 16-bit words
dma_chan.CTRL_TRIG.DATA_SIZE = 1

You may wonder why there are 2 definitions for one register: CTRL_TRIG and CTRL_TRIG_REG. Although it is useful to be able to manipulate individual bitfields (as in the above code) sometimes you need to write the whole register at one time, for example to clear all fields to zero:

# Clear the CTRL_TRIG register
dma_chan.CTRL_TRIG_REG = 0

An additional complication is that there are 12 DMA channels, so we need to define all 12, then select one of them to work on:

DMA_CHAN_WIDTH  = 0x40
DMA_CHAN_COUNT  = 12
DMA_CHANS = [struct(DMA_BASE + n*DMA_CHAN_WIDTH, DMA_CHAN_REGS)
    for n in range(0,DMA_CHAN_COUNT)]

DMA_CHAN = 0
dma_chan = DMA_CHANS[DMA_CHAN]

To add even more complication, the DMA controller also has a single block of registers that are not channel specific, e.g.

DMA_REGS = {
    "INTR":               0x400|UINT32,
    "INTE0":              0x404|UINT32,
    "INTF0":              0x408|UINT32,
    "INTS0":              0x40c|UINT32,
    "INTE1":              0x414|UINT32,
..and so on until..
    "FIFO_LEVELS":        0x440|UINT32,
    "CHAN_ABORT":         0x444|UINT32
}

So to cancel all DMA transactions on all channels:

DMA_DEVICE = struct(DMA_BASE, DMA_REGS)
dma = DMA_DEVICE
dma.CHAN_ABORT = 0xffff

Single ADC sample

MicroPython has a function for reading the ADC, but we’ll be using DMA to grab multiple samples very quickly, so this function can’t be used; we need to program the hardware from scratch. A useful first step is to check that we can produce sensible values for a single ADC sample. Firstly the I/O pin needs to be set as an analog input, using the uctype definitions. There are 3 analog input channels, numbered from 0 to 2:

import rp_devices as devs
ADC_CHAN = 0
ADC_PIN  = 26 + ADC_CHAN
adc = devs.ADC_DEVICE
pin = devs.GPIO_PINS[ADC_PIN]
pad = devs.PAD_PINS[ADC_PIN]
pin.GPIO_CTRL_REG = devs.GPIO_FUNC_NULL
pad.PAD_REG = 0

Then we clear down the control & status register, and the FIFO control & status register; this is only necessary if they have previously been programmed:

adc.CS_REG = adc.FCS_REG = 0

Then enable the ADC, and select the channel to be converted:

adc.CS.EN = 1
adc.CS.AINSEL = ADC_CHAN

Now trigger the ADC for one capture cycle, and read the result:

adc.CS.START_ONCE = 1
print(adc.RESULT_REG)

These two lines can be repeated to get multiple samples.

If the input pin is floating (not connected to anything) then the value returned is impossible to predict, but generally it seems to be around 50 to 80 units. The important point is that the value fluctuates between samples; if several samples have exactly the same value, then there is a problem.

Multiple ADC samples

Since MicroPython isn’t fast enough to handle the incoming data, I’m using DMA, so that the ADC values are copied directly into memory without any software intervention.

However, we don’t always want the ADC to run at maximum speed (500k samples/sec) so need some way of triggering it to fetch the next sample after a programmable delay. The RP2040 designers have anticipated this requirement, and have equipped it with a programmable timer, driven from a 48 MHz clock. There is also a mechanism that allows the ADC to automatically sample 2 or 3 inputs in turn; refer to the RP2040 datasheet for details.

Assuming the ADC has been set up as described above, the additional code is required. First we define the DMA channel, the number of samples, and the rate (samples per second).

DMA_CHAN = 0
NSAMPLES = 10
RATE = 100000
dma_chan = devs.DMA_CHANS[DMA_CHAN]
dma = devs.DMA_DEVICE

We now have to enable the ADC FIFO, create a 16-bit buffer to hold the samples, and set the sample rate:

adc.FCS.EN = adc.FCS.DREQ_EN = 1
adc_buff = array.array('H', (0 for _ in range(NSAMPLES)))
adc.DIV_REG = (48000000 // RATE - 1) << 8
adc.FCS.THRESH = adc.FCS.OVER = adc.FCS.UNDER = 1

The DMA controller is configured with the source & destination addresses, and sample count:

dma_chan.READ_ADDR_REG = devs.ADC_FIFO_ADDR
dma_chan.WRITE_ADDR_REG = uctypes.addressof(adc_buff)
dma_chan.TRANS_COUNT_REG = NSAMPLES

The DMA destination is set to auto-increment, with a data size of 16 bits; the data request comes from the ADC. Then DMA is enabled, waiting for the first request.

dma_chan.CTRL_TRIG_REG = 0
dma_chan.CTRL_TRIG.CHAIN_TO = DMA_CHAN
dma_chan.CTRL_TRIG.INCR_WRITE = dma_chan.CTRL_TRIG.IRQ_QUIET = 1
dma_chan.CTRL_TRIG.TREQ_SEL = devs.DREQ_ADC
dma_chan.CTRL_TRIG.DATA_SIZE = 1
dma_chan.CTRL_TRIG.EN = 1

Before starting the sampling, it is important to clear down the ADC FIFO, by reading out any existing samples – if this step is omitted, the data you get will be a mix of old & new, which can be very confusing.

while adc.FCS.LEVEL:
    x = adc.FIFO_REG

We can now set the START_MANY bit, and the ADC will start generating samples, which will be loaded into its FIFO, then transferred by DMA to the RAM buffer. Once the buffer is full (i.e. the DMA transfer count has been reached, and its BUSY bit is cleared) the DMA transfers will stop, but the ADC will keep trying to put samples in the FIFO until the START_MANY bit is cleared.

adc.CS.START_MANY = 1
while dma_chan.CTRL_TRIG.BUSY:
    time.sleep_ms(10)
adc.CS.START_MANY = 0
dma_chan.CTRL_TRIG.EN = 0

We can now print the results, converted into a voltage reading:

vals = [("%1.3f" % (val*3.3/4096)) for val in adc_buff]
print(vals)

As with the single-value test, the displayed values should show some dithering; if the input is floating, you might see something like:

['0.045', '0.045', '0.047', '0.046', '0.045', '0.046', '0.045', '0.046', '0.046', '0.041']

Running the code

If you are unfamiliar with the process of loading MicroPython onto the Pico, or loading files into the MicroPython filesystem, I suggest you read my previous post.

The source files are available on Github here; you need to load the library file rp_devices.py onto the MicroPython filesystem, then run rp_adc_test.py; I normally run this using Thonny, as it simplifies the process of editing, running and debugging the code.

In the next part I combine the ADC sampling and the network interface to create a networked oscilloscope with a browser interface.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Pi Pico wireless Web server using ESP32 and MicroPython

There are various ways that the Pi Pico (RP2040) can be given a wireless interface; in a previous post I used a Microchip ATWINC1500, now I’m using the Espressif ESP32 WROOM-32. The left-hand photo above shows an Adafruit Airlift ESP32 co-processor board which must be hand-wired to the Pico, whilst the right-hand is a Pimorini Wireless Pack that plugs directly into the Pico, either back-to-back, or (as in the photo) on an Omnibus baseboard that allows multiple I/O boards to be attached to a single Pico.

So you can add WiFi connectivity to your Pico without any additional wiring.

The resulting hardware would normally be programmed in C, but I really like the simplicity of MicroPython, so have chosen that language, but this raises an additional question; do I use the Pimorini MicroPython that is derived from the Pi Foundation version, or CircuitPython, which is a derivative created by Adafruit, with various changes?

CircuitPython includes a lot of I/O libraries as standard, but does lack some important features (such as direct access to memory) that are useful to the more advanced developer. So I’ll try to support both, but I do prefer the working with MicroPython.

SPI interface

The ESP32 does all the hard work of connection to the WiFi network and handling TCP/IP sockets, it is just necessary to send the appropriate commands over the SPI link. In addition to the usual clock, data and chip-select lines, there is a ‘reset’ signal from the Pico to the ESP, and a ‘ready’ signal back from the ESP to the Pico. This is necessary because the Pico spends much of its time waiting for the ESP to complete a command; instead of continually polling for a result, the Pico can wait until ‘ready’ is signalled then fetch the data.

My server code uses the I/O pins defined by the Adafruit Pico Wireless Pack:

Function            GPIO  Pin num
Clock               18    24
Pico Tx data (MOSI) 19    25
Pico Rx data (MISO) 16    21
Chip select (CS)     7    10
ESP32 ready         10    14
ESP32 reset         11    15

Software components

Pico software modules and ESP32 interface

ESP32 code

The ESP32 code takes low-level commands over the SPI interface, such as connecting and disconnecting from the wireless network, opening TCP sockets, sending and receiving data. The same ESP32 firmware works with both the MicroPython and CircuitPython code and I suggest you buy an ESP32 module with the firmware pre-loaded, as the re-building & re-flashing process is a bit complicated, see here for the code, and here for a guide to the upgrade process. I’m using 1.7.3, you can check the version in CircuitPython using:

import board
from digitalio import DigitalInOut
esp32_cs = DigitalInOut(board.GP7)
esp32_ready = DigitalInOut(board.GP10)
esp32_reset = DigitalInOut(board.GP11)
spi = busio.SPI(board.GP18, board.GP19, board.GP16)
esp = adafruit_esp32spi.ESP_SPIcontrol(spi, esp32_cs, esp32_ready, esp32_reset)
print("Firmware version", esp.firmware_version.decode('ascii'))

Note that some ESP32 modules are preloaded with firmware that provides a serial interface instead of SPI, using modem-style ‘AT’ commands; this is incompatible with my code, so the firmware will need to be re-flashed.

MicroPython or CircuitPython

This has to be loaded onto the Pico before anything else. There are detailed descriptions of the loading process on the Web, but basically you hold down the Pico pushbutton while making the USB connection. The Pico will then appear as a disk drive in your filesystem, and you just copy (drag-and-drop) the appropriate UF2 file onto that drive. The Pico will then automatically reboot, and run the file you have loaded.

The standard Pi Foundation MicroPython lacks the necessary libraries to interface with the ESP32, so we have to use the Pimorini version. At the time of writing, the latest ‘MicroPython with Pimoroni Libs’ version is 0.26, available on Github here. This includes all the necessary driver code for the ESP32.

If you are using CircuitPython, the installation is a bit more complicated; the base UF2 file (currently 7.0.0) is available here, but you will also need to create a directory in the MicroPython filesystem called adafruit_esp32spi, and load adafruit_esp32spi.py and adafruit_esp32spi_socket.py into it. The files are obtained from here, and the loading process is as described below.

Loading files through REPL

A common source of confusion is the way that files are loaded onto the Pico. I have already described the loading of MicroPython or CircuitPython UF2 images, but it is important to note that this method only applies to the base Python code; if you want to add files that are accessible to your software (e.g. CircuitPython add-on modules, or Web pages for the server) they must be loaded by a completely different method.

When Python runs, it gives you an interactive console, known as REPL (Read Evaluate Print Loop). This is normally available as a serial interface over USB, but can also be configured to use a hardware serial port. You can directly execute commands using this interface, but more usefully you can use a REPL-aware editor to prepare your files and upload them to the Pico. I use Thonny; Click Run and Select Interpreter, and choose either MicroPython (Raspberry Pi Pico) or CircuitPython (Generic) and Thonny will search your serial port to try and connect to Python running on the Pico. You can then select View | Files, and you get a window that shows your local (PC) filesystem, and also the remote Python files. You can then transfer files to & from the PC, and create subdirectories.

At the time of writing, Thonny can’t handle drag-and-drop between the local & remote directories; you have to right-click on a file, then select ‘upload’ to copy it to the currently-displayed remote directory. Do not attempt a transfer while the remote MicroPython program is running; hit the ‘stop’ button first.

In the case of CircuitPython, you need to create a subdirectory called adafruit_esp32spi, containing adafruit_esp32spi.py and adafruit_esp32socket.py.

Server code

To accommodate the differences between the two MicroPython versions, I have created an ESP32 class, with functions for connecting to the wireless network, and handling TCP sockets; it is just a thin wrapper around the MicroPython functions which send SPI commands to the ESP32, and read the responses.

Connecting to the WiFi network just requires a network name (SSID) and password; all the complication is handled by the ESP32. Then a socket is opened to receive the HTTP requests; this is normally on port 80.

def start_server(self, port):
    self.server_sock = picowireless.get_socket()
    picowireless.server_start(port, self.server_sock, 0)

There are significant differences between conventional TCP sockets, and those provided by the ESP32; there is no ‘bind’ command, and the client socket is obtained by a strangely-named ‘avail_server’ call, which also returns the data length for a client socket – a bit confusing. This is a simplified version of the code:

def get_client_sock(self, server_sock):
    return picowireless.avail_server(server_sock)

def recv_length(self, sock):
    return picowireless.avail_server(sock)

def recv_data(self, sock):
    return picowireless.get_data_buf(sock)

def get_http_request(self):
    self.client_sock = self.get_client_sock(self.server_sock)
    client_dlen = self.recv_length(self.client_sock)
    if self.client_sock != 255 and client_dlen > 0:
        req = b""
        while len(req) < client_dlen:
            req += self.recv_data(self.client_sock)
        request = req.decode("utf-8")
        return request
    return None

When the code runs, the IP address is printed on the console

Connecting to testnet...
WiFi status: connecting
WiFi status: connected
Server socket 0, 10.1.1.11:80

Entering the IP address (10.1.1.11 in the above example) into a Web browser means that the server receives something like the following request:

GET / HTTP/1.1
Host: 10.1.1.11
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Most of this information is irrelevant to a tiny Web server, since there is little choice over the information it returns. The first line has the important information, namely the resource that is being requested, so instead of decoding the whole message, we can do simple tests to match the line to known resources:

DIRECTORY = "/"
INDEX_FNAME   = "rpscope.html"
DATA_FNAME    = "data.csv"
ICON_FNAME    = "favicon.ico"

DISABLE_CACHE = "Cache-Control: no-cache, no-store, must-revalidate\r\n"
DISABLE_CACHE += "Pragma: no-cache\r\nExpires: 0\r\n"

req = esp.get_http_request()
if req:
    r = req.split("\r")[0]
    if ICON_FNAME in r:
        esp.put_http_404()
    elif DATA_FNAME in r:
        esp.put_http_file(DIRECTORY+DATA_FNAME, "text/csv", DISABLE_CACHE)
    else:
        esp.put_http_file(DIRECTORY+INDEX_FNAME)

Since we are dealing with live data, that may change every time it is fetched, the browser’s caching mechanism must be disabled, hence the DISABLE_CACHE response, which aims to do so regardless of which browser version is in use.

Sending the response back to the browser should be easy, it just needs to be split into chunks of maximum 4095 bytes so as to not overflow the SPI buffers. However I had problems with unreliability of both the MicroPython and CircuitPython implementations; sometimes the network transfers would just stall. The solution seems to be to drastically reduce the SPI block size; some CircuitPython code uses 64-byte blocks, but I’ve found 128 bytes works OK. Further work is needed to establish the source of the problem, but this workaround is sufficient for now.

MAX_SPI_DLEN = const(128)
HTTP_OK = "HTTP/1.1 200 OK\r\n"
CONTENT_LEN = "Content-Length: %u\r\n"
CONTENT_TYPE = "Content-type %s\r\n"
HEAD_END = "\r\n"

def put_http_file(self, fname, content="text/html; charset=utf-8", hdr=""):
    try:
        f = open(fname)
    except:
        f = None
    if not f:
        esp.put_http_404()
    else:
        flen = os.stat(fname)[6]
        resp = HTTP_OK + CONTENT_LEN%flen + CONTENT_TYPE%content + hdr + HEAD_END
        self.send_data(self.client_sock, resp)
        n = 0
        while n < flen:
            data = f.read(MAX_SPI_DLEN)
            self.send_data(self.client_sock, data)
            n += len(data)
        self.send_end(self.client_sock)

Dynamic web server

A simple Web server could just receive a page request from a browser, match it with a file in the Pico filesystem, and return the page text to the browser. However, I’d like to report back some live data that has been captured by the Pico, so we need a method to return dynamically-changing values.

There are three main ways of doing this; server-side includes (SSI), AJAX, and on-demand page creation.

Server-side includes

A Web page that is stored in the server filesystem may include tags that trigger the server to perform specific actions, for example when the tag ‘$time’ is reached, the server replaces that text with the current time value. A slightly more sophisticated version embeds the tag in an HTML comment, so the page can be displayed without a Pico server, albeit with no dynamic data.

The great merit of this approach is its simplicity, and I used it extensively in my early projects. However, there is one major snag; the data is embedded in an HTML page, so is difficult to extract. For example, you may have a Web page that contains the temperature data for a 24-hour period, and you want to aggregate that into weekly and monthly reports; you could write a script that strips out the HTML and returns pure data, but it’d be easier if the Web server could just provide a data file for each day.

AJAX

Web pages routinely include Javascript code to perform various display functions, and one of these functions can fetch a data file from the server, and display its contents. This is commonly known as AJAX (Asynchronous Javascript and XML) though in reality there is no necessity for the data to be in XML format; any format will do.

For example, to display a graph of daily temperatures, the Browser loads a Web page with the main formatting, and Javascript code that requests a comma-delimited (CSV) data file. The server prepares that file dynamically using the current data, and returns it to the browser. The Javascript on the browser decodes the data, and displays it as a graph; it can also perform calculations on the data, such as reporting minimum and maximum values.

The key advantage is that the data file can be made available to any other applications, so a logging application can ignore all the Javascript processing, and just fetch the data file directly from the server.

With regard to the data file format, I prefer not to use XML if I can possibly avoid it, so use Javascript Object Notation (JSON) for structured data, and comma-delimited (CSV) for unstructured values, such as data tables.

The first ‘A’ in AJAX stands for Asynchronous, and this deserves some explanation. When the Javascript fetches the data file from the server, there will be a delay, and if the server is heavily loaded, this might be a substantial delay. This could result in the code becoming completely unresponsive, as it waits for data that may never arrive. To avoid this, the data fetching function XMLHttpRequest() returns immediately, but with a callback function that is triggered when the data actually arrives from the server – this is the asynchronous behaviour.

There is now a more modern approach using a ‘fetch’ function that substitutes a ‘promise’ for the callback, but the net effect is the same; keeping the Javascript code running while waiting for data to arrive from the server.

On-demand page creation

The above two techniques rely on the Web page being stored in a filesystem on the server, but it is possible for the server code to create a page from scratch every time it is requested.

Due to the complexity of hand-coding HTML, this approach is normally used with page templates, that are converted on-the-fly into HTML for the browser. However, a template system would add significant complexity to this simple demonstration, so I have used the hand-coding approach to create a basic display of analog input voltages, as shown below.

This data table is created from scratch by the server code, every time the page is loaded:

ADC_PINS = 26, 27, 28
ADC_SCALE = 3.3 / 65536
TEST_PAGE = '''<!DOCTYPE html><html>
    <head><style>table, th, td {border: 1px solid black; margin: 5px;}</style></head>
    <body><h2>Pi Pico web server</h2>%s</body></html>'''

adcs = [machine.ADC(pin) for pin in ADC_PINS]

heads = ["GP%u" % pin for pin in ADC_PINS]
vals = [("%1.3f" % (adc.read_u16() * ADC_SCALE)) for adc in adcs]
th = "<tr><th>" + "</th><th>".join(heads) + "</th></tr>"
tr = "<tr><td>" + "</td><td>".join(vals) + "</td></tr>"
table = "<table><caption>ADC voltages</caption>%s</table>" % (th+tr)
esp.put_http_text(TEST_PAGE % table)

Even in this trivial example, there is significant work in ensuring that the HTML tags are nested correctly, so for pages with any degree of complexity, I’d normally use the AJAX approach as described earlier.

Running the code

The steps are:

  • Connect the wireless module
  • Load the appropriate UF2 file
  • In the case of CircuitPython, load the add-on libraries
  • Get the Web server Python code from Github here. The file rp_esp32.py is for MicroPython with Pimoroni Libraries, and rp_esp32_cp.py is for CircuitPython.
  • Edit the top of the server code to set the network name (SSID) and password for your WiFi network.
  • Run the code using Thonny or any other MicroPython REPL interface; the console should show something like:
Connecting to testnet...
WiFi status: connecting
WiFi status: connected
Server socket 0, 10.1.1.11:80
  • Run a Web browser, and access test.html at the given IP address, e.g. 10.1.1.11/test.html. The console should indicate the socket number, data length, the first line of the HTTP request, and the type of request, e.g.
Client socket 1 len 466: GET /test.html HTTP/1.1 [test page]

The browser should show the voltages of the first 3 ADC channels, e.g.

The Web pages produced by the MicroPython and CircuitPython versions are very similar; the only difference is in the table headers, which either reflect the I/O pin numbers, or the analog channel numbers.

If a file called ‘index.html’ is loaded into the root directory of the MicroPython filesystem, it will be displayed in the browser by default, when no filename is entered in the browser address bar. A minimal index page might look like:

<!doctype html><html><head></head>
  <body>
    <h2>Pi Pico web server</h2>
    <a href="test.html">ADC test</a>
  </body>
</html>

So far, I have only presented very simple Web pages; in the next post I’ll show how to fetch high-speed analog samples using DMA, then combine these with a more sophisticated AJAX functionality to create a Web-based oscilloscope.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

RP2040 WiFi using Microchip ATWINC1500 module: part 2

Server sockets

In part 1, we got as far as connecting an ATWINC1500 or 1510 module to a WiFi network; now it is time to do something vaguely useful with it.

Sockets

A network interface is frequently referred to as a ‘socket’ interface, so first I’d better define what that is.

A socket is a logical endpoint for network communication, and consists of an IP address, and a port number. The IP address is often assigned automatically at boot-time, using Dynamic Host Configuration Protocol (DHCP) as in part 1, but can also be pre-programmed into the unit (a ‘static’ address).

The 16-bit port number further subdivides the functionality within that IP address, so one address can support multiple simultaneous conversations (‘connections’). Furthermore, specific port numbers below 1024 are generally associated with specific functions; for example, an un-encrypted Web server normally uses port 80, whilst an encrypted server is on port 443. Port numbers 1024 and above are generally for user programs.

Clients and servers, UDP and TCP

A sever runs all the time, waiting for a client to contact it; the client is responsible for initiating the contact and providing some data, probably in the form of a request. The server returns an appropriate response, then either side may terminate the connection; the client may end it because it has received enough data, or the server because there are limits on the maximum number of simultaneous clients it can service.

There are 2 fundamental communication methods in TCP/IP: User Datagram Protocol (UDP) and Transmission Control Protocol (TCP).

UDP is the simpler of the two, and involves sending a block of data to a given socket (port and IP address) with no guarantee that it will arrive. TCP involves sending a stream of data to a socket; it includes sophisticated retry mechanisms to ensure that the data arrives.

There are those in the networking community who shun UDP, because they think the unreliability makes is useless; I disagree, and think there are various use-cases where the simple block-based transfer is perfectly adequate, possibly overlaid with a simple retry mechanism, so we’ll start with a simple UDP server.

UDP server

The simplest UDP server is stateless, i.e. it doesn’t store any information about the client; it just responds to any request it receives. This means that a single socket can handle multiple clients, unlike TCP which requires a unique socket for each client it is communicating with.

For a classic C socket interface, the steps would be:

  1. Create a datagram socket using socket()
  2. Bind to the socket to a specific port using bind()
  3. When a message is received on that port, get the data, return address and port number using recvfrom()
  4. Send response data to the remote address and port number using sendto()
  5. Go to step 3

The code driving the ATWINC1500 module does the same job, but the function calls are a bit different, as they reflect the messages sent to & received from the WiFi module:

  1. Initialise a socket structure for UDP
  2. Send a BIND command to the module, with the port number
  3. Receive a BIND response
  4. Send a RECVFROM command to the module
  5. Wait until a RECVFROM response is received, get the data, return address & port number
  6. Send a SENDTO command to the module with the response data, return address & port number
  7. Go to step 4

Note that there may be a very long wait between steps 4 and 5, if there are no clients contacting the server. Fortunately the module will signal the arrival of a message by asserting the interrupt request (IRQ) line, so the RP2040 CPU can proceed with other tasks while waiting.

UDP Reception

There are 4 steps when the module receives a packet (‘datagram’) from a remote client:

  1. Get the group ID and operation. This identifies the message type; for UDP it will generally be a response to a RECVFROM request, but it could be something completely different. My software combines the group ID and operation into a single 16-bit number.
  2. Get the operation-specific header. This is generally 16 bytes or less, and in the case of RECVFROM, gives the IP address and port number of the sender, also a pointer & offset to the user data in the buffer.
  3. Get the user data. The application doesn’t need to fetch all the incoming data; for example, in the case of a Web server, it might just get the first line of the page request, and discard all the other information.
  4. Handle socket errors. If there is an error, the data length-value will be negative, and the code must take appropriate action, such as closing and re-opening the server socket. Since a UDP socket is connectionless, it generally won’t see many errors, but a TCP socket will flag an error every time a client closes an active connection.

For RECVFROM, the step 1 & 2 headers are:

// HIF message header
typedef struct {
    uint8_t gid, op;
    uint16_t len;
} HIF_HDR;

// Operation-specific header
typedef struct {
    SOCK_ADDR addr;
    int16_t dlen; // (status)
    uint16_t oset;
    uint8_t sock, x;
    uint16_t session;
} RECV_RESP_MSG;

Having fetched these two blocks, control is passed to a state machine that takes appropriate action. If we’ve just received an indication that DHCP has succeeded, then we bind the server sockets.

    if (gop==GOP_DHCP_CONF)
    {
        for (sock=MIN_SOCKET; sock<MAX_SOCKETS; sock++)
        {
            sp = &sockets[sock];
            if (sp->state==STATE_BINDING)
                put_sock_bind(fd, sock, sp->localport);
        }
    }

When we get a message indicating the binding has succeeded, then if it is a TCP socket, we need to send a LISTEN command. If UDP, we can just send a RECVFROM, and wait for data to arrive. We can tell whether the socket is TCP or UDP by looking at the socket number; the lower numbers are TCP, and higher are UDP.

    else if (gop==GOP_BIND && (sock=rmp->bind.sock)<MAX_SOCKETS &&
             sockets[sock].state==STATE_BINDING)
    {
        sock_state(sock, STATE_BOUND);
        if (sock < MIN_UDP_SOCK)
            put_sock_listen(fd, sock);
        else
            put_sock_recvfrom(fd, sock);
    }

If a UDP server, we may now get a RECVFROM response, indicating that a packet (‘datagram’) has arrived. If so, we save the return socket address (IP and port number), call a handler function, then send another RECVFROM request.

    else if (gop==GOP_RECVFROM && (sock=rmp->recv.sock)<MAX_SOCKETS &&
             (sp=&sockets[sock])->state==STATE_BOUND)
    {
        memcpy(&sp->addr, &rmp->recv.addr, sizeof(SOCK_ADDR));
        if (sp->handler)
            sp->handler(fd, sock, rmp->recv.dlen);
        put_sock_recvfrom(fd, sock);
    }

A very simple handler just echoes back the incoming data:

uint8_t databuff[MAX_DATALEN];

// Handler for UDP echo
void udp_echo_handler(int fd, uint8_t sock, int rxlen)
{
    if (rxlen>0 && get_sock_data(fd, sock, databuff, rxlen))
        put_sock_sendto(fd, sock, databuff, rxlen);
}

UDP client for testing

For testing, I use a simple UDP client written in Python, that can run on a Raspberry Pi, or any PC running Linux or Windows. It sends a message every second, and checks for a response. You’ll need to change the IP address to match the DCHP value given by the module.

# Simple Python client for testing UDP server
import socket, time

ADDR = "10.1.1.11"
PORT = 1025
MESSAGE = b"Test %u"
DELAY = 1

def hex_str(bytes):
    return " ".join([("%02x" % int(b)) for b in bytes])

print("Send to UDP %s:%s" % (ADDR, PORT))
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(0.2)
count = 1
while True:
    msg = MESSAGE % count
    sock.sendto(msg, (ADDR, PORT))
    print("Tx %u: %s" % (len(msg), hex_str(msg)))
    count += 1
    try:
        data = sock.recvfrom(1000)
    except:
        data = None
    if data:
        bytes, addr = data
        s = hex_str(bytes)
        print("Rx %u: %s\n" % (len(bytes), s))
    time.sleep(DELAY)

TCP server

A TCP connection is more complex than UDP, since the module firmware must keep track of the data that is sent & received, in order to correct any errors. The steps for a classic C socket interface would be:

  1. Create a ‘stream’ socket using socket()
  2. Bind to the socket to a specific port using bind()
  3. Set the socket to wait for incoming connections using listen()
  4. When a connection request is received on the main socket, open a new socket for the data using accept()
  5. When data arrives on the new socket, get it using recv()
  6. Send response data using send()
  7. If socket error or transfer complete, close socket. Otherwise go to step 5

The corresponding operations for the WiFi module are:

  1. Initialise a socket structure for TCP
  2. Send a BIND command to the module, with the port number
  3. Receive a BIND response
  4. Send a LISTEN command
  5. Receive a LISTEN response
  6. Receive an ACCEPT notification when a connection request arrives on the main socket. Save the new socket number.
  7. Send a RECV command on the new socket.
  8. Receive a RECV response when data arrives on the new socket
  9. Send response data using SEND
  10. Go to step 7, or close the new socket

TCP reception

The first step (binding a socket to a port number) is the same as for UDP, but then we send a LISTEN command, which activates the socket to receive incoming connections. When a client connects, we get an ACCEPT response containing 2 socket numbers; the first is the one that we used for the original BIND command, and the second is a new socket that will be used for the data transfer; we need to issue a RECV on this socket to get the user data.

    else if (gop==GOP_ACCEPT &&
             (sock=rmp->accept.listen_sock)<MAX_SOCKETS &&
             (sock2=rmp->accept.conn_sock)<MAX_SOCKETS &&
             sockets[sock].state==STATE_BOUND)
    {
        memcpy(&sockets[sock2].addr, &rmp->recv.addr, sizeof(SOCK_ADDR));
        sockets[sock2].handler = sockets[sock].handler;
        sock_state(sock2, STATE_CONNECTED);
        put_sock_recv(fd, sock2);
    }

When data is available, the RECV command will return, and we can call a data handler function, then send another RECV for more data. Alternatively, if the data length is negative, then there is an error, and the socket needs to be closed. This isn’t necessarily as bad as it sounds; the most common reason is that the client has closed the connection, and we just need to erase the 2nd socket for future use.

    else if (gop==GOP_RECV && (sock=rmp->recv.sock)<MAX_SOCKETS &&
            (sp=&sockets[sock])->state==STATE_CONNECTED)
    {
        if (sp->handler)
            sp->handler(fd, sock, rmp->recv.dlen);
        if (rmp->recv.dlen > 0)
            put_sock_recv(fd, sock);
    }

The TCP data handler is again a simple echo-back of the incoming data, but with an added complication: if the data length is negative, there has been an error. This isn’t as bad as it sounds; the most common error is that the client has closed the TCP connection, so the server must also close its data socket, to allow it to be re-used for a new connection.

// Handler for TCP echo
void tcp_echo_handler(int fd, uint8_t sock, int rxlen)
{
    if (rxlen < 0)
        put_sock_close(fd, sock);
    else if (rxlen>0 && get_sock_data(fd, sock, databuff, rxlen))
        put_sock_send(fd, sock, databuff, rxlen);
}

TCP client for testing

This is similar to the UDP client; it can run on a Raspberry Pi or PC, running Linux or Windows:

# Simple Python client for testing TCP server
import socket, time

ADDR = "10.1.1.11"
PORT = 1025
MESSAGE = b"Test %u"
DELAY = 1

def hex_str(bytes):
    return " ".join([("%02x" % int(b)) for b in bytes])

print("Send to TCP %s:%s" % (ADDR, PORT))
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(0.5)
sock.connect((ADDR, PORT))
count = 1
while True:
    msg = MESSAGE % count
    sock.sendall(msg)
    print("Tx %u: %s" % (len(msg), hex_str(msg)))
    count += 1
    try:
        data = sock.recv(1000)
    except:
        data = None
    if data:
        s = hex_str(data)
        print("Rx %u: %s\n" % (len(data), s))
    time.sleep(DELAY)
# EOF

Source files

The C source files are in the ‘part2’ directory on  Github here

The default network name and passphrase are “testnet” and “testpass”; these must be changed to match your network, then the code will need to be rebuilt & run using the standard Pico devlopment environment.

The default TCP & UDP port numbers are 1025, and the Python programs I’ve provided can be used to perform simple simple echo tests, providing the IP address is modified to match that given when the Pico joins the network.

python tcp_tx.py
Send to TCP 10.1.1.11:1025
Tx 6: 54 65 73 74 20 31
Rx 6: 54 65 73 74 20 31

Tx 6: 54 65 73 74 20 32
Rx 6: 54 65 73 74 20 32
..and so on..

python udp_tx.py
Send to UDP 10.1.1.11:1025
Tx 6: 54 65 73 74 20 31
Rx 6: 54 65 73 74 20 31

Tx 6: 54 65 73 74 20 32
Rx 6: 54 65 73 74 20 32
..and so on..

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

RP2040 WiFi using Microchip ATWINC1500 module

Part 1: joining a network

WINC1500 modules

The Raspberry Pi Pico is an incredibly useful low-cost micro-controller module based on the RP2040 CPU, but at the time of writing, there is a major omission: there is no networking capability.

This project adds low-cost wireless networking to the Pi Pico, and any other RP2040 boards. The There are various modules on the market that could be used for this purpose; I have chosen the Microchip ATWINC1500 or 1510 modules as they low-cost, have an easy hardware interface (4-wire SPI), and feature a built-in TCP/IP software stack, which significantly reduces the amount of software needed on the RP2040.

The photo above shows the module mounted on an Adafruit breakout board, and the module itself; this is the variant with a built-in antenna, but there is also a version with an antenna connector, that allows an external antenna to be used.

The only difference between the ATWINC1500 and 1510 modules is that the latter have larger flash memory size (1 MB, as opposed to 0.5 MB). There is also an earlier series of low-level interface modules named ATWILC; I’m not using them, as the built-in TCP/IP software of the ATWINC saves a lot of code complication on the RP2040.

Hardware connections

Pi Pico and WiFi module

For simplicity, I have used the Adafruit breakout board, but it is possible to directly connect the module to the Pico, powered from its 3.3V supply.

Wiring Pico to Adafruit WINC1500 breakout
Pi Pico pins
SCK     18     SPI clock
MOSI    19     SPI data out
MISO    16     SPI data in
CS      17     SPI chip select
WAKE    20     Module wake
EN      20     Module enable
RESET   21     Module reset
IRQ     22     Module interrupt request

No extra components are needed, if the wiring to the module is kept short, i.e. 3 inches (76 mm).

SPI on the RP2040

Initialising the SPI interface on the RP2040 just involves a list of API function calls:

#define SCK_PIN     18
#define MOSI_PIN    19
#define MISO_PIN    16
#define CS_PIN      17
#define WAKE_PIN    20
#define RESET_PIN   21
#define IRQ_PIN     22

// Initialise SPI interface
void spi_setup(int fd)
{
    stdio_init_all();
    spi_init(SPI_PORT, SPI_SPEED);
    spi_set_format(SPI_PORT, 8, SPI_CPOL_0, SPI_CPHA_0, SPI_MSB_FIRST);
    gpio_init(MISO_PIN);
    gpio_set_function(MISO_PIN, GPIO_FUNC_SPI);
    gpio_set_function(CS_PIN,   GPIO_FUNC_SIO);
    gpio_set_function(SCK_PIN,  GPIO_FUNC_SPI);
    gpio_set_function(MOSI_PIN, GPIO_FUNC_SPI);
    gpio_init(CS_PIN);
    gpio_set_dir(CS_PIN, GPIO_OUT);
    gpio_put(CS_PIN, 1);
    gpio_init(WAKE_PIN);
    gpio_set_dir(WAKE_PIN, GPIO_OUT);
    gpio_put(WAKE_PIN, 1);
    gpio_init(IRQ_PIN);
    gpio_set_dir(IRQ_PIN, GPIO_IN);
    gpio_pull_up(IRQ_PIN);
    gpio_init(RESET_PIN);
    gpio_set_dir(RESET_PIN, GPIO_OUT);
    gpio_put(RESET_PIN, 0);
    sleep_ms(1);
    gpio_put(RESET_PIN, 1);
    sleep_ms(1);
}

When using the standard SPI transfer API function, I found that occasionally the last data bit wasn’t being received correctly. The reason was that the API function returns before the transfer is complete; the clock signal is still high, and needs to go low to finish the transaction. To fix this, I inserted a loop that waits for the clock to go low, before negating the chip-select line.

// Do SPI transfer
int spi_xfer(int fd, uint8_t *txd, uint8_t *rxd, int len)
{
    gpio_put(CS_PIN, 0);
    spi_write_read_blocking(SPI_PORT, txd, rxd, len);
    while (gpio_get(SCK_PIN)) ;
    gpio_put(CS_PIN, 1);
}

Interface method

The WiFi module has its own processor, running proprietary code; it is supplied with a suitable binary image already installed, so will start running as soon as the module is enabled.

Pico WINC1500 block diagram

The module has a Host Interface (HIF) that the Pico uses for all communications; it is a Serial Peripheral Interface (SPI) that consists of a clock signal, incoming & outgoing data lines (MOSI and MISO), and a Chip Select, also known as a Chip Enable. The Pico initiates and controls all the HIF transfers, but the module can request a transfer by asserting an Interrupt Request (IRQ) line.

The module is powered up by asserting the ‘enable’ line, then briefly pulsing the reset line. This ensures that there is a clean startup, without any complications caused by previous settings.

There are 2 basic methods to transfer data between the PICO and the module; simple 32-bit configuration values can be transferred as register read/write cycles; there is a specific format for these, which includes an acknowledgement that a write cycle has succeeded. The following logic analyser trace shows a 32-bit value of 0x51 being read from register 0x1070; the output from the CPU is MOSI, and the input from the module is MISO.

ATWINC1500 register read cycle

Now the corresponding write cycle, where the CPU is writing back a value of 0x51 to the same 32-bit register.

ATWINC1500 register write cycle

There are a few unusual features about these transfers.

  • The chip-select (CS) line doesn’t have to be continuously asserted during the transfer, it need only be asserted whilst a byte is actually being read or written.
  • The command value is CA hex for a read cycle, and C9 for a write.
  • The module echoes back the command value plus 2 bytes for a read (CA 00 F3), or plus 1 byte for a write (C9 00), to indicate it has been accepted.
  • The register address is 24-bit, big-endian (most significant byte first)
  • The data value is 32-bit, little-endian in the read cycle (51 00 00 00), and big-endian in the write cycle (00 00 00 50).

The last point is quite remarkable, and when starting on the code development, I had great difficulty believing it could be true. The likely reason is that the SPI transfer is is big-endian as defined in the Secure Digital (SD) card specification, but the CPU in the module is little-endian. So the firmware has to either do a byte-swap on every response message, or return everything using the native byte-order, with this result.

In addition to reading & writing single-word registers, the software must read & write blocks of data. This involves some negotiation with the module firmware, since that manages the allocation & freeing of the necessary storage space in the module. For example, the procedure for a block write is:

  1. Request a buffer of the required size
  2. Receive the address of the buffer from the module
  3. Write one or more data blocks to the buffer
  4. Signal that the transfer is complete

Reading is similar, except that the first step isn’t needed, as the buffer is already available with the required data.

Operations

The above transfer mechanism is used to send commands to the module, and receive responses back from it; there is generally a one-to-one correspondence between the command and response, but there may be a significant delay between the two. For example, the ‘receive’ command requests a data block that has been received over the network, but if there is none, there will be no response, and the command will remain active until something does arrive.

The commands are generally referred to as ‘operations’, and they are split into groups:

  1. Main
  2. Wireless (WiFi)
  3. Internet Protocol (IP)
  4. Host Interface (HIF)
  5. Over The Air update (OTA)
  6. Secure Socket Layer (SSL)
  7. Cryptography (Crypto)

Each operation is assigned a number, and there is some re-use of numbers within different groups, for example a value of 70 in the WiFi group is used to enable Acess Point (AP) mode, but the same value in the IP group is a socket receive command. To avoid this possible source of confusion, my code combines the group and operation into a single 16-bit value, e.g.

// Host Interface (HIF) Group IDs
#define GID_MAIN        0
#define GID_WIFI        1
#define GID_IP          2
#define GID_HIF         3

// Host Interface operations with Group ID (GID)
#define GIDOP(gid, op) ((gid << 8) | op)
#define GOP_STATE_CHANGE    GIDOP(GID_WIFI, 44)
#define GOP_DHCP_CONF       GIDOP(GID_WIFI, 50)
#define GOP_CONN_REQ_NEW    GIDOP(GID_WIFI, 59)
#define GOP_BIND            GIDOP(GID_IP,   65)
..and so on..

To invoke an operation on the module, you must first send a 4-byte header that gives an 8-bit operation number, 8-bit group, and 16-bit message length.

typedef struct {
    uint8_t gid, op;
    uint16_t len;
} HIF_HDR;

The next 4 bytes of the message are unused, so can either be sent as zeros, or just skipped. Then there is the command header, which varies depending on the operation being performed, but are often 16 bytes or less, for example the IP ‘bind’ command:

// Address field for socket, network order (MSbyte first)
typedef struct {
    uint16_t family, port;
    uint32_t ip;
} SOCK_ADDR;

// Socket bind command, 12 bytes
typedef struct {
    SOCK_ADDR saddr;
    uint8_t sock, x;
    uint16_t session;
} BIND_CMD;

I’ll be discussing the IP operations in detail in the next part.

The interrupt request (IRQ) line is pulled low by the module to indicate that a response is available; for simplicity, my code polls this line, and calls an interrupt handler.

if (read_irq() == 0)
    interrupt_handler();

Joining a network

I’ll start with the most common use-case; joining a network that uses WiFi Protected Access (WPA or WPA2), and obtaining an IP address using Dynamic Host Configuration Protocol (DHCP). This is remarkably painless, since the module firmware does all of the hard work, but first we have to tackle the issue of firmware versions.

As previously explained, the module comes pre-loaded with firmware; at the time of writing, this is generally version 19.5.2 or 19.6.1. There is a provision for re-flashing the firmware to the latest version, but for the time being I’d like to avoid that complication, so the code I’ve written is compatible with both versions.

The reason that this matters is that 19.6.1 introduced a new method for joining a network, with a new operation number (59, as opposed to 40). Fortunately the newer software can still handle the older method, so that is what I’ll be using by default, though there is a compile-time option to use the new one, if you’re sure the module has the newer firmware.

The code to join the network is remarkably brief, just involving some data preparation, then calling a host interface transfer function to send the data. It searches across all channels to find a signal that matches the given Service Set Identifier (SSID, or network name). A password string (WPA passphrase) is also given; if this is a null value, the module will attempt to join an ‘open’ (insecure) network, but there are very obvious security risks with this, so it is not recommended.

// Join a WPA network, or open network if null password
bool join_net(int fd, char *ssid, char *pass)
{
#if NEW_JOIN
    CONN_HDR ch = {pass?0x98:0x2c, CRED_STORE, ANY_CHAN, strlen(ssid), "",
                   pass?AUTH_PSK:AUTH_OPEN, {0,0,0}};
    PSK_DATA pd;

    strcpy(ch.ssid, ssid);
    if (pass)
    {
        memset(&pd, 0, sizeof(PSK_DATA));
        strcpy(pd.phrase, pass);
        pd.len = strlen(pass);
        return(hif_put(fd, GOP_CONN_REQ_NEW|REQ_DATA, &ch, sizeof(CONN_HDR),
               &pd, sizeof(PSK_DATA), sizeof(CONN_HDR)));
    }
    return(hif_put(fd, GOP_CONN_REQ_NEW, &ch, sizeof(CONN_HDR), 0, 0, 0));
#else
    OLD_CONN_HDR och = {"", pass?AUTH_PSK:AUTH_OPEN, {0,0}, ANY_CHAN, "", 1, {0,0}};

    strcpy(och.ssid, ssid);
    strcpy(och.psk, pass ? pass : "");
    return(hif_put(fd, GOP_CONN_REQ_OLD, &och, sizeof(OLD_CONN_HDR), 0, 0, 0));
#endif
}

Running the code

There are 3 source files in the ‘part1’ directory on  Github here:

  • winc_pico_part1.c: main program, with RP2040-specific code
  • winc_wifi.c: module interface
  • winc_wifi.h: module interface definitions

The default network name and passphrase are “testnet” and “testpass”; these will have to be changed to match your network.

Normally I’d provide a simple Pi command-line to compile & run the files, but this is considerably more complex on the Pico; you’ll have to refer to the official documentation for setting up the development tools. I’ve provided a simple cmakelists file, that may need to be altered to suit your environment.

There is a compile-time ‘verbose’ setting, which regulates the amount of diagnostic information that is displayed on the console (serial link). Level 1 shows the following:

Firmware 19.5.2, OTP MAC address F8:F0:05:xx.xx.xx
Connecting...........
Interrupt gid 1 op 44 len 12 State change connected
Interrupt gid 1 op 50 len 28 DHCP conf 10.1.1.11 gate 10.1.1.101

[or if the network can't be found]
Interrupt gid 1 op 44 len 12 State change fail

Verbose level 2 lists all the register settings as well, e.g.

Rd reg 1000: 001003a0
Rd reg 13f4: 00000001
Rd reg 1014: 807c082d
Rd reg 207bc: 00003f00
Rd reg c000c: 00000000
Rd reg c000c: 10add09e
Wr reg 108c: 13521330
Wr reg 14a0: 00000102
..and so on..

Level 3 also includes hex dumps of the data transfers.

Socket interface

Part 2 describes the socket interface, with TCP and UDP servers here.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

PicoReg: real-time diagnostics for the Pi Pico using SWD

PicoReg PyQt display

The Raspberry Pi Pico CPU (RP2040) has a remarkably complex set of peripherals, and this is reflected in the very large number of control registers (1,116).

To debug a C or Python application, it can be very helpful to know the values in these registers; instead of adding ‘print’ calls, or using a heavyweight debugger, you can use a 3-wire connection between the Pico and a Raspberry Pi, and view the state of any register without modifying or disrupting the software. This magic is achieved using the Single Wire Debug (SWD) interface; it is mainly used for reprogramming the Flash memory of the Pico, but can do a lot more – it acts as a transparent window into the I/O subsystems, that is completely independent of the CPU.

This capability can be accessed using software tools such as OpenOCD, GDB, and Eclipse, but I wanted to create something much simpler, and easier to use. The end-result is PicoReg, which is a pure-Python program that runs on any Raspberry Pi. In addition to the SWD interface, it can access the standard System View Description (SVD) file, to give a description of every register in considerable detail.

The end-result is a simple-to-use software tool that gives a valuable insight into the inner workings of the Pico.

Installation

Hardware

Pi -to-Pico SWD connections

Only 3 wires are needed; a ground connection, SWCLK and SWDIO.

Any I/O pins on the Pi could be used; for ease of identification, I’ve chosen BCM pin numbers 20 and 21. These should be connected to the SWCLK and SWDIO pins on the edge of the Pico; keep the wires as short as possible, ideally a 150 mm (6 inches) or less.

The pins are defined at the top of picoreg_gpio.py, so can easily be changed to any others, but you need to be sure there is no conflict with other device drivers, such as serial, SPI etc.

# Clock and data BCM pin numbers
CLK_PIN         = 20
DAT_PIN         = 21

The Pico will need to be powered as normal, for example using a 5V supply into the USB socket, or if the Pico USB interface is not connected, linking the the 5V pin on the Pi to the VSYS pin on the Pico.

Software

There are 2 Python files, and one database file:

  • picoreg_gpio.py. The low-level interface code, with a very simple command-line interface.
  • picoreg_qt.py. A PyQt application with full GUI, that uses picoreg_gpio as a low-level interface.
  • rp2040.svd. The System View Description file provided by the Raspberry Pi organisation, that describes all the peripheral registers in XML format

The GUI translates register names (such as TIMER.ALARM3) into physical addresses (such as 0x4005401c) using the SVD file; if it is missing, the GUI won’t work. The software performs best on a Pi 3 or 4; it will work on earlier devices, but is a bit slower.

The files can be found on github here; copy them to any convenient directory, then install PyQt5 on the Pi. The current version at the time of writing is 5.11.3:

sudo apt update
sudo apt install python3-pyqt5

That is all you need to do!

Running the GUI

Start the GUI from a console:

python3 picoreg_qt.py

The application will respond “Loading rp2040.vsd”, then after a short delay, will show the GUI. You may also see some warning messages from PyQt, but these are harmless.

Picoreg initial display

The controls are:

  • Core number. The Pico is a 2-core device, and this allows you to select core 0 or 1. There is no practical difference when using Picoreg to look at I/O, since the peripheral registers are common to both cores.
  • Verbose. This allows you to see the SWD messages that Picoreg is sending, and the responses obtained; useful when there are problems with the link.
  • Connect. This starts communication with the Pico, and verifies the identity of the RP2040 CPU. When the link is broken, it is necessary to re-connect before doing any register accesses.
  • Single. When connected, this button takes a single reading from the highlighted register.
  • Run. When connected, this button starts a continuous cycle of reading from the highlighted register, at 5 readings per second.

Note that this initial display is not Raspberry-Pi-specific; it will run on any PC, even under Windows. This allows you to browse the register database on any convenient machine, though the low-level I/O code is Pi-specific (using the RPI.GPIO module), so the SWD code only runs on a Pi.

The upper display is a tree structure, containing the peripherals, registers, and fields within the registers. By default it is alphabetically sorted, click on the ‘base’ header, to sort by address. The lower display shows general information, and a description of the selected register.

If the SVD database file is damaged or missing, Picoreg will report a syntax error; check that the file is in the same directory as picroreg_qy.py, and restart.

Click on ‘connect’ and if all is well, the following will be displayed.

SWD connection restart
DPIDR 0x0bc12477

The Debug Port Identification Register value shows that Picoreg connection has succeeded. The software makes 3 attempts to do this, and if unsuccessful, the most likely cause is incorrect wiring, or the Pico board not being powered up.

You can navigate around the register display using mouse (single or double-click) or keyboard, as is usual for such tree displays.

If you have a new un-programmed Pico, try navigating to the TIMER.TIMELR register, and hit ‘Run’.

Viewing timer value

You should see a rapidly-changing display of the current timer value; you can then move the cursor to other registers, whilst the data collection is still running; the register value under the cursor will be updated, while previously-accessed register values are static. There is a flashing indication in the bottom-left corner of the window to show that data collection is running.

Debugging an application

Load the following MicroPython application onto the Pico

# Simple test of O/P and ADC
from machine import Pin, Timer, ADC
led = Pin(25, Pin.OUT)
temperature = ADC(4)
timer = Timer()

def blink(timer):
    led.toggle()
    print(temperature.read_u16())

timer.init(freq=2, mode=Timer.PERIODIC, callback=blink)

The on-board LED will flash at 1 Hz, and the console will report the ADC value on the temperature channel.

You can use PicoReg to display the raw ADC value, in the ADC.RESULT register:

The state of the I/O pins is in the SIO.SPIO_IN register:


You can even see activity on the USB link, as the data pins toggle high and low:

Potential issues

Styling

This proved to be a surprising headache; it has been remarkably difficult to get consistent styling. Some of the screenshots were taken on a Pi 3 running Qt 5.11.3, remote-controlled from a PC, using SSH:

export DISPLAY=:0.0
python3 picoreg_qt.py

The others are run directly from a Pi console, and look quite different (and not as nice, in my opinion).

I’ve already used some very limited styling to customise the tree display:

TREE_STYLE    = ("QTreeView{selection-color:#FF0000;} " +
                 "QTreeView{selection-background-color:#FFEEEE;} ")

self.tree.setStyleSheet(TREE_STYLE)

However, after many hours of experimentation, I still haven’t found a way of getting a consistent appearance – feel free to tackle this problem yourself!

Errors

PicoReg uses pure-python code; on the plus side, you get the convenience of not having to install any device-drivers, but on the minus-side the SWD timing is a bit unpredictable, and can be stretched out when a higher-priority task takes control of the CPU. This is particularly noticeable when resizing or repositioning the display window; the software makes 3 attempts to re-establish connection with the Pico, but may time out and disconnect.

This behaviour is harmless, and it is only necessary to click the ‘Connect’ button to re-establish communications.

Console interface

If there are problems running the graphical interface, the low-level drivers in can be run directly from the command line:

python3 picoreg_gpio.py [options] [address]
    options: -v Verbose mode
             -r Repeated access 
    address: hexadecimal address to be monitored

The default address to be monitored is the GPIO input 0xD0000004. The low-level driver can’t access the SVD database, so the address has to be specified in hexadecimal, e.g. to display the timer value:

python3 picoreg_gpio.py -r 0x4005400c
SWD connection restart
DPIDR 0x0bc12477
0x4005400c: 0xefa1b0d0
0x4005400c: 0xefa26046
0x4005400c: 0xefa30fc8
0x4005400c: 0xefa3bf32
0x4005400c: 0xefa46ec3
..and so on, use ctrl-C to exit

How it works

The SWD link has 2 wires: a clock line, and a bi-directional data line. All transactions are initiated by the Pi, the Pico only transmits on the data line when requested. The data is sent LSB (Least Significant Bit) first.

Each message starts with an 8-bit header, and the Pico responds with a 3-bit acknowledgement; if this is OK (bit value 1, 0, 0) then a 32-bit data value is transferred (sent or received), followed by a parity bit.

As there is only one data line, the Pi has to switch its direction from transmit to receive to get the acknowledgement, and then switch back to transmit (if it is a write cycle) or allow the Pico to send 32 bits of data (if it is a read cycle).

An extra complication that isn’t emphasised in the ARM documentation, is that the active edge changes as well; the Pi sends data that is stable on the rising clock edge, but the Pico data is stable on the falling clock edge, as shown in the following oscilloscope trace. [The response from the Pico has been amplified for clarity; normally it is the same magnitude as the outgoing signal from the Pi.]

SWD header transfer; blue trace is clock (1 MHz), red trace is data

The unusual elements of this protocol make it quite tricky to implement in pure Python, and I must admit the above trace wasn’t generated by my code; it comes from OpenOCD, with an FTDI USB adaptor. The following trace is generated by my code, the frequency is a bit lower (approximately 100 kHz) and is less symmetrical.

SWD header, using Python GPIO

The asymmetric waveform isn’t a problem, but a disadvantage of the pure-Python approach is that there are occasions when the CPU is performing other tasks, and it stop driving the SWD interface.

The trace below shows a 300 microsecond pause in the middle of a transfer, and unsurprisingly the Pico doesn’t like this, and returns an error response. Occasional errors like this aren’t a problem, as they are easily handled by the retry mechanism in the PicoReg code.

SWD transfer with gap in transmission

Error handling

When there is an error, the Pico completely stops communicating over the SWD interface, and ignores all subsequent commands; it is necessary to reset the SWD interface, to re-establish communication.

The reset process is deliberately quite complex, involving the transmission of:

  • 8 all-1 bits (FF hex)
  • 16 bytes of a specific polynomial
  • 4 all-0 bits (0 hex)
  • 1 byte to activate the SW-DP interface
  • At least 50 all-1 bits

If there are any errors in the process, the Pico will not respond, which makes debugging a bit tricky. Once reset, the connection has to re-established by, transmitting the ID of the core to be accessed; the RP2040 processor has two cores, that are selected using a value of 0x01002927 or 0x11002927. When browsing the peripheral registers, it doesn’t matter which core is selected; the results will be the same.

Higher-level protocol

You might think that, having established contact with the CPU, it would be easy to read the register values, but in reality the interface has several extra levels of complication; some of these I’ve already documented in a previous project, but if you are seeking more information, you really need to read the Arm Debug Interface Architecture Specification (ADI). At the time of writing the latest version (v6.0) is available here.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Remote logic waveform display using WebGL

Logic analyser display using WebGL

In a previous post, I used hardware-accelerated graphics to display oscilloscope waveforms in a browser; the next step is to plot logic waveforms, as would normally be displayed on a logic analyser.

We could potentially use exactly the same techniques as before, but this becomes quite inefficient when displaying a large number of logic channels. A real-time display needs a fast redraw-rate, which means maximising the work done by the Graphics Processor (GPU), since it can perform parallel calculations much faster than the Central Processor (CPU), even when running on a low-cost PC, tablet or mobile phone.

Looking at the above graphic, you can see that a major computation step is converting each 16-bit sample value into 16 ‘high’ or ‘low’ trace values. The trace has 20,000 samples, so 320,000 such calculations are needed, which is still quite do-able on the CPU, but it is not unusual for the sample count to be a million or more, which can result in a very sluggish display, making it difficult to scroll through the data, when looking for a specific data pattern.

Fortunately WebGL version 2 supports a technique called ‘instanced rendering’, which we can use to generate 16 or more trace values from a single data word. So we can feed the GPU with a block of raw sample data, and it will split that word into bits, and draw the appropriate vectors on the display, applying the necessary scaling, offsets and colouring.

Shader programming

Graphics Processor (GPU) programming is generally known as shader programming because the main computational elements are the ‘vertex’ and ‘fragment’ shaders, that take in a stream of data, and output a stream of pixels.

To optimise the shader operation, all the attributes for all the lines are usually stored in a single buffer, which is passed to the shaders as a single block; this allows the shaders to work at maximum speed, without having to interact with the main CPU.

In addition to this data, there are ‘uniform’ variables, that allow slowly-changing data values to be passed from the CPU to the GPU, for example specifying the scale, offset and colour of the individual traces.

Instanced rendering

Traditionally there is a one-to-one relationship between the incoming vertex attributes, and an object plotted on the screen; for example, the attributes may specify a line in 3 dimensions (2 points, each with z, y and z dimensions) and the shaders will draw that line on the display, by colouring in the necessary pixels.

Instanced rendering (in WebGL v2) allows a single set of vertex attributes to generate several copies (‘instances’) of an object. For example, one set of attributes can generate 100 identical objects, which is much more efficient than using adding 100 sets of attributes to the buffer.

At first sight, it is difficult to understand how this can be of use; the instances share the same coordinate values, so won’t they all be plotted on top of each other? The answer is yes, they will all be at the same location, unless we use the ‘instance number’ provided by the shader to space them out. So if I’m creating 16 instances from one sample value, they will be numbered 0 to 15. This allows me to assign a specific y-value to each instance, ensuring they are stacked in the display without overlap – and I can also use the instance number as a mask to select the appropriate bit-value to be plotted:

// Simplified vertex shader code

// Array with input data
in float a_data;

// Array with y scale & offset values
uniform vec2 u_scoffs[MAX_CHANS];

// Number of data points
uniform int u_npoints;

// Main shader operation
void main(void) {
    // Convert sample value from float to integer
    int d = int(a_data);

    // Get data bit for this instance (this trace)
    bool hi = (d & (1<<gl_InstanceID)) != 0;

    // Get x-position from vertex num, normalised to +/- 1.0
    float x = float(gl_VertexID) * (2.0/float(u_npoints)) - 1.0;

    // Get y-position from scale & offset array
    float y = u_scoffs[gl_InstanceID][1];

    // Adjust y-value if bit is high
    if (hi)
        y += u_scoffs[gl_InstanceID][0];

    // Set xyz position of this point
    gl_Position = vec4(x, y, 0, 1);

    // Set colour of this point (red if high)
    v_colour = hi ? vec4(0.8, 0.2, 0.2, 1.0) : vec4(0.3, 0.3, 0.3, 1.0);
}

A notable feature is that I’ve specified the input data as being 32-bit floating-point numbers, when the incoming data samples are really 16-bit integers. This is because I had difficulty in persuading the shader code to compile with 16 or 32-bit integer inputs; only floating-point numbers seem to be acceptable for the vertex attributes. There is evidence on the Web to suggest that this is a feature of the shader hardware, and not a shortcoming of the compiler, but regardless of the reason, I’m doing the conversion to and from floating-point values, until I can be sure that integer attributes won’t cause problems.

The Javascript command to draw the traces is:

gl.clear(gl.COLOR_BUFFER_BIT);
gl.drawArraysInstanced(gl.LINE_STRIP, 0, trace_data.length, num_traces);

The lines are plotted in ‘strip’ mode, so plotting 20,000 line segments requires 20,001 coordinate points, as the end of one line segment is the same as the start of the next line. This has the consequence of making all the rising and falling edges non-vertical, as seen below.

Zoomed-in analyser trace

This display is highly unconventional; logic analysers normally only draw vertical & horizontal lines. However, I quite like this feature, since it acts as a reminder of the limitations due to the sample rate; if a pulse is shown as a triangle, it only changed state for one sample-period.

Canvasses

We need a way of adding annotation to the WebGL display, for example the trace identification boxes on the left-hand side. It is possible to draw these using WebGL, but the process is a bit complex; it is much easier to just overlay a second canvas with a 2D rendering context, and use that API to superimpose lines & text onto the WebGL image, e.g.

<style>
      .container {
         position: relative;
      }
      #graph_canvas {
         position: relative;
         left: 0; top: 0; z-index: 1;
      }
      #text_canvas {
         position: absolute;
         left: 0; top: 0; z-index: 10;
      }
</style></head>
   <body>
     <div class="container">
         <canvas id="graph_canvas"></canvas>
         <canvas id="text_canvas"></canvas>
     </div>
  </body>

The text canvas will remain transparent until we draw on it, for example:

// Get 2D context
var text_ctx = text_canvas.getContext("2d");

// Clear the text canvas
function text_clear() {
    text_ctx.clearRect(0, 0, text_ctx.canvas.width, text_ctx.canvas.height);
}
// Draw a text box, given bottom-left corner & size
function text_box(x, y, w, h, txt) {
    text_ctx.font = h + 'px sans-serif';
    text_ctx.beginPath();
    text_ctx.textBaseline = "bottom";
    text_ctx.rect(x-1, y-h-1, w+2, h);
    text_ctx.stroke();
    text_ctx.fillText(txt, x, y, w);
}

Zoom

An analyser trace may have a million or more 16-bit samples, so we need the ability to magnify an area of interest. In theory I could extract the relevant section of the data, and feed it to the shader as a smaller buffer, but the main thrust of this project is to let the shader do all the hard work, so I always give it the full set of samples, and a two ‘uniform’ variables are used to specify an offset into the data, and the number of samples to be displayed.

bool hi = (d & (1<<gl_InstanceID)) != 0;
float x = float(gl_VertexID-u_disp_oset) * (2.0/float(u_disp_nsamp)) - 1.0;
float y = u_scoffs[gl_InstanceID][1];
if (hi)
    y += u_scoffs[gl_InstanceID][0];
gl_Position = vec4(x, y, 0, 1);

To calculate the x-position, the VertexID is used, which is an incrementing counter that keeps track of how many samples have been processed; this is combined with disp_oset and disp_nsamp which are the offset (in samples) of the start of the data to be displayed, and the number of samples to be displayed. The resulting x-value is normalised, i.e. scaled to within -1.0 and +1.0 for all the pixels that should be displayed; any pixels outside that range will be suppressed.

The y-position of a ‘0’ bit is derived from the offset array for each channel; for a ‘1’ bit, a scale value is added on. This allows the main Javascript code to control the position & amplitude of each trace, ensuring they don’t overlap.

16-channel display

Number of channels

I’ve imposed an arbitrary maximum of 16 channels (i.e. 16 traces in the display); the GPU is capable of plotting many more than that, but there is a limitation due to the fact that we’re sending floating-point values to the vertex shader; this has a 24-bit mantissa, so if we feed in values with 24 or more bits, there is the risk that the floating-point value will be an approximation of the sample data, which is useless for this application, since the precise value of each bit is important. So take care when increasing the number of channels beyond 16; make sure the data being plotted is the same as the data being gathered.

The web page includes selection boxes to set the number of channels, and zoom level. The Javascript code automatically adjusts the scale & offset values for each channel; you can resize the window to anything convenient, and they will automatically fill the given canvas size.

8 channel display with zoom

Browser compatibility

The code should run on relatively modern browsers that support WebGL v2, regardless of operating system; I have had success running Chrome 87 and Firefox 61 on Windows 7 & 10, Linux and Android.

With regard to Apple products, older browsers such as Safari 10 don’t work at all, and I’ve had no success with an iPhone 8. However, Safari 14 does work if you specifically enable WebGL 2 in the developer ‘experimental features’, and Chrome on a Macbook Pro runs fine.

Running the code

There is a single HTML file webgl_logic.html, which has all the page formatting, Javascript, and GLSL v2 shader code; it is available on Github here.

When loaded into a suitable browser, it will display 16 waveforms, representing 20,000 samples with a 16-bit binary value incrementing from zero. The display can be animated once by clicking ‘single’, or continuously using ‘run’. You can adjust the zoom level at any time by clicking the selection box, or pressing ‘+’ or ‘-‘ on the keyboard. The offset of the zoomed area is modified by clicking on the display area, then using the left and right-arrow keys.

To experiment with larger data sets, you can increase the NSAMP value in the Javascript code. Don’t forget that by default every data sample generates 16 traces, so 1 million samples means the GPU will be generating 16 million vectors; the speed with which it can do this will depend on the specification of your hardware, and size of the display window, but you can get reasonable performance on a typical office PC; an expensive graphics card isn’t necessary.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Remote oscilloscope display using WebGL

Using WebGL for a remote oscilloscope display

In a previous post, I gathered analog data on the Raspberry Pi, and used OpenGL to plot a fast oscilloscope-type graphic. However, in many cases it is inconvenient to have a local display; it would be much better to send the data over the network, to be displayed remotely.

Separating the data-gathering from the data-display has many other advantages. Modern PCs can draw graphics at very high speed, so offloading this task from the Pi makes a lot of sense; it is also possible to view the real-time image on a tablet or mobile phone, removing the need for bulky equipment and cabling.

If you want to display logic waveforms instead of analogue traces, see my other post.

WebGL programming

WebGL works within a Web browser, to provide hardware-accelerated graphics, similar to OpenGL. It is essentially OpenGL ES, with a Javascript wrapper for compatibility with other browser code.

If you are a newcomer to OpenGL, I suggest you read one of the many tutorials on the Web; there are also ‘live’ interactive sites, where you can experiment with code, and immediately see the result.

There are two fundamental approaches to using WebGL graphics; you can treat it as a simple graphics card, and issue sequential commands to individually draw the display items you want, but this approach can be quite slow, as the repated handovers between Javascript and OpenGL take a significant amount of time.

Alternatively, you can pre-prepare the object to be drawn as an array of 3D data points (known as ‘vertex attributes’), then issue a single call to OpenGL to render those points. This is much faster, as the graphics hardware can work at top speed processing the points.

WebGL vertex processing

Each vertex attribute can have multiple dimensions; I’m using 3 dimensions for each point. The x and y values are normalised to +/- 1.0, so the bottom left corner of the graph is -1, -1, and the top right is +1.0, +1.0. The shaders have a very powerful matrix-arithmetic capability, so it is easy to transform these values into a user-defined scale, but I’ve found this can be very confusing, so I’ve kept the normalised defaults.

The z-value is used to indicate which trace is being drawn; the background grid is z = zero, the first trace is z = 1.0, and so on.

The vertex shader must be instructed what to do with the array of vertices. My previous Raspberry Pi code used ‘strip’ mode (GL_LINE_STRIP), which meant that the vertices form a single continuous polyline, the end of one line segment being used as the start of the next. This had the advantage of minimising the amount of data needed, but meant that I had to perform some z-value tricks to handle the transition from the end of one trace to the start of the next.

To avoid that problem, in this project I’ve used ‘line’ mode (gl.LINES) whereby each line is drawn individually, with a pair of vertices to mark its start & end. This almost doubles the amount of data we have to send to the shader, but does simplify the code.

WebGL versions

Modern browsers support can support WebGL version 1.0 and 2.0. The former is based on OpenGL ES 2.0, the latter OpenGL ES 3.0. To add to the confusion, the GLSL shader language is version 1 for OpenGL ES 2, and version 2 for OpenGL ES 3.

Why does this matter? Unfortunately there are significant differences between the two shader language versions; you can’t create a single source-file that works with both. Many of the examples on the Web don’t specify which version they are targeting, but it is quite easy to tell: if some variables are defined using ‘in’ and ‘out’, then the language is GLSL v2, e.g.

// GLSL v2 definitions:
  in vec3 a_coords;
  out vec4 v_colour;

// GLSL v1 definitions:
  attribute vec3 a_coords;
  varying vec4 v_colour;

I have provided both versions, with a boolean Javascript constant (WEBGL2) to switch between them.

Shader programming

The core of our application is the shader code, that processes the attributes. We are using two shaders; one to convert the vertices into ‘fragment’ data, and the other to convert the fragments into pixels. Since the shader programs are quite short, they can be included as ‘template literal’ strings in the Javascript code. These begin and end with a back-tick (instead of a single or double quote character) and can have multiple lines, with the possibility of variable substitution. The vertex shader code is

vert_code = `#version 300 es
    #define MAX_CHANS ${MAX_CHANS}
    in vec3 a_coords;
    out vec4 v_colour;
    uniform vec4 u_colours[MAX_CHANS];
    uniform vec2 u_scoffs[MAX_CHANS];
    vec2 scoff;
    void main(void) {
        int zint = int(a_coords.z);
        scoff = u_scoffs[zint];
        gl_Position = vec4(a_coords.x, a_coords.y*scoff.x + scoff.y, 0, 1);
        v_colour = u_colours[zint];
    }`;

The 300 ES version definition must be on the first line, or it will be rejected.

The strange-looking MAX_CHANS definition takes the value of a Javascript constant, and creates a matching GLSL definition.

The z-value of each vertex is used to determine the drawing colour by indexing into a ‘uniform’ array, i.e. an array that has been preset in advance by the Javascript code. The z-value is also used to determine the y-value scale and offset (‘scoff’), obtained from another uniform array. So the magnitude and offset of each trace can be individually controlled, by changing the constants loaded into the ‘scoff’ array.

The fragment shader doesn’t do much; it just copies the colour of the fragment into the colour of the pixel:

frag_code = `#version 300 es
    precision mediump float;
    in vec4 v_colour;
    out vec4 o_colour;
    void main() {
        o_colour = v_colour;
    }`;

These programs need to be compiled before use, and there is some simple Javascript code to do this, but that raises the question: if there is an error, how is it displayed, since we’re in a browser? I’ve taken an easy way out and thrown an exception:

// Compile a shader
function compile_shader(typ, source) {
    var s = gl.createShader(typ);
    gl.shaderSource(s, source);
    gl.compileShader(s);
    if (!gl.getShaderParameter(s, gl.COMPILE_STATUS))
        throw "Could not compile " +
              (typ==gl.VERTEX_SHADER ? "vertex" : "fragment") +
              " shader:\n\n"+gl.getShaderInfoLog(s);
    return(s);
}

The error messages are sometimes a bit simplistic, and you may need to be a bit creative when working out what is wrong, so it is recommended that you do frequent re-compiles, to quickly identify any issues.

The shader code and Javascript is all in a single HTML file, that can be directly loaded into a browser from the filesystem; on startup, it displays 2 channels of static test data, to prove that WebGL is working.

HTML

The GLSL shader code, and the associated Javascript program, are encapsulated in a single HTML file, webgl_graphics.html. It defines the ‘canvas’ area that will house the the WebGL graphic, and some buttons and selectors for the user to control the display, with a preformatted text line to display status information.

  <canvas id = "graph_canvas"></canvas>
  <button id="single_btn"   onclick="run_single(this)">Single</button>
  <button id="run_stop_btn" onclick="run_stop(this)"  >Run</button>
  <select id="sel_nchans" onchange="sel_nchans()"></select>
  <select id="sel_srce"   onchange="sel_srce()"  ></select>
  <pre id="status" style="font-size: 14px; margin: 8px"></pre>

I’ve kept the HTML simple, since the primary focus of this project is to demonstrate the high-speed graphics; there is plenty of scope for adding decoration, style sheets etc. to make a better-looking display.

Data acquisition

The Javascript program must repeatedly load in the data to be displayed, to create an animated graphic. There are security safeguards on the extent to which a browser can load files from a user’s hard disk; it is much easier if the files are loaded from a Web server. So I’m running a Python-based Web server (‘cherrypy’) to provide the HTML & data files. This works equally well on a Windows or Linux PC, as on a Raspberry Pi, so it is possible to test the WebGL rendering on any convenient machine, using simulated data.

In a previous post, I used DMA to stream data in from an analog-to-digital converter (ADC); the output is comma-delimited (CSV) data, sent to a Linux first-in-first-out (FIFO) buffer. There is a single text line, containing floating-point text strings, with interleaved channels, so if there are 2 channels, and channel 1 has a value floating around zero, channel 2 has a value around 1, then the data might look like:

0.000,1.000,0.001,1.000,0.000,1.001 ..and so on until..<LF>

There is an easy way to load in this data using Javascript:

    // Decode CSV string into floating-point array
    function csv_decode(s) {
        data = s.trim().split(',');
        return data.map(x => parseFloat(x));
    }

For simplicity, it is assumed that there is no whitespace padding between the values; if the file has been generated by an external application (e.g. spreadsheet) some extra string processing will be needed.

The total number of data values is split between the channels, so if there are 1000 samples and 2 channels, each channel has 500 samples.

Web server

The cherrypy Web server is Python-based, and is easy to install. We need to provide a small Python configuration program, which has a class definition, with methods for the resources that are provided, for example:

# Oscilloscope-type ADC data display
class Grapher(object):

    # Index: show oscilloscope display
    @cherrypy.expose
    def index(self):
        return cherrypy.lib.static.serve_file(directory + "/webgl_graph.html")

if __name__ == '__main__':
    cherrypy.config.update({"server.socket_port": portnum, "server.socket_host": "0.0.0.0"})
    conf = {
        '/': {
            'tools.staticdir.root': os.path.abspath(os.getcwd())
        }
    }
    cherrypy.quickstart(Grapher(), '/', conf)

The default index page is defined as webgl_graph.html in the current directory, and the socket_host definition allows that page to be accessed by any system on the network.

When this page is loaded from the Web server, it shows a simulated 2-channel display, and the status line reports the address and port number of the server.

Graph page loaded from Web server

Hitting the ‘single’ button will load 1000 simulated samples in 2 channels from /sim. The server code is:

    @cherrypy.expose
    def sim(self):
        global nresults
        cherrypy.response.headers['Content-Type'] = 'text/plain'
        data = npoints * [0]
        for c in range(0, npoints, nchans):
            data[c] = (math.sin((nresults*2 + c) / 20.0) + 1.2) * ymax / 4.0
            if nchans > 1:
                data[c+1] = (math.cos((nresults*2 + c) / 200.0) + 0.8) * data[c]
                data[c+1] += random.random() / 4.0
        nresults += 1
        rsp = ",".join([("%1.3f" % d) for d in data])
        return rsp

The lower channel is a pure sine wave, the upper is amplitude-modulated with added random noise, resulting in the following display:

Display of simulated data

The above tests will work with the web server & client running on any Linux or Windows PC. Loading real-time data from a hardware source (such as an ADC) is done using a Linux FIFO; the Javascript code is the same as reading from a file, but read cycle will produce new data each time:

    # FIFO data source
    @cherrypy.expose
    def fifo(self):
        cherrypy.response.headers['Content-Type'] = 'text/plain'
        try:
            f = open(fifo_name, "r")
            rsp = f.readline()
            f.close()
        except:
            rsp = "No data"
        return rsp

Running the code

Install the cherrypy server using:

# For Python 2
pip install cherrypy
# ..or for Python 3 on the Raspberry Pi
pip3 install cherrypy

Fetch the files adc_server.py and webgl_graph.html from github and load them into any spare directory. Run the server file using Python or Python3, and point your browser at the port 8080 of the server, e.g. 192.168.1.197:8080.

If all is well you should see the displays I’ve given above. If access is denied, you may have a firewall issue; if the HTML content is displayed, but the WebGL content isn’t, then check that WebGL is available on the browser, and perhaps set the Javascript WEBGL2 variable false, in order to try using version 1.

On the Raspberry Pi, my previous ADC streaming project can be used to feed real-time data into the FIFO, for example using the following command-line in one console:

# Stream 1000 samples from 2 ADC channels, 10000 samples/sec
sudo rpi_adc_stream -r 10000 -s /tmp/adc.fifo -n 1000 -i 2

Then run the Web server in a second console. If ADC channel 1 is fed with a modulated 1 kHz signal, and channel 2 is the 100 Hz sine-wave modulation, the display might look like:

Display of 2 ADC channels

The data transfer method (continuously re-reading a file on the server) isn’t the most efficient, and does impose an upper limit on the rate at which data can be fetched from the Raspberry Pi. It would probably be better to use an alternative technique such as WebSockets, as I’ve previously explored here.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Fast oscilloscope display using OpenGL on the Raspberry Pi

Pi 4 OpenGL oscilloscope display, 1000 samples, 40k sample/sec

In a previous post, I was reading in a continuous stream of data from an ADC, but found it difficult to display; what I wanted was a real-time animated graph, similar to an oscilloscope display.

A quick search on the Internet suggested that the best way to achieve a good update speed (at least 30 updates per second) is to use the Videocore graphics processing unit (GPU), which is included on all models of the Raspberry Pi.

A high-speed display is useful for spotting noise & glitches in fast-changing data, and allows for the creation of high-resolution displays; for example, the above 10-channel display can be resized into a 1024 x 768 pixel window, whilst retaining a frame-rate around 56 FPS, which is more than adequate.

There are various ways the Videocore GPU can be programmed; unfortunately many of them have complex dependencies, making them difficult to install and use. I’m using FreeGLUT; a simple open-source OpenGL Utility Toolkit (GLUT), that can easily be installed from the latest OS distribution.

There are a very large number of OpenGL tutorials on the Web, and if you are thinking of writing your own code, I strongly recommend you take a look at them; the GPU hardware imposes unique constraints on the programming environment, so although some of the OpenGL code seems to be similar to conventional C programs, in reality there a major differences.

If you’d prefer to have a remote Web-based display, see my WebGL display project.

Shader operation

The process of programming the GPU is generally known as ‘shader programming’, as the two key components are the vertex & fragment shaders.

Put very simply, the vertex shader receives a constant stream of data (‘attributes’) describing the objects to be drawn; this is combined with some static values (‘uniforms’), under the control of the shader program, to produce a stream of pixel information (‘fragments’).

The stream of fragments are fed to the fragment shader, where they are combined with some more ‘uniforms’, under control of the fragment program, to produce the final image on the screen.

In my graphing application, the vertex attributes are a list of points to be plotted; the hardware has native support for 3-dimensional arrays, so I feed in a stream of x, y & z vertex coordinates. You may wonder why I bother with a z coordinate, since the graph is 2-dimensional, but it comes in handy to identify the individual traces. The first trace has a z-value of 1, the next is 2 and so on; this information is combined with some constant ‘uniform’ data, to control the position, scale and colour of each trace. In this way, one large block of xyz data can contain all the information for plotting several traces, without having to stop & restart the shader for each trace.

OpenGL versions

The OpenGL specification has changed a lot over the years, and with some very significant differences in the programming. To add to the complication, there are different version numbers for the OpenGL Shading Language (GLSL) and the OpenGL ES Shading Language (also known as GLSL); the latter is a somewhat reduced-functionality version designed to run on simpler hardware.

My code works on OpenGL v2.1 or OpenGLES v3.0, which is available as standard on the ‘Buster’ software distribution. In terms of hardware, the code works well on v3 and v4 boards, but is very slow on earlier versions, or the Pi Zero.

Shader programming

Normally it is necessary to write 3 separate programs; the main C program which is compiled using gcc as usual, and the two GLSL shader programs. These are written in a C-like syntax, but are compiled and linked using the OpenGL tools.

Rather than having 3 inter-dependant files, I’ve included the shader code as strings in the main C program; for example, the first 4 lines of the ES vertex shader code are:

#version 300 es
precision mediump float;
in vec3 coord3d;
flat out vec4 f_color;

These are converted to a string, so they can be included in the main program:

#define SL(s) s "\n"
char frag_shader[] =
    SL("#version 300 es")
    SL("precision mediump float;")
    SL("in vec3 coord3d;")
    SL("flat out vec4 f_color;")
    ..and so on until..
    SL("}");

An additional advantage of this approach is that defined constants can be shared between the main program and shader code. For example, the main code defines a constant with the maximum number of traces to be drawn:

#define MAX_TRACES 17

This definition can be made available in the shader code by using a macro:

// In the main program..
#define VALSTR(s) #s
#define SL_DEF(s) "#define " #s " " VALSTR(s) "\n"

// In the GLSL code string..
SL_DEF(MAX_TRACES)

The rest of the vertex shader program string looks like this:

    SL_DEF(MAX_TRACES)
    SL("uniform vec4 u_colours[MAX_TRACES];")
    SL("uniform vec2 u_scoffs[MAX_TRACES];")
    SL("vec2 scoff;")
    SL("int zint;")
    SL("bool zen;")
    SL("void main(void) {")
    SL("    zint = int(coord3d.z);")
    SL("    zen = fract(coord3d.z) > 0.0;")
    SL("    scoff = u_scoffs[zint];")
    SL("    gl_Position = vec4(coord3d.x, coord3d.y*scoff.x + scoff.y, 0, 1);\n")
    SL("    f_color = zen && zint<MAX_TRACES ? u_colours[zint] : vec4(0, 0, 0, 0);")
    SL("};");

You can see how the integer z-value is used to select the correct scale and offset (‘scoff’) value for each trace data point. The fractional part is used to enable or disable drawing (by setting the alpha value to 1 or 0), allowing the movement between one trace and another without being visible.

The fragment shader doesn’t do much; it just copies the colour value:

char frag_shader[] =
    SL("#version 300 es")
    SL("precision mediump float;")
    SL("flat in vec4 f_color;")
    SL("layout(location = 0) out vec4 fragColor;")
    SL("void main(void) {")
    SL("    fragColor = f_color;")
    SL("}");

The create_shader() function in the main program compiles this code; if there are any problems, an report is produced which goes some way towards identifying the issue, though the error reporting isn’t quite as robust and effective as one would expect from a modern C compiler.

Main program

Pi 3 OpenGL oscilloscope display, 1000 samples

Aside from compiling the shader code, the primary function of the main program is to prepare the list of coordnates that are to be fed into the vertex shader. The coordinates are loaded in to a single Vertex Buffer Object (VBO), so that when the shader operation begins, it can access this data at maximum speed.

The shader uses ‘normalised’ coordinates, with the bottom-left corner having the x,y value of -1, -1, and the top right 1, 1, but it is easy to use any other coordinate values, due to the strong support for matrix arithmetic.

First the background grid is drawn using individual lines. Drawing a single line in isolation requires plotting 4 points; a movement to the starting point (with alpha value zero), then setting the alpha value to 1 to start plotting, movement to the end point, then setting the alpha value back to 0. This is a bit inefficient when plotting joined-up lines, but the grid is quite simple, so this doesn’t add much to the overall plotting time.

#define ZEN(z)          ((z) + 0.1)

typedef struct {
    GLfloat x;
    GLfloat y;
    GLfloat z;
} POINT;

// Set x, y and z values for single point
void set_point(POINT *pp, float x, float y, float z)
{
    pp->x = x;
    pp->y = y;
    pp->z = z;
}

// Move, then draw line between 2 points
int move_draw_line(POINT *p, float x1, float y1, float x2, float y2, int z)
{
    set_point(p++, x1, y1, z);
    set_point(p++, x1, y1, ZEN(z));
    set_point(p++, x2, y2, ZEN(z));
    set_point(p++, x2, y2, z);
    return(4);
}

Building the software

The FreGLUT package can be installed from the latest (Buster) distro using:

sudo apt update
sudo apt install freeglut3-dev libglew-dev

There is a single C source file rpi_opengl_graph.c, that is available on Github here. The file can be compiled using:

gcc rpi_opengl_graph.c -Wall -lm -lglut -lGLEW -lGL -o rpi_opengl_graph

The top of the file has some definitions that you might like to change before compiling:

  • LINE_WIDTH: width of plot line (2)
  • GRID_DIVS: the number of x and y divisions in the grid (10,8)
  • MAX_VALS: the maximum number of values that can be displayed (10000)
  • trace_colours: the normalised colour of the grid, and the channels
  • trace_scoffs: the scale & offset values for each trace (set by init_scale_offset)

The normalised colours have floating-point values of 0.0 to 1.0 for red, green and blue; I have provided a COLR macro that normalises the conventional hex colour values that are used on the Web.

There are also some command-line options:

-i <num>        Number of input channels: default 2, maximum 16
-n <num>        Number of data values per block: default 1000
-s <name>       Name of input FIFO: default /tmp/adc.fifo
-v              Verbose display for debugging
-y <num>        Maximum y-value for each trace: default 2.0

-display  <val> Standard X display selector
-geometry <val> Standard X display resolution and position

It is important to realise that the given number of data values is split between the number of channels, so if there are 1000 samples and 4 channels, each channel has 250 samples.

The data for the traces is read from a Linux FIFO (as described in a previous post on ADC streaming), in the form of comma-delimited floating-point values. Each line of text represents one set of data for all the channels, so for example there may be 1000 values from 2 channels one line, in the order ch1, ch2, ch1, ch2, etc.. The maximum number of values per line is currently defined in the code as 10,000 and the maximum number of display channels (i.e. oscilloscope traces) is currently 16, though both of these could be increased.

Running the application

The code has been tested on Pi v3 and v4 hardware; it will run on a Pi Zero or 1, but has a really low frame-rate, so isn’t really usable on that platform.

If no data is available (i.e. the Linux FIFO doesn’t exist) the application will plot some static sample traces.

./rpi_opengl_graph
# ..or to specify the display if running remotely..
./rpi_opengl_graph -display :0.0

By default, 1000 points in two traces are plotted in a 300 x 300 pixel window; note the Frames Per Second (FPS) value in the title bar.

You can resize the window by specifying width & height in the standard X command-line format, e.g. for a 640 x 480 pixel window:

./rpi_opengl_graph -geometry 640x480

There is a simple console interface with 2 case-insensitive commands: ‘q’ to quit the application, and ‘p’ (or space-bar) to pause or resume the display updates.

My rpi_adc_stream application from a previous post can be used to supply the data, for example a single channel with 1000 points at 30k sample/s:

In one console:
   sudo ../dma/rpi_adc_stream -r 30000 -s /tmp/adc.fifo -i 1 -n 1000
In a second console:
  ./rpi_opengl_graph -geometry 1024x768 -i 1 -n 1000

The data source has to be run first, otherwise it won’t be detected by the graph utility.

If you don’t have access to this ADC, here is a simple Python program that generates 1000 samples in 2 channels, 50 times a second.

# Simple simulation of ADC feeding Linux FIFO

import math, time, os, signal, sys, random

fifo_name = "/tmp/adc.fifo"
ymax = 2.0
delay = 0.02
nchans = 2
npoints = 1000
running = True
fifo_fd = None

def remove(fname):
    if os.path.exists(fname):
        os.remove(fname)

def shutdown(sig=None, frame=None):
    print("\nClosing..")
    if fifo_fd:
        f.close()
    remove(fifo_name)
    sys.exit(0)

print("%u samples, %u channels, %3.0f S/s" % (npoints, nchans, npoints/delay))
remove(fifo_name)
data = npoints * [0]
n = 0;
signal.signal(signal.SIGINT, shutdown)
os.mkfifo(fifo_name)
try:
    f = open(fifo_name, "w")
except:
    running = False
while running:
    for c in range(0, npoints, nchans):
        data[c] = (math.sin((n*2 + c) / 10.0) + 1.2) * ymax / 4.0
        if nchans > 1:
            data[c+1] = (math.cos((n*2 + c) / 100.0) + 0.8) * data[c]
            data[c+1] += random.random() / 4.0
    n += 1
    s = ",".join([("%1.3f" % d) for d in data])
    try:
        f.write(s + "\n")
        f.flush()
    except:
        running = False
    sys.stdout.write('.')
    sys.stdout.flush()
    time.sleep(delay)
shutdown()

Run this script in one console, then the display application in another console, specifying a suitable window size, e.g.

./rpi_opengl_graph -geometry 640x480

The display shows two traces, one with added noise to illustrate the fast update rate.

rpi_opengl_graph display with adc_sim input

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.

Streaming analog data from a Raspberry Pi

Analog to Digital Converter (ADC) driver software usually captures a single block of samples; if a larger dataset (or continuous stream) is required, it can be very difficult to merge multiple blocks without leaving any gaps.

In this post I describe a utility that runs from the command-line, and performs continuous data capture to a Linux First In First Out (FIFO) buffer, that can be accessed by another Pi program, written in any language. The software also captures a microsecond time-stamp for each data block, that can be used to validate the timing, making sure there are no gaps.

To achieve this performance, I’m heavily reliant on Direct Memory Access (DMA) as described in a previous post; if you are a newcomer to the technique, I suggest you experiment with that code first, since it is much simpler.

ADC hardware

AB Electronics ADC DAC Zero on a Pi 3B

For this demonstration I’m using the ‘ADC-DAC Pi Zero’ from AB Electronics; despite the name, it is compatible with the full range of RPi boards. It uses an MCP3202 12-bit ADC with 2 analog inputs, measuring 0 to 3.3 volts at up to 60K samples per second. It also has 2 analog outputs from an MCP4822 DAC; I had planned to include these in the current software, but ran out of time – they may well feature in a future post.

As is common with mid-range ADC boards, it uses the Serial Peripheral Interface zero (SPI0) for data transfers. It has a 4-wire interface (plus ground) comprising transmit & receive data, a clock line, and Chip Enable zero (CE0).

ADC serial protocol

To get a sample from the ADC, it is necessary to drive the Chip Enable (CE) line low, clock in a command, clock out the data, and drive CE high. The SPI clock signal isn’t just used for data transmission, it also controls the internal logic of the ADC, so there is a limit on how fast it can be toggled; the data sheet is a bit vague on this subject (only specifying a limit of 1.8 MHz with 5V supply, and 0.9 MHz with 2.7V), so I’ve used a conservative value of 1 MHz. The data format is a 4-bit command, a null bit, and 12-bit response, making an awkward size of 17 bits. My software ignores the least-significant bit, so uses more convenient 16-bit transfers, with a maximum rate of 60K samples/sec. The command and response format is:

COMMAND:
  Start bit:                 1
  Single-ended mode          1
  Channel number             0 or 1
  M.S. bit first             1
  Dummy bits for response    0 0 0 0 0 0 0 0 0 0 0 0

RESPONSE:
  Undefined bits (floating)  x x x x
  Null bit                   0
  Data bits 11 to 0          x x x x x x x x x x x x

So the command for channel 0 is D0 hex, channel 1 is F0 hex. The following oscilloscope trace shows 2 transfers at 50,000 samples per second; you can see that the CE line goes low one clock cycle before the start of the transaction, and goes high on the last clock edge. This is because I’ve used the automatic-CE capability of the SPI interface, which provides very accurate timings.

ADC readings on a Pi Zero

The voltage is calculated by taking the value from the lower 11 bits, multiplying by the reference voltage, and dividing by the full-scale value, so 0x2AC * 3.3 / 2048 = 1.102 volts.

Raspberry Pi SPI

The SPI controller has the following 32-bit registers:

  • CS (control & status): configuration settings, and status information
  • FIFO (first-in-first-out): 16-word buffers for transmit & receive data
  • CLK (clock divisor): set the clock rate of the SPI interface
  • DLEN (data length): the transmit/receive length in bytes (see below)
  • LTOH (LOSSI output hold delay): not used
  • DC (DMA configuration): set the trigger levels for DMA data requests

The bit fields within these registers are described in the BCM2835 ARM Peripherals document available here, and the errata here; I’ll be concentrating on aspects that aren’t fully described in that document.

CS bits 0 & 1: select chip enable. The terms Chip Enable (CE) and Chip Select (CS) are used interchangeably to describe the hardware line that enables communication with the ADC or DAC chip, but CS is confusing as there is a CS (Control & Status) register as well, so I prefer to use CE. Bits 0 & 1 of that register control which CE line is used; the ADC is on CE0, and the DAC is on CE1.

CS bits 4 & 5: Tx and Rx FIFO clear. When debugging, it is quite common for there to be data left in the FIFOs, so it is a good idea to clear the FIFOs on startup.

CS bit 7: transfer active. When in DMA mode, set this bit to enable the SPI interface for data transfers. The transfer will start when there is data to be transmitted in the FIFO; after the specified length of data has been transferred, this bit will be cleared.

CS bit 8: DMAEN. This does not enable DMA, it just configures the SPI interface to be more DMA-friendly, as I’ll describe below. It isn’t necessary to use DMA when DMAEN is set; when trying to understand how this mode works, I used simple polled code.

CS bit 11: automatically deassert chip select. When set, the SPI interface can automatically frame each 16-bit transfer with the CE line; setting it low before the start, and high at the end, as shown in the oscilloscope trace above.

There is a confusing interaction between Transfer Active bit (TA), and the Data Length register (DLEN). Basically there are 2 very different ways of setting the data length at the start of a transfer:

  1. If TA is clear, the length (in bytes) must first be set in the DLEN register. Then TA is set, and the transaction will start when there is data in the transmit FIFO.
  2. If TA is set, the DLEN register is ignored. The length (in bytes) must first be written into the FIFO, together with some of the CS register settings, then the transfer will start when data is written to the transmit FIFO.

I generally use the first method, but either is workable providing you have a clear idea of the whether the transfer is active or not – don’t forget that it is automatically cleared when the length becomes zero.

An additional complication comes from the fact that DMA transfers and FIFO registers are 4 bytes wide, but we’re only doing 2-byte transfers to the ADC. The remaining 2 bytes aren’t automatically discarded; they stay in the FIFO to be used by the next transaction. It is possible to use this fact, and economise on memory by having 2 transmit words in one 4-byte memory location, but this can get really confusing (particularly with method 2) so I use a clear-FIFO command in each transfer to remove the extra. This means that the transmit & receive data only uses 16 bits in every 32-bit word.

SPI, PWM and DMA initialisation

To initialise the SPI & PWM controllers, we need to know what master clock frequency they are getting, in order to calculate the divisor values that’ll produce the required output frequencies. The frequencies (in MHz) depend on which Pi hardware version we’re using:

Version   PWM   SPI   REG_BASE     DMA channels used by OS
ZeroW     250   400   0x20000000   0, 2, 4, 6
Zero2     250   250   0x3F000000   0, 2, 3, 4, 6
1         250   250   0x20000000   0, 2, 4, 6
2         250   250   0x3F000000   0, 2, 4, 6
3         250   250   0x3F000000   0, 2, 4, 6
4 or 400  375   200   0xFE000000   2, 11, 12, 13, 14

The channel usage was determined by running my rpi_disp_dma utility, and the PWM & SPI clock values were checked using the rpi_adc_stream application in test mode, as described later in this post.

Sadly, this table isn’t telling the whole truth with regard to the values for SPI master clock. These are the values in normal operation, however if the CPU temperature is too high, its clock frequency is scaled back, and so is the SPI master clock. Mercifully the PWM frequency remains constant, so the sample rate of our code is unaffected, but as you’ll see from the oscilloscope trace above, if we’re running at 50K samples per second, there isn’t a lot of spare time, so if the SPI clock slows down, the transfers could fail to complete, causing garbage data and/or DMA timeouts.

This will only be a problem if you’re working close to the maximum sample rate, and if necessary, there are various workarounds you can use; for example, increase the SPI frequency, since the ADC does seem to tolerate values greater then 1 MHz, or fix the CPU clock frequency by changing the settings in /boot/config.txt.

The table also includes a list of active DMA channels, obtained by my rpi_disp_dma utility, as described later. Based on this result, I generally use channels 7, 8 & 9 in my code but of course there is no guarantee these will remain unused in any future OS release. If in doubt, run the utility for yourself.

Using DMA

The only way of getting ADC samples at accurately-controlled intervals is to use Direct Memory Access (DMA). Once set up, this acts completely independently of the CPU, transferring data to & from the SPI interface. We probably don’t want to run the ADC flat out, so need a method of triggering it after a specific time delay. In the absence of any hardware timers (surprisingly, the RPi CPU doesn’t have any conventional counter/timers) we’re using the Pulse Width Modulation (PWM) interface for timed triggering (which is generally known as ‘pacing’).

So we need to set up 3 DMA channels; one for transmit data, one for receive data, and one for pacing. I’ve tried to make the process of doing this as simple as possible, with a very clean structure. The DMA Control Blocks (CBs) and data must be in un-cached memory, as described in my previous post, so I’ve simplified the program steps to:

  1. Prepare the CBs and data in user memory.
  2. Copy the CBs and data across to uncached memory
  3. Start the DMA controllers
  4. Start the DMA pacing

To keep the organisation of the variables very clear, they are in a structure that can be overlaid onto both the user and the uncached memory. Here is the code for steps 1 and 2:

typedef struct {
    DMA_CB cbs[NUM_CBS];
    uint32_t samp_size, pwm_val, adc_csd, txd[2];
    volatile uint32_t usecs[2], states[2], rxd1[MAX_SAMPS], rxd2[MAX_SAMPS];
} ADC_DMA_DATA;

void adc_dma_init(MEM_MAP *mp, int nsamp, int single)
{
    ADC_DMA_DATA *dp=mp->virt;
    ADC_DMA_DATA dma_data = {
        .samp_size = 2, .pwm_val = pwm_range, .txd={0xd0, in_chans>1 ? 0xf0 : 0xd0},
        .adc_csd = SPI_TFR_ACT | SPI_AUTO_CS | SPI_DMA_EN | SPI_FIFO_CLR | ADC_CE_NUM,
        .usecs = {0, 0}, .states = {0, 0}, .rxd1 = {0}, .rxd2 = {0},
        .cbs = {
        // Rx input: read data from usec clock and SPI, into 2 ping-pong buffers
            {SPI_RX_TI, REG(usec_regs, USEC_TIME), MEM(mp, &dp->usecs[0]),  4, 0, CBS(1), 0}, // 0
            {SPI_RX_TI, REG(spi_regs, SPI_FIFO),   MEM(mp, dp->rxd1), nsamp*4, 0, CBS(2), 0}, // 1
            {SPI_RX_TI, REG(spi_regs, SPI_CS),     MEM(mp, &dp->states[0]), 4, 0, CBS(3), 0}, // 2
            {SPI_RX_TI, REG(usec_regs, USEC_TIME), MEM(mp, &dp->usecs[1]),  4, 0, CBS(4), 0}, // 3
            {SPI_RX_TI, REG(spi_regs, SPI_FIFO),   MEM(mp, dp->rxd2), nsamp*4, 0, CBS(5), 0}, // 4
            {SPI_RX_TI, REG(spi_regs, SPI_CS),     MEM(mp, &dp->states[1]), 4, 0, CBS(0), 0}, // 5
        // Tx output: 2 data writes to SPI for chan 0 & 1, or both chan 0
            {SPI_TX_TI, MEM(mp, dp->txd),          REG(spi_regs, SPI_FIFO), 8, 0, CBS(6), 0}, // 6
        // PWM ADC trigger: wait for PWM, set sample length, trigger SPI
            {PWM_TI,    MEM(mp, &dp->pwm_val),     REG(pwm_regs, PWM_FIF1), 4, 0, CBS(8), 0}, // 7
            {PWM_TI,    MEM(mp, &dp->samp_size),   REG(spi_regs, SPI_DLEN), 4, 0, CBS(9), 0}, // 8
            {PWM_TI,    MEM(mp, &dp->adc_csd),     REG(spi_regs, SPI_CS),   4, 0, CBS(7), 0}, // 9
        }
    };
    if (single)                                 // If single-shot, stop after first Rx block
        dma_data.cbs[2].next_cb = 0;
    memcpy(dp, &dma_data, sizeof(dma_data));    // Copy DMA data into uncached memory

The initialised values are assembled in dma_data, then copied into uncached memory at dp. The control blocks are at the start of the structure, to be sure they’re aligned to the nearest 32-byte boundary. Then there is the data to be transmitted, and some storage for the timestamps, that is marked as ‘volatile’ since it will be modified by DMA.

The format of a control block is:

  • Transfer Information (TI): address increment, trigger signal (data request), etc.
  • Source address
  • Destination address
  • Transfer length (in bytes)
  • Stride: skip unused values (not used)
  • Next Control Block: zero if last block
  • Debug: additional diagnostics

Looking at the first control block (CB 0) in detail:

#define SPI_RX_TI       (DMA_SRCE_DREQ | (DMA_SPI_RX_DREQ << 16) | DMA_WAIT_RESP | DMA_CB_DEST_INC)

{SPI_RX_TI, REG(usec_regs, USEC_TIME), MEM(mp, &dp->usecs[0]),  4, 0, CBS(1), 0}, // 0

Transfer info:       wait for data request from SPI receiver
Source address:      microsecond counter register
Destination address: memory
Transfer length:     4 bytes
Stride:              not used
Next control block:  CB 1
Debug:               not used

The source and destination addresses are more complex than usual, since they must be bus address values, created using a macro that takes a pointer to a block of mapped memory, and the offset within that block.

For this application, we need to keep re-transmitting the same bytes to request the data, but reception is in the form of long blocks of data; I’ve specified 2 blocks, that form a ‘ping-pong’ buffer, with the microsecond timestamp being stored at the start of each block, and a completion flag at the end. Ideally, the user code will be emptying one buffer while the other is being filled by DMA, but if the code is too slow, the overrun condition can be detected, and the data discarded.

Starting DMA

When we start the 3 DMA channels, they will all remain idle until the condition specified in TI is fulfilled:

    init_pwm(PWM_FREQ, pwm_range, PWM_VALUE);   // Initialise PWM, with DMA
    *REG32(pwm_regs, PWM_DMAC) = PWM_DMAC_ENAB | PWM_ENAB;
    *REG32(spi_regs, SPI_DC) = (8<<24) | (1<<16) | (8<<8) | 1;  // Set DMA priorities
    *REG32(spi_regs, SPI_CS) = SPI_FIFO_CLR;                    // Clear SPI FIFOs
    start_dma(mp, DMA_CHAN_C, &dp->cbs[6], 0);  // Start SPI Tx DMA
    start_dma(mp, DMA_CHAN_B, &dp->cbs[0], 0);  // Start SPI Rx DMA
    start_dma(mp, DMA_CHAN_A, &dp->cbs[7], 0);  // Start PWM DMA, for SPI trigger

To set the data-gathering in motion, we just enable PWM.

// Start ADC data acquisition
void adc_stream_start(void)
{
    start_pwm();
}

This sends a data request, which is fulfilled by DMA channel A (CB7), and nothing else happens; the SPI interface remains idle. However, on the next PWM timeout, CBS 8 & 9 are executed, which loads a value of 2 into the DLEN register, and sets the SPI transfer active. This triggers a request for Tx data from DMA channel C (CB6); when the first 2 bytes have been transferred, DMA channel B is triggered to store the microsecond timestamp (CB0), and the data (CB1). Since the transfer is no longer active, the DMA channels will all wait for their trigger signals, and the cycle will repeat, except that CB1 is storing the incoming ADC data in a single block.

Once the required number of samples have been received, CB2 sets a flag to indicate the buffer is full, then CB4 starts filling the other buffer.

Compiling and running the code

The C source code for the streaming application rpi_adc_stream and the DMA detection application rpi_disp_dma are on github here. You’ll also need the utility files rpi_dma_util.c and rpi_dma_util.h from the same directory.

Edit the top of rpi_dma_util.h to indicate which hardware version you are using (0 to 4, or 2 for the Zero2). The applications are compiled using a minimal command line:

gcc -Wall -o rpi_disp_dma rpi_disp_dma.c rpi_dma_utils.c
gcc -Wall -o rpi_adc_stream rpi_adc_stream.c rpi_dma_utils.c

You can add extra compiler options such as -O2 for code optimisation, but this isn’t really necessary.

Both of the utilities have to be run using ‘sudo’, as they require root privileges.

DMA channel scan

The DMA scan is run as follows:

Command:
  sudo ./rpi_disp_dma
Response (Pi ZeroW):
  DMA channels in use: 0 2 4 6

There is only one command line option, ‘-v’ for verbose operation, which prints out all the DMA register values.

By default, DMA_CHAN_A, B and C are defined in rpi_dma_utils.h as channels 7, 8 and 9, so should not conflict with those used by the OS.

ADC streaming

There are various command-line options, but it is suggested that you start by using the -t option to check the SPI and PWM interfaces are running correctly:

Command:
  sudo ./rpi_adc_stream -t
Response:
  RPi ADC streamer v0.20
  VC mem handle 5, phys 0xde50f000, virt 0xb6f5f000
  Testing 1.000 MHz SPI frequency:   1.000 MHz
  Testing   100 Hz  PWM frequency: 100.000 Hz
  Closing

A small error in the reading (e.g. 100.010 Hz) doesn’t indicate a fault, it is just due to the limited resolution of the timer that is making the measurement.

The command-line options are case-insensitive:

-F <num>    Output format, default 0. Set to 1 to enable microsecond timestamps.
-I <num>    Number of input channels, default 1. Set to 2 if both channels required.
-L          Lockstep mode. Only output streaming data when the Linux FIFO is empty.
-N <num>    Number of samples per block, default 1.
-R <num>    Sample rate, in samples per second, default 100.
-S <name>   Enable streaming mode, using the given FIFO name.
-T          Test mode
-V          Verbose mode. Enable hexadecimal data display.

Running the utility with no arguments will perform a single conversion on the first ADC channel (marked ‘IN1’):

Command:
  sudo ./rpi_adc_stream
Response:
  RPi ADC streamer v0.20
  VC mem handle 5, phys 0xde50f000, virt 0xb6fd1000
  SPI frequency 1000000 Hz
  ADC value 686 = 1.105V
  Closing

If the input isn’t connected to anything, you will get a random result; either short-circuit the input pins, or connect them to a known voltage source (less than 3.3V) to get a proper reading.

To stream the voltage values, it is necessary to specify the number of samples per block, the sample rate, and a Linux FIFO name; you can choose (almost) any name you like, but it is recommended to put the FIFO in the /tmp directory, e.g.

Command:
  sudo ./rpi_adc_stream -n 10 -r 20 -s /tmp/adc.fifo
Response:
  RPi ADC streamer v0.20
  VC mem handle 5, phys 0xde50f000, virt 0xb6f7e000
  Created FIFO '/tmp/adc.fifo'
  Streaming 10 samples per block at 20 S/s

The software is now waiting for another application to open the Linux FIFO, before it will start streaming. The FIFO is very similar to a conventional file, so some of the standard file utilities can be used, e.g. ‘cat’ to print the file. Open a second Linux console, and in it type:

Command:
  cat /tmp/adc.fifo
Response (with 1.1V on ADC 'IN1'):
  1.102,1.104,1.104,1.102,1.104,1.104,1.110,1.104,1.102,1.102
  1.105,1.104,1.104,1.104,1.105,1.102,1.102,1.104,1.104,1.104
  ..and so on, at 2 blocks per second..

Hit ctrl-C to stop this command, and you’ll see that the streamer can detect that there is nothing reading the FIFO, so reports ‘stopped streaming’, though it does continue to fetch data using DMA, since this has minimal impact on any other applications.

You’ll note that it hasn’t been necessary to run the data display command using ‘sudo’; it works fine from a normal user account. It is important to limit the amount of code that has to run with root privileges, and the Linux FIFO interface is a handy way of achieving this.

There is a ‘-f’ format option, that controls the way the data is output. Currently there is only one possibility ‘-f 1’ which enables a microsecond timestamp on each block of data, e.g.

Command in console 1:
  sudo ./rpi_adc_stream -n 1 -r 10 -f 1 -s /tmp/adc.fifo
Response:
  Streaming 1 samples per block at 10 S/s

Command in console 2:
  cat /tmp/adc.fifo
Response in console 2 (with 1.1 volt input):
  0,1.102
  100000,1.104
  200000,1.102
  300001,1.105
  400001,1.104
  ..and so on, at 10 lines per second

The timestamp started at zero, then incremented by 100,000 microseconds every block. It is a 32-bit number, so if you want to measure times longer than 7 minutes, you will need to detect when the value has wrapped around.

If 2 input channels are enabled using ‘-i 2’, then the overall sample rate remains unchanged, each channel has half the samples. In the following example, I’ve also enabled verbose mode, to see the ADC binary data:

Command in console 1:
  sudo ./rpi_adc_stream -n 2 -i 2 -r 10 -f 1 -s /tmp/adc.fifo -v
Response in console 1:
  Streaming 2 samples per block at 10 S/s
Response when streaming starts:
  Started streaming to FIFO '/tmp/adc.fifo'
  F2 AD 00 00 F0 01 00 00
  F2 AE 00 00 F0 01 00 00
  F2 AE 00 00 F0 01 00 00
  F2 AE 00 00 F0 00 00 00
  ..and so on..

Command in console 2:
  cat /tmp/adc.fifo
Response in console 2 (IN1 is 1.1 volts, IN2 is zero):
  1.104,0.002
  1.105,0.002
  1.105,0.002
  1.105,0.000
  ..and so on..

Displaying streaming data

It’d be nice to view the streaming data in a continually-updated graph, similar to an oscilloscope display, but surprisingly few graphing utilities can handle a continuous flow of data – or they can only handle it at a very low rate.

Here are a few graphing utilities I’ve tried; they perform reasonably well on fast hardware, but struggle to maintain a good-quality graph on slower boards such as the Pi Zero – there is no problem with the data acquisition, it is just that the graphical display is very demanding.

Trend display

There is a Linux utility called ‘trend’, that can dynamically plot streaming data.

Trend display of a 50 Hz analog signal, 5000 samples per second

It has a wide range of options, and keyboard shortcuts, that I haven’t yet explored. The above graph was generated on a Pi 4 using the following command in one console:

sudo ./rpi_adc_stream -n 1 -l -r 5000 -s /tmp/adc.fifo

Then in a second console, the application is installed and run:

sudo apt install trend
cat /tmp/adc.fifo | trend -A f0f0f0 -I ff0000 -E 0 -s -v - 1200 600

This application is quite demanding on CPU resources, so if you are using a Pi 3, you’ll probably need to drop the sample rate to 2000.

Termeter display

Termeter is a really useful text-based dynamic display utility, written in the Go language.

You may wonder why I’m using a text-based console application to produce a graph, but it has two key advantages; it is very fast, and works on any Pi console. So if you are running the Pi ‘headless’ (i.e. remotely, with no local display) and you want to look your streaming data, you can run termeter on a remote console (e.g. ‘putty’ on windows) without the complexity of setting up an X display server.

It is installed using:

cd ~
sudo apt install golang
go get github.com/atsaki/termeter/cmd/termeter

The above data (1 sample per block, 5000 samples per second) was generated on a Pi 4 by running in one console:

sudo ./rpi_adc_stream -n 1 -r 5000 -s /tmp/adc.fifo

Then the display is started in a second console:

cat /tmp/adc.fifo | ~/go/bin/termeter

On a Pi 3, you might have to drop the sample rate to 2000, and even further on a Pi Zero.

Plotting in Python

Python plot of streaming data

Here is a very simple example that uses NumPy and Matplotlib to create a dynamically-updated graph of ADC data (a 10 Hz sine wave, at 200 samples per second, on a Pi 4). In one terminal, the data is generated by running:

sudo ./rpi_adc_stream -n 100 -r 200 -l -s /tmp/adc.fifo

Then run the following program in a second terminal (assuming you’ve installed Matplotlib and NumPy):

import numpy as np
from matplotlib import pyplot, animation

fifo_name = "/tmp/adc.fifo"
npoints  = 100
interval = 500
xlim     = (0, 1)
ylim     = (0, 3.5)

fifo = open(fifo_name, "r")
fig = pyplot.figure()
ax = pyplot.axes(xlim=xlim, ylim=ylim)
line, = ax.plot([], [], lw=1)

def init():
    line.set_data([], [])
    return line,

def animate(i):
    x = np.linspace(0, 1, npoints)
    y = np.fromstring(fifo.readline(), sep=',')
    line.set_data(x, y)
    return line,

anim = animation.FuncAnimation(fig, animate, init_func=init,
                               frames=npoints, interval=interval, blit=True)
pyplot.show()

The ‘readline’ function fetches a single line of comma-delimited data, which ‘fromstring’ converts to a NumPy array.

The ‘animate’ function is used to continuously refresh the graph, however this approach is only suitable for low update rates; the time taken to do the plot is quite significant, and there is an inherent conflict between the data rate set by the streamer, and the display rate set by the animation, causing the display to stall, especially on a single-core Pi Zero. A multi-threaded program is needed to coordinate the display updates with the incoming data.

Update

The display problem has been solved by creating a fast oscilloscope-type viewer for the streaming data, using OpenGL.

WebGL oscilloscope display

Full details and source code are here, and there is a WebGL version that works remotely in a browser here.

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.