RP2040 – Lean2

Accurate frequency & timing measurement on the Pi Pico RP2040 using Python

In a previous post, I looked at ways of measuring frequency accurately using an RP2040 CPU programmed in C. In this project, I have re-coded those functions in MicroPython, and provided a few enhancements.

I use Direct Memory Access (DMA) to minimise the CPU workload, and avoid the need for interrupts, but this raises a problem: when programming in C, there is a substantial Application Programming Interface (API) that allows everything on the RP2040 chip to be configured using simple function calls. However, MicroPython aims to be more-or-less compatible with a wide range of CPUs, so not much CPU-specific code has been included. This means that we have to do some low-level programming if we want to use DMA, or the more obscure functions of the RP2040 Pulse Width Modulation (PWM) peripheral.

My first attempt at ADC and DMA low-level programming used MicroPython ‘uctypes’ to create a model of the peripherals, and if you are interested in the nuts-and-bolts of this method, I suggest you browse the post here.

This approach produced excellent results, but wasn’t very ‘pythonic’, so I’ve encapsulated common API functions in Python classes; when choosing function names, I’ve copied those in the C API when possible. So for example, setting the PWM peripheral in C is:

#define PWM_GPIO_PIN 3
uint slice = pwm_gpio_to_slice_num(PWM_GPIO_PIN);
pwm_config cfg = pwm_get_default_config();
pwm_config_set_clkdiv_mode(&cfg, PWM_DIV_B_RISING);
// ..and any other settings, then..
pwm_init(slice, &cfg, false);

In MicroPython this is done by instantiating a device from the PWM class:

import pico_devices as devs
PWM_GPIO_PIN = 3
pwm = devs.PWM(PWM_GPIO_PIN)
pwm.set_clkdiv_mode(devs.PWM_DIV_B_RISING)
# ..and any other settings..

Not only are the function call names very similar between C and Python, but also the underlying code is similar; for example, the set_clkdiv function is defined as:

class PWM:
    def set_clkdiv_mode(self, mode):
        self.slice.CSR.DIVMODE = mode

where ‘slice’ is the (somewhat unusual) name for a PWM channel, CSR is the Control and Status Register, and DIVMODE is a 2-bit field within that register. If you want to learn more about the internal structure of the PWM peripheral, I suggest you take a look at the C language post.

The Python classes have been kept very ‘lean’, with no error-checking of the parameters; when time permits, I intend to create a sub-class that provides comprehensive parameter-checking .

Test signal

We need a waveform for the tests, which the RP2040 PWM peripherals can easily provide. The following code generates a 100 kHz square wave on GPIO pin 4:

PWM_OUT_PIN = 4             # GPIO pin for output
PWM_DIV = 125               # 125e6 / 125 = 1 MHz
PWM_WRAP = 9                # 1 MHz / (9 + 1) = 100 kHz
PWM_LEVEL = (PWM_WRAP+1)//2 # 50% PWM

# Start a PWM output
def pwm_out(pin, div, level, wrap): 
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_int_frac(div, 0)
    pwm.set_wrap(wrap)
    pwm.set_chan_level(pwm.gpio_to_channel(pin), level)
    pwm.set_enabled(1)
    return pwm

test_signal = pwm_out(PWM_OUT_PIN, PWM_DIV, PWM_LEVEL, PWM_WRAP)

The choice of GPIO pin number is quite arbitrary, you just need to be aware when programming PWM that there are are 16 channels (technically 8 slices, each with 2 channels) mapped onto 32 pins, so if I’m using GPIO pin 4 for this signal, I can’t use PWM on GPIO 20. However, that pin is still available for any other purpose, so the limitation usually isn’t a problem.

16-bit counter

The RP2040 PWM peripheral can also act as a pulse counter, and by counting the number of edges in a given time, we can establish the frequency. This pulse input capability is only available on odd GPIO pin numbers (i.e. channel B of a PWM ‘slice’).

The code is quite simple; it is only necessary to set the mode, and the frequency divisor:

# Initialise PWM as a pulse counter (gpio must be odd number)
def pulse_counter_init(pin, rising=True):
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    ctr = devs.PWM(pin)
    ctr.set_clkdiv_mode(devs.PWM_DIV_B_RISING if rising else devs.PWM_DIV_B_FALLING)
    ctr.set_clkdiv(1)
    return ctr

Now we can make a simple frequency meter, by clearing the count to zero, then enabling the counter for a specific period of time:

# Enable or disable pulse counter
def pulse_counter_enable(ctr, en):
    if en:
        ctr.set_ctr(0)
    ctr.set_enabled(en)

# Get value of pulse counter
def pulse_counter_value(ctr):
    return ctr.get_counter()

pulse_counter_enable(counter, True)
time.sleep(0.1)
pulse_counter_enable(counter, False)
val = pulse_counter_value(counter)
print("Sleep 0.1s, count %u" % val)

The count-value for a 100 kHz signal and a 100 millisecond sleep is theoretically 10000, but is actually around 10015, due to the time-delays associated with the Python instructions. I’ll be describing ways to eliminate these delays later on in this post.

32-bit counter

Unfortunately the PWM peripheral has only a 16-bit counter, which can be too short for high-frequency signals or long sample-times. In the C code I polled the count value, to count the number of times it rolled over past 65535, then add on that number to the final result.

An alternative method is to take advantage of the fact that the DMA controller has a 32-bit down-counter that decrements every time a DMA cycle is completed. So we just need to set the DMA count to a large number, and set the PWM peripheral to trigger a DMA cycle on every rising or falling edge of the input signal.

This prompts the question “what data should the DMA transfer?” and the answer is “anything: it doesn’t matter”, but we still have to be careful to ensure the transfers don’t over-write some random area in memory. We need to very carefully specify the DMA source & destination addresses, using the binary ‘array’ data type, which occupies a fixed area of memory, without all the complexities of a Python ‘object’. The syntax to create the array is somewhat unusual; rather than just specifying a size, we have to provide the initial data using an iterator. The pico_devices library provides a helper function to simplify the process:

# Create 32-bit array (to receive DMA data)
def array32(size):
    return array.array('I', (0 for _ in range(size)))

To use this array with DMA, we have to get its address in memory, using ‘uctypes.addressof’:

# Get address of variable (for DMA)
def addressof(var):
    return uctypes.addressof(var)

Armed with these functions, we can initialise the DMA controller:

ext_data = devs.array32(1)  # Dummy array for extended counter

# Use DMA to extend pulse counter to 32 bits
def pulse_counter_ext_init(ctr):
    ctr.set_enabled(False)
    ctr.set_wrap(0)
    ctr.set_ctr(0)
    ctr_dma = devs.DMA()
    ctr_dma.set_transfer_data_size(devs.DMA_SIZE_8)
    ctr_dma.set_read_increment(False)
    ctr_dma.set_write_increment(False)
    ctr_dma.set_read_addr(devs.addressof(ext_data))
    ctr_dma.set_write_addr(devs.addressof(ext_data))
    ctr_dma.set_dreq(ctr.get_dreq())
    ctr.set_enabled(True)
    return ctr_dma

We’re modifying the PWM 16-bit counter to wrap around to zero on every input edge, and initialising its starting value to zero, which is essential. Then a DMA channel is instantiated, and configured to copy 8 bits from the data array back to the data array.

It is important to note that enabling the DMA counter does not necessarily start the transfer from scratch; if a transfer has already started, the controller will resume counting where it left off. So it is necessary to clear out any existing transfer, before stating a new one:

# Start the extended pulse counter
def pulse_counter_ext_start(ctr_dma):
    ctr_dma.abort()
    ctr_dma.set_trans_count(0xffffffff, True)
 
# Stop the extended pulse counter
def pulse_counter_ext_stop(ctr_dma):
    ctr_dma.set_enable(False)
 
# Return value from extended pulse counter
def pulse_counter_ext_value(ctr_dma):
    return 0xffffffff - ctr_dma.get_trans_count()

Using the extended pulse counter is similar to the 16-bit counter, we just need to remember the limitation that if the 32-bit value is exceeded, the counter will stop, and not wrap around.

counter_dma = pulse_counter_ext_init(counter)
pulse_counter_ext_start(counter_dma)
time.sleep(1.0)
val = pulse_counter_ext_value(counter_dma)
pulse_counter_ext_stop(counter_dma)
print("Sleep 1.0s, ext count %u" % val)

A typical response is:

Sleep 1.0s, ext count 100011

This shows that the count is not limited to 16 bits.

Gating a counter

To accurately count pulses with a specific time-frame, it is necessary for the timing to be accurate, and as we have seen, the ‘sleep’ function introduces a significant error. To eliminate this, we need a hardware mechanism whereby a timer directly enables and disables the counter (a process called ‘gating’) without requiring any intervention from the CPU.

This involves programming another PWM channel to act as a timer, then when the time has expired, using DMA to modify the counter’s register to stop counting. The default PWM clock is 125 MHz, the prescaler is 8 bits, and the counter register is 16 bits, effectively 17 bits if we engage ‘phase correct’ mode. So the slowest gate frequency is 125e6 / (256 * 65536 * 2) = 3.725 Hz, a gate-time of 0.268 seconds; I’ve opted for 0.25 seconds.

GATE_TIMER_PIN = 0          # Used to define PWM slice
GATE_PRESCALE = 250         # 125e6 / 250 = 500 kHz
GATE_WRAP = 125000          # 500 kHz / 125000 = 4 Hz (250 ms)

# Initialise PWM as a gate timer
def gate_timer_init(pin):
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_int_frac(GATE_PRESCALE, 0)
    pwm.set_wrap(int(GATE_WRAP/2 - 1))
    pwm.set_chan_level(pwm.gpio_to_channel(pin), int(GATE_WRAP/4))
    pwm.set_phase_correct(True)
    return pwm

This code uses GPIO pin 0 to identify which PWM ‘slice’ is being used. Since we aren’t enabling PWM I/O on that pin, it can still be used for any other function, such as serial output.

We need to trigger a DMA cycle when the gate PWM times out; that cycle will be used to disable the counter PWM. So when initialising the DMA channel, we need to capture the non-enabled state, that will be written into the counter CSR register; this is only one 32-bit value, but I’m using a fixed array for that value, since DMA can’t handle the complexities of Python object storage.

gate_data = devs.array32(1)

# Initialise gate timer DMA
def gate_dma_init(ctr, gate):
    dma = devs.DMA()
    dma.set_transfer_data_size(devs.DMA_SIZE_32)
    dma.set_read_increment(False)
    dma.set_write_increment(False)
    dma.set_dreq(gate.get_dreq())
    gate_data[0] = ctr.slice.CSR_REG
    dma.set_read_addr(devs.addressof(gate_data))
    dma.set_write_addr(ctr.get_csr_address())
    return dma

To start the frequency measurement, it is necessary to set the DMA count (since this is reset by a DMA cycle), enable DMA, then enable the gate & counter PWM devices simultaneously.

# Start frequency measurment using gate
def freq_gate_start(ctr, gate, dma):
    ctr.set_ctr(0)
    gate.set_ctr(0)
    dma.set_trans_count(1, True)
    ctr.set_enables((1<<ctr.slice_num) | (1<<gate.slice_num), True)

If all is well, the test signal frequency should be reported correctly on the console:

Gate 250.0 ms, count 25000, freq 100.0 kHz

Edge timer

An alternative method for measuring frequency is to measure the times between the rising or falling edges, producing values that are the reciprocal of the frequency. This is particularly useful when dealing with slow signals, or if you want to implement a method to eliminate ‘rogue’ pulses, since you get one measurement for each cycle of the input signal, so could reject pulses that are outside the expected frequency range.

By now, you won’t be surprised to learn that I use DMA to capture the time-value for each edge; there is a convenient 32-bit microsecond-value that can be copied into a suitably-sized array on edge positive or negative edge; the sole function of the PWM peripheral is to detect the edge, and trigger a DMA cycle.

We’ll be generating a test signal as before, but this time it is 10 Hz:

PWM_DIV = 250               # 125e6 / 125 = 500 kHz
PWM_WRAP = 50000 - 1        # 500 kHz / 50000 = 10 Hz

The PWM peripheral initialisation is similar to previous functions:

# Initialise PWM as a timer (gpio must be odd number)
def timer_init(pin, rising=True):
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_mode(devs.PWM_DIV_B_RISING if rising else devs.PWM_DIV_B_FALLING)
    pwm.set_clkdiv(1)
    pwm.set_wrap(0);
    return pwm

The DMA controller is also similar, but the destination is set to auto-increment with every transfer:

# Initialise timer DMA
def timer_dma_init(timer):
    dma = devs.DMA()
    dma.set_transfer_data_size(devs.DMA_SIZE_32)
    dma.set_read_increment(False)
    dma.set_write_increment(True)
    dma.set_dreq(timer.get_dreq())
    dma.set_read_addr(devs.TIMER_RAWL_ADDR)
    return dma

Starting the DMA is a bit different; firstly we need to use the ‘abort’ command to stop any previous transfer that might still be in progress. If we didn’t do that, and just enabled DMA, the transfer would resume where it left off – the new settings would be ignored. Secondly we need to set both the transfer count and the destination address, since these will have been changed by a previous transfer.

# Start frequency measurment using gate
def timer_start(timer, dma):
    timer.set_ctr(0)
    timer.set_enabled(True)
    dma.abort()
    dma.set_write_addr(devs.addressof(time_data))
    dma.set_trans_count(NTIMES, True)

The main program needs to perform some simple maths to derive the frequency, guarding against the possibility that there may be insufficient time-values (1 or less):

NTIMES = 9                       # Number of time samples
time_data = devs.array32(NTIMES) # Time data

timer_pwm = timer_init(PWM_IN_PIN)
timer_dma = timer_dma_init(timer_pwm)
    
timer_start(timer_pwm, timer_dma)
time.sleep(1.0)
timer_stop(timer_pwm)
    
count = NTIMES - timer_dma.get_trans_count()
data = time_data[0:count]
diffs = [data[n]-data[n-1] for n in range(1, len(data))]
total = sum(diffs)
freq = (1e6 * len(diffs) / total) if total else 0
print("%u samples, total %u us, freq %3.1f Hz" % (count, total, freq))

The frequency of the test signal should be displayed on the console:

9 samples, total 800000 us, freq 10.0 Hz

Running the code

The source files are available on Github here. It is necessary to load the library file ‘pico_devices.py’ onto the target system; if you are using the Thonny editor, this is done by right-clicking the filename, and selecting ‘Upload to /’. You can then run one of the program files to measure frequency:

pico_counter.py: use simple pulse counting and sleep timing
pico_freq.py: use DMA to accurately gate the pulse counter
pico_timer.py: use time-interval (reciprocal) measurement

Don’t forget to link GPIO pin 4 (the test signal output) to pin 3 (the measurement input) for these tests.

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

Accurate frequency measurement on the Pi Pico RP2040 using C

This project describes the creation of the software for a vehicle speedometer. It measures the frequency of a signal that is proportional to speed, but I’m also using the opportunity to explore two possible techniques for measuring frequency with high accuracy using an RP2040 processor.

If you’d prefer Python code, see this post.

Counting transitions using PWM

Unusually, I’m going to use the RP2040 PWM (Pulse Width Modulation) peripheral to count the clock pulses. PWM is normally used to generate signals, not measure them, but the RP2040 peripheral has an unusual feature; it can be use to count pulse-edges, namely the number of low-to-high or high-to-low transitions of an input signal. If we start this counter, and stop after a specific time (which is commonly known as ‘gating’ the counter) then we obtain the frequency by dividing the count by the time.

The RP2040 documentation uses the unusual term ‘slice’ to denote one of the PWM channels, where each slice has 2 ‘channels’ (A and B) that are associated with specific I/O pins, as follows:

Slice 0 channel A uses GPIO 0, Slice 0 channel B uses GPIO 1
Slice 1 channel A uses GPIO 2, Slice 1 channel B uses GPIO 3
Slice 2 channel A uses GPIO 4, Slice 2 channel B uses GPIO 5
..and so on..

When generating PWM signals both channels can be used, but when counting edges we must use the ‘B’ channels, so an odd GPIO pin number.

The PWM counter isn’t difficult to set up, using the standard Pico hardware interface library:

uint counter_slice;

// Initialise frequency pulse counter
void freq_counter_init(int pin) 
{
    assert(pwm_gpio_to_channel(pin) == PWM_CHAN_B);
    counter_slice = pwm_gpio_to_slice_num(pin);

    gpio_set_function(pin, GPIO_FUNC_PWM);
    pwm_config cfg = pwm_get_default_config();
    pwm_config_set_clkdiv_mode(&cfg, PWM_DIV_B_RISING);
    pwm_config_set_clkdiv(&cfg, 1);
    pwm_init(counter_slice, &cfg, false);
}

Starting, stopping and reading the count is quite easy:

// Get count 
uint16_t freq_counter_read(uint msec)
{
    pwm_set_counter(counter_slice, 0);
    pwm_set_enabled(counter_slice, true);
    sleep_ms(msec);
    pwm_set_enabled(counter_slice, false);    
    return (uint16_t) pwm_get_counter(counter_slice);
}

This code works fine, with a few limitations:

The sleep_msec() call isn’t very accurate, its timing will depend on other activities the RP2040 is performing (such as USB interrupts).
During the sleep_ms() call the CPU is just wasting time, doing nothing; if we want to do something else (such as scanning a button input) then we’ll have to use a timer interrupt for that, which will further reduce the accuracy of the sleep timing.
The count is a 16-bit number, so we must choose a gate-time that ensures the counter won’t overflow.

To fix points 1 & 2 we need the gating timer to be implemented in hardware, and we can use another PWM ‘slice’ for this purpose; it is basically acting as an up-counter fed from a known clock, and when it times out, we halt the counter PWM, so its value is fixed.

To achieve good accuracy, there should also be a hardware link between the timeout of the timer PWM and the stopping of the counter PWM, so it won’t matter what code the CPU is executing at the time. Fortunately the timer PWM can trigger a DMA (Direct Memory Access) cycle when it times out, and we can use this cycle to update the counter PWM control register, stopping the counter. This means that once the two PWM slices are started (counter & timer) no more CPU intervention is required, until the data capture is finished.

Initialising the gate time & DMA is a little more complicated:

#define TIMER_PRESCALE      250     // 8-bit value
#define TIMER_WRAP          125000  // 17-bit value
#define SAMPLE_FREQ         (125000000 / (TIMER_PRESCALE*TIMER_WRAP))

uint gate_slice, gate_dma_chan, timer_dma_dreq, csr_stopval;

// Initialise gate timer, and DMA to control the counter
void gate_timer_init(int pin)
{
    gate_slice = pwm_gpio_to_slice_num(pin);
    io_rw_32 *counter_slice_csr = &pwm_hw->slice[counter_slice].csr;
    
    gpio_set_function(pin, GPIO_FUNC_PWM);
    pwm_set_clkdiv_int_frac(gate_slice, TIMER_PRESCALE, 0);
    pwm_set_wrap(gate_slice, TIMER_WRAP/2 - 1);
    pwm_set_chan_level(gate_slice, PWM_CHAN_B, TIMER_WRAP/4);
    pwm_set_phase_correct(gate_slice, true);
        
    gate_dma_chan = dma_claim_unused_channel(true);
    dma_channel_config cfg = dma_channel_get_default_config(gate_dma_chan);
    channel_config_set_transfer_data_size(&cfg, DMA_SIZE_32);
    channel_config_set_read_increment(&cfg, false);
    channel_config_set_dreq(&cfg, pwm_get_dreq(gate_slice));
    csr_stopval = *counter_slice_csr;
    dma_channel_configure(gate_dma_chan, &cfg, counter_slice_csr, &csr_stopval, 1, false);
    pwm_set_enabled(gate_slice, true);
}

The PWM wrap settings deserve some explanation; since the hardware register is 16 bits, the obvious choice is to set the timer wrap value to 65536 or less, but this means the longest gate-time is 65536 * 256 / 125 MHz = 134 msec. However, by selecting ‘phase correct’ mode, the PWM device counts up to the wrap value, then back down again, before triggering the DMA. So the wrap value effectively becomes 17 bits wide, and we can set a gate time of 250 msec, as in the code above.

To run a capture cycle, it is only necessary to give the DMA controller the address of the register to be modified (the counter PWM ‘csr’ register) and start both PWM slices simultaneously.

// Start pulse counter
void counter_start(void)
{
    dma_channel_transfer_from_buffer_now(timer_dma_chan, &csr_stopval, 1);
    pwm_set_counter(counter_slice, 0);
    pwm_set_counter(gate_slice, 0);
    pwm_set_mask_enabled((1 << counter_slice) | (1 << gate_slice));
}

To get read the counter value, we need to wait until the DMA cycle is complete, then stop the timer and access the count register.

// Get pulse count
int counter_value(void)
{
    while (dma_channel_is_busy(gate_dma_chan)) ;
    pwm_set_enabled(gate_slice, false);    
    return((uint16_t)pwm_get_counter(counter_slice));
}

There is still the problem of the counter potentially overflowing if there is a high-frequency input; this could be solved using a PWM overflow interrupt, but personally I prefer to poll the count value to check for overflow, for example:

uint counter_lastval, counter_overflow;

// Check for overflow, and check if capture complete
bool counter_value_ready(void)
{
    uint n = pwm_get_counter(counter_slice);
    
    if (n < counter_lastval)
        counter_overflow++;
    counter_lastval = n;
    return (!dma_channel_is_busy(timer_dma_chan));
}

A capture cycle that is protected against counter overflow could now look like:

counter_start();
while (!counter_value_ready())
{
    // Insert code here to be executed while waiting for value
}
printf("%u Hz\n", frequency_value());

Reciprocal measurement

The ‘count-the-edges’ technique described above works well for reasonably fast signals (e.g. 1 kHz and above), but at lower frequencies it is quite inaccurate, so an alternative technique is used, which is known as ‘time-interval’ or ‘reciprocal’ measurement.

Instead of counting the number of transitions in a given time, the new technique measures the time between the transitions, for one or more cycles; the inverse of this time is the frequency.

We have already seen how the RP2040 PWM peripheral can be used to count pulses and generate DMA requests; if the ‘wrap’ value is set to zero, then the PWM will generate a DMA request for every positive-going transition of the input signal. This DMA cycle can be used to copy the 32-bit microsecond value from a timer register into an array. So we end up with an array of microsecond timing values, and the inverse of these is the frequency.

The modifications to the previously-described code aren’t substantial, we just need to modify the counter ‘wrap’ value, and set up the DMA transfers of the timing values.

#define NUM_EDGE_TIMES      11
#define EDGE_WAIT_USEC      2000001
uint edge_times[NUM_EDGE_TIMES]; 

// Initialise DMA to store the edge times
void edge_timer_init(void) 
{
    timer_dma_chan = dma_claim_unused_channel(true);
    dma_channel_config cfg = dma_channel_get_default_config(timer_dma_chan);
    channel_config_set_transfer_data_size(&cfg, DMA_SIZE_32);
    channel_config_set_read_increment(&cfg, false);
    channel_config_set_write_increment(&cfg, true);
    channel_config_set_dreq(&cfg, pwm_get_dreq(counter_slice));
    dma_channel_configure(timer_dma_chan, &cfg, edge_times, &timer_hw->timerawl, NUM_EDGE_TIMES, false);
    pwm_set_wrap(counter_slice, 0);
}

The first 2 definitions give the number of cycles to be captured, and the length of time to wait for the edges to arrive. These may be tuned as required; a slow signal will require a long capture time, for example a 1 Hz signal needs at least 2 seconds to be sure of capturing 2 edges. Conversely, achieving high accuracy on a fast signal will require a large array, since the result is calculated from the average of all the values.

// Get average of the edge times
int edge_timer_value(void)
{
    uint i=1, n;
    int total=0;

    dma_channel_abort(timer_dma_chan);
    pwm_set_enabled(counter_slice, false);    
    while (i<NUM_EDGE_TIMES && edge_times[i])
    {
        n = edge_times[i] - edge_times[i-1];
        total += n;
        i++;
    }
    return(i>1 ? total / (i - 1) : 0);
}

// Get frequency value from edge timer
float edge_freq_value(void)
{
    int val = edge_timer_value();
    return(val ? 1e6 / val : 0);
}

The main loop now looks like:

edge_timer_init();
while (true) 
{
    memset(edge_times, 0, sizeof(edge_times));
    edge_timer_start();
    sleep_ms(EDGE_WAIT_USEC / 1000);
    printf("Frequency %8.6f Hz\n", edge_freq_value());
}

Since DMA has been used to capture the data, the sleep timing is completely non-critical; it can be increased to accommodate slower input signals, or reduced to provide a quicker answer with faster signals.

However, as mentioned above, the use of ‘sleep’ does render the CPU unresponsive for that duration, which is a problem if you want it to do other things, e.g. scan for button-presses. A simple way of fixing this issue (without resorting to interrupts) is to create a polled timer, that keeps track of the elapsed time, and returns a ‘true’ value when there is a timeout.

// Return non-zero if timeout
bool ustimeout(uint *tickp, uint usec)
{
    uint t = time_us_32(), dt = t - *tickp;

    if (usec == 0 || dt >= usec)
    {
        *tickp = t;
        return (1);
    }
    return (0);
}

Before using this timer, we call the function with a pointer to a ‘uint’ variable, and a zero microsecond value; this initialises the variable with the current time. Thereafter, we just call the function with the desired timeout value, and it will return ‘true’ on timeout. The main loop becomes:

uint edge_ticks;

while (true) 
{
    memset(edge_times, 0, sizeof(edge_times));
    edge_timer_start();
    ustimeout(&edge_ticks, 0);
    while (!ustimeout(&edge_ticks, EDGE_WAIT_USEC))
    {
        // Insert code here to be executed while waiting for value
    }
    printf("Frequency %5.3f Hz\n", edge_freq_value());

This code can be used to measure very low frequencies; for example, with the gate time of over 2 seconds, it is possible to measure the 1 PPS (1 Pulse Per Second) signal from a GPS module, to within 6 decimal places. This is useful for checking the accuracy of the Pico microsecond timer, since the PPS signal is locked to the very accurate satellite clocks.

Running the code

The source code is available on Github here. There is a single C source file (picofreq.c), with a definition at the top to choose between edge-counter and edge-timer (reciprocal) measurements, then some definitions for the two methods:

// Set zero to use edge-counter, 1 to use edge-timer
#define USE_EDGE_TIMER      0

// Definitions for edge-counter
#define TIMER_PRESCALE      250     // 8-bit value
#define TIMER_WRAP          125000  // 17-bit value
#define SAMPLE_FREQ         (125000000 / (TIMER_PRESCALE * TIMER_WRAP))

// Definitions for edge-timer
#define NUM_EDGE_TIMES      11
#define EDGE_WAIT_USEC      200001

In edge-counter mode, you need to select values that give you the required gate time, bearing in mind the prescaler value is 8 bits, and the wrap value is 17 bits. The settings above give a gate-time of 250 * 12500 / 125000000 = 0.25 seconds, which is close to the maximum.

For the edge-timer, the definitions are the number of edges to captured, and the overall time to wait. In the example above, the frequency is calculated using the average of 10 time-difference values, with a waiting time of 0.2 seconds. To gain maximum accuracy from fast signals, it will be necessary to increase the number of edges; conversely, if you want to measure a slow signal such as the 1 Hz output from a GPS, then the waiting-time needs to be increased to over 2 seconds.

In either mode, the program prints the frequency on the default serial console (115k baud on GPIO pin 0), and toggles the Pico on-board LED. If you are using the Pico-W wireless variant then the LED is driven by the CYW43439 WiFi chip, and it is much more complex to control, so if you want to retain this feature, an external LED may be used on any convenient pin number.

The GPIO pin definitions are:

#define FREQ_IN_PIN         7
#define GATE_TIMER_PIN      0

The frequency input can be any odd-numbered GPIO pin, as discussed above. The gate timer pin definition is just a convenient way of specifying a PWM slice, so the above definition selects slice 0 channel A. Since GPIO pin 0 is defined as a serial output, there is no clash between the serial & PWM signals; the PWM output signal is discarded.

To build and run the code, I have included a minimal CMakeLists.txt, the only addition being the enabling of all warnings:

cmake_minimum_required(VERSION 3.12)
include(pico_sdk_import.cmake)
project(picofreq C CXX ASM)
pico_sdk_init()
add_compile_options(-Wall)
add_executable(picofreq picofreq.c)
target_link_libraries(picofreq pico_stdlib hardware_pwm hardware_dma)
pico_add_extra_outputs(picofreq)

This generates a uf2 file in the ‘build’ directory that can be programmed into the Pico using the normal method (holding the pushbutton down while powering up, then drag-and-drop using the USB filesystem) but personally I find it much more convenient to use the the Pico debug probe, and in case you are using the Windows VS Code front-end, I have included the standard launch.json and settings.json files in a .vscode subdirectory, since I’ve found these to be essential for using VS Code. To download & debug the code, just hit ctrl-shift-D to bring up the debug window, then F5. If you are using and alternative debug adaptor, it will probably be necessary to modify launch.json, but the variety of options are too complex to be described here.

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

Picowi part 10: Web camera

A Web camera is quite a demanding application, since it requires a continuous stream of data to be sent over the network at high speed. The data volume is determined by the image size, and the compression method; the raw data for a single VGA-size (640 x 480 pixel) image is over 600K bytes, so some compression is desirable. Some cameras have built-in JPEG compression, which can compress the VGA image down to roughly 30K bytes, and it is possible to send a stream of still images to the browser, which will display them as if they came from a video-format file. This approach (known as motion-JPEG, or MJPEG) has a disadvantage in terms of inter-frame compression; since each frame is compressed in isolation, the compressor can’t reduce the filesize by taking advantage of any similarities between adjacent frames, as is done in protocols such as MPEG. However, MJPEG has the great advantage of simplicity, which makes it suitable for this demonstration.

Camera

The standard cameras for the full-size Raspberry Pi boards have a CSI (Camera Serial Interface) conforming to the specification issued by the MIPI (Mobile Industry Processor Interface) alliance. This high-speed connection is unsuitable for use with the Pico, we need something with a slower-speed SPI (Serial Peripheral Interface), and JPEG compression ability.

The camera I used is the 2 megapixel Arducam, which is uses the OV2640 sensor, combined with an image processing chip. It has I2C and SPI interfaces; the former is primarily for configuring the sensor, with the latter being for data transfer. Sadly the maximum SPI frequency is specified as 8 MHz, which compares unfavourably with the 60 MHz SPI we are using to communicate with the network.

The connections specified by Arducam are:

SPI SCK  GPIO pin 2
SPI MOSI          3
SPI MISO          4
SPI CS            5
I2C SDA           8
I2C SCL           9
Power             3.3V
Ground            GND

In addition, GPIO pin 0 is used as a serial console output, the data rate is 115200 baud by default.

I2C and SPI tests

The first step is to check that the i2c interface is connected correctly, by checking an ID register value:

#define CAM_I2C         i2c0
#define CAM_I2C_ADDR    0x30
#define CAM_I2C_FREQ    100000
#define CAM_PIN_SDA     8
#define CAM_PIN_SCL     9

i2c_init(CAM_I2C, CAM_I2C_FREQ);
gpio_set_function(CAM_PIN_SDA, GPIO_FUNC_I2C);
gpio_set_function(CAM_PIN_SCL, GPIO_FUNC_I2C);
gpio_pull_up(CAM_PIN_SDA);
gpio_pull_up(CAM_PIN_SCL);

WORD w = ((WORD)cam_sensor_read_reg(0x0a) << 8) | cam_sensor_read_reg(0x0b);
if (w != 0x2640 && w != 0x2641 && w != 0x2642)
    printf("Camera i2c error: ID %04X\n", w);

/ Read camera sensor i2c register
BYTE cam_sensor_read_reg(BYTE reg)
{
    BYTE b;
    
    i2c_write_blocking(CAM_I2C, CAM_I2C_ADDR, &reg, 1, true);
    i2c_read_blocking(CAM_I2C, CAM_I2C_ADDR, &b, 1, false);
    return (b);
}

Then we can check the SPI interface by writing values to a register, and reading them back:

#define CAM_SPI         spi0
#define CAM_SPI_FREQ    8000000
#define CAM_PIN_SCK     2
#define CAM_PIN_MOSI    3
#define CAM_PIN_MISO    4
#define CAM_PIN_CS      5

spi_init(CAM_SPI, CAM_SPI_FREQ);
gpio_set_function(CAM_PIN_MISO, GPIO_FUNC_SPI);
gpio_set_function(CAM_PIN_SCK, GPIO_FUNC_SPI);
gpio_set_function(CAM_PIN_MOSI, GPIO_FUNC_SPI);
gpio_init(CAM_PIN_CS);
gpio_set_dir(CAM_PIN_CS, GPIO_OUT);
gpio_put(CAM_PIN_CS, 1);

if ((cam_write_reg(0, 0x55), cam_read_reg(0) != 0x55) || (cam_write_reg(0, 0xaa), cam_read_reg(0) != 0xaa))
    printf("Camera SPI error\n");

Initialisation

The sensors require a large number of i2c register settings in order to function correctly. These are just ‘magic numbers’ copied across from the Arducam source code. The last block of values specify the sensor resolution, which is set at compile-time. The options are 320 x 240 (QVGA) 640 x 480 (VGA) 1024 x 768 (XGA) 1600 x 1200 (UXGA), e.g.

// Horizontal resolution: 320, 640, 1024 or 1600 pixels
#define CAM_X_RES 640

Capturing a frame

A single frame is captured by writing to a few registers, then waiting for the camera to signal that the capture (and JPEG compression) is complete. The size of the image varies from shot to shot, so it is necessary to read some register values to determine the actual image size. In reality, the camera has a tendency to round up the size, and pad the end of the image with some nulls, but this doesn’t seem to be a problem when displaying the image.

// Read single camera frame
int cam_capture_single(void)
{
    int tries = 1000, ret=0, n=0;
    
    cam_write_reg(4, 0x01);
    cam_write_reg(4, 0x02);
    while ((cam_read_reg(0x41) & 0x08) == 0 && tries)
    {
        usdelay(100);
        tries--;
    }
    if (tries)
        n = cam_read_fifo_len();
    if (n > 0 && n <= sizeof(cam_data))
    {
        cam_select();
        spi_read_blocking(CAM_SPI, 0x3c, cam_data, 1);
        spi_read_blocking(CAM_SPI, 0x3c, cam_data, n);
        cam_deselect();
        ret = n;
    }
    return (ret);
}

Reading the picture from the camera just requires the reading of a single dummy byte, then the whole block that represents the image; it is a complete JFIF-format picture, so no further processing needs to be done. If the browser has requested a single still image, we just send the whole block as-is to the client, with an HTTP header specifying “Content-Type: image/jpeg”

The following image was captured by the camera at 640 x 480 resolution:

MJPEG video

As previously mentioned, the Web server can stream video to the browser, in the form of a continuous flow of JPEG images. The requires a few special steps:

In the response header, the server defines the content-type as “multipart/x-mixed-replace”
To enable the browser to detect when one image ends, and another starts, we need a unique marker. This can be anything that isn’t likely to occur in the data stream; I’ve specified “boundary=mjpeg_boundary”
Before each image, the boundary marker must be sent, followed by the content-type (“image/jpeg”) and a blank line to mark the end of the header.

Timing

The timing will be quite variable, since it depends on the image complexity and network configuration, but here are the results of some measurements when fetching a single JPEG image over a small local network, using binary (not base64) mode:

Resolution (pixels)	Image capture time (ms)	Image size (kbyte)	TCP transfer time (ms)	TCP speed (kbyte/s)
320 x 240	153	10.2	4.4	2310
640 x 480	292	25.6	10.9	2350
1024 x 768	321	49.1	21.5	2285
1600 x 1200	420	97.3	42.4	2292

Web camera timings

The webcam code triggers an image capture, then after the data has been fetched into the CPU RAM buffer, it is sent to the network stack for transmission. There would be some improvement in the timings if the next image were fetched while the current image is being transmitted, however the improvement will be quite small, since the overall time is dominated by the time taken for the camera to capture and compress the image.

Using the Web camera

There is only one setting at the top of camera/cam_2640.h, namely the horizontal resolution:

// Horizontal resolution: 320, 640, 1024 or 1600 pixels
#define CAM_X_RES 640

Then the binary is built and the CPU is programmed in the usual way:

make web_cam
./prog web_cam

At boot-time the IP address will be reported on the serial console; use this to access the camera or video Web pages in a browser, e.g.

http://192.168.1.240/camera.jpg
http://192.168.1.240/video

It is important to note that a new image capture is triggered every time the Web page is accessed, so any attempt to simultaneously access the pages from more than one browser will fail. To allow simultaneous access by multiple clients, a double-buffering scheme needs to be implemented.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

PicoWi part 6: DHCP

In part 5, we joined a WiFi network, and used ‘ping’ to contact another unit on that network, but this was achieved by setting the IP address manually, which is generally known as using a ‘static’ IP.

The alternative is to use a ‘dynamic’ IP, that a central server (such as the WiFi Access Point) allocates from a pool of available addresses, using Dynamic Host Configuration Protocol (DHCP); this also provides other information such as a netmask & router address, to allow our unit to communicate with the wider Internet.

IP addresses and routing

So far, I’ve just said that an IP address consists of 4 bytes, that are usually expressed as decimal values with dotted notation, e.g. 192.168.1.2, but there is some extra complication.

Firstly it is important to note I’m using version 4 of the protocol (IPv4); there is a newer version (IPv6) with a much wider address range, but the older version is sufficient for our purposes, and easier to implement.

Next it is important to distinguish between a public and private IP address.

Public: an address that is accessible from the Internet, generally assigned by an Internet Service Provider (ISP)
Private: an address used locally within an organisation, that is not unique; generally assigned from the blocks 192.168.x.x, 172.16.x.x or 10.x.x.x

The address we’ll be getting from the DHCP server is probably private; if we are accessing the Internet, there will be one or more network devices (‘routers’) that perform public-to-private translation, and also security functions (‘firewalls’) to block malicious data.

If our unit has an IP address it wishes to contact, how does it know what to do? It just has to determine if the target address is local or remote by applying a netmask. For example if our unit is given the address 192.168.1.1 with netmask 255.255.255.0, then a logical AND of the two values means that our local network (known as a ‘subnet’) is 192.168.1. If the unit we’re contacting is on that subnet (i.e. the address begins with 192.168.1) then we just send out a local ARP request to convert their IP address into a MAC address, and start communicating.

If the target address isn’t on the same subnet (e.g. 192.168.2.1, 11.22.33.44, or anything else) then our unit contacts a router (using the address given in the DHCP response) and relies on the router to forward the data appropriately.

In the diagram above, there are networks with public addresses 11.22.33.44 and 22.33.44.55, and they both have private addresses in 192.168.1.x subnetworks; the job of the router is to move the data between these subnetworks by performing Network Address Translation (NAT) between them.

If unit 192.168.1.3 wants to contact 22.33.44.55 it will check the netmask, and because the target isn’t on the same subnetwork, the data will be sent to the router 192.168.1.1, which will forward it over the Internet.

If 192.168.1.3 wants to contact 192.168.1.2, ANDing with the netmask will show that they are both on the same subnet, so the data will be sent directly, bypassing using the router.

However, if 192.168.1.3 wants to send the data to 192.168.1.1 on the remote network, how does the router know what to do? The simple answer is “it doesn’t”, as addresses on the 192.168.1.x subnet aren’t unique, and there will be thousands (or millions!) of units with that same address around the world. Also the netmask clearly indicates that 192.168.1.1 must be on the same subnet as 192.168.1.3, so the data will be sent locally to 192.168.1.1, whether it exists or not; if it doesn’t exist, that’ll be flagged up by the ARP request failing.

There are various workarounds for this ‘NAT traversal’ problem, for example 192.168.1.3 sends the data to the router 22.33.44.55, which is configured to copy incoming data to 192.168.1.1, but there are major security risks associated with opening up a system to unfiltered Internet traffic, so for the purposes of this blog, I’m assuming that our unit will only be communicating with other units on the same subnetwork, or publicly-available systems on the Internet.

The above example assumes there is a single router for all outgoing traffic, and this is generally the case on a WiFi network, where the Access Point also acts as a router. However, on more complex networks there can be multiple routers to provide alternative routes to other networks or the Internet.

Client and server

The most common model for communication between two systems is client-server. The server runs continuously, waiting for a client to get in contact. The client uses a specific communications format (a ‘protocol’) to establish a link (‘connection’) to the server. The connection persists for as long as is needed to exchange the data, then it is closed by both sides.

Simpler protocols can dispense with the connection, but still retain the client-server model; for example, to fetch the time with Network Time Protocol (NTP) you just send a single message to a time server, and get a single message back with the time. This ‘connectionless’ approach means that a single ‘stateless’ server can handle very large numbers of clients, since it doesn’t have to track the state of its clients; an incoming request has all the information needed to send the response.

UDP message format

So there are two distinct ways for a client to communicate with a server; one creates a persistent connection, with both sides tracking the flow of data, and re-sending any data that is lost in transit: this is Transmission Control Protocol (TCP). The other way is User Datagram Protocol (UDP), which has no such tracking, or error correction; just send a block of data and hope it arrives.

This uncertainty means that, if faced with a choice, many programmers reject UDP as being too unreliable, however it does have a very important place in the suite of TCP/IP protocols, not least because it is used for DHCP.

A DHCP transmission consists of the following:

Ethernet header
IP header
UDP header
DHCP header
DHCP option data

We’ve already used the Ethernet and IP headers when sending an ICMP (ping) message, this time we’re stacking on a UDP header.

/* ***** UDP (User Datagram Protocol) header ***** */
typedef struct udph
{
    WORD  sport,            /* Source port */
          dport,            /* Destination port */
          len,              /* Length of datagram + this header */
          check;            /* Checksum of data, header + pseudoheader */
} UDPHDR;

There is a 16-bit length, which shows the total length of the header plus any data that follows, and a 16-bit checksum, which is calculated in an unusual manner; it incorporates the UDP header, parts of the IP header, and all the data that follows. The way this is calculated is to create a pseudo-header containing the relevant IP parts:

/* ***** Pseudo-header for UDP or TCP checksum calculation ***** */
/* The integers must be in hi-lo byte order for checksum */
typedef struct              /* Pseudo-header... */
{
    IPADDR sip,             /* Source IP address */
          dip;              /* Destination IP address */
    BYTE  z,                /* Zero */
          pcol;             /* Protocol byte */
    WORD  len;              /* UDP length field */
} PHDR;

So the UDP code has to prepare two headers, though the pseudo-header is only used for checksum calculation, and can be discarded after that is done.

// Add UDP header to buffer, return byte count
int ip_add_udp(BYTE *buff, WORD sport, WORD dport, void *data, int dlen)
{
    UDPHDR *udp=(UDPHDR *)buff;
    IPHDR *ip=(IPHDR *)(buff-sizeof(IPHDR));
    WORD len=sizeof(UDPHDR), check;
    PHDR ph;

    udp->sport = htons(sport);
    udp->dport = htons(dport);
    udp->len = htons(sizeof(UDPHDR) + dlen);
    udp->check = 0;
    len += ip_add_data(&buff[sizeof(UDPHDR)], data, dlen);
    check = add_csum(0, udp, len);
    IP_CPY(ph.sip, ip->sip);
    IP_CPY(ph.dip, ip->dip);
    ph.z = 0;
    ph.pcol = PUDP;
    ph.len = udp->len;
    udp->check = 0xffff ^ add_csum(check, &ph, sizeof(PHDR));
    return(len);
}

Port numbers

Another notable feature of the UDP header is the source & destination port numbers, and these deserve some explanation.

A port number can identify a specific service on a server; for example port 80 identifies an HTTP web server, and 67 is a DHCP server. These are ‘well-known’ port numbers and are in the range 0 to 1023. Ports numbered 1024 to 49151 are also used for specific server functionality that isn’t part of the original set, so are known as ‘registered’. The remaining numbers 49152 to 65535 are ‘dynamic’ ports, that are used temporarily by client applications.

When a client wishes to communicate with a server, it will obtain a dynamic port from its operating system, and use that port for the duration of a transaction, releasing it when the transaction is complete. In contrast, a server will generally monopolise a well-known or registered port on a permanent basis, though some servers additionally open up a dynamic port on a short-term basis to handle a specific interaction with the client, such as a file transfer.

Unusually, the DHCP server & client are both assigned well-known numbers, namely UDP 67 and 68. You may see these identified as BOOTP ports, since DHCP is based on the older BOOTP protocol, with some additions.

DHCP message format

DHCP is a 4-step process:

Discover: the unit broadcasts a request asking for network parameters, such as an IP address it can use, also a router address, and subnet mask.
Offer: the server responds with some proposed values, that the unit can accept or reject.
Request: the unit signifies its acceptance of the proposed values
ACK: the server acknowledges the request, indicating that the parameters have been assigned to the unit.

Once the parameters have been assigned, the server will generally attempt to keep them unchanged, such that every time the unit boots, it will get the same IP address. However, this is not guaranteed, and a busy server with a lot of temporary clients will be forced to re-use addresses from units that haven’t been active for a while.

The message format is based on the older protocol BOOTP:

typedef struct {
  	BYTE  opcode;   			/* Message opcode/type. */
	BYTE  htype;				/* Hardware addr type (net/if_types.h). */
	BYTE  hlen;					/* Hardware addr length. */
	BYTE  hops;					/* Number of relay agent hops from client. */
	DWORD trans;				/* Transaction ID. */
	WORD secs;					/* Seconds since client started looking. */
	WORD flags;					/* Flag bits. */
	IPADDR ciaddr,				/* Client IP address (if already in use). */
           yiaddr,				/* Client IP address. */
           siaddr,				/* Server IP address */
           giaddr;				/* Relay agent IP address. */
	BYTE chaddr [16];		    /* Client hardware address. */
	char sname[SNAME_LEN];	    /* Server name. */
	char bname[BOOTF_LEN];		/* Boot filename. */
	BYTE cookie[DHCP_COOKIE_LEN];   /* Magic cookie */
} DHCPHDR;

When making the initial discovery request, many of these values are unused; the ‘cookie’ is filled in with a specific 4-byte value (99, 130, 83, 99) that signal this is a DHCP request, not BOOTP. Then there is a data field with ‘option’ values; each entry has one byte indicating the option type, one byte indicating data length, and that number of data bytes. The options I use in the discovery request are a byte value of 1, indicating it is a discovery message, and 4 parameter values, indicating what should be provided by the server (1 for subnet mask, 3 for router address, 6 for nameserver address and 15 for network name).

// DHCP message options
typedef struct {
    BYTE typ1, len1, opt;
    BYTE typ2, len2, data[4];
    BYTE end;
} DHCP_MSG_OPTS;

// DHCP discover options
DHCP_MSG_OPTS dhcp_disco_opts = 
   {53, 1, 1,               // Msg len 1 type 1: discover
    55, 4, {1, 3, 6, 15},   // Param len 4: mask, router, DNS, name
    255};                   // End

The resulting offer from the server probably includes much more than we asked for; this is what my server returns:

    Option: (53) DHCP Message Type (Offer)
    Option: (54) DHCP Server Identifier (192.168.1.254)
    Option: (51) IP Address Lease Time (7 days)
    Option: (58) Renewal Time Value (3 days, 12 hours)
    Option: (59) Rebinding Time Value (6 days, 3 hours)
    Option: (1) Subnet Mask (255.255.255.0)
    Option: (28) Broadcast Address (192.168.1.255)
    Option: (15) Domain Name ("home")
    Option: (6) Domain Name Server (192.168.1.254)
    Option: (3) Router (192.168.1.254)
    Option: (255) End

You’ll see that the Access Point 192.168.1.254 is acting as a router and nameserver; we’ll be looking at the Domain Name System (DNS) in the next part of this blog.

If the unit wants to accept these proposed settings, it must send a request containing the proposed IP address. This can have the same format as the discovery, with a byte value of 3, indicating it is a request message, and a the 4-byte address value:

// DHCP request options
DHCP_MSG_OPTS dhcp_req_opts = 
   {53, 1, 3,               // Msg len 1 type 3: request
    50, 4, {0, 0, 0, 0},    // Address len 4 (copied from offer)
    255};                   // End

Assuming all is OK, the ACK response from the server will be similar to the offer, maybe with more values added (such as vendor-specific information), so an important part of the receiver code is the scanning of the parameters to find the values that are needed.

State machine

If we were in a multi-tasking environment, the DHCP process might basically consist of a sequence of 4 function calls, each function stopping (‘blocking’) until it is complete:

send_discovery()
receive_offer()
send_request()
receive_ack()

Since we don’t currently have multi-tasking, we can’t adopt this approach, as it would block any other code from running, and in the event of an error, one of these functions might stall indefinitely. Instead, we have to adopt a ‘polled’ approach, where we keep on re-visiting this process to see what (if anything) has changed. The key to this is to have a single ‘state’ variable that reflects what has happened, e.g. it has a value of 1 when we have sent the discovery, 2 when we have received an offer, and so on.

// Poll DHCP state machine
void dhcp_poll(void)
{
    static uint32_t dhcp_ticks=0;
    
    if (dhcp_state == 0 ||              // Send DHCP Discover
       (dhcp_state != DHCPT_ACK && ustimeout(&dhcp_ticks, DHCP_TIMEOUT)))
    {
        ustimeout(&dhcp_ticks, 0);
        IP_ZERO(my_ip);
        ip_tx_dhcp(bcast_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_disco_opts, sizeof(dhcp_disco_opts));
        dhcp_state = DHCPT_DISCOVER;
    }
    else if (dhcp_state == DHCPT_OFFER) // Received Offer, send Request
    {
        ustimeout(&dhcp_ticks, 0);
        IP_CPY(dhcp_req_opts.data, offered_ip);
        ip_tx_dhcp(host_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_req_opts, sizeof(dhcp_req_opts));
        dhcp_state = DHCPT_REQUEST;
    }
}

The polling of the DHCP state also incorporates a timeout, that is triggered in the event of an error; with a simple 4-step protocol like this, we can just restart the process from the beginning, rather than trying to work out where the error occurred.

Example program

There is one example program dhcp.c that fetches IP addresses and netmask from a DHCP server, and prints the result:

Joining network
Joined network
Tx DHCP DISCOVER
Rx DHCP OFFER 192.168.1.240
Tx DHCP REQUEST
Rx DHCP OFFER 192.168.1.240
Rx DHCP OFFER 192.168.1.240
Rx DHCP ACK 192.168.1.240 mask 255.255.255.0 router 192.168.1.254 DNS 192.168.1.254
DHCP complete, IP address 192.168.1.240 router 192.168.1.254
192.168.1.254->192.168.1.240 ARP request
192.168.1.240->192.168.1.254 ARP response

The display mode is set to include DHCP:

set_display_mode(DISP_INFO|DISP_JOIN|DISP_ARP|DISP_DHCP);

This allows you to see the message-passing; it isn’t unusual to receive duplicate messages, and in the DHCP OFFER above. The ARP display is also enabled so you can see the router using ARP to check the newly-assigned address.

It will be necessary to change the default SSID and PASSWD to match your network; for details on how to build & load the application, see the introduction.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

Web display for Pi Pico oscilloscope

In part 1 of this series, I added WiFi connectivity to the Pi Pico using an ESP32 moduleand MicroPython. Part 2 showed how Direct Memory Access (DMA) can be used to get analog samples at regular intervals from the Pico on-board Analog Digital Converter (ADC).

I’m now combining these two techniques with some HTML and Javascript code to create a Web display in a browser, but since this code will be quite complicated, first I’ll sort out how the data is fetched from the Pico Web server.

Data request

The oscilloscope display will require user controls to alter the sample rate, number of samples, and any other settings we’d like to change. These values must be sent to the Web server, along with a filename that will trigger the acquisition. To fetch 1000 samples at 10000 samples per second, the request received by the server might look like:

GET /capture.csv?nsamples=1000&xrate=10000

If you avoid any fancy characters, the Python code in the server that extracts the filename and parameters isn’t at all complicated:

ADC_SAMPLES, ADC_RATE = 20, 100000
parameters = {"nsamples":ADC_SAMPLES, "xrate":ADC_RATE}

# Get HTTP request, extract filename and parameters
req = esp.get_http_request()
if req:
    line = req.split("\r")[0]
    fname = get_fname_params(line, parameters)

# Get filename & parameters from HTML request
def get_fname_params(line, params):
    fname = ""
    parts = line.split()
    if len(parts) > 1:
        p = parts[1].partition('?')
        fname = p[0]
        query = p[2].split('&')
        for param in query:
            p = param.split('=')
            if len(p) > 1:
                if p[0] in params:
                    try:
                        params[p[0]] = int(p[1])
                    except:
                        pass
    return fname

The default parameter names & values are stored in a dictionary, and when the URL is decoded, and names that match those in the dictionary will have their values updated. Then the data is fetched using the parameter values, and returned in the form of a comma-delimited (CSV) file:

if CAPTURE_CSV in fname:
    vals = adc_capture()
    esp.put_http_text(vals, "text/csv", esp32.DISABLE_CACHE)

The name ‘comma-delimited’ is a bit of a misnomer in this case, we just with the given number of lines, with one floating-point voltage value per line.

Requesting the data

Before diving into the complexities of graphical display and Javascript, it is worth creating a simple Web page to fetch this data.

The standard way of specifying parameters with a file request is to define a ‘form’ that will be submitted to the server. The parameter values can be constrained using ‘select’, to avoid the user entering incompatible numbers:

<html><!DOCTYPE html><html lang="en">
<head><meta charset="utf-8"/></head><body>
  <form action="/capture.csv">
    <label for="nsamples">Number of samples</label>
    <select name="nsamples" id="nsamples">
      <option value=100>100</option>
      <option value=200>200</option>
	  <option value=500>500</option>
      <option value=1000>1000</option>
    </select>
    <label for="xrate">Sample rate</label>
    <select name="xrate" id="xrate">
      <option value=1000>1000</option>
      <option value=2000>2000</option>
	  <option value=5000>5000</option>
      <option value=10000>10000</option>
    </select>
	<input type="submit" value="Submit">
  </form>
</body></html>

This generates a very simple display on the browser:

On submitting the form, we get back a raw list of values:

Since the file we have requested is pure CSV data, that is all we get; the controls have vanished, and we’ll have to press the browser ‘back’ button if we want to retry the transaction. This is quite unsatisfactory, and to improve it there are various techniques, for example using a template system to always add the controls at the top of the data. However, we also want the browser to display the data graphically, which means a sizeable amount of Javascript, so we might as well switch to a full-blown AJAX implementation, as mentioned in the first part.

AJAX

To recap, AJAX originally stood for ‘Asynchronous JavaScript and XML’, where the Javascript on the browser would request an XML file from the server, then display data within that file on the browser screen. However, there is no necessity that the file must be XML; for simple unstructured data, CSV is adequate.

The HTML page is similar to the previous one, the main changes are that we have specified a button that’ll call a Javascript function when clicked, and there is a defined area to display the response data; this is tagged as ‘preformatted’ so the text will be displayed in a plain monospaced style.

  <form id="captureForm">
    <label for="nsamples">Number of samples</label>
    <select name="nsamples" id="nsamples">
      <option value=100>100</option>
      <option value=200>200</option>
	  <option value=500>500</option>
      <option value=1000>1000</option>
    </select>
    <label for="xrate">Sample rate</label>
    <select name="xrate" id="xrate">
      <option value=1000>1000</option>
      <option value=2000>2000</option>
	  <option value=5000>5000</option>
      <option value=10000>10000</option>
    </select>
    <button onclick="doSubmit(event)">Submit</button>
  </form>
  <pre><p id="responseText"></p></pre>

The button calls the Javascript function ‘doSubmit’ when clicked, with the click event as an argument. As this button is in a form, by default the browser would attempt to re-fetch the current document using the form data, so we need to block this behaviour and substitute the action we want, which is to wait until the response is obtained, and display it in the area we have allocated. This is ‘asynchronous’ (using a callback function) so that the browser doesn’t stall waiting for the response.

function doSubmit() {
  // Eliminate default action for button click
  // (only necessary if button is in a form)
  event.preventDefault();

  // Create request
  var req = new XMLHttpRequest();

  // Define action when response received
  req.addEventListener( "load", function(event) {
    document.getElementById("responseText").innerHTML = event.target.responseText;
  } );

  // Create FormData from the form
  var formdata = new FormData(document.getElementById("captureForm"));

  // Collect form data and add to request
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0] + '=' + entry[1]);
  }
  req.open( "GET", "/capture.csv?" + encodeURI(params.join("&")));
  req.send();
}

The resulting request sent by the browser looks something like:

GET /capture.csv?nsamples=100&xrate=1000

This is created by looping through the items in the form, and adding them to the base filename. When doing this, there is a limited range of characters we can use, in order not to wreck the HTTP request syntax. I have used the ‘encodeURI’ function to encode any of these unusable characters; this isn’t necessary with simple parameters that are just alphanumeric values, but if I’d included a parameter with free-form text, this would be needed. For example, if one parameter was a page title that might include spaces, then the title “Test page” would be encoded as

GET /capture.csv?nsamples=100&xrate=1000&title=Test%20page

You may wonder why I am looping though the form entries, when in theory they can just be attached to the HTTP request in one step:

// Insert form data into request - doesn't work!
req.open("GET", "/capture.csv");
req.send(formdata);

I haven’t been able to get this method to work; I think the problem is due to the way the browser adapts the request if a form is included, but in the end it isn’t difficult to iterate over the form entries and add them directly to the request.

The resulting browser display is a minor improvement over the previous version, in that it isn’t necessary to use the ‘back’ button to re-fetch the data, but still isn’t very pretty.

Graphical display

There many ways to display graphic content within a browser. The first decision is whether to use vector graphics, or a bitmap; I prefer the former, since it allows the display to be resized without the lines becoming jagged.

There is a vector graphics language for browsers, namely Scalable Vector Graphics (SVG) and I have experimented with this, but find it easier to use Javascript commands to directly draw on a specific area of the screen, known as an ‘HTML canvas’, that is defined within the HTML page:

<div><canvas id="canvas1"></canvas></div>

To draw on this, we create a ‘2D context’ in Javascript:

var ctx1 = document.getElementById("canvas1").getContext("2d");

We can now use commands such as ‘moveto’ and ‘lineto’ to draw on this context; a useful first exercise is to draw a grid across the display.

var ctx1, xdivisions=10, ydivisions=10, winxpad=10, winypad=30;
var grid_bg="#d8e8d8", grid_fg="#40f040";
window.addEventListener("load", function() {
  ctx1 = document.getElementById("canvas1").getContext("2d");
  resize();
  window.addEventListener('resize', resize, false);
} );

// Draw grid
function drawGrid(ctx) {
  var w=ctx.canvas.clientWidth, h=ctx.canvas.clientHeight;
  var dw = w/xdivisions, dh=h/ydivisions;
  ctx.fillStyle = grid_bg;
  ctx.fillRect(0, 0, w, h);
  ctx.lineWidth = 1;
  ctx.strokeStyle = grid_fg;
  ctx.strokeRect(0, 1, w-1, h-1);
  ctx.beginPath();
  for (var n=0; n<xdivisions; n++) {
    var x = n*dw;
    ctx.moveTo(x, 0);
    ctx.lineTo(x, h);
  }
  for (var n=0; n<ydivisions; n++) {
    var y = n*dh;
    ctx.moveTo(0, y);
    ctx.lineTo(w, y);
  }
  ctx.stroke();
}

// Respond to window being resized
function resize() {
  ctx1.canvas.width = window.innerWidth - winxpad*2;
  ctx1.canvas.height = window.innerHeight - winypad*2;
  drawGrid(ctx1);
}

I’ve included a function that resizes the canvas to fit within the window, which is particularly convenient when getting a screen-grab for inclusion in a blog post:

All that remains is to issue a request, wait for the response callback, and plot the CSV data onto the canvas.

var running=false, capfile="/capture.csv"

// Do a single capture (display is done by callback)
function capture() {
  var req = new XMLHttpRequest();
  req.addEventListener( "load", display);
  var params = formParams()
  req.open( "GET", capfile + "?" + encodeURI(params.join("&")));
  req.send();
}

// Display data (from callback event)
function display(event) {
  drawGrid(ctx1);
  plotData(ctx1, event.target.responseText);
  if (running) {
    window.requestAnimationFrame(capture);
  }
}

// Get form parameters
function formParams() {
  var formdata = new FormData(document.getElementById("captureForm"));
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0]+ '=' + entry[1]);
  }
  return params;
}

A handy feature is to have the display auto-update when the current data has been displayed; I’ve done this by using requestAnimationFrame to trigger another capture cycle, if the global ‘running’ variable is set. Then we just need some buttons to control this feature:

<button id="single" onclick="doSingle()">Single</button>
<button id="run"  onclick="doRun()">Run</button>

// Handle 'single' button press
function doSingle() {
  event.preventDefault();
  running = false;
  capture();
}

// Handle 'run' button press
function doRun() {
  event.preventDefault();
  running = !running;
  capture();
}

The end result won’t win any prizes for style or speed, but it does serve as a useful basis for acquiring & displaying data in a Web browser.

You’ll see that the controls have been rearranged slightly, and I’ve also added a ‘simulate’ checkbox; this invokes MicroPython code in the Pico Web server that doesn’t use the ADC; instead it uses the CORDIC algorithm to incrementally generate sine & cosine values, which are multiplied, with some random noise added:

# Simulate ADC samples: sine wave plus noise
def adc_sim():
    nsamp = parameters["nsamples"]
    buff = array.array('f', (0 for _ in range(nsamp)))
    f, s, c = nsamp/20.0, 1.0, 0.0
    for n in range(0, nsamp):
        s += c / f
        c -= s / f
        val = ((s + 1) * (c + 1)) + random.randint(0, 100) / 300.0
        buff[n] = val
    return "\r\n".join([("%1.3f" % val) for val in buff])

Distorted sine wave with random noise added

Running the code

If you haven’t done so before, I suggest you run the code given in the first and second parts, to check the hardware is OK.

Load rp_devices.py and rp_esp32.py onto the Micropython filesystem, not forgetting to modify the network name (SSID) and password at the top of that file. Then load the HTML files rpscope_capture, rpscope_ajax and rpscope_display, and run the MicroPython server rp_adc_server.py using Thonny. The files are on Github here.

You should then be able to display the pages as shown above, using the IP address that is displayed on the Thonny console; I’ve used 10.1.1.11 in the examples above.

When experimenting with alternative Web pages, I found it useful to run a Web server on my PC, as this allows a much faster development process. There are many ways to do this, the simplest is probably to use the server that is included as standard in Python 3:

python -m http.server 8000

This makes the server available on port 8000. If the Web browser is running on the same PC as the server, use the ‘localhost’ address in the browser, e.g.

http://127.0.0.1:8000/rpscope_display.html

This assumes the HTML file is in the same directory that you used to invoke the Web server. If you also include a CSV file named ‘capture.csv’, then it will be displayed as if the data came from the Pico server.

However, there is one major problem with this approach: the CSV file will be cached by the browser, so if you change the file, the display won’t change. This isn’t a problem on the Pico Web server, as it adds do-not-cache headers in the HTTP response. The standard Python Web server doesn’t do that, so will use the cached data, even after the file has changed.

One other issue is worthy of mention; in my setup, the ESP32 network interface sometimes locks up after it has transferred a significant amount of data, which means the Web server becomes unresponsive. This isn’t an issue with the MicroPython code, since the ESP32 doesn’t respond to pings when it is in this state. I’m using ESP32 Nina firmware v 1.7.3; hopefully, by the time you read this, there is an update that fixes the problem.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Pi Pico ADC input using DMA and MicroPython

This is the second part of my Web-based Pi Pico oscilloscope project. In the first part I used an Espressif ESP32 to add WiFi connectivity to the Pico, and now I’m writing code to grab analog data from the on-chip Analog-to-Digital Converter (ADC), which can potentially provide up to 500k samples/sec.

High-speed transfers like this normally require code written in C or assembly-language, but I’ve decided to use MicroPython, which is considerably slower, so I need to use hardware acceleration to handle the data rate, specifically Direct Memory Access (DMA).

MicroPython ‘uctypes’

MicroPython does not have built-in functions to support DMA, and doesn’t provide any simple way of accessing the registers that control the ADC, DMA and I/O pins. However it does provide a way of defining these registers, using a new mechanism called ‘uctypes’. This is vaguely similar to ‘ctypes’ in standard Python, which is used to define Python interfaces for ‘foreign’ functions, but defines hardware registers, using a very compact (and somewhat obscure) syntax.

To give a specific example, the DMA controller has multiple channels, and according to the RP2040 datasheet section 2.5.7, each channel has 4 registers, with the following offsets:

0x000 READ_ADDR
0x004 WRITE_ADDR
0x008 TRANS_COUNT
0x00c CTRL_TRIG

The first three of these require simple 32-bit values, but the fourth has a complex bitfield:

Bit 31:   AHB_ERROR
Bit 30:   READ_ERROR
..and so on until..
Bits 3-2: DATA_SIZE
Bit 1:    HIGH_PRIORITY
Bit 0:    EN

With MicroPython uctypes, we can define the registers, and individual bitfields within those registers, e.g.

from uctypes import BF_POS, BF_LEN, UINT32, BFUINT32
DMA_CHAN_REGS = {
    "READ_ADDR_REG":       0x00|UINT32,
    "WRITE_ADDR_REG":      0x04|UINT32,
    "TRANS_COUNT_REG":     0x08|UINT32,
    "CTRL_TRIG_REG":       0x0c|UINT32,
    "CTRL_TRIG":          (0x0c,DMA_CTRL_TRIG_FIELDS)
}
DMA_CTRL_TRIG_FIELDS = {
    "AHB_ERROR":   31<<BF_POS | 1<<BF_LEN | BFUINT32,
    "READ_ERROR":  30<<BF_POS | 1<<BF_LEN | BFUINT32,
..and so on until..
    "DATA_SIZE":    2<<BF_POS | 2<<BF_LEN | BFUINT32,
    "HIGH_PRIORITY":1<<BF_POS | 1<<BF_LEN | BFUINT32,
    "EN":           0<<BF_POS | 1<<BF_LEN | BFUINT32
}

The UINT32, BF_POS and BF_LEN entries may look strange, but they are just a way of encapsulating the data type, bit position & bit count into a single variable, and once that has been defined, you can easily read or write any element of the bitfield, e.g.

# Set DMA data source to be ADC FIFO
dma_chan.READ_ADDR_REG = ADC_FIFO_ADDR

# Set transfer size as 16-bit words
dma_chan.CTRL_TRIG.DATA_SIZE = 1

You may wonder why there are 2 definitions for one register: CTRL_TRIG and CTRL_TRIG_REG. Although it is useful to be able to manipulate individual bitfields (as in the above code) sometimes you need to write the whole register at one time, for example to clear all fields to zero:

# Clear the CTRL_TRIG register
dma_chan.CTRL_TRIG_REG = 0

An additional complication is that there are 12 DMA channels, so we need to define all 12, then select one of them to work on:

DMA_CHAN_WIDTH  = 0x40
DMA_CHAN_COUNT  = 12
DMA_CHANS = [struct(DMA_BASE + n*DMA_CHAN_WIDTH, DMA_CHAN_REGS)
    for n in range(0,DMA_CHAN_COUNT)]

DMA_CHAN = 0
dma_chan = DMA_CHANS[DMA_CHAN]

To add even more complication, the DMA controller also has a single block of registers that are not channel specific, e.g.

DMA_REGS = {
    "INTR":               0x400|UINT32,
    "INTE0":              0x404|UINT32,
    "INTF0":              0x408|UINT32,
    "INTS0":              0x40c|UINT32,
    "INTE1":              0x414|UINT32,
..and so on until..
    "FIFO_LEVELS":        0x440|UINT32,
    "CHAN_ABORT":         0x444|UINT32
}

So to cancel all DMA transactions on all channels:

DMA_DEVICE = struct(DMA_BASE, DMA_REGS)
dma = DMA_DEVICE
dma.CHAN_ABORT = 0xffff

Single ADC sample

MicroPython has a function for reading the ADC, but we’ll be using DMA to grab multiple samples very quickly, so this function can’t be used; we need to program the hardware from scratch. A useful first step is to check that we can produce sensible values for a single ADC sample. Firstly the I/O pin needs to be set as an analog input, using the uctype definitions. There are 3 analog input channels, numbered from 0 to 2:

import rp_devices as devs
ADC_CHAN = 0
ADC_PIN  = 26 + ADC_CHAN
adc = devs.ADC_DEVICE
pin = devs.GPIO_PINS[ADC_PIN]
pad = devs.PAD_PINS[ADC_PIN]
pin.GPIO_CTRL_REG = devs.GPIO_FUNC_NULL
pad.PAD_REG = 0

Then we clear down the control & status register, and the FIFO control & status register; this is only necessary if they have previously been programmed:

adc.CS_REG = adc.FCS_REG = 0

Then enable the ADC, and select the channel to be converted:

adc.CS.EN = 1
adc.CS.AINSEL = ADC_CHAN

Now trigger the ADC for one capture cycle, and read the result:

adc.CS.START_ONCE = 1
print(adc.RESULT_REG)

These two lines can be repeated to get multiple samples.

If the input pin is floating (not connected to anything) then the value returned is impossible to predict, but generally it seems to be around 50 to 80 units. The important point is that the value fluctuates between samples; if several samples have exactly the same value, then there is a problem.

Multiple ADC samples

Since MicroPython isn’t fast enough to handle the incoming data, I’m using DMA, so that the ADC values are copied directly into memory without any software intervention.

However, we don’t always want the ADC to run at maximum speed (500k samples/sec) so need some way of triggering it to fetch the next sample after a programmable delay. The RP2040 designers have anticipated this requirement, and have equipped it with a programmable timer, driven from a 48 MHz clock. There is also a mechanism that allows the ADC to automatically sample 2 or 3 inputs in turn; refer to the RP2040 datasheet for details.

Assuming the ADC has been set up as described above, the additional code is required. First we define the DMA channel, the number of samples, and the rate (samples per second).

DMA_CHAN = 0
NSAMPLES = 10
RATE = 100000
dma_chan = devs.DMA_CHANS[DMA_CHAN]
dma = devs.DMA_DEVICE

We now have to enable the ADC FIFO, create a 16-bit buffer to hold the samples, and set the sample rate:

adc.FCS.EN = adc.FCS.DREQ_EN = 1
adc_buff = array.array('H', (0 for _ in range(NSAMPLES)))
adc.DIV_REG = (48000000 // RATE - 1) << 8
adc.FCS.THRESH = adc.FCS.OVER = adc.FCS.UNDER = 1

The DMA controller is configured with the source & destination addresses, and sample count:

dma_chan.READ_ADDR_REG = devs.ADC_FIFO_ADDR
dma_chan.WRITE_ADDR_REG = uctypes.addressof(adc_buff)
dma_chan.TRANS_COUNT_REG = NSAMPLES

The DMA destination is set to auto-increment, with a data size of 16 bits; the data request comes from the ADC. Then DMA is enabled, waiting for the first request.

dma_chan.CTRL_TRIG_REG = 0
dma_chan.CTRL_TRIG.CHAIN_TO = DMA_CHAN
dma_chan.CTRL_TRIG.INCR_WRITE = dma_chan.CTRL_TRIG.IRQ_QUIET = 1
dma_chan.CTRL_TRIG.TREQ_SEL = devs.DREQ_ADC
dma_chan.CTRL_TRIG.DATA_SIZE = 1
dma_chan.CTRL_TRIG.EN = 1

Before starting the sampling, it is important to clear down the ADC FIFO, by reading out any existing samples – if this step is omitted, the data you get will be a mix of old & new, which can be very confusing.

while adc.FCS.LEVEL:
    x = adc.FIFO_REG

We can now set the START_MANY bit, and the ADC will start generating samples, which will be loaded into its FIFO, then transferred by DMA to the RAM buffer. Once the buffer is full (i.e. the DMA transfer count has been reached, and its BUSY bit is cleared) the DMA transfers will stop, but the ADC will keep trying to put samples in the FIFO until the START_MANY bit is cleared.

adc.CS.START_MANY = 1
while dma_chan.CTRL_TRIG.BUSY:
    time.sleep_ms(10)
adc.CS.START_MANY = 0
dma_chan.CTRL_TRIG.EN = 0

We can now print the results, converted into a voltage reading:

vals = [("%1.3f" % (val*3.3/4096)) for val in adc_buff]
print(vals)

As with the single-value test, the displayed values should show some dithering; if the input is floating, you might see something like:

['0.045', '0.045', '0.047', '0.046', '0.045', '0.046', '0.045', '0.046', '0.046', '0.041']

Running the code

If you are unfamiliar with the process of loading MicroPython onto the Pico, or loading files into the MicroPython filesystem, I suggest you read my previous post.

The source files are available on Github here; you need to load the library file rp_devices.py onto the MicroPython filesystem, then run rp_adc_test.py; I normally run this using Thonny, as it simplifies the process of editing, running and debugging the code.

In the next part I combine the ADC sampling and the network interface to create a networked oscilloscope with a browser interface.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

RP2040 WiFi using Microchip ATWINC1500 module

Part 1: joining a network

The Raspberry Pi Pico is an incredibly useful low-cost micro-controller module based on the RP2040 CPU, but at the time of writing, there is a major omission: there is no networking capability.

This project adds low-cost wireless networking to the Pi Pico, and any other RP2040 boards. The There are various modules on the market that could be used for this purpose; I have chosen the Microchip ATWINC1500 or 1510 modules as they low-cost, have an easy hardware interface (4-wire SPI), and feature a built-in TCP/IP software stack, which significantly reduces the amount of software needed on the RP2040.

The photo above shows the module mounted on an Adafruit breakout board, and the module itself; this is the variant with a built-in antenna, but there is also a version with an antenna connector, that allows an external antenna to be used.

The only difference between the ATWINC1500 and 1510 modules is that the latter have larger flash memory size (1 MB, as opposed to 0.5 MB). There is also an earlier series of low-level interface modules named ATWILC; I’m not using them, as the built-in TCP/IP software of the ATWINC saves a lot of code complication on the RP2040.

Hardware connections

For simplicity, I have used the Adafruit breakout board, but it is possible to directly connect the module to the Pico, powered from its 3.3V supply.

Wiring Pico to Adafruit WINC1500 breakout

Pi Pico pins
SCK     18     SPI clock
MOSI    19     SPI data out
MISO    16     SPI data in
CS      17     SPI chip select
WAKE    20     Module wake
EN      20     Module enable
RESET   21     Module reset
IRQ     22     Module interrupt request

No extra components are needed, if the wiring to the module is kept short, i.e. 3 inches (76 mm).

SPI on the RP2040

Initialising the SPI interface on the RP2040 just involves a list of API function calls:

#define SCK_PIN     18
#define MOSI_PIN    19
#define MISO_PIN    16
#define CS_PIN      17
#define WAKE_PIN    20
#define RESET_PIN   21
#define IRQ_PIN     22

// Initialise SPI interface
void spi_setup(int fd)
{
    stdio_init_all();
    spi_init(SPI_PORT, SPI_SPEED);
    spi_set_format(SPI_PORT, 8, SPI_CPOL_0, SPI_CPHA_0, SPI_MSB_FIRST);
    gpio_init(MISO_PIN);
    gpio_set_function(MISO_PIN, GPIO_FUNC_SPI);
    gpio_set_function(CS_PIN,   GPIO_FUNC_SIO);
    gpio_set_function(SCK_PIN,  GPIO_FUNC_SPI);
    gpio_set_function(MOSI_PIN, GPIO_FUNC_SPI);
    gpio_init(CS_PIN);
    gpio_set_dir(CS_PIN, GPIO_OUT);
    gpio_put(CS_PIN, 1);
    gpio_init(WAKE_PIN);
    gpio_set_dir(WAKE_PIN, GPIO_OUT);
    gpio_put(WAKE_PIN, 1);
    gpio_init(IRQ_PIN);
    gpio_set_dir(IRQ_PIN, GPIO_IN);
    gpio_pull_up(IRQ_PIN);
    gpio_init(RESET_PIN);
    gpio_set_dir(RESET_PIN, GPIO_OUT);
    gpio_put(RESET_PIN, 0);
    sleep_ms(1);
    gpio_put(RESET_PIN, 1);
    sleep_ms(1);
}

When using the standard SPI transfer API function, I found that occasionally the last data bit wasn’t being received correctly. The reason was that the API function returns before the transfer is complete; the clock signal is still high, and needs to go low to finish the transaction. To fix this, I inserted a loop that waits for the clock to go low, before negating the chip-select line.

// Do SPI transfer
int spi_xfer(int fd, uint8_t *txd, uint8_t *rxd, int len)
{
    gpio_put(CS_PIN, 0);
    spi_write_read_blocking(SPI_PORT, txd, rxd, len);
    while (gpio_get(SCK_PIN)) ;
    gpio_put(CS_PIN, 1);
}

Interface method

The WiFi module has its own processor, running proprietary code; it is supplied with a suitable binary image already installed, so will start running as soon as the module is enabled.

The module has a Host Interface (HIF) that the Pico uses for all communications; it is a Serial Peripheral Interface (SPI) that consists of a clock signal, incoming & outgoing data lines (MOSI and MISO), and a Chip Select, also known as a Chip Enable. The Pico initiates and controls all the HIF transfers, but the module can request a transfer by asserting an Interrupt Request (IRQ) line.

The module is powered up by asserting the ‘enable’ line, then briefly pulsing the reset line. This ensures that there is a clean startup, without any complications caused by previous settings.

There are 2 basic methods to transfer data between the PICO and the module; simple 32-bit configuration values can be transferred as register read/write cycles; there is a specific format for these, which includes an acknowledgement that a write cycle has succeeded. The following logic analyser trace shows a 32-bit value of 0x51 being read from register 0x1070; the output from the CPU is MOSI, and the input from the module is MISO.

Now the corresponding write cycle, where the CPU is writing back a value of 0x51 to the same 32-bit register.

There are a few unusual features about these transfers.

The chip-select (CS) line doesn’t have to be continuously asserted during the transfer, it need only be asserted whilst a byte is actually being read or written.
The command value is CA hex for a read cycle, and C9 for a write.
The module echoes back the command value plus 2 bytes for a read (CA 00 F3), or plus 1 byte for a write (C9 00), to indicate it has been accepted.
The register address is 24-bit, big-endian (most significant byte first)
The data value is 32-bit, little-endian in the read cycle (51 00 00 00), and big-endian in the write cycle (00 00 00 50).

The last point is quite remarkable, and when starting on the code development, I had great difficulty believing it could be true. The likely reason is that the SPI transfer is is big-endian as defined in the Secure Digital (SD) card specification, but the CPU in the module is little-endian. So the firmware has to either do a byte-swap on every response message, or return everything using the native byte-order, with this result.

In addition to reading & writing single-word registers, the software must read & write blocks of data. This involves some negotiation with the module firmware, since that manages the allocation & freeing of the necessary storage space in the module. For example, the procedure for a block write is:

Request a buffer of the required size
Receive the address of the buffer from the module
Write one or more data blocks to the buffer
Signal that the transfer is complete

Reading is similar, except that the first step isn’t needed, as the buffer is already available with the required data.

Operations

The above transfer mechanism is used to send commands to the module, and receive responses back from it; there is generally a one-to-one correspondence between the command and response, but there may be a significant delay between the two. For example, the ‘receive’ command requests a data block that has been received over the network, but if there is none, there will be no response, and the command will remain active until something does arrive.

The commands are generally referred to as ‘operations’, and they are split into groups:

Main
Wireless (WiFi)
Internet Protocol (IP)
Host Interface (HIF)
Over The Air update (OTA)
Secure Socket Layer (SSL)
Cryptography (Crypto)

Each operation is assigned a number, and there is some re-use of numbers within different groups, for example a value of 70 in the WiFi group is used to enable Acess Point (AP) mode, but the same value in the IP group is a socket receive command. To avoid this possible source of confusion, my code combines the group and operation into a single 16-bit value, e.g.

// Host Interface (HIF) Group IDs
#define GID_MAIN        0
#define GID_WIFI        1
#define GID_IP          2
#define GID_HIF         3

// Host Interface operations with Group ID (GID)
#define GIDOP(gid, op) ((gid << 8) | op)
#define GOP_STATE_CHANGE    GIDOP(GID_WIFI, 44)
#define GOP_DHCP_CONF       GIDOP(GID_WIFI, 50)
#define GOP_CONN_REQ_NEW    GIDOP(GID_WIFI, 59)
#define GOP_BIND            GIDOP(GID_IP,   65)
..and so on..

To invoke an operation on the module, you must first send a 4-byte header that gives an 8-bit operation number, 8-bit group, and 16-bit message length.

typedef struct {
    uint8_t gid, op;
    uint16_t len;
} HIF_HDR;

The next 4 bytes of the message are unused, so can either be sent as zeros, or just skipped. Then there is the command header, which varies depending on the operation being performed, but are often 16 bytes or less, for example the IP ‘bind’ command:

// Address field for socket, network order (MSbyte first)
typedef struct {
    uint16_t family, port;
    uint32_t ip;
} SOCK_ADDR;

// Socket bind command, 12 bytes
typedef struct {
    SOCK_ADDR saddr;
    uint8_t sock, x;
    uint16_t session;
} BIND_CMD;

I’ll be discussing the IP operations in detail in the next part.

The interrupt request (IRQ) line is pulled low by the module to indicate that a response is available; for simplicity, my code polls this line, and calls an interrupt handler.

if (read_irq() == 0)
    interrupt_handler();

Joining a network

I’ll start with the most common use-case; joining a network that uses WiFi Protected Access (WPA or WPA2), and obtaining an IP address using Dynamic Host Configuration Protocol (DHCP). This is remarkably painless, since the module firmware does all of the hard work, but first we have to tackle the issue of firmware versions.

As previously explained, the module comes pre-loaded with firmware; at the time of writing, this is generally version 19.5.2 or 19.6.1. There is a provision for re-flashing the firmware to the latest version, but for the time being I’d like to avoid that complication, so the code I’ve written is compatible with both versions.

The reason that this matters is that 19.6.1 introduced a new method for joining a network, with a new operation number (59, as opposed to 40). Fortunately the newer software can still handle the older method, so that is what I’ll be using by default, though there is a compile-time option to use the new one, if you’re sure the module has the newer firmware.

The code to join the network is remarkably brief, just involving some data preparation, then calling a host interface transfer function to send the data. It searches across all channels to find a signal that matches the given Service Set Identifier (SSID, or network name). A password string (WPA passphrase) is also given; if this is a null value, the module will attempt to join an ‘open’ (insecure) network, but there are very obvious security risks with this, so it is not recommended.

// Join a WPA network, or open network if null password
bool join_net(int fd, char *ssid, char *pass)
{
#if NEW_JOIN
    CONN_HDR ch = {pass?0x98:0x2c, CRED_STORE, ANY_CHAN, strlen(ssid), "",
                   pass?AUTH_PSK:AUTH_OPEN, {0,0,0}};
    PSK_DATA pd;

    strcpy(ch.ssid, ssid);
    if (pass)
    {
        memset(&pd, 0, sizeof(PSK_DATA));
        strcpy(pd.phrase, pass);
        pd.len = strlen(pass);
        return(hif_put(fd, GOP_CONN_REQ_NEW|REQ_DATA, &ch, sizeof(CONN_HDR),
               &pd, sizeof(PSK_DATA), sizeof(CONN_HDR)));
    }
    return(hif_put(fd, GOP_CONN_REQ_NEW, &ch, sizeof(CONN_HDR), 0, 0, 0));
#else
    OLD_CONN_HDR och = {"", pass?AUTH_PSK:AUTH_OPEN, {0,0}, ANY_CHAN, "", 1, {0,0}};

    strcpy(och.ssid, ssid);
    strcpy(och.psk, pass ? pass : "");
    return(hif_put(fd, GOP_CONN_REQ_OLD, &och, sizeof(OLD_CONN_HDR), 0, 0, 0));
#endif
}

Running the code

There are 3 source files in the ‘part1’ directory on Github here:

winc_pico_part1.c: main program, with RP2040-specific code
winc_wifi.c: module interface
winc_wifi.h: module interface definitions

The default network name and passphrase are “testnet” and “testpass”; these will have to be changed to match your network.

Normally I’d provide a simple Pi command-line to compile & run the files, but this is considerably more complex on the Pico; you’ll have to refer to the official documentation for setting up the development tools. I’ve provided a simple cmakelists file, that may need to be altered to suit your environment.

There is a compile-time ‘verbose’ setting, which regulates the amount of diagnostic information that is displayed on the console (serial link). Level 1 shows the following:

Firmware 19.5.2, OTP MAC address F8:F0:05:xx.xx.xx
Connecting...........
Interrupt gid 1 op 44 len 12 State change connected
Interrupt gid 1 op 50 len 28 DHCP conf 10.1.1.11 gate 10.1.1.101

[or if the network can't be found]
Interrupt gid 1 op 44 len 12 State change fail

Verbose level 2 lists all the register settings as well, e.g.

Rd reg 1000: 001003a0
Rd reg 13f4: 00000001
Rd reg 1014: 807c082d
Rd reg 207bc: 00003f00
Rd reg c000c: 00000000
Rd reg c000c: 10add09e
Wr reg 108c: 13521330
Wr reg 14a0: 00000102
..and so on..

Level 3 also includes hex dumps of the data transfers.

Socket interface

Part 2 describes the socket interface, with TCP and UDP servers here.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.