iosoftcode

SIAD: Simple Isolated Analog-to-Digital converter

Circuit diagram of Simple Isolated Analog-to-Digital converter

This project was prompted by the need for a circuit to provide a galvanically-isolated measurement of battery voltage. Existing designs tend to be complicated, expensive or both, so a new approach was adopted, that keeps the circuitry simple, and relies on the timing capability of a CPU. I’m using a Pi Pico RP2040, but almost any microcontroller would do, so long as it has a simple pulse-timing capability.

Why does the measurement have to be isolated? In my case, the equipment is in an environment with a lot of electrical noise, so good isolation is essential, but the technique has other advantages:

Measurement of voltages that are floating with respect to ground
Avoidance of any ground-loops
Protection against transients on the lines being measured
Simple hardware that can be implemented using surface-mount or robust through-hole devices
Low-cost readily-available parts
Negligible load-current when not measuring

The last point is quite important since in battery-powered systems, we don’t want the monitoring system to act as a continuous drain on the battery itself.

Operating method

Charge / discharge of 1 uF capacitor with 1K resistor and 10V supply

A standard method for converting an analogue voltage to digital is the dual-slope ADC, whereby the time taken to charge a capacitor to the unknown voltage is compared with the time taken for a known ‘reference’ voltage; the ratio of the two indicates the value of the unknown voltage.

One obvious simplification would be to just use a single slope; measure the time taken to charge the capacitor to a known voltage, and since capacitor-charging follows a known exponential curve, a simple equation would yield the voltage value.

The big problem with this approach is the capacitor itself; there are many types of capacitor, but they all share the same characteristic to a greater-or-lesser extent: their capacitance value is not very accurately defined. If you look through a typical electronics catalogue, there aren’t many capacitors below 10% tolerance, and many are far worse than that. Then there is the issue of the capacitor value changing with temperature & age, and even in some cases, the applied voltage.

So it is important to minimise the effect of a change in capacitance, by basically comparing the time taken to charge up using the unknown voltage with the time taken to discharge from a known voltage. The charge & discharge times can easily be measured by a microcontroller, using digital signals that have been passed through an opto-coupler to provide isolation.

Shunt voltage reference

The TL431 is known as a ‘shunt voltage reference’ since its primary function is to clamp a voltage to a known value. It is also known as a ‘programmable zener diode’ which accounts for the surprising terminology, whereby the anode is the negative terminal, and the cathode is the positive terminal. The anode-cathode junction is essentially open-circuit until the voltage of the reference pin is around 2.5 volts (2.470 to 2.520 V according to the TL432A datasheet), when that junction will conduct. So in the above circuit diagram, if R2 and R3 are equal, and the input voltage is over 5V, then the output voltage will be clamped to 5 volts.

However the cathode line doesn’t necessarily have to go to the R1 R2 junction; in my design it goes to an optocoupler diode. This means that the opto will be energised when the gate pin reaches 2.5 volts. An additional side effect (not explicitly mentioned in the datasheet) is that the gate line is clamped to 2.5V; this becomes relevant when simulating the circuit.

Charge cycle

To satisfy the requirement for near-zero current when not measuring, a digital signal (through an opto-coupler) is used to start the measurement. I’ve used a FET opto-coupler, since its close-to-zero on-resistance is useful for simplifying the design, but a conventional Bipolar Junction Transistor (BJT) type could be used instead, taking into account the voltage drop across its emitter-collector junction.

Capacitor C1 is charged through R1, with resistors R2 & R3 acting as a load across the capacitor. Once the capacitor is fully charged, the voltage across it will be:

VCmax = VIN * (R2+R3) / (R1+R2+R3)

So if the input is 12V, the final voltage across C1 will be 9.717 volts. The standard value for measuring the charge-time of a capacitor is the ‘time constant’, which is the time taken to charge it to 63.2% of the final voltage, and if R2 and R3 weren’t there, this would be equal to C1 * R1, which is 4.7 milliseconds. The effect of C2 and C3 is to reduce the time-constant as if R2 + R3 were placed in parallel with R1, so the time-constant becomes:

TC = C1 * (R1 * (R2 + R3) / (R1 + (R2 + R3)))

This reduces the time-constant to 3.806 ms. The equation for the voltage across C1 is:

V = VS * (1 - exp(-T / TC))
Where:
    VS is supply voltage
    T is time in seconds
    TC is time-constant (C * R)

It follows that the equation to calculate the time to reach a given voltage is:

T = -log(1 - (V / VS)) * TC

The way we’ll measure charge-time is start a clock when the opto-coupler is energised, and stop that clock when the TL431 device switches on; since R2 has the same value as R3, the switch-on will happen when the voltage across C1 is 5V. So the time to switch on for a 12V supply will be:

-log(1 - (5 / VCmax)) * TC = 0.002751 seconds

Discharge cycle

Once the voltage across C1 has stabilised, the CPU can switch off the supply to the U1 optocoupler, and the voltage across C1 will decay; when it falls below 5V, the TL431 will switch off, switching off U2. So the discharge time is measured between U1 and U2 being switched off.

Charge / discharge simulation for 10V supply

The discharge time would be quite easy to calculate were it not for the fact that the TL431 clamps the reference pin to 2.5 volts with respect to the anode, so the discharge isn’t a simple exponential curve.

The best way to calibrate the ADC is to use known supply voltages, and use the microcontroller to report the charge & discharge times, and the ratio between them, e.g.

Supply Charge  Disch	Ratio
20	   1.784   16.682	0.1069
19	   1.891   15.981	0.1183
18	   2.013   15.254	0.1320
17	   2.154   14.487	0.1487
16	   2.318   13.662	0.1697
15	   2.513   12.793	0.1964
14	   2.746   11.854	0.2317
13	   3.031   10.841	0.2796
12	   3.391    9.740	0.3482
11	   3.859    8.533	0.4522
10.5   4.144    7.864	0.5270
10	   4.495    7.195	0.6247
9.5	   4.897    6.448	0.7595
9	   5.420    5.680	0.9542
8.5	   6.046    4.842	1.2487
8	   6.916    3.950	1.7509
7.5	   8.100    2.977	2.7209
7	  10.048    1.915	5.2470

The tests have been done in descending order, so the time ratios are in ascending order, which simplifies the process of converting a measured time ratio into a voltage using linear interpolation:

#define NUM_CAL_VALS 16
float cal_ratios[NUM_CAL_VALS] = { 0.1069, 0.1183, 0.1320, 0.1487,
                   0.1697, 0.1964, 0.2317, 0.2796, 0.3482, 0.4522,
                   0.5270, 0.6247, 0.7595, 0.9542, 1.2487, 1.7509 };
float cal_volts[NUM_CAL_VALS] = { 20,     19,     18,     17,
                  16,     15,     14,     13,     12,     11,
                  10.5,   10,     9.5,    9,      8.5,    8 };

float chg_t, dis_t;  // Charge & discharge times

float val = dis_t > 0 ? chg_t / dis_t : 0;
volts = interp(val, cal_ratios, cal_volts, NUM_CAL_VALS);

// Return interpolated value
float interp(float val, float *invals, float *outvals, int size)
{
    float diff, indiff, outdiff, ret = 0.0;
    int i = 0;
    
    while (i < size-1 && ret == 0.0)
    {
        if (invals[i+1] > val)
        {
            diff = val - invals[i];
            indiff = invals[i + 1] - invals[i];
            outdiff = outvals[i + 1] - outvals[i];
            ret = outvals[i] + outdiff * diff / indiff;
        }
        i++;
    }    
    return (ret);
}

Design changes

There is plenty of scope for changes or improvements to this design; for example, the current component values give a useful measurement range of 8 to 20 volts, which can be increased by changing R1, though it is necessary to bear in mind the ‘absolute maximum’ anode-cathode voltage of the TL431, which is 37 volts.

The capacitor value can be increased to give longer charge & discharge times, but the overall measurement time will increase as well, as the discharge cycle should only be started when the capacitor is fully charged. A suitable candidate would be a 10 uF tantalum device, but don’t use a wide-tolerance part such as an aluminium electrolytic, if you want the measurements to have long-term repeatability.

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

Accurate frequency & timing measurement on the Pi Pico RP2040 using Python

In a previous post, I looked at ways of measuring frequency accurately using an RP2040 CPU programmed in C. In this project, I have re-coded those functions in MicroPython, and provided a few enhancements.

I use Direct Memory Access (DMA) to minimise the CPU workload, and avoid the need for interrupts, but this raises a problem: when programming in C, there is a substantial Application Programming Interface (API) that allows everything on the RP2040 chip to be configured using simple function calls. However, MicroPython aims to be more-or-less compatible with a wide range of CPUs, so not much CPU-specific code has been included. This means that we have to do some low-level programming if we want to use DMA, or the more obscure functions of the RP2040 Pulse Width Modulation (PWM) peripheral.

My first attempt at ADC and DMA low-level programming used MicroPython ‘uctypes’ to create a model of the peripherals, and if you are interested in the nuts-and-bolts of this method, I suggest you browse the post here.

This approach produced excellent results, but wasn’t very ‘pythonic’, so I’ve encapsulated common API functions in Python classes; when choosing function names, I’ve copied those in the C API when possible. So for example, setting the PWM peripheral in C is:

#define PWM_GPIO_PIN 3
uint slice = pwm_gpio_to_slice_num(PWM_GPIO_PIN);
pwm_config cfg = pwm_get_default_config();
pwm_config_set_clkdiv_mode(&cfg, PWM_DIV_B_RISING);
// ..and any other settings, then..
pwm_init(slice, &cfg, false);

In MicroPython this is done by instantiating a device from the PWM class:

import pico_devices as devs
PWM_GPIO_PIN = 3
pwm = devs.PWM(PWM_GPIO_PIN)
pwm.set_clkdiv_mode(devs.PWM_DIV_B_RISING)
# ..and any other settings..

Not only are the function call names very similar between C and Python, but also the underlying code is similar; for example, the set_clkdiv function is defined as:

class PWM:
    def set_clkdiv_mode(self, mode):
        self.slice.CSR.DIVMODE = mode

where ‘slice’ is the (somewhat unusual) name for a PWM channel, CSR is the Control and Status Register, and DIVMODE is a 2-bit field within that register. If you want to learn more about the internal structure of the PWM peripheral, I suggest you take a look at the C language post.

The Python classes have been kept very ‘lean’, with no error-checking of the parameters; when time permits, I intend to create a sub-class that provides comprehensive parameter-checking .

Test signal

We need a waveform for the tests, which the RP2040 PWM peripherals can easily provide. The following code generates a 100 kHz square wave on GPIO pin 4:

PWM_OUT_PIN = 4             # GPIO pin for output
PWM_DIV = 125               # 125e6 / 125 = 1 MHz
PWM_WRAP = 9                # 1 MHz / (9 + 1) = 100 kHz
PWM_LEVEL = (PWM_WRAP+1)//2 # 50% PWM

# Start a PWM output
def pwm_out(pin, div, level, wrap): 
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_int_frac(div, 0)
    pwm.set_wrap(wrap)
    pwm.set_chan_level(pwm.gpio_to_channel(pin), level)
    pwm.set_enabled(1)
    return pwm

test_signal = pwm_out(PWM_OUT_PIN, PWM_DIV, PWM_LEVEL, PWM_WRAP)

The choice of GPIO pin number is quite arbitrary, you just need to be aware when programming PWM that there are are 16 channels (technically 8 slices, each with 2 channels) mapped onto 32 pins, so if I’m using GPIO pin 4 for this signal, I can’t use PWM on GPIO 20. However, that pin is still available for any other purpose, so the limitation usually isn’t a problem.

16-bit counter

The RP2040 PWM peripheral can also act as a pulse counter, and by counting the number of edges in a given time, we can establish the frequency. This pulse input capability is only available on odd GPIO pin numbers (i.e. channel B of a PWM ‘slice’).

The code is quite simple; it is only necessary to set the mode, and the frequency divisor:

# Initialise PWM as a pulse counter (gpio must be odd number)
def pulse_counter_init(pin, rising=True):
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    ctr = devs.PWM(pin)
    ctr.set_clkdiv_mode(devs.PWM_DIV_B_RISING if rising else devs.PWM_DIV_B_FALLING)
    ctr.set_clkdiv(1)
    return ctr

Now we can make a simple frequency meter, by clearing the count to zero, then enabling the counter for a specific period of time:

# Enable or disable pulse counter
def pulse_counter_enable(ctr, en):
    if en:
        ctr.set_ctr(0)
    ctr.set_enabled(en)

# Get value of pulse counter
def pulse_counter_value(ctr):
    return ctr.get_counter()

pulse_counter_enable(counter, True)
time.sleep(0.1)
pulse_counter_enable(counter, False)
val = pulse_counter_value(counter)
print("Sleep 0.1s, count %u" % val)

The count-value for a 100 kHz signal and a 100 millisecond sleep is theoretically 10000, but is actually around 10015, due to the time-delays associated with the Python instructions. I’ll be describing ways to eliminate these delays later on in this post.

32-bit counter

Unfortunately the PWM peripheral has only a 16-bit counter, which can be too short for high-frequency signals or long sample-times. In the C code I polled the count value, to count the number of times it rolled over past 65535, then add on that number to the final result.

An alternative method is to take advantage of the fact that the DMA controller has a 32-bit down-counter that decrements every time a DMA cycle is completed. So we just need to set the DMA count to a large number, and set the PWM peripheral to trigger a DMA cycle on every rising or falling edge of the input signal.

This prompts the question “what data should the DMA transfer?” and the answer is “anything: it doesn’t matter”, but we still have to be careful to ensure the transfers don’t over-write some random area in memory. We need to very carefully specify the DMA source & destination addresses, using the binary ‘array’ data type, which occupies a fixed area of memory, without all the complexities of a Python ‘object’. The syntax to create the array is somewhat unusual; rather than just specifying a size, we have to provide the initial data using an iterator. The pico_devices library provides a helper function to simplify the process:

# Create 32-bit array (to receive DMA data)
def array32(size):
    return array.array('I', (0 for _ in range(size)))

To use this array with DMA, we have to get its address in memory, using ‘uctypes.addressof’:

# Get address of variable (for DMA)
def addressof(var):
    return uctypes.addressof(var)

Armed with these functions, we can initialise the DMA controller:

ext_data = devs.array32(1)  # Dummy array for extended counter

# Use DMA to extend pulse counter to 32 bits
def pulse_counter_ext_init(ctr):
    ctr.set_enabled(False)
    ctr.set_wrap(0)
    ctr.set_ctr(0)
    ctr_dma = devs.DMA()
    ctr_dma.set_transfer_data_size(devs.DMA_SIZE_8)
    ctr_dma.set_read_increment(False)
    ctr_dma.set_write_increment(False)
    ctr_dma.set_read_addr(devs.addressof(ext_data))
    ctr_dma.set_write_addr(devs.addressof(ext_data))
    ctr_dma.set_dreq(ctr.get_dreq())
    ctr.set_enabled(True)
    return ctr_dma

We’re modifying the PWM 16-bit counter to wrap around to zero on every input edge, and initialising its starting value to zero, which is essential. Then a DMA channel is instantiated, and configured to copy 8 bits from the data array back to the data array.

It is important to note that enabling the DMA counter does not necessarily start the transfer from scratch; if a transfer has already started, the controller will resume counting where it left off. So it is necessary to clear out any existing transfer, before stating a new one:

# Start the extended pulse counter
def pulse_counter_ext_start(ctr_dma):
    ctr_dma.abort()
    ctr_dma.set_trans_count(0xffffffff, True)
 
# Stop the extended pulse counter
def pulse_counter_ext_stop(ctr_dma):
    ctr_dma.set_enable(False)
 
# Return value from extended pulse counter
def pulse_counter_ext_value(ctr_dma):
    return 0xffffffff - ctr_dma.get_trans_count()

Using the extended pulse counter is similar to the 16-bit counter, we just need to remember the limitation that if the 32-bit value is exceeded, the counter will stop, and not wrap around.

counter_dma = pulse_counter_ext_init(counter)
pulse_counter_ext_start(counter_dma)
time.sleep(1.0)
val = pulse_counter_ext_value(counter_dma)
pulse_counter_ext_stop(counter_dma)
print("Sleep 1.0s, ext count %u" % val)

A typical response is:

Sleep 1.0s, ext count 100011

This shows that the count is not limited to 16 bits.

Gating a counter

To accurately count pulses with a specific time-frame, it is necessary for the timing to be accurate, and as we have seen, the ‘sleep’ function introduces a significant error. To eliminate this, we need a hardware mechanism whereby a timer directly enables and disables the counter (a process called ‘gating’) without requiring any intervention from the CPU.

This involves programming another PWM channel to act as a timer, then when the time has expired, using DMA to modify the counter’s register to stop counting. The default PWM clock is 125 MHz, the prescaler is 8 bits, and the counter register is 16 bits, effectively 17 bits if we engage ‘phase correct’ mode. So the slowest gate frequency is 125e6 / (256 * 65536 * 2) = 3.725 Hz, a gate-time of 0.268 seconds; I’ve opted for 0.25 seconds.

GATE_TIMER_PIN = 0          # Used to define PWM slice
GATE_PRESCALE = 250         # 125e6 / 250 = 500 kHz
GATE_WRAP = 125000          # 500 kHz / 125000 = 4 Hz (250 ms)

# Initialise PWM as a gate timer
def gate_timer_init(pin):
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_int_frac(GATE_PRESCALE, 0)
    pwm.set_wrap(int(GATE_WRAP/2 - 1))
    pwm.set_chan_level(pwm.gpio_to_channel(pin), int(GATE_WRAP/4))
    pwm.set_phase_correct(True)
    return pwm

This code uses GPIO pin 0 to identify which PWM ‘slice’ is being used. Since we aren’t enabling PWM I/O on that pin, it can still be used for any other function, such as serial output.

We need to trigger a DMA cycle when the gate PWM times out; that cycle will be used to disable the counter PWM. So when initialising the DMA channel, we need to capture the non-enabled state, that will be written into the counter CSR register; this is only one 32-bit value, but I’m using a fixed array for that value, since DMA can’t handle the complexities of Python object storage.

gate_data = devs.array32(1)

# Initialise gate timer DMA
def gate_dma_init(ctr, gate):
    dma = devs.DMA()
    dma.set_transfer_data_size(devs.DMA_SIZE_32)
    dma.set_read_increment(False)
    dma.set_write_increment(False)
    dma.set_dreq(gate.get_dreq())
    gate_data[0] = ctr.slice.CSR_REG
    dma.set_read_addr(devs.addressof(gate_data))
    dma.set_write_addr(ctr.get_csr_address())
    return dma

To start the frequency measurement, it is necessary to set the DMA count (since this is reset by a DMA cycle), enable DMA, then enable the gate & counter PWM devices simultaneously.

# Start frequency measurment using gate
def freq_gate_start(ctr, gate, dma):
    ctr.set_ctr(0)
    gate.set_ctr(0)
    dma.set_trans_count(1, True)
    ctr.set_enables((1<<ctr.slice_num) | (1<<gate.slice_num), True)

If all is well, the test signal frequency should be reported correctly on the console:

Gate 250.0 ms, count 25000, freq 100.0 kHz

Edge timer

An alternative method for measuring frequency is to measure the times between the rising or falling edges, producing values that are the reciprocal of the frequency. This is particularly useful when dealing with slow signals, or if you want to implement a method to eliminate ‘rogue’ pulses, since you get one measurement for each cycle of the input signal, so could reject pulses that are outside the expected frequency range.

By now, you won’t be surprised to learn that I use DMA to capture the time-value for each edge; there is a convenient 32-bit microsecond-value that can be copied into a suitably-sized array on edge positive or negative edge; the sole function of the PWM peripheral is to detect the edge, and trigger a DMA cycle.

We’ll be generating a test signal as before, but this time it is 10 Hz:

PWM_DIV = 250               # 125e6 / 125 = 500 kHz
PWM_WRAP = 50000 - 1        # 500 kHz / 50000 = 10 Hz

The PWM peripheral initialisation is similar to previous functions:

# Initialise PWM as a timer (gpio must be odd number)
def timer_init(pin, rising=True):
    devs.gpio_set_function(pin, devs.GPIO_FUNC_PWM)
    pwm = devs.PWM(pin)
    pwm.set_clkdiv_mode(devs.PWM_DIV_B_RISING if rising else devs.PWM_DIV_B_FALLING)
    pwm.set_clkdiv(1)
    pwm.set_wrap(0);
    return pwm

The DMA controller is also similar, but the destination is set to auto-increment with every transfer:

# Initialise timer DMA
def timer_dma_init(timer):
    dma = devs.DMA()
    dma.set_transfer_data_size(devs.DMA_SIZE_32)
    dma.set_read_increment(False)
    dma.set_write_increment(True)
    dma.set_dreq(timer.get_dreq())
    dma.set_read_addr(devs.TIMER_RAWL_ADDR)
    return dma

Starting the DMA is a bit different; firstly we need to use the ‘abort’ command to stop any previous transfer that might still be in progress. If we didn’t do that, and just enabled DMA, the transfer would resume where it left off – the new settings would be ignored. Secondly we need to set both the transfer count and the destination address, since these will have been changed by a previous transfer.

# Start frequency measurment using gate
def timer_start(timer, dma):
    timer.set_ctr(0)
    timer.set_enabled(True)
    dma.abort()
    dma.set_write_addr(devs.addressof(time_data))
    dma.set_trans_count(NTIMES, True)

The main program needs to perform some simple maths to derive the frequency, guarding against the possibility that there may be insufficient time-values (1 or less):

NTIMES = 9                       # Number of time samples
time_data = devs.array32(NTIMES) # Time data

timer_pwm = timer_init(PWM_IN_PIN)
timer_dma = timer_dma_init(timer_pwm)
    
timer_start(timer_pwm, timer_dma)
time.sleep(1.0)
timer_stop(timer_pwm)
    
count = NTIMES - timer_dma.get_trans_count()
data = time_data[0:count]
diffs = [data[n]-data[n-1] for n in range(1, len(data))]
total = sum(diffs)
freq = (1e6 * len(diffs) / total) if total else 0
print("%u samples, total %u us, freq %3.1f Hz" % (count, total, freq))

The frequency of the test signal should be displayed on the console:

9 samples, total 800000 us, freq 10.0 Hz

Running the code

The source files are available on Github here. It is necessary to load the library file ‘pico_devices.py’ onto the target system; if you are using the Thonny editor, this is done by right-clicking the filename, and selecting ‘Upload to /’. You can then run one of the program files to measure frequency:

pico_counter.py: use simple pulse counting and sleep timing
pico_freq.py: use DMA to accurately gate the pulse counter
pico_timer.py: use time-interval (reciprocal) measurement

Don’t forget to link GPIO pin 4 (the test signal output) to pin 3 (the measurement input) for these tests.

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

Accurate frequency measurement on the Pi Pico RP2040 using C

This project describes the creation of the software for a vehicle speedometer. It measures the frequency of a signal that is proportional to speed, but I’m also using the opportunity to explore two possible techniques for measuring frequency with high accuracy using an RP2040 processor.

If you’d prefer Python code, see this post.

Counting transitions using PWM

Unusually, I’m going to use the RP2040 PWM (Pulse Width Modulation) peripheral to count the clock pulses. PWM is normally used to generate signals, not measure them, but the RP2040 peripheral has an unusual feature; it can be use to count pulse-edges, namely the number of low-to-high or high-to-low transitions of an input signal. If we start this counter, and stop after a specific time (which is commonly known as ‘gating’ the counter) then we obtain the frequency by dividing the count by the time.

The RP2040 documentation uses the unusual term ‘slice’ to denote one of the PWM channels, where each slice has 2 ‘channels’ (A and B) that are associated with specific I/O pins, as follows:

Slice 0 channel A uses GPIO 0, Slice 0 channel B uses GPIO 1
Slice 1 channel A uses GPIO 2, Slice 1 channel B uses GPIO 3
Slice 2 channel A uses GPIO 4, Slice 2 channel B uses GPIO 5
..and so on..

When generating PWM signals both channels can be used, but when counting edges we must use the ‘B’ channels, so an odd GPIO pin number.

The PWM counter isn’t difficult to set up, using the standard Pico hardware interface library:

uint counter_slice;

// Initialise frequency pulse counter
void freq_counter_init(int pin) 
{
    assert(pwm_gpio_to_channel(pin) == PWM_CHAN_B);
    counter_slice = pwm_gpio_to_slice_num(pin);

    gpio_set_function(pin, GPIO_FUNC_PWM);
    pwm_config cfg = pwm_get_default_config();
    pwm_config_set_clkdiv_mode(&cfg, PWM_DIV_B_RISING);
    pwm_config_set_clkdiv(&cfg, 1);
    pwm_init(counter_slice, &cfg, false);
}

Starting, stopping and reading the count is quite easy:

// Get count 
uint16_t freq_counter_read(uint msec)
{
    pwm_set_counter(counter_slice, 0);
    pwm_set_enabled(counter_slice, true);
    sleep_ms(msec);
    pwm_set_enabled(counter_slice, false);    
    return (uint16_t) pwm_get_counter(counter_slice);
}

This code works fine, with a few limitations:

The sleep_msec() call isn’t very accurate, its timing will depend on other activities the RP2040 is performing (such as USB interrupts).
During the sleep_ms() call the CPU is just wasting time, doing nothing; if we want to do something else (such as scanning a button input) then we’ll have to use a timer interrupt for that, which will further reduce the accuracy of the sleep timing.
The count is a 16-bit number, so we must choose a gate-time that ensures the counter won’t overflow.

To fix points 1 & 2 we need the gating timer to be implemented in hardware, and we can use another PWM ‘slice’ for this purpose; it is basically acting as an up-counter fed from a known clock, and when it times out, we halt the counter PWM, so its value is fixed.

To achieve good accuracy, there should also be a hardware link between the timeout of the timer PWM and the stopping of the counter PWM, so it won’t matter what code the CPU is executing at the time. Fortunately the timer PWM can trigger a DMA (Direct Memory Access) cycle when it times out, and we can use this cycle to update the counter PWM control register, stopping the counter. This means that once the two PWM slices are started (counter & timer) no more CPU intervention is required, until the data capture is finished.

Initialising the gate time & DMA is a little more complicated:

#define TIMER_PRESCALE      250     // 8-bit value
#define TIMER_WRAP          125000  // 17-bit value
#define SAMPLE_FREQ         (125000000 / (TIMER_PRESCALE*TIMER_WRAP))

uint gate_slice, gate_dma_chan, timer_dma_dreq, csr_stopval;

// Initialise gate timer, and DMA to control the counter
void gate_timer_init(int pin)
{
    gate_slice = pwm_gpio_to_slice_num(pin);
    io_rw_32 *counter_slice_csr = &pwm_hw->slice[counter_slice].csr;
    
    gpio_set_function(pin, GPIO_FUNC_PWM);
    pwm_set_clkdiv_int_frac(gate_slice, TIMER_PRESCALE, 0);
    pwm_set_wrap(gate_slice, TIMER_WRAP/2 - 1);
    pwm_set_chan_level(gate_slice, PWM_CHAN_B, TIMER_WRAP/4);
    pwm_set_phase_correct(gate_slice, true);
        
    gate_dma_chan = dma_claim_unused_channel(true);
    dma_channel_config cfg = dma_channel_get_default_config(gate_dma_chan);
    channel_config_set_transfer_data_size(&cfg, DMA_SIZE_32);
    channel_config_set_read_increment(&cfg, false);
    channel_config_set_dreq(&cfg, pwm_get_dreq(gate_slice));
    csr_stopval = *counter_slice_csr;
    dma_channel_configure(gate_dma_chan, &cfg, counter_slice_csr, &csr_stopval, 1, false);
    pwm_set_enabled(gate_slice, true);
}

The PWM wrap settings deserve some explanation; since the hardware register is 16 bits, the obvious choice is to set the timer wrap value to 65536 or less, but this means the longest gate-time is 65536 * 256 / 125 MHz = 134 msec. However, by selecting ‘phase correct’ mode, the PWM device counts up to the wrap value, then back down again, before triggering the DMA. So the wrap value effectively becomes 17 bits wide, and we can set a gate time of 250 msec, as in the code above.

To run a capture cycle, it is only necessary to give the DMA controller the address of the register to be modified (the counter PWM ‘csr’ register) and start both PWM slices simultaneously.

// Start pulse counter
void counter_start(void)
{
    dma_channel_transfer_from_buffer_now(timer_dma_chan, &csr_stopval, 1);
    pwm_set_counter(counter_slice, 0);
    pwm_set_counter(gate_slice, 0);
    pwm_set_mask_enabled((1 << counter_slice) | (1 << gate_slice));
}

To get read the counter value, we need to wait until the DMA cycle is complete, then stop the timer and access the count register.

// Get pulse count
int counter_value(void)
{
    while (dma_channel_is_busy(gate_dma_chan)) ;
    pwm_set_enabled(gate_slice, false);    
    return((uint16_t)pwm_get_counter(counter_slice));
}

There is still the problem of the counter potentially overflowing if there is a high-frequency input; this could be solved using a PWM overflow interrupt, but personally I prefer to poll the count value to check for overflow, for example:

uint counter_lastval, counter_overflow;

// Check for overflow, and check if capture complete
bool counter_value_ready(void)
{
    uint n = pwm_get_counter(counter_slice);
    
    if (n < counter_lastval)
        counter_overflow++;
    counter_lastval = n;
    return (!dma_channel_is_busy(timer_dma_chan));
}

A capture cycle that is protected against counter overflow could now look like:

counter_start();
while (!counter_value_ready())
{
    // Insert code here to be executed while waiting for value
}
printf("%u Hz\n", frequency_value());

Reciprocal measurement

The ‘count-the-edges’ technique described above works well for reasonably fast signals (e.g. 1 kHz and above), but at lower frequencies it is quite inaccurate, so an alternative technique is used, which is known as ‘time-interval’ or ‘reciprocal’ measurement.

Instead of counting the number of transitions in a given time, the new technique measures the time between the transitions, for one or more cycles; the inverse of this time is the frequency.

We have already seen how the RP2040 PWM peripheral can be used to count pulses and generate DMA requests; if the ‘wrap’ value is set to zero, then the PWM will generate a DMA request for every positive-going transition of the input signal. This DMA cycle can be used to copy the 32-bit microsecond value from a timer register into an array. So we end up with an array of microsecond timing values, and the inverse of these is the frequency.

The modifications to the previously-described code aren’t substantial, we just need to modify the counter ‘wrap’ value, and set up the DMA transfers of the timing values.

#define NUM_EDGE_TIMES      11
#define EDGE_WAIT_USEC      2000001
uint edge_times[NUM_EDGE_TIMES]; 

// Initialise DMA to store the edge times
void edge_timer_init(void) 
{
    timer_dma_chan = dma_claim_unused_channel(true);
    dma_channel_config cfg = dma_channel_get_default_config(timer_dma_chan);
    channel_config_set_transfer_data_size(&cfg, DMA_SIZE_32);
    channel_config_set_read_increment(&cfg, false);
    channel_config_set_write_increment(&cfg, true);
    channel_config_set_dreq(&cfg, pwm_get_dreq(counter_slice));
    dma_channel_configure(timer_dma_chan, &cfg, edge_times, &timer_hw->timerawl, NUM_EDGE_TIMES, false);
    pwm_set_wrap(counter_slice, 0);
}

The first 2 definitions give the number of cycles to be captured, and the length of time to wait for the edges to arrive. These may be tuned as required; a slow signal will require a long capture time, for example a 1 Hz signal needs at least 2 seconds to be sure of capturing 2 edges. Conversely, achieving high accuracy on a fast signal will require a large array, since the result is calculated from the average of all the values.

// Get average of the edge times
int edge_timer_value(void)
{
    uint i=1, n;
    int total=0;

    dma_channel_abort(timer_dma_chan);
    pwm_set_enabled(counter_slice, false);    
    while (i<NUM_EDGE_TIMES && edge_times[i])
    {
        n = edge_times[i] - edge_times[i-1];
        total += n;
        i++;
    }
    return(i>1 ? total / (i - 1) : 0);
}

// Get frequency value from edge timer
float edge_freq_value(void)
{
    int val = edge_timer_value();
    return(val ? 1e6 / val : 0);
}

The main loop now looks like:

edge_timer_init();
while (true) 
{
    memset(edge_times, 0, sizeof(edge_times));
    edge_timer_start();
    sleep_ms(EDGE_WAIT_USEC / 1000);
    printf("Frequency %8.6f Hz\n", edge_freq_value());
}

Since DMA has been used to capture the data, the sleep timing is completely non-critical; it can be increased to accommodate slower input signals, or reduced to provide a quicker answer with faster signals.

However, as mentioned above, the use of ‘sleep’ does render the CPU unresponsive for that duration, which is a problem if you want it to do other things, e.g. scan for button-presses. A simple way of fixing this issue (without resorting to interrupts) is to create a polled timer, that keeps track of the elapsed time, and returns a ‘true’ value when there is a timeout.

// Return non-zero if timeout
bool ustimeout(uint *tickp, uint usec)
{
    uint t = time_us_32(), dt = t - *tickp;

    if (usec == 0 || dt >= usec)
    {
        *tickp = t;
        return (1);
    }
    return (0);
}

Before using this timer, we call the function with a pointer to a ‘uint’ variable, and a zero microsecond value; this initialises the variable with the current time. Thereafter, we just call the function with the desired timeout value, and it will return ‘true’ on timeout. The main loop becomes:

uint edge_ticks;

while (true) 
{
    memset(edge_times, 0, sizeof(edge_times));
    edge_timer_start();
    ustimeout(&edge_ticks, 0);
    while (!ustimeout(&edge_ticks, EDGE_WAIT_USEC))
    {
        // Insert code here to be executed while waiting for value
    }
    printf("Frequency %5.3f Hz\n", edge_freq_value());

This code can be used to measure very low frequencies; for example, with the gate time of over 2 seconds, it is possible to measure the 1 PPS (1 Pulse Per Second) signal from a GPS module, to within 6 decimal places. This is useful for checking the accuracy of the Pico microsecond timer, since the PPS signal is locked to the very accurate satellite clocks.

Running the code

The source code is available on Github here. There is a single C source file (picofreq.c), with a definition at the top to choose between edge-counter and edge-timer (reciprocal) measurements, then some definitions for the two methods:

// Set zero to use edge-counter, 1 to use edge-timer
#define USE_EDGE_TIMER      0

// Definitions for edge-counter
#define TIMER_PRESCALE      250     // 8-bit value
#define TIMER_WRAP          125000  // 17-bit value
#define SAMPLE_FREQ         (125000000 / (TIMER_PRESCALE * TIMER_WRAP))

// Definitions for edge-timer
#define NUM_EDGE_TIMES      11
#define EDGE_WAIT_USEC      200001

In edge-counter mode, you need to select values that give you the required gate time, bearing in mind the prescaler value is 8 bits, and the wrap value is 17 bits. The settings above give a gate-time of 250 * 12500 / 125000000 = 0.25 seconds, which is close to the maximum.

For the edge-timer, the definitions are the number of edges to captured, and the overall time to wait. In the example above, the frequency is calculated using the average of 10 time-difference values, with a waiting time of 0.2 seconds. To gain maximum accuracy from fast signals, it will be necessary to increase the number of edges; conversely, if you want to measure a slow signal such as the 1 Hz output from a GPS, then the waiting-time needs to be increased to over 2 seconds.

In either mode, the program prints the frequency on the default serial console (115k baud on GPIO pin 0), and toggles the Pico on-board LED. If you are using the Pico-W wireless variant then the LED is driven by the CYW43439 WiFi chip, and it is much more complex to control, so if you want to retain this feature, an external LED may be used on any convenient pin number.

The GPIO pin definitions are:

#define FREQ_IN_PIN         7
#define GATE_TIMER_PIN      0

The frequency input can be any odd-numbered GPIO pin, as discussed above. The gate timer pin definition is just a convenient way of specifying a PWM slice, so the above definition selects slice 0 channel A. Since GPIO pin 0 is defined as a serial output, there is no clash between the serial & PWM signals; the PWM output signal is discarded.

To build and run the code, I have included a minimal CMakeLists.txt, the only addition being the enabling of all warnings:

cmake_minimum_required(VERSION 3.12)
include(pico_sdk_import.cmake)
project(picofreq C CXX ASM)
pico_sdk_init()
add_compile_options(-Wall)
add_executable(picofreq picofreq.c)
target_link_libraries(picofreq pico_stdlib hardware_pwm hardware_dma)
pico_add_extra_outputs(picofreq)

This generates a uf2 file in the ‘build’ directory that can be programmed into the Pico using the normal method (holding the pushbutton down while powering up, then drag-and-drop using the USB filesystem) but personally I find it much more convenient to use the the Pico debug probe, and in case you are using the Windows VS Code front-end, I have included the standard launch.json and settings.json files in a .vscode subdirectory, since I’ve found these to be essential for using VS Code. To download & debug the code, just hit ctrl-shift-D to bring up the debug window, then F5. If you are using and alternative debug adaptor, it will probably be necessary to modify launch.json, but the variety of options are too complex to be described here.

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

Picowi part 10: Web camera

A Web camera is quite a demanding application, since it requires a continuous stream of data to be sent over the network at high speed. The data volume is determined by the image size, and the compression method; the raw data for a single VGA-size (640 x 480 pixel) image is over 600K bytes, so some compression is desirable. Some cameras have built-in JPEG compression, which can compress the VGA image down to roughly 30K bytes, and it is possible to send a stream of still images to the browser, which will display them as if they came from a video-format file. This approach (known as motion-JPEG, or MJPEG) has a disadvantage in terms of inter-frame compression; since each frame is compressed in isolation, the compressor can’t reduce the filesize by taking advantage of any similarities between adjacent frames, as is done in protocols such as MPEG. However, MJPEG has the great advantage of simplicity, which makes it suitable for this demonstration.

Camera

The standard cameras for the full-size Raspberry Pi boards have a CSI (Camera Serial Interface) conforming to the specification issued by the MIPI (Mobile Industry Processor Interface) alliance. This high-speed connection is unsuitable for use with the Pico, we need something with a slower-speed SPI (Serial Peripheral Interface), and JPEG compression ability.

The camera I used is the 2 megapixel Arducam, which is uses the OV2640 sensor, combined with an image processing chip. It has I2C and SPI interfaces; the former is primarily for configuring the sensor, with the latter being for data transfer. Sadly the maximum SPI frequency is specified as 8 MHz, which compares unfavourably with the 60 MHz SPI we are using to communicate with the network.

The connections specified by Arducam are:

SPI SCK  GPIO pin 2
SPI MOSI          3
SPI MISO          4
SPI CS            5
I2C SDA           8
I2C SCL           9
Power             3.3V
Ground            GND

In addition, GPIO pin 0 is used as a serial console output, the data rate is 115200 baud by default.

I2C and SPI tests

The first step is to check that the i2c interface is connected correctly, by checking an ID register value:

#define CAM_I2C         i2c0
#define CAM_I2C_ADDR    0x30
#define CAM_I2C_FREQ    100000
#define CAM_PIN_SDA     8
#define CAM_PIN_SCL     9

i2c_init(CAM_I2C, CAM_I2C_FREQ);
gpio_set_function(CAM_PIN_SDA, GPIO_FUNC_I2C);
gpio_set_function(CAM_PIN_SCL, GPIO_FUNC_I2C);
gpio_pull_up(CAM_PIN_SDA);
gpio_pull_up(CAM_PIN_SCL);

WORD w = ((WORD)cam_sensor_read_reg(0x0a) << 8) | cam_sensor_read_reg(0x0b);
if (w != 0x2640 && w != 0x2641 && w != 0x2642)
    printf("Camera i2c error: ID %04X\n", w);

/ Read camera sensor i2c register
BYTE cam_sensor_read_reg(BYTE reg)
{
    BYTE b;
    
    i2c_write_blocking(CAM_I2C, CAM_I2C_ADDR, &reg, 1, true);
    i2c_read_blocking(CAM_I2C, CAM_I2C_ADDR, &b, 1, false);
    return (b);
}

Then we can check the SPI interface by writing values to a register, and reading them back:

#define CAM_SPI         spi0
#define CAM_SPI_FREQ    8000000
#define CAM_PIN_SCK     2
#define CAM_PIN_MOSI    3
#define CAM_PIN_MISO    4
#define CAM_PIN_CS      5

spi_init(CAM_SPI, CAM_SPI_FREQ);
gpio_set_function(CAM_PIN_MISO, GPIO_FUNC_SPI);
gpio_set_function(CAM_PIN_SCK, GPIO_FUNC_SPI);
gpio_set_function(CAM_PIN_MOSI, GPIO_FUNC_SPI);
gpio_init(CAM_PIN_CS);
gpio_set_dir(CAM_PIN_CS, GPIO_OUT);
gpio_put(CAM_PIN_CS, 1);

if ((cam_write_reg(0, 0x55), cam_read_reg(0) != 0x55) || (cam_write_reg(0, 0xaa), cam_read_reg(0) != 0xaa))
    printf("Camera SPI error\n");

Initialisation

The sensors require a large number of i2c register settings in order to function correctly. These are just ‘magic numbers’ copied across from the Arducam source code. The last block of values specify the sensor resolution, which is set at compile-time. The options are 320 x 240 (QVGA) 640 x 480 (VGA) 1024 x 768 (XGA) 1600 x 1200 (UXGA), e.g.

// Horizontal resolution: 320, 640, 1024 or 1600 pixels
#define CAM_X_RES 640

Capturing a frame

A single frame is captured by writing to a few registers, then waiting for the camera to signal that the capture (and JPEG compression) is complete. The size of the image varies from shot to shot, so it is necessary to read some register values to determine the actual image size. In reality, the camera has a tendency to round up the size, and pad the end of the image with some nulls, but this doesn’t seem to be a problem when displaying the image.

// Read single camera frame
int cam_capture_single(void)
{
    int tries = 1000, ret=0, n=0;
    
    cam_write_reg(4, 0x01);
    cam_write_reg(4, 0x02);
    while ((cam_read_reg(0x41) & 0x08) == 0 && tries)
    {
        usdelay(100);
        tries--;
    }
    if (tries)
        n = cam_read_fifo_len();
    if (n > 0 && n <= sizeof(cam_data))
    {
        cam_select();
        spi_read_blocking(CAM_SPI, 0x3c, cam_data, 1);
        spi_read_blocking(CAM_SPI, 0x3c, cam_data, n);
        cam_deselect();
        ret = n;
    }
    return (ret);
}

Reading the picture from the camera just requires the reading of a single dummy byte, then the whole block that represents the image; it is a complete JFIF-format picture, so no further processing needs to be done. If the browser has requested a single still image, we just send the whole block as-is to the client, with an HTTP header specifying “Content-Type: image/jpeg”

The following image was captured by the camera at 640 x 480 resolution:

MJPEG video

As previously mentioned, the Web server can stream video to the browser, in the form of a continuous flow of JPEG images. The requires a few special steps:

In the response header, the server defines the content-type as “multipart/x-mixed-replace”
To enable the browser to detect when one image ends, and another starts, we need a unique marker. This can be anything that isn’t likely to occur in the data stream; I’ve specified “boundary=mjpeg_boundary”
Before each image, the boundary marker must be sent, followed by the content-type (“image/jpeg”) and a blank line to mark the end of the header.

Timing

The timing will be quite variable, since it depends on the image complexity and network configuration, but here are the results of some measurements when fetching a single JPEG image over a small local network, using binary (not base64) mode:

Resolution (pixels)	Image capture time (ms)	Image size (kbyte)	TCP transfer time (ms)	TCP speed (kbyte/s)
320 x 240	153	10.2	4.4	2310
640 x 480	292	25.6	10.9	2350
1024 x 768	321	49.1	21.5	2285
1600 x 1200	420	97.3	42.4	2292

Web camera timings

The webcam code triggers an image capture, then after the data has been fetched into the CPU RAM buffer, it is sent to the network stack for transmission. There would be some improvement in the timings if the next image were fetched while the current image is being transmitted, however the improvement will be quite small, since the overall time is dominated by the time taken for the camera to capture and compress the image.

Using the Web camera

There is only one setting at the top of camera/cam_2640.h, namely the horizontal resolution:

// Horizontal resolution: 320, 640, 1024 or 1600 pixels
#define CAM_X_RES 640

Then the binary is built and the CPU is programmed in the usual way:

make web_cam
./prog web_cam

At boot-time the IP address will be reported on the serial console; use this to access the camera or video Web pages in a browser, e.g.

http://192.168.1.240/camera.jpg
http://192.168.1.240/video

It is important to note that a new image capture is triggered every time the Web page is accessed, so any attempt to simultaneously access the pages from more than one browser will fail. To allow simultaneous access by multiple clients, a double-buffering scheme needs to be implemented.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

PicoWi part 9: TCP Web server

Transmission Control Protocol (TCP) is an important next step in the PioWi protocol stack; it opens the way to various network applications, such as the ubiquitous Web server.

In this post I’ll be introducing a fast Web server, that can be used for intensive data-transmission duties; in the next post it’ll be used to implement a Web camera with still- and video-image capabilities.

TCP

At first sight, TCP may look quite simple to implement; it adds reliability to the network transmissions by establishing a ‘connection’ between 2 systems, with each side tracking the other’s transmissions, and acknowledging receipt. However, there are various subtleties to the TCP protocol that make it very challenging to implement, namely:

Out-of-order arrival. Since there is no fixed path for the data blocks to move across the network, a newer block may arrive after an older one.
Data flow. A unit that sends data must regulate its flow so as to not overwhelm the receiver.
Disorderly shutdown. When the data transfer is complete, sender and receiver will attempt to shut down the connection in an orderly fashion, but sometimes this will fail, leaving a connection half-open.
Buffering. The data sender won’t know if its data has been received until an acknowledgement is received from the receiver, so it must buffer the data just in case it has to be resent.
Symmetry. Although one system ( the ‘client’) initiates communication with another (the ‘server’), once the connection is established it is completely symmetrical, with either side being able to send & receive the data, or terminate the connection.
Multiple connections. Servers are usually required to handle multiple simultaneous connections, from multiple clients.

It is well worth reading the TCP specification RFC9293; implementing a full-scale TCP stack is a complex task, so the initial focus of this post will be on a server that primarily sends data – it receives requests from clients, but is optimised for the sending of bulk data from server to client.

State machine

The behaviour of the TCP stack is controlled by a sate machine, that processes open/close/acknowledgement signals from the remote system, and open/close signals from a higher-level application, and decides what to do next. The signals from the remote system are in the form of binary flags, most notably:

SYN: open a connection (synchronise)
ACK: acknowledge a transmission
FIN: close a connection (finish)
RST: reject a transmission (reset)

Since the connection is symmetrical, both sides have to receive a SYN to make the connection, and a FIN to close that connection. Here is a sample transaction, showing how a Web client and server might transfer a small Web page:

The server has a permanently-open port (‘passive’ open) that is ready to accept incoming connection requests. The client application sets up the connection by sending a SYN to the server, which responds with SYN + ACK, then the client sends an ACK to confirm. Both sides now consider the connection to be ‘established’, and either side can start sending data. In the case of a Web browser, the client sends an HTTP-format request for a Web page; the details of that request will be explained later.

After the server acknowledges the request, it sends 2 data blocks as a response, which the client acknowledges using a single ACK. In this example I’ve then shown the client closing the connection by sending a FIN, which is acknowledged, then confirmed by the server sending a FIN, however in many cases there will be a more sizeable exchange of data, and the connection might be kept open for further requests and responses, to avoid the (very significant) overhead of opening and closing the connection.

TCP sequence number and window size

Both sides of the TCP connection need to keep track of the data sent & received; this is done with a ‘sequence number’, that essentially points to the current starting position of the data within a virtual data buffer, with 3 extra complications:

The SYN and FIN markers each count as 1 extra data byte.
The first transmission doesn’t have a sequence number of zero; a pseudo-random value is used, to reduce the likelihood of the current data blocks being confused with blocks that might be left over from a previous transaction.
The number is 32 bits wide. and wraps around when the maximum value is exceeded.

To avoid congestion, there must some way for a unit to signal how much buffer space it has left; this is done by the ‘window size’ parameter in the TCP message. This value isn’t necessarily a reflection of the actual space available, as there is a danger that a small value will cause a lot of small data blocks to be generated (which is very inefficient), rather than waiting for a good-sized space to be available.

Message format

The protocol header has source & destination port numbers (similar to UDP), also the sequence & acknowledgement numbers and window size, that are needed for error handling and flow control. The flags are a bit-field containing SYN, ACK, FIN, RST and other indications.

/* ***** TCP (Transmission Control Protocol) header ***** */
typedef struct tcph
{
    WORD  sport,            /* Source port */
          dport;            /* Destination port */
    DWORD seq,              /* Sequence number */
          ack;              /* Ack number */
    BYTE  hlen,             /* TCP header len (num of bytes << 2) */
          flags;            /* Option flags */
    WORD  window,           /* Flow control credit (num of bytes) */
          check,            /* Checksum */
          urgent;           /* Urgent data pointer */
} TCPHDR;

#define TCP_DATA_OFFSET (sizeof(ETHERHDR) + sizeof(IPHDR) + sizeof(TCPHDR))

#define TCP_FIN     0x01    /* Option flags: no more data */
#define TCP_SYN     0x02    /*           sync sequence nums */
#define TCP_RST     0x04    /*           reset connection */
#define TCP_PUSH    0x08    /*           push buffered data */
#define TCP_ACK     0x10    /*           acknowledgement */
#define TCP_URGE    0x20    /*           urgent */

The checksum is similar to UDP in that it includes a ‘pseudo-header’ with source & destination IP addresses.

/* ***** Pseudo-header for UDP or TCP checksum calculation ***** */
/* The integers must be in hi-lo byte order for checksum */
typedef struct              /* Pseudo-header... */
{
    IPADDR sip,             /* Source IP address */
          dip;              /* Destination IP address */
    BYTE  z,                /* Zero */
          pcol;             /* Protocol byte */
    WORD  len;              /* UDP length field */
} PHDR;

HTTP

TCP can be used to carry a wide variety of higher-level protocols, but a frequent choice is Hypertext Transfer Protocol (HTTP), that is used by a Web browser to request data from a Web server.

An HTTP request consists of:

A request line, specifying the method to be used, the resource to be accessed, and the HTTP version number. A query string may optionally be appended to the resource name, to provide additional requirements.
Optional HTTP headers, or header fields, specifying additional parameters
A blank line, marking the end of the header
A message body, if needed

The server responds with:

A status line, with a status code and reason phrase, indicating if the resource is available.
HTTP headers, or header fields, giving information about the resource, and the server that is providing it.
A blank line
A message body, containing the resource data

Some of the headers used in the Web server code:

#define HTTP_200_OK         "HTTP/1.1 200 OK\r\n"
#define HTTP_404_FAIL       "HTTP/1.1 404 Not Found\r\n"
#define HTTP_SERVER         "Server: picowi\r\n"
#define HTTP_NOCACHE        "Cache-Control: no-cache, no-store, must-revalidate\r\n"
#define HTTP_CONTENT_HTML   "Content-Type: text/html; charset=ISO-8859-1\r\n"
#define HTTP_CONTENT_JPEG   "Content-Type: image/jpeg\r\n"
#define HTTP_CONTENT_TEXT   "Content-Type: text/plain\r\n"
#define HTTP_CONTENT_BINARY "Content-Type: application/octet-stream\r\n"
#define HTTP_CONTENT_LENGTH "Content-Length: %d\r\n"
#define HTTP_ORIGIN_ANY     "Access-Control-Allow-Origin: *\r\n"
#define HTTP_MULTIPART      "Content-Type: multipart/x-mixed-replace; boundary=mjpeg_boundary\r\n"
#define HTTP_BOUNDARY       "\r\n--mjpeg_boundary\r\n"

Web browsers have a create tendency to store (‘cache’) and re-use Web pages, which is a major problem if we are trying to display ‘live’ data, so the NOCACHE header can be used to tell the browser not to cache the resource data.

A browser can handle a wide range of data formats, but only if it is informed which format has been used. The CONTENT headers clarify this, and are essential for displaying the data correctly.

A feature of modern Web browsers is that they block ‘cross-site scripting’ by default. This means that the browser can’t insert data from one server, whilst displaying a page from another. This is very important when dealing with high-security applications such as banking, to prevent a rogue site from impersonating a legitimate site by displaying portions of its pages. It also forces all the pages and data to be hosted on a single Web server, which can be a nuisance for embedded systems with limited capabilities; it is better to host the static Web pages on another site, so the embedded system just has to provide the sensor data to be displayed on those pages. The ORIGIN_ANY header enables this, by allowing the data to be used by any other Web site.

The MULTIPART definition is useful for defining a video stream, that consists of a sequence of still images. The video server I’m creating uses Motion JPEG (MJPEG) which is just a stream of JPEG images, so the browser needs an indication as to where one image ends, and the next begins. So MULTIPART specifies a (hopefully unique) marker that can be sent after each frame as a delimiter, that triggers the browser to display the last-received frame, and prepare to receive a new one. The end-result is that the still images are displayed as a continuous stream, emulating a conventional video file, albeit with a larger file-size, due to the absence of inter-frame compression.

Web server API

Programming a Web server in C can get quite complicated, especially when we’re not running a multi-tasking operating system. The usual model is for each connection to ‘block’ (i.e. stall) until data is available, but that isn’t feasible in a single-tasking system.

So instead I’ve created an event-oriented system, where a callback function is registered for each Web page:

web_page_handler("GET /test.txt", web_test_handler);
web_page_handler("GET /data.txt", web_data_handler);

These handler functions are only called if the relevant Web page is requested, so they don’t consume any resources until that happens.

If the page just has some simple static text, that is loaded into a buffer, and a socket closure is requested:

// Handler for test page
int web_test_handler(int sock, char *req, int oset)
{
    static int count = 1;
    int n = 0;
    
    if (req)
    {
        printf("TCP socket %d Rx %s\n", sock, req);
        sprintf(temps, "<html><pre>Test %u</pre></html>", count++);
        n = web_resp_add_str(sock,
            HTTP_200_OK HTTP_SERVER HTTP_NOCACHE HTTP_CONNECTION_CLOSE
            HTTP_CONTENT_HTML HTTP_HEADER_END) + 
            web_resp_add_str(sock, temps);
        tcp_sock_close(sock);
    }
    return (n);
}

The ‘req’ parameter is the browser text requesting the resource, which can be parsed to extract the parameter values from a query string.

The ‘oset’ parameter is used when the Web response doesn’t fit into a single response message. It tracks the current position within the data buffer, which normally is equal to the total amount of data so far, but under error conditions, it will step back to an earlier value. A typical usage is to use the value as an index into a data buffer, not forgetting the HTTP response header, which is at the start of the first data block. The following code returns a stream of blocks for each image, until all the image has been sent, which triggers a new multipart header and image capture:

#define TCP_MAXDATA   1400

// Handler for single camera image
int web_cam_handler(int sock, char *req, int oset)
{
    int n = 0, diff;
    static int startime = 0, hlen = 0, dlen = 0;
    
    if (req)
    {
        hlen = n = web_resp_add_str(sock,
            HTTP_200_OK HTTP_SERVER HTTP_NOCACHE HTTP_CONNECTION_CLOSE
            HTTP_CONTENT_JPEG HTTP_HEADER_END);
        dlen = cam_capture_single();
        n += web_resp_add_data(sock, cam_data, TCP_MAXDATA - n);
    }
    else
    {
        n = MIN(TCP_MAXDATA, dlen + hlen - oset);
        if (n > 0)
            web_resp_add_data(sock, &cam_data[oset - hlen], n);
        else
            tcp_sock_close(sock);
    }
    return (n);
}

Test Web server

web_server.c is a test program to demonstrates the ability of a Web server to return large amounts of data at high speed (over 20 megabits per second). It transfers dummy binary data in text format (base-64) or in raw binary.

It has the following Web pages:

status.txt

This is a simple status message with dummy parameters in JSON format, e.g.

{"state":0,"nsamp":0,"xsamp":10000,"xrate":100000}.

These values are taken from a structure:

typedef struct {
    char name[16];
    int val;
} SERVER_ARG;

SERVER_ARG server_args[] = {
    { "state", 0 },
    { "nsamp", 0 },
    { "xsamp", 10000 },
    { "xrate", 100000 },
    { "" }
};

This demonstrates how dynamic numeric values could be propagated from server to client.

data.txt

This demonstrates the transfer of a large binary block using base-64 encoding, which converts every 3 bytes into 4 ASCII characters. This technique is used when we want to avoid the complication of handling raw binary.

The microsecond timer on the Pico is used to record the start & ending times of the data transfer, so as to print the data rate on the Pico serial console.

data.bin

This transfers blocks of data in pure binary format, and the throughput rate is reported.

default

The default Web page just returns the string ‘test’ and a number that increments with every access, to show that the page isn’t being cached.

Running the Web server

By default, the server will try to connect to the default Wifi network (‘testnet’) so you will probably need to change the definition at the top of the server code to match your network name and password.

If you are using the Pi development environment described in the introduction, then compiling and running the Web server requires just 2 steps:

make web_server
./prog web_server

When it boots, the server will report its IP address on the serial console, e.g. 192. 168.1.240. Enter that address into a Web browser, to see the (really simple) default page, with a number that increments every time you re-fetch the page:

To see the raw JSON-format status page, access status.txt:

You can also view the start of the base-64 format data, though this isn’t very enlightening:

To demonstrate how the status & data information can be decoded and displayed, I have included an HTML file web/display.html, which is an adaptation of my ‘EDLA’ logic analyser code.

Before loading this file, you need to edit the IP address at the top to match your Pico server, and also select binary or base-64 mode for the transfer:

const remip = "192.168.1.240", bin_mode = true;

The resulting display shows a logic-analyser-style display of the incoming data, with the text from the status file underneath. The graphic is much too dense to be any use, but it does show how a large block of data can be transferred and displayed with remarkable speed.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2023. Please credit this blog if you use the information or software in it.

PicoWi part 8: UDP server socket

So far I have used a large number of custom functions to configure and control the WiFi networking, but before adding yet more functionality, I need to offer a simpler (and more standard) way of doing all this programming.

When it comes to network programming on Linux or Windows systems, there is only one widely-used Application Programming Interface (API), namely Network Sockets, also known as Internet Sockets. A socket is an endpoint on the network, defined by an IP address & port number; in previous parts, I have used sockets to construct DHCP & DNS clients, and now will use them in a general-purpose programming interface.

UDP & TCP, Client & Server

To recap on the difference between User Datagram Protocol (UDP) & Transmission Control Protocol (TCP), also Client & Server:

UDP allows you to send blocks of data across the network, the maximum block size generally being around 1.5 kBytes. There is no synchronisation between sender & receiver, and no error handling, so there is no guarantee that the data will arrive at the destination.
TCP starts with an exchange of packets to synchronise the two endpoints, and thereafter the transfer is bidirectional and stream-orientated; it acts like a transparent pipe between the sockets, carrying arbitrary numbers of bytes from one socket to the other. When the transfer is complete, either side can close the connection, but the closure will only be complete when both sides acknowledge it.
A client initiates contact with a server, by sending UDP or TCP traffic to a well-known port on that system.
A server takes possession of (‘binds to’) a specific UDP or TCP port number, and runs continuously, waiting for a client to initiate contact.

I’m starting with the simplest of these options, namely a UDP server.

UDP server

To create a server socket, it is just necessary to issue a socket() call, followed by bind() to take possession of a specific port, by providing an initialised structure. The same code applies for both Linux, and my RP2040 socket API:

#define PORTNUM  8080    // Server port number
#define MAXLEN   1024    // Maximum data length

int server_sock; 
struct sockaddr_in server_addr, client_addr; 
    
server_sock = socket(AF_INET, SOCK_DGRAM, 0);
if (server_sock < 0)
{ 
    perror("socket creation failed"); 
    return(1); 
} 
memset(&server_addr, 0, sizeof(server_addr)); 
server_addr.sin_family    = AF_INET;
server_addr.sin_addr.s_addr = INADDR_ANY; 
server_addr.sin_port = htons(PORTNUM); 
if (bind(server_sock, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0) 
{ 
    perror("bind failed"); 
    return(1); 
}

You may be surprised at the use of 2 structures, namely sockaddr_in, and sockaddr. The former is specific to Internet Protocol, whereas the latter is general-purpose, capable of being used with other network protocols. The structure has to be filled in with the address family (Internet Protocol) the desired IP address (zero for any), and the port number, byte-swapped to big-endian.

The code then enters an endless loop, waiting for a a datagram to be received, printing the data (on the assumption it is text), and returning a text string in response:

printf("UDP server on port %u\n", PORTNUM);
while (1)
{
    memset(&client_addr, 0, sizeof(client_addr)); 
    addrlen = sizeof(client_addr);
    n = recvfrom(server_sock, buffer, MAXLEN-1, 0, 
        (struct sockaddr *)&client_addr, &addrlen);
    if (n > 0)
    {
        buffer[n] = 0; 
        printf("Client %s: %s\n", inet_ntoa(client_addr.sin_addr), buffer); 
        n = sprintf(buffer, "Test %u\n", testnum++);
        sendto(server_sock, buffer, n, MSG_CONFIRM, 
            (struct sockaddr *)&client_addr, addrlen);
    }
}

It may seem strange to set ‘addrlen’ every time we go round the loop; this is for safety, as recvfrom() can potentially modify its value. Also, there is no guarantee that the incoming string is null-terminated, so a null is added at the end of the datagram, just in case.

Blocking

The above code ‘blocks’ on the receive function, that is to say it waits indefinitely until the data arrives, which is standard practice on Linux or Windows sockets – it is sensible when we have a multi-tasking operating system, but inconvenient if we are restricted to a single task, as the CPU can’t run any other code while waiting for incoming data.

There are various ways round this limitation, for example Linux has a ‘select’ function that allows you to poll the interface, checking if any incoming data is available. Alternatively, we can set a timeout on the socket read, for example 0.1 seconds (100,000 microseconds):

struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 100000;
setsockopt(server_sock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

To demonstrate this functionality, we can add a useful feature whereby the LED is flashed every time an access to the server is made; if we set the timeout to 1000 microseconds, then the timeouts give us a convenient millisecond timer. The following code does a rapid flash of the LED until the unit joins the Wifi network, then a quick blink every 2 seconds, plus a flash every time the server is accessed.

sock_set_timeout(server_sock, 1000);
while (1)
{
    memset(&client_addr, 0, sizeof(client_addr)); 
    addrlen = sizeof(client_addr);
    n = recvfrom(server_sock, buffer, MAXLEN, 0, 
        (struct sockaddr *)&client_addr, &addrlen);
    if (n > 0)
    {
        buffer[n] = 0; 
        printf("Client %s: %s\n", inet_ntoa(client_addr.sin_addr), buffer); 
        n = sprintf(buffer, "Test %u\n", testnum++);
        sendto(server_sock, buffer, n, 0, (struct sockaddr *)&client_addr, addrlen);
        msec = 0;
    }
    else
    {
        msec++;
        if (msec == 1)
            wifi_set_led(1);
        else if (msec == 100)
            wifi_set_led(0);
        else if (msec == 200 && !link_check())
            msec = 0;
        else if (msec == 2000)
            msec = 0;
    }
}

Test program

See the introduction for details on how to rebuild and program the udp_socket_server application into the Pico.

The simplest way to test the server is using the netcat application, that can easily be installed on the Raspberry Pi using ‘apt install’; you just need to note the IP address of the Pico as reported on the serial console, and enter it into the netcat command line, e.g. for an address of 192.168.1.240:

echo -n "hello" | nc -4u -w1 192.168.1.240 8080

The server will respond with an incrementing test number, e.g.

Test 1

Alternatively, here is a simple Python client program:

# Simple UDP client example
import time, socket

addr = "192.168.1.240", 8080
data = 'test'

while True:  
    client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    client.settimeout(1)
    client.sendto(str.encode(data), addr)
    print("Tx: " + data);        
    try:
        resp, server = client.recvfrom(1024)
    except socket.timeout:
        resp = None
    print("Rx: " + "timeout" if not resp else resp.decode('utf-8'))
    time.sleep(2)
# EOF

Or in C:

// Simple UDP client for Linux

#include <stdio.h> 
#include <string.h> 
#include <sys/socket.h> 
#include <arpa/inet.h> 
#include <netinet/in.h> 
#include <unistd.h>

#define SERVER_ADDR     "192.168.1.240"
#define SERVER_PORT     8080
#define MAXLEN          1024 

void sock_set_timeout(int sock, int usec);

int main(int argc, char **argv)
{
    int sock, n;
    struct sockaddr_in server_addr;
    char txdata[MAXLEN] = "test", rxdata[MAXLEN];

    sock = socket(AF_INET, SOCK_DGRAM, 0);
    sock_set_timeout(sock, 100000);
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = INADDR_ANY;
    server_addr.sin_port = htons(SERVER_PORT);
    while (1)
    {
        printf("Tx %s:%u: %s\n", SERVER_ADDR, SERVER_PORT, txdata);
        sendto(sock, txdata, strlen(txdata), 0,
              (struct sockaddr *)&server_addr, sizeof(server_addr));
        n = recvfrom(sock, rxdata, MAXLEN-1, 0, NULL, NULL);
        if (n < 0)
            printf("Rx timeout\n");
        else
        {
            rxdata[n] = 0;
            printf("Rx %s\n", rxdata);
        }
        sleep(2);
    }
    return(0);
}

// Set socket timeout
void sock_set_timeout(int sock, int usec)
{
    struct timeval tv;
    
    tv.tv_sec = usec / 1000000;
    tv.tv_usec = usec % 1000000;
    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
}

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 7: DNS

The Domain Name System (DNS) is the top layer of the 3-layer addressing scheme, as described in the previous part of this blog. To recap:

The Domain Name System translates a Domain Name such as http://www.google.com into an Internet Protocol (IP) address, consisting of 4 decimal numbers 0-255 with ‘.’ separators, e.g. 142.250.187.228.

The Address Resolution Protocol (ARP) translates an IP address into a Medium Access and Control (MAC) address, consisting of 6 hexadecimal values with ‘:’ separators, e.g. 28:CD:C1:00:D9:20

The MAC address is used for all communication within a network, to identify a sender or intended recipient.

In the early days of the Internet there was a simple one-to-one relationship between a domain name and an IP address, so DNS just required a simple lookup on a remote database. Nowadays the name-to-address mapping is much more complex, but we can still take a simple approach, by providing DNS with a name, and it will return an IP address, or several IP addresses if there are some alternatives.

DNS specification

The specifications for DNS, and all other aspects of TCP/IP communication, are in the form of documents called Request For Comments (RFC); DNS is RFC-1034 and RFC-1035. These documents are freely available, and don’t just describe the protocol, but also give background information on the rationale for the design decisions that were made. RFCs are generally very clearly written, with a minimum of jargon, and are well worth a read.

A DNS lookup consists of a single UserDatagram Protocol (UDP) message sent to a server, and a single matching UDP response. UDP is an ‘unreliable’ protocol, so there is no guarantee that a response will be received; if not, the process can just be repeated.

DNS request

The request consists of a fixed-format header, plus variable-format data. The fixed section has just 6 2-byte values:

typedef struct
{
	WORD ident,   // Message identification number
        flags,    // Option flags
	    n_query,  // Number of queries
		n_ans,    // Number of answers
		n_auth,   // Number of authority records
		n_rr;     // Number of additional records
} DNS_HDR;

All values are in ‘network’ byte-order, so are most-significant-byte first. The meaning of the flags can best be described by looking at the decode of a typical request in Wireshark:

Domain Name System (query)
    Transaction ID: 0x1234
    Flags: 0x0100 Standard query
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data: Unacceptable
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0

This is followed by a variable-length Resource Record (RR) field with the domain name, then 16-bit name type & name class, which normally have a value of 1. Each part of the name is prefixed by a length byte, and the name is terminated by a zero length, e.g.

LEN w w w  LEN r a s p b e r r y p i  LEN o r g  NUL Type Class
03  777777 0b  7261737062657272797069 03  6f7267 00  0001 0001

The code to create the header and data is relatively simple:

// Format DNS request data, given name string
int dns_add_hdr_data(BYTE *buff, char *s) 
{
	BYTE *p, *q;
    DNS_HDR *dhp = (DNS_HDR *)buff;
    int len = sizeof(DNS_HDR);
    static int ident = 1;
    
    memset(dhp, 0, sizeof(DNS_HDR));
    dhp->ident = htons(ident++);
    dhp->flags = htons(0x100);  // Recursion desired
    dhp->n_query = htons(1);
    p = q = &buff[len];
	while (*s)	                // Prefix each part with length byte
	{
		p++;
		while (*s && *s != '.')
			*p++ = (BYTE)*s++;
		*q = (BYTE)(p - q - 1);
		q = p;
    	if (*s)
        	s++;
	}
    *p++ = 0;   // Null terminator
    *p++ = 0;	// Type A (host address)
    *p++ = 1;
    *p++ = 0;	// Class IN
    *p++ = 1;
	return (p - buff);
}

Transmitting this message is just a question of adding on the Ethernet, IP and UDP headers; this is done in a slightly strange order, since the UDP header must include a checksum calculated from the subsequent (DNS) data.

// Transmit DNS request
int dns_tx(MACADDR mac, IPADDR dip, WORD sport, char *s)
{
    char temps[300];
    int oset = 0;
	int len = ip_add_eth(txbuff, mac, my_mac, PCOL_IP);
	int dlen = dns_add_hdr_data(&txbuff[len + sizeof(IPHDR) + sizeof(UDPHDR)], s);
	
	len += ip_add_hdr(&txbuff[len], dip, PUDP, sizeof(UDPHDR) + dlen);
	len += udp_add_hdr_data(&txbuff[len], sport, DNS_SERVER_PORT, 0, dlen);
 	return (ip_tx_eth(txbuff, len));
}

DNS response

The response (if any) will arrive from the WiFi chip as an ‘event’, and it has to go through IP and UDP pre-processing before arriving at the DNS decoder, or any other protocol handler that matches the incoming data. I’ve already established an ‘add_event_handler’ mechanism for distributing events to various handlers, and am using a similar mechanism for distributing incoming UDP traffic, by giving the standard DNS port number:

#define DNS_SERVER_PORT	53
add_event_handler(udp_event_handler);
udp_sock_init(udp_dns_handler, zero_ip, 0, DNS_SERVER_PORT);

// Return number of DNS responses
int dns_num_resps(BYTE *buff, int len)
{
    DNS_HDR *dhp = (DNS_HDR *)&buff[sizeof(ETHERHDR) + sizeof(IPHDR) + sizeof(UDPHDR)];
    return (len > sizeof(ETHERHDR)+sizeof(IPHDR)+sizeof(UDPHDR)+sizeof(DNS_HDR) ?
        htons(dhp->n_ans) : 0);
}

// Handler for UDP DNS response
int udp_dns_handler(UDP_SOCKET *usp)
{    
    char temps[300];
    IPADDR addr;
    int oset = 0;
    
    if (display_mode & DISP_DNS)
    {
        printf("Rx %s: ", dns_hdr_str(temps, usp->data, usp->dlen));
        printf("%s\n", dns_name_str(temps, usp->data, usp->dlen, &oset, 0, 0));
        for (int n = 0; n < dns_num_resps(usp->data, usp->dlen); n++)
            printf("%s\n", dns_name_str(temps, usp->data, usp->dlen, &oset, 0, addr));
    }
    return (1);
}

The response handler just iterates through the responses and prints them out:

Tx DNS 1 query: www.raspberrypi.org type A
Tx UDP 192.168.1.139:1234->192.168.1.254:53 len 37
Rx UDP 192.168.1.254:53->192.168.1.139:1234 len 85
Rx DNS 1 query, 3 resp: www.raspberrypi.org type A
  www.raspberrypi.org type A 104.22.1.43
  www.raspberrypi.org type A 104.22.0.43
  www.raspberrypi.org type A 172.67.36.98

There are 3 responses, but the way they are structured is surprising; this is the binary data:

0040                                 03 77 77 77 0b 72   ...........www.r
0050   61 73 70 62 65 72 72 79 70 69 03 6f 72 67 00 00   aspberrypi.org..
0060   01 00 01 c0 0c 00 01 00 01 00 00 00 df 00 04 68   ...............h
0070   16 01 2b c0 0c 00 01 00 01 00 00 00 df 00 04 68   ..+............h
0080   16 00 2b c0 0c 00 01 00 01 00 00 00 df 00 04 ac   ..+.............
0090   43 24 62                                          C$b

The first entry has the same format as the request, with the domain name http://www.raspberrypi.org having its delimiters replaced by length-bytes. However the subsequent responses have the length byte replaced by the value 0xc0, followed by a value of 0x0c. This 16-bit value is the result of a compression scheme; the two most-significant bits are set to indicate a compressed entry, and the remaining 14 bits are an offset value pointing at the duplicate text (calculated from the start of the DNS header).

The domain name (or compression pointer) is followed by 16-bit type & class words (usually both 1), a 32-bit time-to-live value, then the IP address length and the 4 address bytes.

This makes the decoder quite complex; typically the item most of interest is the IP address, but it is necessary to decode the 3 parts of the name (or the compression pointer) first; see the function dns_name_str() for my version of this.

Misaligned data values

A major issue that arose while debugging the message decoder is the fact that the 16- and 32-bit values in the response may not be aligned on a 2- or 4-byte boundary. This is an issue that is relatively unique to protocol decoding on small embedded systems, but does have a really bad outcome (crashing the CPU) so some explanation is in order. Here is a test program, with a simplified version of the DNS response, just a null-terminated string followed by a 4-byte IP address:

#include <stdio.h>
#include <string.h>

char data[] = {'a', 'b', 'c', 0, 0x11, 0x22, 0x33, 0x44};

typedef unsigned int IPADDR;

int main(int argc, char *argv[])
{
    char *p = &data[strlen(data) + 1];
    IPADDR addr = *(IPADDR *)p;
    printf("%s %X\n", data, addr);
}

When this program is run on the Pico board, or a little-endian Linux system, the result is printed out as expected:

abc 44332211

The program can now be changed so the string is 1 character longer:

char data[] = {'a', 'b', 'c', 'd', 0, 0x11, 0x22, 0x33, 0x44};

The Linux system works as expected, printing out:

abcd 44332211

However the program fails on the Pico, and prints nothing out. If run under a debugger, a ‘hard fault’ trap is reported, with the call stack showing that the problem occurs when the address value is set. This is because the address is no longer on a 4-byte boundary, and the simpler RP2040 processor can’t handle this misaligned transfer, whereas the PC processor can. So it is quite easy to write some code that works fine on a PC, and sometimes fails on the Pico; for example, my early DNS tests used the domain name ‘pool.ntp.org’ and when the response is obtained, all the 16-bit values happen to all be on 2-byte boundaries so the code worked fine; switching to ‘www.raspberrypi.org’ these values became misaligned, so the code failed.

There are various workarounds for this problem, the simplest being not to use casts; my newer code defines the IP address as a byte array, which can be copied byte-by-byte to avoid any misalignment issues. It isn’t sensible to use this approach on 16-bit port values, but these are generally in a byte array and need to be swapped from from big-endian to little-endian, so we can use a byte pointer, e.g.

BYTE *buff;
WORD val = htonsp(buff);

// Convert byte-order in a 'short' variable, given byte pointer
WORD htonsp(BYTE *p)
{
    WORD w = (WORD)(*p++) << 8;
    return(w | *p);
}

Test program

The ‘dns’ program joins a network using the name ‘testname’ and password ‘testpass’ that need to be changed. It uses DHCP to fetch an IP address, and the addresses of a router and DNS server. ARP is then used to resolve the server IP address to a MAC address, then that is used to contact the nameserver, asking to resolve the name http://www.raspberrypi.org, and printing the result.

For more information on compiling and running the code, see the introduction.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 6: DHCP

In part 5, we joined a WiFi network, and used ‘ping’ to contact another unit on that network, but this was achieved by setting the IP address manually, which is generally known as using a ‘static’ IP.

The alternative is to use a ‘dynamic’ IP, that a central server (such as the WiFi Access Point) allocates from a pool of available addresses, using Dynamic Host Configuration Protocol (DHCP); this also provides other information such as a netmask & router address, to allow our unit to communicate with the wider Internet.

IP addresses and routing

So far, I’ve just said that an IP address consists of 4 bytes, that are usually expressed as decimal values with dotted notation, e.g. 192.168.1.2, but there is some extra complication.

Firstly it is important to note I’m using version 4 of the protocol (IPv4); there is a newer version (IPv6) with a much wider address range, but the older version is sufficient for our purposes, and easier to implement.

Next it is important to distinguish between a public and private IP address.

Public: an address that is accessible from the Internet, generally assigned by an Internet Service Provider (ISP)
Private: an address used locally within an organisation, that is not unique; generally assigned from the blocks 192.168.x.x, 172.16.x.x or 10.x.x.x

The address we’ll be getting from the DHCP server is probably private; if we are accessing the Internet, there will be one or more network devices (‘routers’) that perform public-to-private translation, and also security functions (‘firewalls’) to block malicious data.

If our unit has an IP address it wishes to contact, how does it know what to do? It just has to determine if the target address is local or remote by applying a netmask. For example if our unit is given the address 192.168.1.1 with netmask 255.255.255.0, then a logical AND of the two values means that our local network (known as a ‘subnet’) is 192.168.1. If the unit we’re contacting is on that subnet (i.e. the address begins with 192.168.1) then we just send out a local ARP request to convert their IP address into a MAC address, and start communicating.

If the target address isn’t on the same subnet (e.g. 192.168.2.1, 11.22.33.44, or anything else) then our unit contacts a router (using the address given in the DHCP response) and relies on the router to forward the data appropriately.

In the diagram above, there are networks with public addresses 11.22.33.44 and 22.33.44.55, and they both have private addresses in 192.168.1.x subnetworks; the job of the router is to move the data between these subnetworks by performing Network Address Translation (NAT) between them.

If unit 192.168.1.3 wants to contact 22.33.44.55 it will check the netmask, and because the target isn’t on the same subnetwork, the data will be sent to the router 192.168.1.1, which will forward it over the Internet.

If 192.168.1.3 wants to contact 192.168.1.2, ANDing with the netmask will show that they are both on the same subnet, so the data will be sent directly, bypassing using the router.

However, if 192.168.1.3 wants to send the data to 192.168.1.1 on the remote network, how does the router know what to do? The simple answer is “it doesn’t”, as addresses on the 192.168.1.x subnet aren’t unique, and there will be thousands (or millions!) of units with that same address around the world. Also the netmask clearly indicates that 192.168.1.1 must be on the same subnet as 192.168.1.3, so the data will be sent locally to 192.168.1.1, whether it exists or not; if it doesn’t exist, that’ll be flagged up by the ARP request failing.

There are various workarounds for this ‘NAT traversal’ problem, for example 192.168.1.3 sends the data to the router 22.33.44.55, which is configured to copy incoming data to 192.168.1.1, but there are major security risks associated with opening up a system to unfiltered Internet traffic, so for the purposes of this blog, I’m assuming that our unit will only be communicating with other units on the same subnetwork, or publicly-available systems on the Internet.

The above example assumes there is a single router for all outgoing traffic, and this is generally the case on a WiFi network, where the Access Point also acts as a router. However, on more complex networks there can be multiple routers to provide alternative routes to other networks or the Internet.

Client and server

The most common model for communication between two systems is client-server. The server runs continuously, waiting for a client to get in contact. The client uses a specific communications format (a ‘protocol’) to establish a link (‘connection’) to the server. The connection persists for as long as is needed to exchange the data, then it is closed by both sides.

Simpler protocols can dispense with the connection, but still retain the client-server model; for example, to fetch the time with Network Time Protocol (NTP) you just send a single message to a time server, and get a single message back with the time. This ‘connectionless’ approach means that a single ‘stateless’ server can handle very large numbers of clients, since it doesn’t have to track the state of its clients; an incoming request has all the information needed to send the response.

UDP message format

So there are two distinct ways for a client to communicate with a server; one creates a persistent connection, with both sides tracking the flow of data, and re-sending any data that is lost in transit: this is Transmission Control Protocol (TCP). The other way is User Datagram Protocol (UDP), which has no such tracking, or error correction; just send a block of data and hope it arrives.

This uncertainty means that, if faced with a choice, many programmers reject UDP as being too unreliable, however it does have a very important place in the suite of TCP/IP protocols, not least because it is used for DHCP.

A DHCP transmission consists of the following:

Ethernet header
IP header
UDP header
DHCP header
DHCP option data

We’ve already used the Ethernet and IP headers when sending an ICMP (ping) message, this time we’re stacking on a UDP header.

/* ***** UDP (User Datagram Protocol) header ***** */
typedef struct udph
{
    WORD  sport,            /* Source port */
          dport,            /* Destination port */
          len,              /* Length of datagram + this header */
          check;            /* Checksum of data, header + pseudoheader */
} UDPHDR;

There is a 16-bit length, which shows the total length of the header plus any data that follows, and a 16-bit checksum, which is calculated in an unusual manner; it incorporates the UDP header, parts of the IP header, and all the data that follows. The way this is calculated is to create a pseudo-header containing the relevant IP parts:

/* ***** Pseudo-header for UDP or TCP checksum calculation ***** */
/* The integers must be in hi-lo byte order for checksum */
typedef struct              /* Pseudo-header... */
{
    IPADDR sip,             /* Source IP address */
          dip;              /* Destination IP address */
    BYTE  z,                /* Zero */
          pcol;             /* Protocol byte */
    WORD  len;              /* UDP length field */
} PHDR;

So the UDP code has to prepare two headers, though the pseudo-header is only used for checksum calculation, and can be discarded after that is done.

// Add UDP header to buffer, return byte count
int ip_add_udp(BYTE *buff, WORD sport, WORD dport, void *data, int dlen)
{
    UDPHDR *udp=(UDPHDR *)buff;
    IPHDR *ip=(IPHDR *)(buff-sizeof(IPHDR));
    WORD len=sizeof(UDPHDR), check;
    PHDR ph;

    udp->sport = htons(sport);
    udp->dport = htons(dport);
    udp->len = htons(sizeof(UDPHDR) + dlen);
    udp->check = 0;
    len += ip_add_data(&buff[sizeof(UDPHDR)], data, dlen);
    check = add_csum(0, udp, len);
    IP_CPY(ph.sip, ip->sip);
    IP_CPY(ph.dip, ip->dip);
    ph.z = 0;
    ph.pcol = PUDP;
    ph.len = udp->len;
    udp->check = 0xffff ^ add_csum(check, &ph, sizeof(PHDR));
    return(len);
}

Port numbers

Another notable feature of the UDP header is the source & destination port numbers, and these deserve some explanation.

A port number can identify a specific service on a server; for example port 80 identifies an HTTP web server, and 67 is a DHCP server. These are ‘well-known’ port numbers and are in the range 0 to 1023. Ports numbered 1024 to 49151 are also used for specific server functionality that isn’t part of the original set, so are known as ‘registered’. The remaining numbers 49152 to 65535 are ‘dynamic’ ports, that are used temporarily by client applications.

When a client wishes to communicate with a server, it will obtain a dynamic port from its operating system, and use that port for the duration of a transaction, releasing it when the transaction is complete. In contrast, a server will generally monopolise a well-known or registered port on a permanent basis, though some servers additionally open up a dynamic port on a short-term basis to handle a specific interaction with the client, such as a file transfer.

Unusually, the DHCP server & client are both assigned well-known numbers, namely UDP 67 and 68. You may see these identified as BOOTP ports, since DHCP is based on the older BOOTP protocol, with some additions.

DHCP message format

DHCP is a 4-step process:

Discover: the unit broadcasts a request asking for network parameters, such as an IP address it can use, also a router address, and subnet mask.
Offer: the server responds with some proposed values, that the unit can accept or reject.
Request: the unit signifies its acceptance of the proposed values
ACK: the server acknowledges the request, indicating that the parameters have been assigned to the unit.

Once the parameters have been assigned, the server will generally attempt to keep them unchanged, such that every time the unit boots, it will get the same IP address. However, this is not guaranteed, and a busy server with a lot of temporary clients will be forced to re-use addresses from units that haven’t been active for a while.

The message format is based on the older protocol BOOTP:

typedef struct {
  	BYTE  opcode;   			/* Message opcode/type. */
	BYTE  htype;				/* Hardware addr type (net/if_types.h). */
	BYTE  hlen;					/* Hardware addr length. */
	BYTE  hops;					/* Number of relay agent hops from client. */
	DWORD trans;				/* Transaction ID. */
	WORD secs;					/* Seconds since client started looking. */
	WORD flags;					/* Flag bits. */
	IPADDR ciaddr,				/* Client IP address (if already in use). */
           yiaddr,				/* Client IP address. */
           siaddr,				/* Server IP address */
           giaddr;				/* Relay agent IP address. */
	BYTE chaddr [16];		    /* Client hardware address. */
	char sname[SNAME_LEN];	    /* Server name. */
	char bname[BOOTF_LEN];		/* Boot filename. */
	BYTE cookie[DHCP_COOKIE_LEN];   /* Magic cookie */
} DHCPHDR;

When making the initial discovery request, many of these values are unused; the ‘cookie’ is filled in with a specific 4-byte value (99, 130, 83, 99) that signal this is a DHCP request, not BOOTP. Then there is a data field with ‘option’ values; each entry has one byte indicating the option type, one byte indicating data length, and that number of data bytes. The options I use in the discovery request are a byte value of 1, indicating it is a discovery message, and 4 parameter values, indicating what should be provided by the server (1 for subnet mask, 3 for router address, 6 for nameserver address and 15 for network name).

// DHCP message options
typedef struct {
    BYTE typ1, len1, opt;
    BYTE typ2, len2, data[4];
    BYTE end;
} DHCP_MSG_OPTS;

// DHCP discover options
DHCP_MSG_OPTS dhcp_disco_opts = 
   {53, 1, 1,               // Msg len 1 type 1: discover
    55, 4, {1, 3, 6, 15},   // Param len 4: mask, router, DNS, name
    255};                   // End

The resulting offer from the server probably includes much more than we asked for; this is what my server returns:

    Option: (53) DHCP Message Type (Offer)
    Option: (54) DHCP Server Identifier (192.168.1.254)
    Option: (51) IP Address Lease Time (7 days)
    Option: (58) Renewal Time Value (3 days, 12 hours)
    Option: (59) Rebinding Time Value (6 days, 3 hours)
    Option: (1) Subnet Mask (255.255.255.0)
    Option: (28) Broadcast Address (192.168.1.255)
    Option: (15) Domain Name ("home")
    Option: (6) Domain Name Server (192.168.1.254)
    Option: (3) Router (192.168.1.254)
    Option: (255) End

You’ll see that the Access Point 192.168.1.254 is acting as a router and nameserver; we’ll be looking at the Domain Name System (DNS) in the next part of this blog.

If the unit wants to accept these proposed settings, it must send a request containing the proposed IP address. This can have the same format as the discovery, with a byte value of 3, indicating it is a request message, and a the 4-byte address value:

// DHCP request options
DHCP_MSG_OPTS dhcp_req_opts = 
   {53, 1, 3,               // Msg len 1 type 3: request
    50, 4, {0, 0, 0, 0},    // Address len 4 (copied from offer)
    255};                   // End

Assuming all is OK, the ACK response from the server will be similar to the offer, maybe with more values added (such as vendor-specific information), so an important part of the receiver code is the scanning of the parameters to find the values that are needed.

State machine

If we were in a multi-tasking environment, the DHCP process might basically consist of a sequence of 4 function calls, each function stopping (‘blocking’) until it is complete:

send_discovery()
receive_offer()
send_request()
receive_ack()

Since we don’t currently have multi-tasking, we can’t adopt this approach, as it would block any other code from running, and in the event of an error, one of these functions might stall indefinitely. Instead, we have to adopt a ‘polled’ approach, where we keep on re-visiting this process to see what (if anything) has changed. The key to this is to have a single ‘state’ variable that reflects what has happened, e.g. it has a value of 1 when we have sent the discovery, 2 when we have received an offer, and so on.

// Poll DHCP state machine
void dhcp_poll(void)
{
    static uint32_t dhcp_ticks=0;
    
    if (dhcp_state == 0 ||              // Send DHCP Discover
       (dhcp_state != DHCPT_ACK && ustimeout(&dhcp_ticks, DHCP_TIMEOUT)))
    {
        ustimeout(&dhcp_ticks, 0);
        IP_ZERO(my_ip);
        ip_tx_dhcp(bcast_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_disco_opts, sizeof(dhcp_disco_opts));
        dhcp_state = DHCPT_DISCOVER;
    }
    else if (dhcp_state == DHCPT_OFFER) // Received Offer, send Request
    {
        ustimeout(&dhcp_ticks, 0);
        IP_CPY(dhcp_req_opts.data, offered_ip);
        ip_tx_dhcp(host_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_req_opts, sizeof(dhcp_req_opts));
        dhcp_state = DHCPT_REQUEST;
    }
}

The polling of the DHCP state also incorporates a timeout, that is triggered in the event of an error; with a simple 4-step protocol like this, we can just restart the process from the beginning, rather than trying to work out where the error occurred.

Example program

There is one example program dhcp.c that fetches IP addresses and netmask from a DHCP server, and prints the result:

Joining network
Joined network
Tx DHCP DISCOVER
Rx DHCP OFFER 192.168.1.240
Tx DHCP REQUEST
Rx DHCP OFFER 192.168.1.240
Rx DHCP OFFER 192.168.1.240
Rx DHCP ACK 192.168.1.240 mask 255.255.255.0 router 192.168.1.254 DNS 192.168.1.254
DHCP complete, IP address 192.168.1.240 router 192.168.1.254
192.168.1.254->192.168.1.240 ARP request
192.168.1.240->192.168.1.254 ARP response

The display mode is set to include DHCP:

set_display_mode(DISP_INFO|DISP_JOIN|DISP_ARP|DISP_DHCP);

This allows you to see the message-passing; it isn’t unusual to receive duplicate messages, and in the DHCP OFFER above. The ARP display is also enabled so you can see the router using ARP to check the newly-assigned address.

It will be necessary to change the default SSID and PASSWD to match your network; for details on how to build & load the application, see the introduction.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 5: ARP, IP and ICMP

In part 4, the wireless chip was connected to a WiFi network, so it can now send & receive data on that network, but we still have to encode the data for transmission, and decode it for reception.

We’re using a ‘full MAC’ chip, so all the low-level WiFi interfacing is handled within the chip. When transmitting, it encrypts our data, and adds the necessary 802.11 headers so that it will accepted by the network access point; when receiving, the headers are stripped off and the data is decrypted before being passed over to the Pico CPU.

This doesn’t just make our encoding & decoding tasks easier, it also ensures that the transmissions fully conform to the (exceedingly complex) 802.11 rules; if your interest is in creating non-standard wireless transmissions, then I’m afraid this project will be of no help.

TCP/IP

The suite of protocols used for data transmission over the Internet are generally known as Transmission Control Protocol / Internet Protocol, or TCP/IP. We’ll only be using a small subset of these protocols, and the initial task is just to handle Address Resolution Protocol (ARP) and Internet Control Message Protocol (ICMP). This will allow us to send & receive diagnostic ‘ping’ messages, and do some simple benchmarks by communicating with another system.

TCP/IP uses a three-tier addressing system; at the highest level, there are names with dotted notation, such as iosoft.blog or http://www.google.com. To access the computer at this address, two further steps are required:

a Domain Name System (DNS) database lookup is used to convert the name into an Internet Protocol (IP) address, which has 4 numeric values in dotted notation, for example 192.168.5.1
an Address Resolution Protocol (ARP) message is sent out on the network, with a request to convert the remote unit’s IP address into a Media Access and Control (MAC) address, which has 6 bytes, that are normally printed with a colon separator, e.g. 28:CD:C1:00:12:34

The first of these will be tackled in the next part of this project; for now, I’m assuming that the unit has obtained an IP address from somewhere, and knows the IP address of another unit it wishes to communicate with, for example the WiFi access point.

Address Resolution Protocol (ARP)

This is probably the simplest of all TCP/IP protocols; the unit broadcasts a request in a specific format, giving the IP address it wants to contact, and if any unit on the same ‘subnet’ has that address, then it will respond with its 6-byte MAC address. That is used for outgoing messages, but for incoming messages our unit must listen out for ARP broadcasts, and if a request matches its IP address, it should respond with the MAC address.

The ARP message format can be encapsulated within a C structure:

typedef unsigned char  BYTE;
typedef unsigned short WORD;
typedef unsigned int   DWORD;
typedef BYTE MACADDR[MACLEN];
typedef unsigned int IPADDR;

/* ***** ARP (Address Resolution Protocol) packet ***** */
typedef struct
{
    WORD hrd,           /* Hardware type */
         pro;           /* Protocol type */
    BYTE  hln,          /* Len of h/ware addr (6) */
          pln;          /* Len of IP addr (4) */
    WORD op;            /* ARP opcode */
    MACADDR  smac;      /* Source MAC addr */
    IPADDR   sip;       /* Source IP addr */
    MACADDR  dmac;      /* Destination MAC addr */
    IPADDR   dip;       /* Destination IP addr */
} ARPKT;

This is the first of many C structures for TCP/IP, and I’ve chosen to define 8, 16 and 32-bit values as BYTE, WORD and DWORD for clarity.

To broadcast this message, we need to add on Ethernet header, giving a source MAC address (the MAC address of our unit, as reported by the WiFi chip) the destination MAC address (broadcast, which is all-ones, i.e. FF:FF:FF:FF:FF:FF) and a protocol ID, which indicates that we’re sending an ARP packet.

/* Ethernet (DIX) header */
typedef struct {
    MACADDR dest;               /* Destination MAC address */
    MACADDR srce;               /* Source MAC address */
    WORD    ptype;              /* Protocol type or length */
} ETHERHDR;
#define PCOL_ARP    0x0806      /* Protocol type: ARP */
#define PCOL_IP     0x0800      /*                IP */

There are a lot of similarities between the higher level of wired (Ethernet) and wireless (802.11) protocols, so it makes sense that both use the same network address structure.

Creating an ARP request is really just a fill-in-the-blanks exercise:

#define HTYPE       0x0001  /* Hardware type: ethernet */
#define ARPPRO      0x0800  /* Protocol type: IP */
#define ARPREQ      0x0001  /* ARP request */
#define ARPRESP     0x0002  /* ARP response */

// Add Ethernet header to buffer, return byte count
WORD ip_add_eth(BYTE *buff, MACADDR dmac, MACADDR smac, WORD pcol)
{
    ETHERHDR *ehp = (ETHERHDR *)buff;

    MAC_CPY(ehp->dest, dmac);
    MAC_CPY(ehp->srce, smac);
    ehp->ptype = htons(pcol);
    return(sizeof(ETHERHDR));
}

// Create an ARP frame, return length
int ip_make_arp(BYTE *buff, MACADDR mac, IPADDR addr, WORD op)
{
    int n = ip_add_eth(buff, op==ARPREQ ? bcast_mac : mac, my_mac, PCOL_ARP);
    ARPKT *arp = (ARPKT *)&buff[n];

    MAC_CPY(arp->smac, my_mac);
    MAC_CPY(arp->dmac, op==ARPREQ ? bcast_mac : mac);
    arp->hrd = htons(HTYPE);
    arp->pro = htons(ARPPRO);
    arp->hln = MACLEN;
    arp->pln = sizeof(DWORD);
    arp->op  = htons(op);
    arp->dip = addr;
    arp->sip = my_ip;
    if (display_mode & DISP_ARP)
        ip_print_arp(arp);
    return(n + sizeof(ARPKT));
}

// Convert byte-order in a 'short' variable
WORD htons(WORD w)
{
    return(w<<8 | w>>8);
}

All network data is in big-endian format (most-significant byte first), but the RP2040 processor is little-endian, so the 16-bit values need to be byte-swapped.

To transmit the message, all that is needed is to add on the SDPCM layer for the WiFi chip, and copy it into an outgoing message buffer:

// Transmit an ARP frame
int ip_tx_arp(MACADDR mac, IPADDR addr, WORD op)
{
    int n = ip_make_arp(txbuff, mac, addr, op);
    
    return(ip_tx_eth(txbuff, n));
 }

// Send transmit data
int ip_tx_eth(BYTE *buff, int len)
{
    return(event_net_tx(buff, len));
}
// Transmit network data
int event_net_tx(void *data, int len)
{
    TX_MSG *txp = &tx_msg;
    int txlen = sizeof(SDPCM_HDR)+2+sizeof(BDC_HDR)+len;
    
    display(DISP_DATA, "Tx_DATA len %d\n", len);
    disp_bytes(DISP_DATA, data, len);
    display(DISP_DATA, "\n");
    txp->sdpcm.len = txlen;
    txp->sdpcm.notlen = ~txp->sdpcm.len;
    txp->sdpcm.seq = sd_tx_seq++;
    memcpy(txp->data, data, len);
    if (!wifi_reg_val_wait(10, SD_FUNC_BUS, SPI_STATUS_REG, 
            SPI_STATUS_F2_RX_READY, SPI_STATUS_F2_RX_READY, 4))
        return(0);
    return(wifi_data_write(SD_FUNC_RAD, 0, (uint8_t *)txp, (txlen+3)&~3));
}

The transmit data length is rounded up to the nearest 4 bytes, as the WiFi DMA controller only works handles complete 4-byte words.

ARP reception

An incoming message will arrive as an ‘event’ from the WiFi chip, and a handler function first checks that it is valid:

// Handler for incoming ARP frame
int arp_event_handler(EVENT_INFO *eip)
{
    ETHERHDR *ehp=(ETHERHDR *)eip->data;

    if (eip->chan == SDPCM_CHAN_DATA &&
        eip->dlen >= sizeof(ETHERHDR)+sizeof(ARPKT) &&
        htons(ehp->ptype) == PCOL_ARP &&
        (MAC_IS_BCAST(ehp->dest) ||
         MAC_CMP(ehp->dest, my_mac)))
    {
        return(ip_rx_arp(eip->data, eip->dlen));
    }
    return(0);
}

If the incoming message is an ARP request, then the receiver function transmits an appropriate response. If it is a response, the resulting MAC address is saved, for use in future transmissions:

// Receive incoming ARP data
int ip_rx_arp(BYTE *data, int dlen)
{
    ETHERHDR *ehp=(ETHERHDR *)data;
    ARPKT *arp = (ARPKT *)&data[sizeof(ETHERHDR)];
    WORD op = htons(arp->op);

    if (arp->dip == my_ip)
    {
        if (op == ARPREQ)
            ip_tx_arp(ehp->srce, arp->sip, ARPRESP);
        else if (op == ARPRESP)
            ip_save_arp(arp->smac, arp->sip);
        return(1);
    }
    return(0);
}

Ping

Having obtained the 6-byte MAC address of a unit we wish to communicate with, what can we send to it? The obvious choice is a diagnostic ‘ping’, that echoes back the data we send, and measures the round-trip time.

Ping uses the Internet Control Message Protocol (ICMP), with an IP header for the address information:

/* ***** ICMP (Internet Control Message Protocol) header ***** */
typedef struct
{
    BYTE  type,         /* Message type */
          code;         /* Message code */
    WORD  check,        /* Checksum */
          ident,        /* Identifier */
          seq;          /* Sequence number */
} ICMPHDR;
#define ICREQ           8   /* Message type: echo request */
#define ICREP           0   /*               echo reply */

/* ***** IP (Internet Protocol) header ***** */
typedef struct
{
    BYTE   vhl,         /* Version and header len */
           service;     /* Quality of IP service */
    WORD   len,         /* Total len of IP datagram */
           ident,       /* Identification value */
           frags;       /* Flags & fragment offset */
    BYTE   ttl,         /* Time to live */
           pcol;        /* Protocol used in data area */
    WORD   check;       /* Header checksum */
    IPADDR sip,         /* IP source addr */
           dip;         /* IP dest addr */
} IPHDR;
#define PICMP   1           /* Protocol type: ICMP */
#define PTCP    6           /*                TCP */
#define PUDP   17           /*                UDP */

Creating an ICMP request largely consists of filling in the values within these structures, and adding some arbitrary data on the end, but there are some issues to bear in mind:

As with ARP, all values are big-endian (most significant byte first) so byte-swaps are needed
Potentially the IP message (known as a ‘datagram’) may travel very long distances, with a large number of ‘hops’ between computers, and each of these hops will have a maximum data size it can accommodate, which is known as a Maximum Transmission Unit (MTU). To allow a large datagram to be sent across a link with a smaller MTU, there is a technique called ‘IP fragmentation’, whereby the transmission is chopped up into smaller parts, and the parts are reassembled at the receiving end. For simplicity, we won’t initially support fragmentation, which means we have an MTU of around 1.5K bytes.
There is a checksum across the IP header and ICMP data, and this is calculated using a method that performs identically on big-endian and little-endian processors.

/* Calculate TCP-style checksum, add to old value */
WORD add_csum(WORD sum, void *dp, int count)
{
    WORD n=count>>1, *p=(WORD *)dp, last=sum;

    while (n--)
    {
        sum += *p++;
        if (sum < last)
            sum++;
        last = sum;
    }
    if (count & 1)
        sum += *p & 0x00ff;
    if (sum < last)
        sum++;
    return(sum);
}

Ping reception

If the unit has received a unicast ICMP request, then it should return a response to the sender that basically copies everything in the request, but with the source & destination addresses swapped, and the message type changed from request to reply. Theoretically, the ICMP checksum needs to be re-computed, but as it is just a sum of 16-bit words, it isn’t affected by the address swap. So we can just re-use the existing checksum, adjusted for the change from the request value of 8 to the response value of 0:

// Receive incoming ICMP data
int ip_rx_icmp(BYTE *data, int dlen)
{
    ETHERHDR *ehp=(ETHERHDR *)data;
    IPHDR *ip = (IPHDR *)&data[sizeof(ETHERHDR)];
    ICMPHDR *icmp = (ICMPHDR *)&data[sizeof(ETHERHDR)+sizeof(IPHDR)];
    int n;

    if (display_mode & DISP_ICMP)
        ip_print_icmp(ip);
    if (icmp->type == ICREQ)
    {
        ip_add_eth(data, ehp->srce, my_mac, PCOL_IP);
        ip->dip = ip->sip;
        ip->sip = my_ip;
        icmp->check = add_csum(icmp->check, &icmp->type, 1);
        icmp->type = ICREP;
        n = htons(ip->len);
        return(ip_tx_eth(data, sizeof(ETHERHDR)+n+sizeof(ICMPHDR)));
    }
    else if (icmp->type == ICREP)
    {
        ping_rx_time = ustime();
    }
    return(0);
}

Example program: ping.c

This program generates pins every 2 seconds, and responds to incoming ping requests. It uses a hard-coded IP addresses, for itself and the target of the outgoing pings:

IPADDR myip   = IPADDR_VAL(192,168,1,165);
IPADDR hostip = IPADDR_VAL(192,168,1,1);

‘myip’ should be set to a suitable unused IP address on your subnet (e.g. 182.168.1 in the above example); you can check if an address is unused by pinging it.

‘hostip’ should be set to the address of another unit on the network that can accept pings, or the address of the WiFi Access Point.

You’ll also need to set the name & password for the WiFi network you are using:

// The hard-coded password is for test purposes only!!!
#define SSID                "testnet"
#define PASSWD              "testpass"

See the PicoWi introduction for a description of the build process, and the connection of a serial console to see the diagnostic messages.

The LED will flash rapidly for a few seconds until the device is connected to the network; it will then switch on when a ping is sent, and off when it is received; on my network, the ping time is generally quite short, so only a brief flash is visible if everything is working correctly.

Ping times

On an Ethernet network, it is usual to see fast & repeatable values for the ping round-trip time. However wireless networks aren’t as predictable, since all units that are on the same radio channels will be competing for air-time, not just with your network, but any other networks within range.

So the response time will vary, depending on the activity of any networks sharing the same WiFi channels; here is a a typical example of 20 pings, using the time reported on the Pico console:

A ping time of 1.2 milliseconds is quite respectable, considering that a Pi 4 on the same network takes a minimum of 1.9 ms.

The graph was plotted using GNUplot; if you want to replicate it, the console output is captured to pings.txt, then pre-processed using awk:

awk -F [=\ ] '/time/ { print $(NF-1) }' pings.txt > pings.csv

This script should also work with the console output of Linux pings. The result is then fed to GNUplot; the command-line has been split into 4 for clarity:

gnuplot -e "set term png size 420,240 font 'sans,8'; \
  set title 'Ping time'; set grid; set key noautotitle; \
  set ylabel 'Time (ms)' offset 2; set output 'pings.png'; \
  plot 'pings.csv' with boxes"

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 4: scan and join a network

By the end of part 3, the WiFi chip was up & running, and as a simple test of WiFi operation, we’ll next scan the neighbourhood for WiFi networks, then attempt to join a network.

Scanning a network

As a quick check of wireless functionality, it can be useful to scan for WiFi networks within range. Before starting that, we need to send some IOCTL commands to configure various parameters, such as the network band.

The main problem with IOCTL calls is their sheer variety, that might require data in a specific format, or maybe no data at all. I haven’t been able to find a document that describes them, the only publicly-available documentation seems to be the source code . So when developing, it is quite possible to use the wrong IOCTL command, or send it the wrong data, and we need a way of reporting the error, without adding a lot of print function calls.

All my IOCTL functions return 0 if there wasn’t a reply, and -1 if the response indicated an error, so we can just chain commands using the short-circuit AND functionality to ensure execution will stop when an error occurs, and print the last IOCTL command that was executed:

#define IOCTL_WAIT  30 // Time to wait for ioctl response (msec)

const EVT_STR escan_evts[] = {EVT(WLC_E_ESCAN_RESULT), EVT(WLC_E_SET_SSID), EVT(-1)};

// Start a network scan
int scan_start(void)
{
    int ret;
    
    events_enable(escan_evts);
    ret = ioctl_wr_int32(WLC_SET_SCAN_CHANNEL_TIME, IOCTL_WAIT, SCAN_CHAN_TIME) > 0 &&
        ioctl_set_uint32("pm2_sleep_ret", IOCTL_WAIT, 0xc8) > 0 &&
        ioctl_set_uint32("bcn_li_bcn", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("bcn_li_dtim", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("assoc_listen", IOCTL_WAIT, 0x0a) > 0 &&
        ioctl_wr_int32(WLC_SET_BAND, IOCTL_WAIT, WIFI_BAND_ANY) > 0 &&
        ioctl_wr_int32(WLC_UP, IOCTL_WAIT, 0) > 0;
    ioctl_err_display(ret);
    return(ret);
}

// Display last IOCTL if error
void ioctl_err_display(int retval)
{
    IOCTL_MSG *msgp = &ioctl_txmsg;
    IOCTL_HDR *iohp = (IOCTL_HDR *)&msgp->data[msgp->cmd.sdpcm.hdrlen];
    char *cmds = iohp->cmd==WLC_GET_VAR ? "GET" : 
                 iohp->cmd==WLC_SET_VAR ? "SET" : "";
    char *data, *name;
    
    if (retval <= 0)
    {
        data = (char *)&msgp->data[msgp->cmd.sdpcm.hdrlen+sizeof(IOCTL_HDR)];
        name = iohp->cmd==WLC_GET_VAR || iohp->cmd==WLC_SET_VAR ? data : "";
        printf("IOCTL error: cmd %lu %s %s\n", iohp->cmd, cmds, name);
    }
}

We can check the code functioning by forcing an error, e.g. temporarily reducing the timeout value for a command such as ‘bcn_li_dtim’ to zero, in which case the code reports the following which, although somewhat terse, does indicate the source of the problem:

IOCTL error: cmd 263 SET bcn_li_dtim

To start a scan, we need one more IOCTL, with an data structure that sets some more parameters:

#define SSID_MAXLEN         32
#define SCANTYPE_ACTIVE     0
#define SCANTYPE_PASSIVE    1

#pragma pack(1)
typedef struct {
    uint32_t version;
    uint16_t action,
             sync_id;
    uint32_t ssidlen;
    uint8_t  ssid[SSID_MAXLEN],
             bssid[6],
             bss_type,
             scan_type;
    uint32_t nprobes,
             active_time,
             passive_time,
             home_time;
    uint16_t nchans,
             nssids;
    uint8_t  chans[14][2],
             ssids[1][SSID_MAXLEN];
} SCAN_PARAMS;

SCAN_PARAMS scan_params = {
    .version=1, .action=1, .sync_id=0x1, .ssidlen=0, .ssid={0},
    .bssid={0xff,0xff,0xff,0xff,0xff,0xff}, .bss_type=2,
    .scan_type=SCANTYPE_PASSIVE, .nprobes=~0, .active_time=~0,
    .passive_time=~0, .home_time=~0, .nchans=0
};

ioctl_set_data("escan", IOCTL_WAIT, &scan_params, sizeof(scan_params));

After that command is sent, we should receive several responses in the form of events; at least one from each WiFi network in range. The scan event handler has to byte-swap any 16 or 32-bit values, since they are in ‘network’ byte-order (big-endian); the handler function was described in the previous part of this blog.

It isn’t unusual for the same network to be reported more than once, e.g.

8C:59:73:xx:xx:xx 'Post_Office' chan 3
E8:65:D4:xx:xx:xx 'Court Hotel' chan 1
8C:59:73:xx:xx:xx 'Post_Office' chan 3
00:11:22:xx:xx:xx 'Virginia' chan 6
20:B0:01:xx:xx:xx 'vodafone' chan 6
6A:A2:22:xx:xx:xx '[hidden]' chan 6
..and so on..

In the tests I have done, the total time from power-up to receiving the last scan entry is around 2.1 seconds, which is surprisingly fast, considering how much chip-initialisation has been required.

Joining a network

This requires a large number of IOCTL commands to set up the WiFi interface, and there is little point in my listing all of them here, so I’m concentrating on specific settings of interest.

Country: this is required in order to set domain-specific parameters. I’m taking the easy way out, and specifying a country code of ‘XX’, which is a common set of world-wide characteristics.
Multicast: there is one MAC address set to 01:00:5E:00:00:FB which is the standard for IP v4
Power saving: this is disabled by default, but can be compiled in if required, though it does significantly increase WiFi response times, as the device will sleep when idle, and takes some time to wake up & respond.
Authentication: this uses a WPA2 pre-shared key, stored in plaintext, which is a major weakness in network security.
Network name: the SSID is also stored as plaintext.

Once the network join has been initiated, we receive a stream of events to show progress. These can be viewed by calling set_display_mode with DISP_EVENT. A typical joining sequence might be:

Join secure network:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

Automatic reassociation after joining a network:
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 14
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
 ..then Rx DATA flow continues..

Join open network (no security):
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

SSID not found:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Password incorrect:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 15
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 14
  ..then the same sequence repeated..

The ‘status’ values are common to all the events:

0: success
3: no networks
6: unsolicited
8: partial

The ‘reason’ values are specific to an event, for example in PSK_SUP, 14 means that a de-authentication request has been received, and 15 indicates that a timeout of the pre-shared key handshake has occurred.

Also there is no guarantee that the events will arrive in this order; for example, when I tested on a different Access Point, the last 3 events were PSK_SUP, JOIN, and SET_SSID.

I have also tested the responses to network events:

Orderly shutdown of WiFi at the access point:
  Rx_EVT  12 DISASSOC_IND,  flags 0, status 0, reason 8
  Rx_EVT   3 AUTH,          flags 0, status 5, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT  16 LINK,          flags 0, status 0, reason 2

Restore WiFi after orderly shutdown:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Power-down of the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1

Restore power to the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Network unavailable on startup:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Network becomes available after startup:
  Nothing!

Try to join a secure network, using no security 
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0

So the good news is that the WiFi chip can automatically reconnect to the network under some circumstances, but the bad news is that it will not always reconnect, and I can find no single event showing if the device is connected or not. Rather than attempting to decode the events in detail, I’ve used an overall timeout for joining a network (default 10 seconds); if that fails there is a rest period (currently also 10 seconds) before the next re-connection attempt.

Example programs

There are two examples; see the introduction for details of how to re-build and run the code.

scan.c does a single scan, and returns a list of networks found. The result returned by the WiFi chip is displayed as-is, so may contain duplicates.

join.c joins a given network, reporting on progress; the network name and password must be entered in the source code:

#define SSID                "testnet"
#define PASSWD              "testpass"

The on-board LED flashes at 5 Hz prior to connection, and at 1 Hz when connected.

In the next part I’ll start using TCP/IP protocols.

Project links
Introduction	Project overview
Part 1	Low-level interface; hardware & software
Part 2	Initialisation; CYW43xxx chip setup
Part 3	IOCTLs and events; driver communication
Part 4	Scan and join a network; WPA security
Part 5	ARP, IP and ICMP; IP addressing, and ping
Part 6	DHCP; fetching IP configuration from server
Part 7	DNS; domain name lookup
Part 8	UDP server socket
Part 9	TCP Web server
Part 10	Web camera
Source code	Full C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.