WiFi – Page 2

EDLA part 2: firmware for remote logic analyser

This is the second part of a 3-part blog post describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. Part 1 described the hardware, this post now describes the firmware within the logic analyser unit.

Development environment

There are two main development environments for the ESP32 processor; ESP-IDF and Arduino-compatible. The former is much more comprehensive, but a lot of those features aren’t needed, so to save time, I have used the latter.

There are two ways of developing Arduino code; using the original Arduino IDE, or using Microsoft Visual Studio Code (VS Code) with a build system called PlatformIO. I originally tried to support both, but found the Arduino IDE too restrictive, so opted for VS Code and PlatformIO.

Installing this on Windows is remarkably easy, see these posts on PlatformIO installation or PlatformIO development

Then it is just necessary to open a directory containing the project files, and after a suitable pause while the necessary files are downloaded, the source files can be compiled, and the resulting binary downloaded onto the ESP32 module.

The code has two main areas: driving the custom hardware that captures the samples, and the network interface.

Hardware driver

As described in the previous post, the main hardware elements driven by the CPU are:

16-bit data bus for the RAM chips and the comparator outputs
Clock & chip select for RAM chips
SPI interface for the DAC that sets the threshold

Data bus

The sample memory consists of four 23LC1024 serial RAM chips, each storing 1 Mbit in quad-SPI (4-bit) mode. They are arranged to form a 16-bit data bus; it would be really convenient if this could be assigned to 16 consecutive I/O bits on the CPU, but the ESP32 hardware does not permit this. The assignment is:

Data line 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
GPIO      4  5 12 13 14 15 16 17 18 19 21 22 23 25 26 27

There is an obvious requirement to handle the data bus as a single 16-bit value within the code, so it is necessary to provide functions that convert that 16-bit data into a 32-bit value to be fed to the I/O pins, and vice-versa, and it’d be helpful if this was done in an easy-to-understand manner, to simplify any changes when a new CPU is used that has a different pin assignment.

After having tried the usual mess of shift-and-mask operations, I hit upon the idea of creating a bitfield for each group of consecutive GPIO pins, and a matching bitfield for the same group in the 16-bit word; then it is only necessary to equate each field to its partner, to produce the required conversion.

// Data bus pin definitions
// z-variables are unused pins
typedef struct {
    uint32_t z1:4, d0_1:2, z2:6, d2_9:8, z3:1, d10_12:3, z4:1, d13_15:3;
} BUSPINS;
typedef union {
    uint32_t val;
    BUSPINS pins;
} BUSPINVAL;

// Matching elements in 16-bit word
typedef struct {
    uint32_t d0_1:2, d2_9:8, d10_12:3, d13_15:3;
} BUSWORD;
typedef union {
    uint16_t val;
    BUSWORD bits;
} BUSWORDVAL;

// Return 32-bit bus I/O value, given 16-bit word
inline uint32_t word_busval(uint16_t val) {
    BUSWORDVAL w = { .val = val };
    BUSPINVAL  p = { .pins = { 0, w.bits.d0_1,   0, w.bits.d2_9,
                               0, w.bits.d10_12, 0, w.bits.d13_15 } };
    return (p.val);
}

// Return 16-bit word, given 32-bit bus I/O value
inline uint16_t bus_wordval(uint32_t val) {
    BUSPINVAL  p = { .val = val };
    BUSWORDVAL w = { .bits = { p.pins.d0_1, p.pins.d2_9, 
                               p.pins.d10_12, p.pins.d13_15 } };
    return (w.val);
}

An additional complication is that the 16-bit value is going to 4 RAM chips, and each chip needs to receive the same command, and the bit-pattern of that command changes depending on whether the chip is in SPI or quad-SPI (QSPI, also known as SQI) mode. So the command to send a command to all 4 RAM chips in SPI mode is:

#define RAM_SPI_DOUT    1
#define MSK_SPI_DOUT    (1 << RAM_SPI_DIN)
#define ALL_RAM_WORD(b) ((b) | (b)<<4 | (b)<<8 | (b)<<12)
uint32_t spi_dout_pins = word_busval(ALL_RAM_WORD(MSK_SPI_DOUT));

// Send byte command to all RAMs using SPI
// Toggles SPI clock at around 7 MHz
void bus_send_spi_cmd(byte *cmd, int len) {
    GPIO.out_w1ts = spi_hold_pins;
    while (len--) {
        byte b = *cmd++;
        for (int n = 0; n < 8; n++) {
            if (b & 0x80) GPIO.out_w1ts = spi_dout_pins;
            else GPIO.out_w1tc = spi_dout_pins;
            SET_SCK;
            b <<= 1;
            CLR_SCK;
        }
    }
}

I have used a ‘bit-bashing’ technique (i.e. manually driving the I/O pins high or low) since I’m emulating 4 SPI transfers in parallel, and as you can see from the comment, the end-result is reasonably fast.

When the RAMS are in QSPI mode, instead of doing eight single-bit transfers, we must do two four-bit transfers:

// Send a single command to all RAMs using QSPI
void bus_send_qspi_cmd(byte *cmd, int len) {
    while (len--) {
        uint32_t b1=*cmd>>4, b2=*cmd&15;
        uint32_t val=word_busval(ALL_RAM_WORD(b1));
        gpio_out_bus(val);
        SET_SCK;
        val = word_busval(ALL_RAM_WORD(b2));
        CLR_SCK;
        gpio_out_bus(val);
        SET_SCK;
        cmd++;
        CLR_SCK;
    }
}

The above code assumes that the appropriate I/O pin-directions (input or output) have been set, but that too depends on which mode the RAMs are in; for SPI each RAM chip has 2 data inputs (DIN and HOLD) and 1 output (DOUT), whilst in QSPI mode all 4 RAM data pins are inputs or outputs depending on whether the RAM is being written to, or read from.

There are 4 commands that the software sends to the RAM chips, each is a single byte:

0x38: enter quad-SPI (QSPI) mode
0xff: leave QPSI mode, enter SPI mode
0x02: write data
0x03: read data

The read & write commands are followed by a 3-byte address value, that dictates the starting-point for the transfer. So if the RAMs are already in QSPI mode, the sequence for capturing samples is:

Set bus pins as outputs, so bus is controlled by CPU
Assert RAM chip select
Send command byte, with a value of 2 (write)
Send 3 address bytes (all zero when starting data capture)
Set bus pins as inputs, so bus is controlled by comparators
Start RAM clock
When capture is complete, stop RAM clock
Negate RAM chip select

The steps for recovering the captured data are:

Set bus pins as outputs, so bus is controlled by CPU
Assert RAM chip select
Send command byte, with a value of 3 (read)
Send 3 address bytes
Set bus pins as inputs, so bus is controlled by the RAM chips
Toggle clock line, and read data from the 16-bit bus
When readout is complete, negate RAM chip select

RAM clock and chip select

When the CPU is directly accessing the RAM chips (to send commands, or read back data samples) it is most convenient to ‘bit-bash’ the clock and I/O signals, as described above. It is possible that incoming interrupts can cause temporary pauses in the clock transitions, but this doesn’t matter: the RAM chips use ‘static’ memory, which won’t change its state even if there is a very long pause in a transfer cycle.

However, when capturing data, it is very important that the RAMs receive a steady clock at the required sample rate, with no interruptions. This is easily achieved on the ESP32 by using the LED PWM peripheral:

#define PIN_SCK         33
#define PWM_CHAN        0

// Initialise PWM output
void pwm_init(int pin, int freq) {
    ledcSetup(PWM_CHAN, freq, 1);
    ledcAttachPin(pin, PWM_CHAN);
}

// Start PWM output
void pwm_start(void) {
    ledcWrite(PWM_CHAN, 1);
}
// Stop PWM output
void pwm_stop(void) {
    ledcWrite(PWM_CHAN, 0);
}

In addition, the CPU must count the number of pulses that have been output, so that it knows which memory address is currently being written – there is no way to interrogate the RAM chip to establish its current address value. Surprisingly, the ESP32 doesn’t have a general-purpose 32-bit counter, so we have to use the 16-bit pulse-count peripheral instead, and detect overflows in order to produce a 32-bit value.

volatile uint16_t pcnt_hi_word;

// Handler for PCNT interrupt
void IRAM_ATTR pcnt_handler(void *x) {
    uint32_t intr_status = PCNT.int_st.val;
    if (intr_status) {
        pcnt_hi_word++;
        PCNT.int_clr.val = intr_status;
    }
}

// Initialise PWM pulse counter
void pcnt_init(int pin) {
    pcnt_intr_disable(PCNT_UNIT);
    pcnt_config_t pcfg = { pin, PCNT_PIN_NOT_USED, PCNT_MODE_KEEP, PCNT_MODE_KEEP,
        PCNT_COUNT_INC, PCNT_COUNT_DIS, 0, 0, PCNT_UNIT, PCNT_CHAN };
    pcnt_unit_config(&pcfg);
    pcnt_counter_pause(PCNT_UNIT);
    pcnt_event_enable(PCNT_UNIT, PCNT_EVT_THRES_0);
    pcnt_set_event_value(PCNT_UNIT, PCNT_EVT_THRES_0, 0);
    pcnt_isr_register(pcnt_handler, 0, 0, 0);
    pcnt_intr_enable(PCNT_UNIT);
    pcnt_counter_pause(PCNT_UNIT);
    pcnt_counter_clear(PCNT_UNIT);
    pcnt_counter_resume(PCNT_UNIT);
    pcnt_hi_word = 0;
}

// Return sample counter value (mem addr * 2), extended to 32 bits
uint32_t pcnt_val32(void) {
    uint16_t hi = pcnt_hi_word, lo = PCNT.cnt_unit[PCNT_UNIT].cnt_val;
    if (hi != pcnt_hi_word)
        lo = PCNT.cnt_unit[PCNT_UNIT].cnt_val;
    return(((uint32_t)hi<<16) | lo);
}

When writing this code, I came across some strange features of the PCNT interrupt, such as multiple interrupts for a single event, and misleading values when reading the count value inside the interrupt handler, so be careful when doing any modifications.

The pulse count does not equal the RAM address; is the RAM address multiplied by 2. This is because it takes two 4-bit write cycles to create one byte in RAM (bits 4-7, then 0-3), so the memory chip increments its RAM address once for every 2 samples.

All the RAMs share a single clock line and chip select; the select line is driven low at the start of a command, and must remain low for the duration of the command and data transfer; when it goes high, the transfer is terminated.

Setting threshold value

The comparators compare the incoming signal with a threshold value, to determine if the value is 1 or 0 (above or below threshold). The threshold is derived from a digital-to-analog converter (DAC), the part I’ve chosen is the Microchip MCP4921; it was necessary to use a part with an SPI interface, since there is only 1 spare output pin, which serves as the chip select for this device; the clock and data pins are shared with the RAM chips.

This means that the DAC control code can use the same drivers as the RAM chips by negating the RAM chip select, and asserting the DAC chip select:

#define PIN_DAC_CS      2
#define DAC_SELECT      GPIO.out_w1tc = 1<<PIN_DAC_CS
#define DAC_DESELECT    GPIO.out_w1ts = 1<<PIN_DAC_CS

// Output voltage from DAC; Vout = Vref * n / 4096
void dac_out(int mv) {
    uint16_t w = 0x7000 + ((mv * 4096) / 3300);
    byte cmd[2] = { (byte)(w >> 8), (byte)(w & 0xff) };
    RAM_DESELECT;
    DAC_SELECT;
    bus_send_spi_cmd(cmd, 2);
    DAC_DESELECT;
}

Triggering

Triggering is achieved by using the ESP32 pin-change interrupt, as this can capture quite a narrow pulses. There will be a delay before the interrupt is serviced, which means that we don’t get an accurate indication of which sample caused the trigger, but that isn’t a problem in practice.

int trigchan, trigflag;

// Handler for trigger interrupt
void IRAM_ATTR trig_handler(void) {
    if (!trigflag) {
        trigsamp = pcnt_val32();
        trigflag = 1;
    }
}

// Enable or disable the trigger interrupt for channels 1 to 16
void set_trig(bool en) {
    int chan=server_args[ARG_TRIGCHAN].val, mode=server_args[ARG_TRIGMODE].val;
    if (trigchan) {
        detachInterrupt(busbit_pin(trigchan-1));
        trigchan = 0;
    }
    if (en && chan && mode) {
        attachInterrupt(busbit_pin(chan-1), trig_handler, 
            mode==TRIG_FALLING ? FALLING : RISING);
        trigchan = chan;
    }
    trigflag = 0;
}

This interrupt handler sets a flag, that is actioned by the main state machine. There is a ‘trig_pos’ parameter that sets how many tenths of the data should be displayed prior to triggering; it is normally set to 1, which means that (approximately) 1 tenth will be displayed before the trigger, and 9 tenths after.

It is possible that there may be a considerable delay before the trigger event is encountered. In this case, the unit continues to capture samples, and the RAM address counter will wrap around every time it reaches the maximum value. This means that the pre-trigger data won’t necessarily begin at address zero; the firmware has to fetch the trigger RAM address, then jump backwards to find the start of the data.

State machine

This handles the whole capture process. There are 6 states:

Idle: no data, and not capturing data
Ready: data has been captured, ready to be uploaded
Preload: capturing data, before looking for trigger
PreTrig: capturing data, looking for trigger
PostTrig: capturing data after trigger
Upload: transferring data over the network

The Preload state is needed to ensure there is some data prior to the trigger. If triggering is disabled, then as soon as the capture is started, the software goes directly to the PostTrig state, checking the sample count to detect when it is greater than the requested number.

// Check progress of capture, return non-zero if complete
bool web_check_cap(void) {
    uint32_t nsamp = pcnt_val32(), xsamp = server_args[ARG_XSAMP].val;
    uint32_t presamp = (xsamp/10) * server_args[ARG_TRIGPOS].val;
    STATE_VALS state = (STATE_VALS)server_args[ARG_STATE].val;
    server_args[ARG_NSAMP].val = nsamp;
    if (state == STATE_PRELOAD) {
        if (nsamp > presamp)
            set_state(STATE_PRETRIG);
    }
    else if (state == STATE_PRETRIG) {
        if (trigflag) {
            startsamp = trigsamp - presamp;
            set_state(STATE_POSTTRIG);
        }
    }
    else if (state == STATE_POSTTRIG) {
        if (nsamp-startsamp > xsamp) {
            cap_end();
            set_state(STATE_READY);
            return(true);
        }
    }
    return (false);
}

Network interface

A detailed description of network operation will be found in part 3 of this project; for now, it is sufficient to say that the unit acts as a wireless client, connecting to a pre-defined WiFi access point; it has a simple Web server with all requests & responses using HTTP.

Wireless connection

The first step is to join a wireless network, using a predefined network name (‘SSID’) and password. The code must also try to re-establish the link to he Access Point if the connection fails, so there is a polling function that checks for connectivity.

// Begin WiFi connection
void net_start(void) {
    DEBUG.print("Connecting to ");
    DEBUG.println(ssid);
    WiFi.begin(ssid, password);
    WiFi.setSleep(false);
}

// Check network is connected
bool net_check(void) {
    static int lastat=0;
    int stat = WiFi.status();
    if (stat != lastat) {
        if (stat<=WL_DISCONNECTED) {
            DEBUG. printf("WiFi status: %s\r\n", wifi_states[stat]);
            lastat = stat;
        }
        if (stat == WL_DISCONNECTED)
            WiFi.reconnect();
    }
    return(stat == WL_CONNECTED);
}

Web server

The Web pages are very simple and only contain data; the HTML layout and Javascript code to display the data is fetched from a different server.

The server is initialised with callbacks for three pages:

#define STATUS_PAGENAME "/status.txt"
#define DATA_PAGENAME   "/data.txt"
#define HTTP_PORT       80

WebServer server(HTTP_PORT);

// Check if WiFi & Web server is ready
bool net_ready(void) {
    bool ok = (WiFi.status() == WL_CONNECTED);
    if (ok) {
        DEBUG.print("Connected, IP ");
        DEBUG.println(WiFi.localIP());
        server.enableCORS();
        server.on("/", web_root_page);
        server.on(STATUS_PAGENAME, web_status_page);
        server.on(DATA_PAGENAME, web_data_page);
        server.onNotFound(web_notfound);
        DEBUG.print("HTTP server on port ");
        DEBUG.println(HTTP_PORT);
        delay(100);
    }
    return (ok);
}

The root page returns a simple text string, and is mainly used to check that the Web server is functioning:

#define HEADER_NOCACHE  "Cache-Control", "no-cache, no-store, must-revalidate"

// Return root Web page
void web_root_page(void) {
    server.sendHeader(HEADER_NOCACHE);
    sprintf((char *)txbuff, "%s, attenuator %u:1", version, THRESH_SCALE);
    server.send(200, "text/plain", (char *)txbuff);
}

All the Web pages are sent with a header that disables browser caching; this is necessary to ensure that the most up-to-date data is displayed.

The status page returns a JSON (Javascript Object Notation) formatted string, containing the current settings; a typical response might be:

{"state":1,"nsamp":10010,"xsamp":10000,"xrate":100000,"thresh":10,"trig_chan":0,"trig_mode":0,"trig_pos":1}

This indicates that 10000 samples were requested at 100 KS/s, 10010 were actually collected, using a threshold of 10 volts. The ‘state’ value of 1 indicates that data collection is complete, and the data is ready to be uploaded.

The individual arguments are stored in an array of structures, which is converted into the JSON string:

typedef struct {
    char name[16];
    int val;
} SERVER_ARG;

SERVER_ARG server_args[] = {
    {"state",       STATE_IDLE},
    {"nsamp",       0},
    {"xsamp",       10000},
    {"xrate",       100000},
    {"thresh",      THRESH_DEFAULT},
    {"trig_chan",   0},
    {"trig_mode",   0},
    {"trig_pos",    1},
    {""}
};

// Return server status as json string
int web_json_status(char *buff, int maxlen) {
    SERVER_ARG *arg = server_args;
    int n=sprintf(buff, "{");
    while (arg->name[0] && n<maxlen-20) {
        n += sprintf(&buff[n], "%s\"%s\":%d", n>2?",":"", arg->name, arg->val);
        arg++;
    }
    return(n += sprintf(&buff[n], "}"));
}

The HTTP request for the status page can also include a query string with parameters that reflect the values the user has entered in a Web form. If a ‘cmd’ parameter is included, it is interpreted as a command; the following query includes ‘cmd=1’, which starts a new capture:

GET /status.txt?unit=1&thresh=10&xsamp=10000&xrate=100000&trig_mode=0&trig_chan=0&zoom=1&cmd=1

The software matches the parameters with those in the server_args array, and stores the values in that array; unmatched parameters (such as the zoom level) are ignored.

// Return status Web page
void web_status_page(void) {
    web_set_args();
    web_do_command();
    web_json_status((char *)txbuff, TXBUFF_LEN);
    server.sendHeader(HEADER_NOCACHE);
    server.setContentLength(CONTENT_LENGTH_UNKNOWN);
    server.send(200, "application/json");
    server.sendContent((char *)txbuff);
    server.sendContent("");
}

// Get command from incoming Web request
int web_get_cmd(void) {
    for (int i=0; i<server.args(); i++) {
        if (!strcmp(server.argName(i).c_str(), "cmd"))
            return(atoi(server.arg(i).c_str()));
    }
    return(0);
}

// Get arguments from incoming Web request
void web_set_args(void) {
    for (int i=0; i<server.args(); i++) {
        int val = atoi(server.arg(i).c_str());
        web_set_arg(server.argName(i).c_str(), val);
    }
}

Data transfer

The captured data is transferred using an HTTP GET request to the page data.txt. The binary data is encoded using the base64 method, which converts 3 bytes into 4 ASCII characters, so it can be sent as a text block. There is insufficient RAM in the ESP32 to store the sample data, so it is transferred on-the-fly from the RAM chips to a network buffer.

// Return data Web page
void web_data_page(void) {
    web_set_args();
    web_do_command();
    server.sendHeader(HEADER_NOCACHE);
    server.setContentLength(CONTENT_LENGTH_UNKNOWN);
    server.send(200, "text/plain");
    cap_read_start(startsamp);
    int count=0, nsamp=server_args[ARG_XSAMP].val;
    size_t outlen = 0;
    while (count < nsamp) {
        size_t n = min(nsamp - count, TXBUFF_NSAMP);
        cap_read_block(txbuff, n);
        byte *enc = base64_encode((byte *)txbuff, n * 2, &outlen);
        count += n;
        server.sendContent((char *)enc);
        free(enc);
    }
    server.sendContent("");
    cap_read_end();
}

The ‘unknown’ content length means that the software can send an arbitrary number of text blocks, without having to specify the total length in advance. The transfer is terminated by calling sendContent with a null string.

Diagnostics

There is a single red LED, but due to pin constraints, it is shared with the RAM chip select. So it will always illuminate when the RAM is being accessed, but in addition:

Rapid flashing (5 Hz) if the unit is not connected to the WiFi network
Brief flash (100 ms every 2 seconds) when the unit is connected to the network.
Solid on when the unit is capturing data, and is waiting for a trigger, or until the required amount of data has been collected.

There is also the ESP32 USB interface that emulates a serial console at 115 Kbaud:

#define DEBUG_BAUD  115200
#define DEBUG       Serial      // Debug on USB serial link

DEBUG.begin(DEBUG_BAUD);

// 'print' 'println' and 'printf' functions are supported, e.g.
DEBUG.print("Connecting to ");
DEBUG.println(ssid);

To view the console display, you can use your favourite terminal emulator (e.g. TeraTerm on Windows) connected to the USB serial port, however you will have to break that connection every time you re-program the ESP32, since it is needed for re-flashing the firmware. The VS Code IDE does have its own terminal emulator, which generally auto-disconnects for re-programming, but I have had occasional problems with this feature, for reasons that are a bit unclear.

Modifications

There are a few compile-time options that need to be set before compiling the source code:

SW_VERSION (in main.cpp): a string indicating the current software version number
ssid & password (in esp32_web.cpp): must be changed to match your wireless network
THRESH_SCALE (in esp32_la.h): the scaling factor for the threshold value, that is used to program the DAC.

The threshold scaling will depend on the values of the attenuator resistors. The unit was originally designed for input voltages up to 50V, with a possible overload to 250V, so the input attenuation was 101 (100K series resistor, 1K shunt resistor). If using the unit with, say, 5 volt logic, then the series resistor will need to be much lower (and maybe the shunt resistance a bit higher) so the threshold scaling value will need to be adjusted accordingly. Since the threshold value sent from the browser is an integer value (currently 0 – 50) you might choose the redefine that value when working with lower voltages, for example represent 0 – 7 volts as a value of 0 – 70, in tenths of a volt. This change will need to be made in the firmware, and both Web interfaces.

An important note, when creating a new unit. Since I’m using all the available I/O pins on the ESP32, I’ve had to use GPIO12, even though this does (by default) determine the Flash voltage at startup.

To use the pin for I/O, it is essential that this behaviour is changed by modifying the parameters in the ESP32 one-time-programmable memory. This is done using the Python espefuse program that is provided in the IDE. To summarise the current settings, navigate to the directory containing that file, and execute:

python espefuse.py --port COM4 summary

..assuming the USB serial link is on Windows COM port 4. Then to modify the setting, execute:

python espefuse.py --port COM4 set_flash_voltage 3.3V

You will be prompted to confirm that the change should be made, since it is irreversible. Then if you re-run the summary, the last line should be:

Flash voltage (VDD_SDIO) set to 3.3V by efuse.

Part 1 of this project looked at the hardware, part 3 the Web interface and Python API. The source files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA part 1: hardware for remote logic analyser

This is the first post in a series describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. For an overview of the project, see this post. The hardware specification is:

Digital inputs: 16 for each unit
Input threshold: programmable
Sample rate: up to 20 megasamples per second
Sample store: up to 250 kilosamples
Network interface: WiFi

Wireless networking is an essential component of this project, and at the time of writing, the most logical hardware choice is the Espressif ESP32, which is a microcontroller with up to 34 GPIO pins, an Xtensa dual-core 32-bit LX6 processor, and built-in WiFi.

Sample storage

There are ESP32 variants with differing amounts RAM & flash ROM, but currently the most common type is the ESP32-S2 with 320 KiB SRAM and 128 KiB ROM. A significant portion of this is taken up by the WiFi code, so there is insufficient RAM to store the required number of samples.

When using external memory for sample storage, the standard practice is to employ one or more RAM chips and an address counter that is fed from a constant clock that increments when a sample is stored.

Unfortunately, the 16 data lines plus 18 address lines make this arrangement quite bulky, even when implemented using surface-mount parts. One way of simplifying the circuit is to embed the logic within a programmable logic device, such as a Field-Programmable Gate Array (FPGA), but the programming & debugging of such a device can be quite complex.

Ideally, what we want is a RAM device that has a built-in address counter, that will auto-increment on each sample. Such devices do exist, they are known as ‘Serial SRAM’; they have a 1-bit, 2-bit or 4-bit clocked serial interface for sending commands and data. You may be familiar with 1-bit SPI (Serial Peripheral Interface), as it is used in a wide variety of devices, and the RAM chip is in this mode on startup. Less well-known are the 2-bit (SDI or DSPI) and 4-bit (SQI or QSPI) interfaces that use the same hardware lines unidirectionally; commands are sent to read or write data in these modes, and thereafter the RAM chip transfers the data with an auto-incrementing address counter that wraps around at the end of the RAM.

Each RAM chip handles 4 input channels, so 4 chips are needed for the 16-channel input.

The following steps are needed for data capture:

Send command to switch RAM from SPI to SQI mode
Send a ‘write’ command, with the desired starting address
Assert the chip select line, and start the clock signal.
When capture is complete, stop the clock signal
Negate chip select, which completes the ‘write’ command

Readback of the captured data is similar, except that a ‘read’ command is used.

Unfortunately there is no way to read back the address counter within the RAM chip, so to keep track of the sampling process, it is necessary to attach a pulse counter to the clock line; a counter/timer within the microcontroller can be used for this purpose.

Another issue is the fact that the RAM data lines serve two purposes; to receive data from the comparators, and commands from the CPU. Ideally the comparators would have an ‘enable’ pin to tri-state their outputs, but I couldn’t find a suitable device with this feature. Failing that, the conventional approach would be to use multiplexer chips to switch between the two data sources, but I’ve taken a much simpler approach, using resistors in the output of the comparators. When the CPU is in control, it just sets the data lines high or low as required, overriding the comparator outputs; when capturing data, the CPU sets its pins as inputs, so the comparators control the data going into the RAM, albeit with a small delay due to the 1K series resistance interacting with the circuit capacitance, but this hasn’t proved a problem in practice.

Triggering

An important feature of logic analysers is triggering; the ability to continuously capture data until a specific condition is met, carry on capturing for a specific number of samples after the trigger, then stop.

The logic to support this operation can be quite complex, and the addition of digital comparators (or their equivalent in programmable logic) would be a major complication. However, it is worth bearing in mind two things:

If there is a small time-delay between the trigger condition being detected, and the hardware reacting to the trigger, then it is no problem; if we are capturing tens of thousands of samples, a trigger delay of 10 or 20 samples is of no great concern.
The CPU is largely idle while data is being captured; it only has to respond to network requests, which are largely handled by the 2nd CPU in a dual-CPU device.

In common with most modern microcontrollers, the ESP32 has the ability to generate an interrupt on the state-change of any I/O pin, and this interrupt can be used for triggering, since it can capture very short pulses (under 50 nanoseconds). In theory it is possible to chain several edge-interrupts together, to give more complex triggering, but personally I’ve found a single edge-trigger to be sufficient for most purposes.

ESP32

The decision to use an ESP32 processor was largely driven by its built-in WiFi interface, and the ready availability of complete low-cost modules with a built-in antenna (or connector for an external antenna). The module used is the ESP32-S2-DevKitC with 38 pins, and an ESP32-WROOM-32D or -32E processor; take care not not to be confuse it with similar-looking modules.

This module has a few features that make it an excellent choice:

Fast dual-CPU architecture.
Easy-to-use C software development environment based on the VScode IDE, PlatformIO configuration, and the Arduino run-time environment.
A pin multiplexer that removes a lot of constraints as to which pins can be used for which internal functions
Simple PWM generator that can generate the required clock frequencies
Edge-detection interrupts on any I/O pin.

However, there are some less-than-ideal features:

Gaps in the I/O pin assignments, so it is impossible to assign 8 consecutive bits to form a single byte-wide input, or 16 consecutive bits to form a word-wide input.
Absence of a general-purpose 32-bit pulse counter; only 16 bits are available.
Usage of some I/O pins to specify boot-time settings.
Some GPIO pins are input-only.

These issues can be resolved in software, as will be described in the next blog post, but the final design does use all the input/output pins, with none spare, which forces some economies. For example, it is highly desirable to have a diagnostic LED controlled by the CPU, but there is no O/P pin to drive it, so it has been put on the RAM chip-select line, which slightly reduces the flexibility of the LED indications.

Analog inputs

Each analog input requires an attenuator to reduce the input voltage down to something manageable, and a comparator that compares the attenuated signal with a programmable reference voltage produced by a DAC (Digital-Analog Converter).

The attenuator is just a resistive potential divider; the resistors have been arranged in groups of 8, such that dual-in-line (DIL) plug-in resistor packs can be used in place of discrete resistors. This means that the board can handle very a wide range of input voltages by plugging in different resistor packs.

It proved quite difficult to find a comparator that is readily available, fast enough, and with a push-pull output (not open-drain). An early candidate was the MAX942, but this has back-to-back diodes between the inverting and non-inverting inputs, which would cause major problems if the voltage difference was sufficiently high to make them conduct. In the end, TS3022 devices in SO-8 packages were selected, and they perform really well; provision had been made for adding positive feedback to provide hysteresis (by adding resistors to the DIL-footprint through-holes), but in practice this has not been necessary.

Programming

The ESP32 module has a micro USB connector to provide power to the unit, and a programming interface. As a backup, the PCB also includes a JTAG programming interface, but this uses some of the data pins, so is only usable on a bare depopulated PCB.

The USB interface also emulates a serial console, that is compatible with standard PC terminal emulators; the ESP32 firmware makes extensive use of this for diagnostic reporting.

PCB design

The circuit diagram, PCB manufacturing files (Gerbers) and parts list are in the project repository; do check the README file for the latest information.

The PCB has dual-footprints (DIL & SO-8) for the memory chips and the comparators. I have used DIL sockets for the RAM chips so they can be upgraded at a future date, but as mentioned above, none of the DIL-packaged comparators were suitable, so surface-mount TS3022 parts were used – they have a relatively generous pin spacing (1.27 mm) so shouldn’t be difficult for anyone to assemble who has reasonable soldering skills.

The photo above shows socketed resistor packs for the input attenuators; if using these (as opposed to individual resistors) make sure you buy the type with 8 individual resistors, not commoned.

The ESP32 module requires two 19-way sockets with square pins; I had to use 20-pin parts, with one pin cut off. To help with hardware diagnostics, I have included convenient 2.54 mm pitch headers for the RAM clock, chip select and data lines. These only need to be populated if you are using a logic analyser to trace the board’s operation, or if you wish to remove the ESP32 module and drive the board from some other CPU.

Power (5 volts, with a current capacity of at least 250 mA) is either applied on the USB connector, or on P14, in which case there needs to be an on/off switch connected to the terminals of P7, or those pins must be bonded across. The module is programmed over USB; do not use the JTAG interface unless the board is de-populated.

Part 2 of this project looks at the ESP32 firmware, part 3 the Web interface and Python API. The circuit diagram and PCB files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA: remote logic analyzer using ESP32 and Web protocols

There are plenty of low-cost logic analysers but they all share a common characteristic; a USB link is used to transfer the data into a PC for analysis.

If the equipment is in a safe & comfortable office environment, then this isn’t a problem, but in many cases it is operating in an distant, inaccessible or hostile location, so remote monitoring is desirable. If the analyser unit is small and low-cost, it can remain attached on a semi-permanent basis, enabling long-term monitoring & diagnosis of remote equipment

The initial specification of the logic analyser unit is

Digital inputs: 16 for each unit
Input threshold: programmable
Sample rate: up to 20 megasamples per second
Sample store: up to 250 kilosamples
Network interface: WiFi
Network protocols: TCP and HTTP
Control method: full remote control
Display method: Web pages with Javascript
Remote API: Python class

The project is fully open-source, and is documented in the following posts:

Part 1, hardware The data acquisition and networking hardware
Part 2, firmware C source code that drives the capture hardware
Part 3, networking Web interface and Python API
Source files for the hardware and software

PEDLA is a variant of this project, using a Pi Pico RP2040 in place of the ESP32.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

Pi Pico wireless Web server using ESP32 and MicroPython

There are various ways that the Pi Pico (RP2040) can be given a wireless interface; in a previous post I used a Microchip ATWINC1500, now I’m using the Espressif ESP32 WROOM-32. The left-hand photo above shows an Adafruit Airlift ESP32 co-processor board which must be hand-wired to the Pico, whilst the right-hand is a Pimorini Wireless Pack that plugs directly into the Pico, either back-to-back, or (as in the photo) on an Omnibus baseboard that allows multiple I/O boards to be attached to a single Pico.

So you can add WiFi connectivity to your Pico without any additional wiring.

The resulting hardware would normally be programmed in C, but I really like the simplicity of MicroPython, so have chosen that language, but this raises an additional question; do I use the Pimorini MicroPython that is derived from the Pi Foundation version, or CircuitPython, which is a derivative created by Adafruit, with various changes?

CircuitPython includes a lot of I/O libraries as standard, but does lack some important features (such as direct access to memory) that are useful to the more advanced developer. So I’ll try to support both, but I do prefer the working with MicroPython.

SPI interface

The ESP32 does all the hard work of connection to the WiFi network and handling TCP/IP sockets, it is just necessary to send the appropriate commands over the SPI link. In addition to the usual clock, data and chip-select lines, there is a ‘reset’ signal from the Pico to the ESP, and a ‘ready’ signal back from the ESP to the Pico. This is necessary because the Pico spends much of its time waiting for the ESP to complete a command; instead of continually polling for a result, the Pico can wait until ‘ready’ is signalled then fetch the data.

My server code uses the I/O pins defined by the Adafruit Pico Wireless Pack:

Function            GPIO  Pin num
Clock               18    24
Pico Tx data (MOSI) 19    25
Pico Rx data (MISO) 16    21
Chip select (CS)     7    10
ESP32 ready         10    14
ESP32 reset         11    15

Software components

Pico software modules and ESP32 interface

ESP32 code

The ESP32 code takes low-level commands over the SPI interface, such as connecting and disconnecting from the wireless network, opening TCP sockets, sending and receiving data. The same ESP32 firmware works with both the MicroPython and CircuitPython code and I suggest you buy an ESP32 module with the firmware pre-loaded, as the re-building & re-flashing process is a bit complicated, see here for the code, and here for a guide to the upgrade process. I’m using 1.7.3, you can check the version in CircuitPython using:

import board
from digitalio import DigitalInOut
esp32_cs = DigitalInOut(board.GP7)
esp32_ready = DigitalInOut(board.GP10)
esp32_reset = DigitalInOut(board.GP11)
spi = busio.SPI(board.GP18, board.GP19, board.GP16)
esp = adafruit_esp32spi.ESP_SPIcontrol(spi, esp32_cs, esp32_ready, esp32_reset)
print("Firmware version", esp.firmware_version.decode('ascii'))

Note that some ESP32 modules are preloaded with firmware that provides a serial interface instead of SPI, using modem-style ‘AT’ commands; this is incompatible with my code, so the firmware will need to be re-flashed.

MicroPython or CircuitPython

This has to be loaded onto the Pico before anything else. There are detailed descriptions of the loading process on the Web, but basically you hold down the Pico pushbutton while making the USB connection. The Pico will then appear as a disk drive in your filesystem, and you just copy (drag-and-drop) the appropriate UF2 file onto that drive. The Pico will then automatically reboot, and run the file you have loaded.

The standard Pi Foundation MicroPython lacks the necessary libraries to interface with the ESP32, so we have to use the Pimorini version. At the time of writing, the latest ‘MicroPython with Pimoroni Libs’ version is 0.26, available on Github here. This includes all the necessary driver code for the ESP32.

If you are using CircuitPython, the installation is a bit more complicated; the base UF2 file (currently 7.0.0) is available here, but you will also need to create a directory in the MicroPython filesystem called adafruit_esp32spi, and load adafruit_esp32spi.py and adafruit_esp32spi_socket.py into it. The files are obtained from here, and the loading process is as described below.

Loading files through REPL

A common source of confusion is the way that files are loaded onto the Pico. I have already described the loading of MicroPython or CircuitPython UF2 images, but it is important to note that this method only applies to the base Python code; if you want to add files that are accessible to your software (e.g. CircuitPython add-on modules, or Web pages for the server) they must be loaded by a completely different method.

When Python runs, it gives you an interactive console, known as REPL (Read Evaluate Print Loop). This is normally available as a serial interface over USB, but can also be configured to use a hardware serial port. You can directly execute commands using this interface, but more usefully you can use a REPL-aware editor to prepare your files and upload them to the Pico. I use Thonny; Click Run and Select Interpreter, and choose either MicroPython (Raspberry Pi Pico) or CircuitPython (Generic) and Thonny will search your serial port to try and connect to Python running on the Pico. You can then select View | Files, and you get a window that shows your local (PC) filesystem, and also the remote Python files. You can then transfer files to & from the PC, and create subdirectories.

At the time of writing, Thonny can’t handle drag-and-drop between the local & remote directories; you have to right-click on a file, then select ‘upload’ to copy it to the currently-displayed remote directory. Do not attempt a transfer while the remote MicroPython program is running; hit the ‘stop’ button first.

In the case of CircuitPython, you need to create a subdirectory called adafruit_esp32spi, containing adafruit_esp32spi.py and adafruit_esp32socket.py.

Server code

To accommodate the differences between the two MicroPython versions, I have created an ESP32 class, with functions for connecting to the wireless network, and handling TCP sockets; it is just a thin wrapper around the MicroPython functions which send SPI commands to the ESP32, and read the responses.

Connecting to the WiFi network just requires a network name (SSID) and password; all the complication is handled by the ESP32. Then a socket is opened to receive the HTTP requests; this is normally on port 80.

def start_server(self, port):
    self.server_sock = picowireless.get_socket()
    picowireless.server_start(port, self.server_sock, 0)

There are significant differences between conventional TCP sockets, and those provided by the ESP32; there is no ‘bind’ command, and the client socket is obtained by a strangely-named ‘avail_server’ call, which also returns the data length for a client socket – a bit confusing. This is a simplified version of the code:

def get_client_sock(self, server_sock):
    return picowireless.avail_server(server_sock)

def recv_length(self, sock):
    return picowireless.avail_server(sock)

def recv_data(self, sock):
    return picowireless.get_data_buf(sock)

def get_http_request(self):
    self.client_sock = self.get_client_sock(self.server_sock)
    client_dlen = self.recv_length(self.client_sock)
    if self.client_sock != 255 and client_dlen > 0:
        req = b""
        while len(req) < client_dlen:
            req += self.recv_data(self.client_sock)
        request = req.decode("utf-8")
        return request
    return None

When the code runs, the IP address is printed on the console

Connecting to testnet...
WiFi status: connecting
WiFi status: connected
Server socket 0, 10.1.1.11:80

Entering the IP address (10.1.1.11 in the above example) into a Web browser means that the server receives something like the following request:

GET / HTTP/1.1
Host: 10.1.1.11
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Most of this information is irrelevant to a tiny Web server, since there is little choice over the information it returns. The first line has the important information, namely the resource that is being requested, so instead of decoding the whole message, we can do simple tests to match the line to known resources:

DIRECTORY = "/"
INDEX_FNAME   = "rpscope.html"
DATA_FNAME    = "data.csv"
ICON_FNAME    = "favicon.ico"

DISABLE_CACHE = "Cache-Control: no-cache, no-store, must-revalidate\r\n"
DISABLE_CACHE += "Pragma: no-cache\r\nExpires: 0\r\n"

req = esp.get_http_request()
if req:
    r = req.split("\r")[0]
    if ICON_FNAME in r:
        esp.put_http_404()
    elif DATA_FNAME in r:
        esp.put_http_file(DIRECTORY+DATA_FNAME, "text/csv", DISABLE_CACHE)
    else:
        esp.put_http_file(DIRECTORY+INDEX_FNAME)

Since we are dealing with live data, that may change every time it is fetched, the browser’s caching mechanism must be disabled, hence the DISABLE_CACHE response, which aims to do so regardless of which browser version is in use.

Sending the response back to the browser should be easy, it just needs to be split into chunks of maximum 4095 bytes so as to not overflow the SPI buffers. However I had problems with unreliability of both the MicroPython and CircuitPython implementations; sometimes the network transfers would just stall. The solution seems to be to drastically reduce the SPI block size; some CircuitPython code uses 64-byte blocks, but I’ve found 128 bytes works OK. Further work is needed to establish the source of the problem, but this workaround is sufficient for now.

MAX_SPI_DLEN = const(128)
HTTP_OK = "HTTP/1.1 200 OK\r\n"
CONTENT_LEN = "Content-Length: %u\r\n"
CONTENT_TYPE = "Content-type %s\r\n"
HEAD_END = "\r\n"

def put_http_file(self, fname, content="text/html; charset=utf-8", hdr=""):
    try:
        f = open(fname)
    except:
        f = None
    if not f:
        esp.put_http_404()
    else:
        flen = os.stat(fname)[6]
        resp = HTTP_OK + CONTENT_LEN%flen + CONTENT_TYPE%content + hdr + HEAD_END
        self.send_data(self.client_sock, resp)
        n = 0
        while n < flen:
            data = f.read(MAX_SPI_DLEN)
            self.send_data(self.client_sock, data)
            n += len(data)
        self.send_end(self.client_sock)

Dynamic web server

A simple Web server could just receive a page request from a browser, match it with a file in the Pico filesystem, and return the page text to the browser. However, I’d like to report back some live data that has been captured by the Pico, so we need a method to return dynamically-changing values.

There are three main ways of doing this; server-side includes (SSI), AJAX, and on-demand page creation.

Server-side includes

A Web page that is stored in the server filesystem may include tags that trigger the server to perform specific actions, for example when the tag ‘$time’ is reached, the server replaces that text with the current time value. A slightly more sophisticated version embeds the tag in an HTML comment, so the page can be displayed without a Pico server, albeit with no dynamic data.

The great merit of this approach is its simplicity, and I used it extensively in my early projects. However, there is one major snag; the data is embedded in an HTML page, so is difficult to extract. For example, you may have a Web page that contains the temperature data for a 24-hour period, and you want to aggregate that into weekly and monthly reports; you could write a script that strips out the HTML and returns pure data, but it’d be easier if the Web server could just provide a data file for each day.

AJAX

Web pages routinely include Javascript code to perform various display functions, and one of these functions can fetch a data file from the server, and display its contents. This is commonly known as AJAX (Asynchronous Javascript and XML) though in reality there is no necessity for the data to be in XML format; any format will do.

For example, to display a graph of daily temperatures, the Browser loads a Web page with the main formatting, and Javascript code that requests a comma-delimited (CSV) data file. The server prepares that file dynamically using the current data, and returns it to the browser. The Javascript on the browser decodes the data, and displays it as a graph; it can also perform calculations on the data, such as reporting minimum and maximum values.

The key advantage is that the data file can be made available to any other applications, so a logging application can ignore all the Javascript processing, and just fetch the data file directly from the server.

With regard to the data file format, I prefer not to use XML if I can possibly avoid it, so use Javascript Object Notation (JSON) for structured data, and comma-delimited (CSV) for unstructured values, such as data tables.

The first ‘A’ in AJAX stands for Asynchronous, and this deserves some explanation. When the Javascript fetches the data file from the server, there will be a delay, and if the server is heavily loaded, this might be a substantial delay. This could result in the code becoming completely unresponsive, as it waits for data that may never arrive. To avoid this, the data fetching function XMLHttpRequest() returns immediately, but with a callback function that is triggered when the data actually arrives from the server – this is the asynchronous behaviour.

There is now a more modern approach using a ‘fetch’ function that substitutes a ‘promise’ for the callback, but the net effect is the same; keeping the Javascript code running while waiting for data to arrive from the server.

On-demand page creation

The above two techniques rely on the Web page being stored in a filesystem on the server, but it is possible for the server code to create a page from scratch every time it is requested.

Due to the complexity of hand-coding HTML, this approach is normally used with page templates, that are converted on-the-fly into HTML for the browser. However, a template system would add significant complexity to this simple demonstration, so I have used the hand-coding approach to create a basic display of analog input voltages, as shown below.

This data table is created from scratch by the server code, every time the page is loaded:

ADC_PINS = 26, 27, 28
ADC_SCALE = 3.3 / 65536
TEST_PAGE = '''<!DOCTYPE html><html>
    <head><style>table, th, td {border: 1px solid black; margin: 5px;}</style></head>
    <body><h2>Pi Pico web server</h2>%s</body></html>'''

adcs = [machine.ADC(pin) for pin in ADC_PINS]

heads = ["GP%u" % pin for pin in ADC_PINS]
vals = [("%1.3f" % (adc.read_u16() * ADC_SCALE)) for adc in adcs]
th = "<tr><th>" + "</th><th>".join(heads) + "</th></tr>"
tr = "<tr><td>" + "</td><td>".join(vals) + "</td></tr>"
table = "<table><caption>ADC voltages</caption>%s</table>" % (th+tr)
esp.put_http_text(TEST_PAGE % table)

Even in this trivial example, there is significant work in ensuring that the HTML tags are nested correctly, so for pages with any degree of complexity, I’d normally use the AJAX approach as described earlier.

Running the code

The steps are:

Connect the wireless module
Load the appropriate UF2 file
In the case of CircuitPython, load the add-on libraries
Get the Web server Python code from Github here. The file rp_esp32.py is for MicroPython with Pimoroni Libraries, and rp_esp32_cp.py is for CircuitPython.
Edit the top of the server code to set the network name (SSID) and password for your WiFi network.
Run the code using Thonny or any other MicroPython REPL interface; the console should show something like:

Connecting to testnet...
WiFi status: connecting
WiFi status: connected
Server socket 0, 10.1.1.11:80

Run a Web browser, and access test.html at the given IP address, e.g. 10.1.1.11/test.html. The console should indicate the socket number, data length, the first line of the HTTP request, and the type of request, e.g.

Client socket 1 len 466: GET /test.html HTTP/1.1 [test page]

The browser should show the voltages of the first 3 ADC channels, e.g.

The Web pages produced by the MicroPython and CircuitPython versions are very similar; the only difference is in the table headers, which either reflect the I/O pin numbers, or the analog channel numbers.

If a file called ‘index.html’ is loaded into the root directory of the MicroPython filesystem, it will be displayed in the browser by default, when no filename is entered in the browser address bar. A minimal index page might look like:

<!doctype html><html><head></head>
  <body>
    <h2>Pi Pico web server</h2>
    <a href="test.html">ADC test</a>
  </body>
</html>

So far, I have only presented very simple Web pages; in the next post I’ll show how to fetch high-speed analog samples using DMA, then combine these with a more sophisticated AJAX functionality to create a Web-based oscilloscope.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

RP2040 WiFi using Microchip ATWINC1500 module: part 2

Server sockets

In part 1, we got as far as connecting an ATWINC1500 or 1510 module to a WiFi network; now it is time to do something vaguely useful with it.

Sockets

A network interface is frequently referred to as a ‘socket’ interface, so first I’d better define what that is.

A socket is a logical endpoint for network communication, and consists of an IP address, and a port number. The IP address is often assigned automatically at boot-time, using Dynamic Host Configuration Protocol (DHCP) as in part 1, but can also be pre-programmed into the unit (a ‘static’ address).

The 16-bit port number further subdivides the functionality within that IP address, so one address can support multiple simultaneous conversations (‘connections’). Furthermore, specific port numbers below 1024 are generally associated with specific functions; for example, an un-encrypted Web server normally uses port 80, whilst an encrypted server is on port 443. Port numbers 1024 and above are generally for user programs.

Clients and servers, UDP and TCP

A sever runs all the time, waiting for a client to contact it; the client is responsible for initiating the contact and providing some data, probably in the form of a request. The server returns an appropriate response, then either side may terminate the connection; the client may end it because it has received enough data, or the server because there are limits on the maximum number of simultaneous clients it can service.

There are 2 fundamental communication methods in TCP/IP: User Datagram Protocol (UDP) and Transmission Control Protocol (TCP).

UDP is the simpler of the two, and involves sending a block of data to a given socket (port and IP address) with no guarantee that it will arrive. TCP involves sending a stream of data to a socket; it includes sophisticated retry mechanisms to ensure that the data arrives.

There are those in the networking community who shun UDP, because they think the unreliability makes is useless; I disagree, and think there are various use-cases where the simple block-based transfer is perfectly adequate, possibly overlaid with a simple retry mechanism, so we’ll start with a simple UDP server.

UDP server

The simplest UDP server is stateless, i.e. it doesn’t store any information about the client; it just responds to any request it receives. This means that a single socket can handle multiple clients, unlike TCP which requires a unique socket for each client it is communicating with.

For a classic C socket interface, the steps would be:

Create a datagram socket using socket()
Bind to the socket to a specific port using bind()
When a message is received on that port, get the data, return address and port number using recvfrom()
Send response data to the remote address and port number using sendto()
Go to step 3

The code driving the ATWINC1500 module does the same job, but the function calls are a bit different, as they reflect the messages sent to & received from the WiFi module:

Initialise a socket structure for UDP
Send a BIND command to the module, with the port number
Receive a BIND response
Send a RECVFROM command to the module
Wait until a RECVFROM response is received, get the data, return address & port number
Send a SENDTO command to the module with the response data, return address & port number
Go to step 4

Note that there may be a very long wait between steps 4 and 5, if there are no clients contacting the server. Fortunately the module will signal the arrival of a message by asserting the interrupt request (IRQ) line, so the RP2040 CPU can proceed with other tasks while waiting.

UDP Reception

There are 4 steps when the module receives a packet (‘datagram’) from a remote client:

Get the group ID and operation. This identifies the message type; for UDP it will generally be a response to a RECVFROM request, but it could be something completely different. My software combines the group ID and operation into a single 16-bit number.
Get the operation-specific header. This is generally 16 bytes or less, and in the case of RECVFROM, gives the IP address and port number of the sender, also a pointer & offset to the user data in the buffer.
Get the user data. The application doesn’t need to fetch all the incoming data; for example, in the case of a Web server, it might just get the first line of the page request, and discard all the other information.
Handle socket errors. If there is an error, the data length-value will be negative, and the code must take appropriate action, such as closing and re-opening the server socket. Since a UDP socket is connectionless, it generally won’t see many errors, but a TCP socket will flag an error every time a client closes an active connection.

For RECVFROM, the step 1 & 2 headers are:

// HIF message header
typedef struct {
    uint8_t gid, op;
    uint16_t len;
} HIF_HDR;

// Operation-specific header
typedef struct {
    SOCK_ADDR addr;
    int16_t dlen; // (status)
    uint16_t oset;
    uint8_t sock, x;
    uint16_t session;
} RECV_RESP_MSG;

Having fetched these two blocks, control is passed to a state machine that takes appropriate action. If we’ve just received an indication that DHCP has succeeded, then we bind the server sockets.

    if (gop==GOP_DHCP_CONF)
    {
        for (sock=MIN_SOCKET; sock<MAX_SOCKETS; sock++)
        {
            sp = &sockets[sock];
            if (sp->state==STATE_BINDING)
                put_sock_bind(fd, sock, sp->localport);
        }
    }

When we get a message indicating the binding has succeeded, then if it is a TCP socket, we need to send a LISTEN command. If UDP, we can just send a RECVFROM, and wait for data to arrive. We can tell whether the socket is TCP or UDP by looking at the socket number; the lower numbers are TCP, and higher are UDP.

    else if (gop==GOP_BIND && (sock=rmp->bind.sock)<MAX_SOCKETS &&
             sockets[sock].state==STATE_BINDING)
    {
        sock_state(sock, STATE_BOUND);
        if (sock < MIN_UDP_SOCK)
            put_sock_listen(fd, sock);
        else
            put_sock_recvfrom(fd, sock);
    }

If a UDP server, we may now get a RECVFROM response, indicating that a packet (‘datagram’) has arrived. If so, we save the return socket address (IP and port number), call a handler function, then send another RECVFROM request.

    else if (gop==GOP_RECVFROM && (sock=rmp->recv.sock)<MAX_SOCKETS &&
             (sp=&sockets[sock])->state==STATE_BOUND)
    {
        memcpy(&sp->addr, &rmp->recv.addr, sizeof(SOCK_ADDR));
        if (sp->handler)
            sp->handler(fd, sock, rmp->recv.dlen);
        put_sock_recvfrom(fd, sock);
    }

A very simple handler just echoes back the incoming data:

uint8_t databuff[MAX_DATALEN];

// Handler for UDP echo
void udp_echo_handler(int fd, uint8_t sock, int rxlen)
{
    if (rxlen>0 && get_sock_data(fd, sock, databuff, rxlen))
        put_sock_sendto(fd, sock, databuff, rxlen);
}

UDP client for testing

For testing, I use a simple UDP client written in Python, that can run on a Raspberry Pi, or any PC running Linux or Windows. It sends a message every second, and checks for a response. You’ll need to change the IP address to match the DCHP value given by the module.

# Simple Python client for testing UDP server
import socket, time

ADDR = "10.1.1.11"
PORT = 1025
MESSAGE = b"Test %u"
DELAY = 1

def hex_str(bytes):
    return " ".join([("%02x" % int(b)) for b in bytes])

print("Send to UDP %s:%s" % (ADDR, PORT))
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(0.2)
count = 1
while True:
    msg = MESSAGE % count
    sock.sendto(msg, (ADDR, PORT))
    print("Tx %u: %s" % (len(msg), hex_str(msg)))
    count += 1
    try:
        data = sock.recvfrom(1000)
    except:
        data = None
    if data:
        bytes, addr = data
        s = hex_str(bytes)
        print("Rx %u: %s\n" % (len(bytes), s))
    time.sleep(DELAY)

TCP server

A TCP connection is more complex than UDP, since the module firmware must keep track of the data that is sent & received, in order to correct any errors. The steps for a classic C socket interface would be:

Create a ‘stream’ socket using socket()
Bind to the socket to a specific port using bind()
Set the socket to wait for incoming connections using listen()
When a connection request is received on the main socket, open a new socket for the data using accept()
When data arrives on the new socket, get it using recv()
Send response data using send()
If socket error or transfer complete, close socket. Otherwise go to step 5

The corresponding operations for the WiFi module are:

Initialise a socket structure for TCP
Send a BIND command to the module, with the port number
Receive a BIND response
Send a LISTEN command
Receive a LISTEN response
Receive an ACCEPT notification when a connection request arrives on the main socket. Save the new socket number.
Send a RECV command on the new socket.
Receive a RECV response when data arrives on the new socket
Send response data using SEND
Go to step 7, or close the new socket

TCP reception

The first step (binding a socket to a port number) is the same as for UDP, but then we send a LISTEN command, which activates the socket to receive incoming connections. When a client connects, we get an ACCEPT response containing 2 socket numbers; the first is the one that we used for the original BIND command, and the second is a new socket that will be used for the data transfer; we need to issue a RECV on this socket to get the user data.

    else if (gop==GOP_ACCEPT &&
             (sock=rmp->accept.listen_sock)<MAX_SOCKETS &&
             (sock2=rmp->accept.conn_sock)<MAX_SOCKETS &&
             sockets[sock].state==STATE_BOUND)
    {
        memcpy(&sockets[sock2].addr, &rmp->recv.addr, sizeof(SOCK_ADDR));
        sockets[sock2].handler = sockets[sock].handler;
        sock_state(sock2, STATE_CONNECTED);
        put_sock_recv(fd, sock2);
    }

When data is available, the RECV command will return, and we can call a data handler function, then send another RECV for more data. Alternatively, if the data length is negative, then there is an error, and the socket needs to be closed. This isn’t necessarily as bad as it sounds; the most common reason is that the client has closed the connection, and we just need to erase the 2nd socket for future use.

    else if (gop==GOP_RECV && (sock=rmp->recv.sock)<MAX_SOCKETS &&
            (sp=&sockets[sock])->state==STATE_CONNECTED)
    {
        if (sp->handler)
            sp->handler(fd, sock, rmp->recv.dlen);
        if (rmp->recv.dlen > 0)
            put_sock_recv(fd, sock);
    }

The TCP data handler is again a simple echo-back of the incoming data, but with an added complication: if the data length is negative, there has been an error. This isn’t as bad as it sounds; the most common error is that the client has closed the TCP connection, so the server must also close its data socket, to allow it to be re-used for a new connection.

// Handler for TCP echo
void tcp_echo_handler(int fd, uint8_t sock, int rxlen)
{
    if (rxlen < 0)
        put_sock_close(fd, sock);
    else if (rxlen>0 && get_sock_data(fd, sock, databuff, rxlen))
        put_sock_send(fd, sock, databuff, rxlen);
}

TCP client for testing

This is similar to the UDP client; it can run on a Raspberry Pi or PC, running Linux or Windows:

# Simple Python client for testing TCP server
import socket, time

ADDR = "10.1.1.11"
PORT = 1025
MESSAGE = b"Test %u"
DELAY = 1

def hex_str(bytes):
    return " ".join([("%02x" % int(b)) for b in bytes])

print("Send to TCP %s:%s" % (ADDR, PORT))
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(0.5)
sock.connect((ADDR, PORT))
count = 1
while True:
    msg = MESSAGE % count
    sock.sendall(msg)
    print("Tx %u: %s" % (len(msg), hex_str(msg)))
    count += 1
    try:
        data = sock.recv(1000)
    except:
        data = None
    if data:
        s = hex_str(data)
        print("Rx %u: %s\n" % (len(data), s))
    time.sleep(DELAY)
# EOF

Source files

The C source files are in the ‘part2’ directory on Github here

The default network name and passphrase are “testnet” and “testpass”; these must be changed to match your network, then the code will need to be rebuilt & run using the standard Pico devlopment environment.

The default TCP & UDP port numbers are 1025, and the Python programs I’ve provided can be used to perform simple simple echo tests, providing the IP address is modified to match that given when the Pico joins the network.

python tcp_tx.py
Send to TCP 10.1.1.11:1025
Tx 6: 54 65 73 74 20 31
Rx 6: 54 65 73 74 20 31

Tx 6: 54 65 73 74 20 32
Rx 6: 54 65 73 74 20 32
..and so on..

python udp_tx.py
Send to UDP 10.1.1.11:1025
Tx 6: 54 65 73 74 20 31
Rx 6: 54 65 73 74 20 31

Tx 6: 54 65 73 74 20 32
Rx 6: 54 65 73 74 20 32
..and so on..

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

RP2040 WiFi using Microchip ATWINC1500 module

Part 1: joining a network

The Raspberry Pi Pico is an incredibly useful low-cost micro-controller module based on the RP2040 CPU, but at the time of writing, there is a major omission: there is no networking capability.

This project adds low-cost wireless networking to the Pi Pico, and any other RP2040 boards. The There are various modules on the market that could be used for this purpose; I have chosen the Microchip ATWINC1500 or 1510 modules as they low-cost, have an easy hardware interface (4-wire SPI), and feature a built-in TCP/IP software stack, which significantly reduces the amount of software needed on the RP2040.

The photo above shows the module mounted on an Adafruit breakout board, and the module itself; this is the variant with a built-in antenna, but there is also a version with an antenna connector, that allows an external antenna to be used.

The only difference between the ATWINC1500 and 1510 modules is that the latter have larger flash memory size (1 MB, as opposed to 0.5 MB). There is also an earlier series of low-level interface modules named ATWILC; I’m not using them, as the built-in TCP/IP software of the ATWINC saves a lot of code complication on the RP2040.

Hardware connections

For simplicity, I have used the Adafruit breakout board, but it is possible to directly connect the module to the Pico, powered from its 3.3V supply.

Wiring Pico to Adafruit WINC1500 breakout

Pi Pico pins
SCK     18     SPI clock
MOSI    19     SPI data out
MISO    16     SPI data in
CS      17     SPI chip select
WAKE    20     Module wake
EN      20     Module enable
RESET   21     Module reset
IRQ     22     Module interrupt request

No extra components are needed, if the wiring to the module is kept short, i.e. 3 inches (76 mm).

SPI on the RP2040

Initialising the SPI interface on the RP2040 just involves a list of API function calls:

#define SCK_PIN     18
#define MOSI_PIN    19
#define MISO_PIN    16
#define CS_PIN      17
#define WAKE_PIN    20
#define RESET_PIN   21
#define IRQ_PIN     22

// Initialise SPI interface
void spi_setup(int fd)
{
    stdio_init_all();
    spi_init(SPI_PORT, SPI_SPEED);
    spi_set_format(SPI_PORT, 8, SPI_CPOL_0, SPI_CPHA_0, SPI_MSB_FIRST);
    gpio_init(MISO_PIN);
    gpio_set_function(MISO_PIN, GPIO_FUNC_SPI);
    gpio_set_function(CS_PIN,   GPIO_FUNC_SIO);
    gpio_set_function(SCK_PIN,  GPIO_FUNC_SPI);
    gpio_set_function(MOSI_PIN, GPIO_FUNC_SPI);
    gpio_init(CS_PIN);
    gpio_set_dir(CS_PIN, GPIO_OUT);
    gpio_put(CS_PIN, 1);
    gpio_init(WAKE_PIN);
    gpio_set_dir(WAKE_PIN, GPIO_OUT);
    gpio_put(WAKE_PIN, 1);
    gpio_init(IRQ_PIN);
    gpio_set_dir(IRQ_PIN, GPIO_IN);
    gpio_pull_up(IRQ_PIN);
    gpio_init(RESET_PIN);
    gpio_set_dir(RESET_PIN, GPIO_OUT);
    gpio_put(RESET_PIN, 0);
    sleep_ms(1);
    gpio_put(RESET_PIN, 1);
    sleep_ms(1);
}

When using the standard SPI transfer API function, I found that occasionally the last data bit wasn’t being received correctly. The reason was that the API function returns before the transfer is complete; the clock signal is still high, and needs to go low to finish the transaction. To fix this, I inserted a loop that waits for the clock to go low, before negating the chip-select line.

// Do SPI transfer
int spi_xfer(int fd, uint8_t *txd, uint8_t *rxd, int len)
{
    gpio_put(CS_PIN, 0);
    spi_write_read_blocking(SPI_PORT, txd, rxd, len);
    while (gpio_get(SCK_PIN)) ;
    gpio_put(CS_PIN, 1);
}

Interface method

The WiFi module has its own processor, running proprietary code; it is supplied with a suitable binary image already installed, so will start running as soon as the module is enabled.

The module has a Host Interface (HIF) that the Pico uses for all communications; it is a Serial Peripheral Interface (SPI) that consists of a clock signal, incoming & outgoing data lines (MOSI and MISO), and a Chip Select, also known as a Chip Enable. The Pico initiates and controls all the HIF transfers, but the module can request a transfer by asserting an Interrupt Request (IRQ) line.

The module is powered up by asserting the ‘enable’ line, then briefly pulsing the reset line. This ensures that there is a clean startup, without any complications caused by previous settings.

There are 2 basic methods to transfer data between the PICO and the module; simple 32-bit configuration values can be transferred as register read/write cycles; there is a specific format for these, which includes an acknowledgement that a write cycle has succeeded. The following logic analyser trace shows a 32-bit value of 0x51 being read from register 0x1070; the output from the CPU is MOSI, and the input from the module is MISO.

Now the corresponding write cycle, where the CPU is writing back a value of 0x51 to the same 32-bit register.

There are a few unusual features about these transfers.

The chip-select (CS) line doesn’t have to be continuously asserted during the transfer, it need only be asserted whilst a byte is actually being read or written.
The command value is CA hex for a read cycle, and C9 for a write.
The module echoes back the command value plus 2 bytes for a read (CA 00 F3), or plus 1 byte for a write (C9 00), to indicate it has been accepted.
The register address is 24-bit, big-endian (most significant byte first)
The data value is 32-bit, little-endian in the read cycle (51 00 00 00), and big-endian in the write cycle (00 00 00 50).

The last point is quite remarkable, and when starting on the code development, I had great difficulty believing it could be true. The likely reason is that the SPI transfer is is big-endian as defined in the Secure Digital (SD) card specification, but the CPU in the module is little-endian. So the firmware has to either do a byte-swap on every response message, or return everything using the native byte-order, with this result.

In addition to reading & writing single-word registers, the software must read & write blocks of data. This involves some negotiation with the module firmware, since that manages the allocation & freeing of the necessary storage space in the module. For example, the procedure for a block write is:

Request a buffer of the required size
Receive the address of the buffer from the module
Write one or more data blocks to the buffer
Signal that the transfer is complete

Reading is similar, except that the first step isn’t needed, as the buffer is already available with the required data.

Operations

The above transfer mechanism is used to send commands to the module, and receive responses back from it; there is generally a one-to-one correspondence between the command and response, but there may be a significant delay between the two. For example, the ‘receive’ command requests a data block that has been received over the network, but if there is none, there will be no response, and the command will remain active until something does arrive.

The commands are generally referred to as ‘operations’, and they are split into groups:

Main
Wireless (WiFi)
Internet Protocol (IP)
Host Interface (HIF)
Over The Air update (OTA)
Secure Socket Layer (SSL)
Cryptography (Crypto)

Each operation is assigned a number, and there is some re-use of numbers within different groups, for example a value of 70 in the WiFi group is used to enable Acess Point (AP) mode, but the same value in the IP group is a socket receive command. To avoid this possible source of confusion, my code combines the group and operation into a single 16-bit value, e.g.

// Host Interface (HIF) Group IDs
#define GID_MAIN        0
#define GID_WIFI        1
#define GID_IP          2
#define GID_HIF         3

// Host Interface operations with Group ID (GID)
#define GIDOP(gid, op) ((gid << 8) | op)
#define GOP_STATE_CHANGE    GIDOP(GID_WIFI, 44)
#define GOP_DHCP_CONF       GIDOP(GID_WIFI, 50)
#define GOP_CONN_REQ_NEW    GIDOP(GID_WIFI, 59)
#define GOP_BIND            GIDOP(GID_IP,   65)
..and so on..

To invoke an operation on the module, you must first send a 4-byte header that gives an 8-bit operation number, 8-bit group, and 16-bit message length.

typedef struct {
    uint8_t gid, op;
    uint16_t len;
} HIF_HDR;

The next 4 bytes of the message are unused, so can either be sent as zeros, or just skipped. Then there is the command header, which varies depending on the operation being performed, but are often 16 bytes or less, for example the IP ‘bind’ command:

// Address field for socket, network order (MSbyte first)
typedef struct {
    uint16_t family, port;
    uint32_t ip;
} SOCK_ADDR;

// Socket bind command, 12 bytes
typedef struct {
    SOCK_ADDR saddr;
    uint8_t sock, x;
    uint16_t session;
} BIND_CMD;

I’ll be discussing the IP operations in detail in the next part.

The interrupt request (IRQ) line is pulled low by the module to indicate that a response is available; for simplicity, my code polls this line, and calls an interrupt handler.

if (read_irq() == 0)
    interrupt_handler();

Joining a network

I’ll start with the most common use-case; joining a network that uses WiFi Protected Access (WPA or WPA2), and obtaining an IP address using Dynamic Host Configuration Protocol (DHCP). This is remarkably painless, since the module firmware does all of the hard work, but first we have to tackle the issue of firmware versions.

As previously explained, the module comes pre-loaded with firmware; at the time of writing, this is generally version 19.5.2 or 19.6.1. There is a provision for re-flashing the firmware to the latest version, but for the time being I’d like to avoid that complication, so the code I’ve written is compatible with both versions.

The reason that this matters is that 19.6.1 introduced a new method for joining a network, with a new operation number (59, as opposed to 40). Fortunately the newer software can still handle the older method, so that is what I’ll be using by default, though there is a compile-time option to use the new one, if you’re sure the module has the newer firmware.

The code to join the network is remarkably brief, just involving some data preparation, then calling a host interface transfer function to send the data. It searches across all channels to find a signal that matches the given Service Set Identifier (SSID, or network name). A password string (WPA passphrase) is also given; if this is a null value, the module will attempt to join an ‘open’ (insecure) network, but there are very obvious security risks with this, so it is not recommended.

// Join a WPA network, or open network if null password
bool join_net(int fd, char *ssid, char *pass)
{
#if NEW_JOIN
    CONN_HDR ch = {pass?0x98:0x2c, CRED_STORE, ANY_CHAN, strlen(ssid), "",
                   pass?AUTH_PSK:AUTH_OPEN, {0,0,0}};
    PSK_DATA pd;

    strcpy(ch.ssid, ssid);
    if (pass)
    {
        memset(&pd, 0, sizeof(PSK_DATA));
        strcpy(pd.phrase, pass);
        pd.len = strlen(pass);
        return(hif_put(fd, GOP_CONN_REQ_NEW|REQ_DATA, &ch, sizeof(CONN_HDR),
               &pd, sizeof(PSK_DATA), sizeof(CONN_HDR)));
    }
    return(hif_put(fd, GOP_CONN_REQ_NEW, &ch, sizeof(CONN_HDR), 0, 0, 0));
#else
    OLD_CONN_HDR och = {"", pass?AUTH_PSK:AUTH_OPEN, {0,0}, ANY_CHAN, "", 1, {0,0}};

    strcpy(och.ssid, ssid);
    strcpy(och.psk, pass ? pass : "");
    return(hif_put(fd, GOP_CONN_REQ_OLD, &och, sizeof(OLD_CONN_HDR), 0, 0, 0));
#endif
}

Running the code

There are 3 source files in the ‘part1’ directory on Github here:

winc_pico_part1.c: main program, with RP2040-specific code
winc_wifi.c: module interface
winc_wifi.h: module interface definitions

The default network name and passphrase are “testnet” and “testpass”; these will have to be changed to match your network.

Normally I’d provide a simple Pi command-line to compile & run the files, but this is considerably more complex on the Pico; you’ll have to refer to the official documentation for setting up the development tools. I’ve provided a simple cmakelists file, that may need to be altered to suit your environment.

There is a compile-time ‘verbose’ setting, which regulates the amount of diagnostic information that is displayed on the console (serial link). Level 1 shows the following:

Firmware 19.5.2, OTP MAC address F8:F0:05:xx.xx.xx
Connecting...........
Interrupt gid 1 op 44 len 12 State change connected
Interrupt gid 1 op 50 len 28 DHCP conf 10.1.1.11 gate 10.1.1.101

[or if the network can't be found]
Interrupt gid 1 op 44 len 12 State change fail

Verbose level 2 lists all the register settings as well, e.g.

Rd reg 1000: 001003a0
Rd reg 13f4: 00000001
Rd reg 1014: 807c082d
Rd reg 207bc: 00003f00
Rd reg c000c: 00000000
Rd reg c000c: 10add09e
Wr reg 108c: 13521330
Wr reg 14a0: 00000102
..and so on..

Level 3 also includes hex dumps of the data transfers.

Socket interface

Part 2 describes the socket interface, with TCP and UDP servers here.

Copyright (c) Jeremy P Bentham 2021. Please credit this blog if you use the information or software in it.

Zerowi bare-metal WiFi driver part 6: joining a network

It has been a long & complicated journey to get to this point, and you might be thinking that it’ll take a lot more effort to join a network, particularly if it involves WPA security.

However, this isn’t true; because we’re dealing with an intelligent network interface, all the complexities of WPA are handled by the on-chip firmware, so it only takes a few simple IOCTL commands to join a secure network.

For the purposes of this blog, I’m assuming there is an existing infrastructure network with an Access Point (AP) using a Service Set Identifier (SSID) of ‘testnet’, and a WPA or WPA2 password of ‘testpass’ – this is purely for test purposes, and must be changed.

IOCTL calls

IOCTLs were described in detail in the previous part, but to recap: there are over 300 possible calls, covering all aspects of the configuration and monitoring of the network interface. Some calls can take a significant amount of time to process, so the host CPU has to poll the SDIO interface, checking for a response.

In the previous part, I put a ‘while’ loop round some of the calls to account for this delay, but this was getting rather messy, so I’ve expanded the IOCTL calls to include a millisecond retry time; if this is non-zero, the code will keep polling for a response until the time is exceeded.

An example of this is the way the country is set:

    if (!ioctl_set_data("country", 100, &country_struct, sizeof(country_struct)))
        printf("Can't set country\n");

After sending the command, the code waits for up to 100 milliseconds for a response, returning its length; zero if there was an error or no response.

There isn’t a standard data format for all the IOCTLS, so it is very easy to have a mismatch between the command & data; ideally we’d attach a diagnostic function to each call, to report any errors. To do this, I’ve created a simple macro to invoke the IOCTL function, with token-pasting to report the function name and first argument in the event of an error. For example, the code to set the pre-shared key is:

#define CHECK(f, a, ...) {if (!f(a, __VA_ARGS__)) \
                          printf("Error: %s(%s ...)\n", #f, #a);}

CHECK(ioctl_wr_data, WLC_SET_WSEC_PMK, 0, &wsec_pmk, sizeof(wsec_pmk));

When all is OK, this acts the same as:

ioctl_wr_data(WLC_SET_WSEC_PMK, 0, &wsec_pmk, sizeof(wsec_pmk));

If the data is incorrect, for example the key string is too short, an error will be reported on the console at run-time:

Error: ioctl_wr_data(WLC_SET_WSEC_PMK ...)

I’ve found this really helpful in tracking down IOCTL programming errors and timing issues.

Joining an insecure network

We’ll start with a simple test network that has security disabled; at the risk of stating the obvious, this mode presents a major security risk, so should be used with great caution.

All we need to set is a two-letter country, and an SSID (network name), which are used to populate the IOCTL structures, e.g.

#define SSID            "testnet"
#define COUNTRY         "GB"
#define COUNTRY_REV     -1
#define SECURITY        0  // Zero to disable security
wlc_ssid_t ssid={sizeof(SSID)-1, SSID};
wl_country_t country_struct = {.ccode=COUNTRY, .country_abbrev=COUNTRY, .rev=COUNTRY_REV};

The country defines which radio channels may be used. A few countries have multiple revisions of their channel settings; to use the default revision, a value of -1 is entered.

Joining the network just involves setting ‘infrastructure’ mode, disabling the various forms of security, then sending a ‘SET_SSID’ IOCTL command to join the network:

    CHECK(ioctl_wr_int32, WLC_SET_INFRA, 50, 1);
    CHECK(ioctl_wr_int32, WLC_SET_AUTH, 0, 0);
    CHECK(ioctl_wr_int32, WLC_SET_WSEC, 0, SECURITY);
    CHECK(ioctl_wr_int32, WLC_SET_WPA_AUTH, 0, 0);
    ioctl_enable_evts(join_evts);
    CHECK(ioctl_wr_data, WLC_SET_SSID, 100, &ssid, sizeof(ssid));

The joining process takes a few seconds, then success or failure is signalled by asynchronous events, hence the ‘enable_evts’ line – this specifies which events we’d like to receive when the device is connecting, and after it has connected. The event messages are described below.

Joining a secure network

There are IOCTL commands covering a wide range of security options; I’ve tested WPA-TKIP and WPA2 in pre-shared key (PSK) mode as they are in widespread use.

The settings aren’t complex; the hard work is done by the chip firmware. I’ve used a compile-time definition to select the security mode:

// Security settings: 0 for none, 1 for WPA_TKIP, 2 for WPA2
// The hard-coded password is for test purposes only!!!
#define SECURITY        2
#define PASSPHRASE      "testpass"
wsec_pmk_t wsec_pmk = {sizeof(PASSPHRASE)-1, WSEC_PASSPHRASE, PASSPHRASE};

#if SECURITY
CHECK(ioctl_wr_int32, WLC_SET_WSEC, 0, SECURITY==2 ? 6 : 2);
CHECK(ioctl_set_intx2, "bsscfg:sup_wpa", 0, 0, 1);
CHECK(ioctl_set_intx2, "bsscfg:sup_wpa2_eapver", 0, 0, -1);
CHECK(ioctl_set_intx2, "bsscfg:sup_wpa_tmo", 0, 0, 2500);
CHECK(ioctl_wr_data, WLC_SET_WSEC_PMK, 0, &wsec_pmk, sizeof(wsec_pmk));
CHECK(ioctl_wr_int32, WLC_SET_WPA_AUTH, 0, SECURITY==2 ? 0x80 : 4);
#endif

ioctl_enable_evts(join_evts);
CHECK(ioctl_wr_data, WLC_SET_SSID, 100, &ssid, sizeof(ssid));

Some of the IOCTL calls require two 32-bit integer arguments, the first being an interface number that is zero in this implementation. The function ‘set_intx2’ was created to simplify the handling of the 2 arguments:

// Set 2 integers in IOCTL variable
int ioctl_set_intx2(char *name, int wait_msec, int val1, int val2)
{
    int data[2] = {val1, val2};

    return(ioctl_cmd(WLC_SET_VAR, name, wait_msec, 1, data, 8));
}

Response when joining network

The status reports are in the form of asynchronous event messages; prior to joining, we did an ioctl_enable_events() call to specify which events we’re interested in receiving. This call also stores a string value for each enabled event, to simplify decoding.

The messages are obtained by polling the SDIO interface with a command 53. If one is available, the first 2 bytes are a (little-endian) length, the next 2 are the bitwise inverse of that length; otherwise the returned values are zero if there is no message.

My code first loads and decodes the 12-byte event header, then the remainder of the message. For example, this is the response if the Access Point (AP) doesn’t support the requested security scheme:

67 00 98 ff 10 01 00 0e 00 20 00 00
len=67 seq=10 chan=01 hdrlen=0E flow=00 credit=20

00 00 20 00 00 01 00 01 00 00 b8 27 eb 6b 3d 7c ba 27 eb 6b 3d 7c 88 6c 80 01 00 3f 00 00 10 18
00 01 00 02 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 07 61 79 54 65 6b 20
77 6c 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74 65 73 74 6e 65 74 00 00
dest=B827EB6B3D7C srce=BA27EB6B3D7C type=886C sub=8001 len=3F oui=1018 usr=01
ver=02 flags=00 type=00 status=01 reason=00 auth=00 dlen=07 addr=617954656B20
SET_SSID FAIL

The total length (including the 12-byte header) is 67 hex, 103 decimal; the bitwise inverse of this is 98FF hex. Some care is needed when doing the decoding; the first part of the message is in the usual little-endian byte order, while the later parts are in network (big-endian) format. The most important part of the decode is the last line, giving the name of the event (0, SET_SSID) and the status code (1, FAIL).

It would be nice if all the responses were as easy to understand as this, but they can be a bit misleading. This is the response when the password is incorrect:

AUTH SUCCESS
LINK SUCCESS
SET_SSID SUCCESS
PSK_SUP PARTIAL
PSK_SUP PARTIAL
DEAUTH_IND SUCCESS

I don’t understand why so much success is reported, when there is an obvious error. In contrast, a successful network join looks like:

AUTH SUCCESS
LINK SUCCESS
SET_SSID SUCCESS
PSK_SUP UNSOLICITED

Another indication of success is that you may see some blocks of binary data, such as:

52 00 ad ff 14 02 00 0e 00 20 00 00
len=52 seq=14 chan=02 hdrlen=0E flow=00 credit=20

00 00 20 00 00 01 00 01 f5 00 ff ff ff ff ff ff 68 17 29 f6 b8 54 08 06 00 01 08 00 06 04 00 01
68 17 29 f6 b8 32 0a 01 01 cc 00 00 00 00 00 00 0a 01 01 c7 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
dest=FFFFFFFFFFFF srce=681729F6B854 type=806

The event fields haven’t been decoded because this isn’t really an event; it is a network frame. The all-1s destination address shows that it is a broadcast transmission; it is quite usual for a network to carry lots of broadcasts, that have to be decoded by all the attached systems, so the wireless interface is dutifully passing them on to our software for processing.

These network frames are a handy way of showing that our wireless interface is connected to a real network, but to understand them we’ll need to tackle the TCP/IP protocols that underpin all our network communications; that’ll be in the next part.

If you want to run the code so far, see the end of the previous part for instructions; there is a batch file ‘make_join.bat’ for Windows, and ‘make_join’ for Linux.

Interrupts

Thinking ahead to the TCP/IP stack, we’ll need a quick way of detecting when network data (a network ‘frame’) has arrived; continually sending out 12-byte SDIO data requests, in the hope of receiving something back, is really inefficient.

If you search the CYW43438 data sheet for the word ‘interrupt’, you’ll see that SDIO data line 1 also acts as an interrupt line – there is no separate pin for that function. This works because generally the 4 data lines are just floating (pulled high by resistors); they’re only driven high or low during block transfers. So while they are idle, the wireless chip can pull one of them (DATA1) low to request a data transfer; as soon as the transfer starts, the line reverts to carrying data bits.

It is quite easy to set this up, just write to the Interrupt Enable register:

    sdio_cmd52_writes(SD_FUNC_BUS, BUS_INTEN_REG, 0x07, 1);

The value of 7 includes a master interrupt enable, plus enabling functions 1 & 2. [I’m not clear on what these 2 functions are: the network traffic seems to use function 1 only].

Handling the interrupt should just be a case of checking the I/O pin, then doing the necessary read cycle:

if (gpio_in(SD_D1_PIN) == 0)
{
    while ((n=ioctl_get_event(&ieh, eventbuff, sizeof(eventbuff))) > 0)
        ..process the network data..
}

Unfortunately this approach doesn’t work very well; the interrupt line has a tendency to remain low, causing lots of unnecessary attempts to get non-existent data. The reason is that the interrupt needs to be explicitly acknowledged, so that the I/O pin goes high again. The Broadcom driver handles this situation by disabling asynchronous messaging; when the interrupt pin is asserted, it does read & write cycles to the interrupt status register, to check for the interrupt, and acknowledge it. Here is my interpretation of this method, which seems to work well:

// After security is set up, enable async events
ioctl_enable_evts(join_evts);

// Start joining a network
CHECK(ioctl_wr_data, WLC_SET_SSID, 100, &ssid, sizeof(ssid));

// Enable interrupts
sdio_cmd52_writes(SD_FUNC_BUS, BUS_INTEN_REG, 0x07, 1);

// ..wait until the events show we have joined the network, then..

// Disable events
ioctl_enable_evts(no_evts);

// Fetch a network frame
if (gpio_in(SD_D1_PIN) == 0)
{
    sdio_bak_read32(SB_INT_STATUS_REG, &val);
    if (val & 0xff)
    {
        sdio_bak_write32(SB_INT_STATUS_REG, val);
        if ((n=ioctl_get_event(&ieh, eventbuff, sizeof(eventbuff))) > 0)
            ...process the network data...
    }
}

In the early stages of code development, I tend to use polling in place of interrupts, as it makes debugging much easier. When the code is debugged, I’ll switch to using real CPU interrupts: any I/O pin can be configured to trigger an interrupt, so the change shouldn’t be difficult.

[Overview] [Previous part]

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.

Zerowi bare-metal WiFi driver part 5: IOCTLs

It has been a long haul, but we are now getting close to doing something useful with the WiFi chip; we just need to tackle the issue of IOCTLs.

You may already be familiar with these from configuring a serial link, or network hardware; they provide a programming interface into a vendor-specific driver. Since the BCM/CTW43xxx chips are intelligent (they have their own CPU) the IOCTL calls are handled directly by the firmware we’ve programmed into the chip. So even though we’re in ‘bare metal’ mode, without an operating system, we still need to handle IOCTLs.

The IOCTL calls are listed in the wwd_wlioctl.h in WICED or WiFi Host Driver, or wlioctl_defs.h, and there are over 300 of them; this post will concentrate on the code that sends IOCTL requests and handles the responses, and we’ll check they are working by doing a quick network scan – more interesting things, like transmission & reception, will have to wait for the next part.

Message structure

When using IOCTL calls, you are essentially writing a data packet to the WiFi RAM, waiting for an acknowledgement, then reading back the response. As you’d expect, there is a specific data format for the requests and responses, though it does have some strange features:

#define IOCTL_MAX_DLEN  256

typedef struct {
    uint8_t  seq,       // sdpcm_sw_header
             chan,
             nextlen,
             hdrlen,
             flow,
             credit,
             reserved[2];
    uint32_t cmd;       // CDC header
    uint16_t outlen,
             inlen;
    uint32_t flags,
             status;
    uint8_t data[IOCTL_MAX_DLEN];
} IOCTL_CMD;

typedef struct {
    uint16_t len;
    uint8_t  reserved1,
             flags,
             reserved2[2],
             pad[2];
} IOCTL_GLOM_HDR;

The best feature of the IOCTL data is that it always starts with a 16-bit length word, followed by the bitwise inverse of that length (least-significant byte first). For example, here is the decode of a request to set a variable ‘bus:rxglom’ to a value of 1:

19.290643 * Cmd 53 A500002C Wr WLAN 08000 len 44
19.290669 * Rsp 53 00001000 Flags 10
  Data  44 bytes: 2b 00 d4 ff 00 00 00 0c 00 00 00 00 07 01 00 00 0f 00 00 00 02 00 02 00 00 00 00 00 62 75 73 3a 72 78 67 6c 6f 6d 00 01 00 00 00 00 *
 IOC_W  44 bytes: seq=0 chan=0 nextlen=0 hdrlen=C flow=0 credit=0 cmd=107 outlen=F inlen=0 flags=20002 status=0 set 'bus:rxglom'
19.290769   Ack 2F FF

You can check this is an IOCTL message by adding the first two bytes to the second two: 002B + FFD4 = FFFF. It uses a command 53 to send a 44-byte request (actually 43 bytes, rounded up to nearest 4-byte value) to the RAD function, containing a header of mostly zeros with an IOCTL number of 107 hex (263 decimal) to set a variable, a null-terminated variable name, then the binary value.

It is then necessary to poll the WiFi chip to check when the response is available, and if so, acknowledge it:

19.291055 * Cmd 53 15404004 Rd BAK  180000:A020 len 4
  Data   4 bytes: 40 00 80 00 *
19.291081 * Rsp 53 00001000 Flags 10
19.291179 * Cmd 53 95404004 Wr BAK  180000:A020 len 4
19.291205 * Rsp 53 00001000 Flags 10
  Data   4 bytes: 40 00 00 00 *
19.291259   Ack 28 3F

The value of 40 hex in backplane register 2020 (A020 for a 32-bit value) shows there is a response, which is acknowledged by writing 40 hex to that register, then the response is read:

19.291377 * Cmd 53 21000040 Rd WLAN 08000 len 64
19.291403 * Rsp 53 00001000 Flags 10
  Data  64 bytes: 2b 00 d4 ff 02 00 00 0c 00 11 00 00 07 01 00 00 0f 00 00 00 00 00 02 00 00 00 00 00 62 75 73 3a 72 78 67 6c 6f 6d 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *
 IOC_R  64 bytes: seq=2 chan=0 nextlen=0 hdrlen=C flow=0 credit=11 cmd=107 outlen=F inlen=0 flags=20000 status=0 set 'bus:rxglom'

The response is the same length as the request, and when writing to a variable it is largely a copy of the command. When receiving the response, it is important to check that it matches the request, as the two can easily get out of step. Unfortunately the sequence number can’t be used for this purpose (in this example the response is 2, and request is 0) instead the most-significant 16 bits of the ‘flags’ are a ‘request ID’ that should be the same for request & response, while the lower 16 bits are set to 2 for a write cycle, 0 for a read.

Equally strange is that the command length always seems to be the same as the response, so for a short command with a long response (such as ‘ver’) the command is 296 bytes long, just to carry a 3-character name. This is a bit crazy; sometime I’ll experiment with the header fields to see if there is a way round it.

Glom

I’ll admit this word wasn’t in my vocabulary until I encountered it in the WiFi drivers, and I’m still not entirely clear what it means. The transaction above sets ‘rxglom’ to 1, which enables ‘glom’ mode for incoming commands (‘rx’ refers to the WiFi chip command reception, not the host).

After this is set, another header is introduced into commands sent to the WiFi chip; I have accommodated this using another structure, and a union to cover both.

typedef struct {
    uint16_t len;
    uint8_t  reserved1,
             flags,
             reserved2[2],
             pad[2];
} IOCTL_GLOM_HDR;

typedef struct {
    IOCTL_GLOM_HDR glom_hdr;
    IOCTL_CMD  cmd;
} IOCTL_GLOM_CMD;

typedef struct
{
    uint16_t len,           // sdpcm_header.frametag
             notlen;
    union 
    {
        IOCTL_CMD cmd;
        IOCTL_GLOM_CMD glom_cmd;
    };
} IOCTL_MSG;

The good news is that the first 4 bytes of any message remain the same (16-bit length, and its bitwise inverse); the bad news is that the new header is shoehorned in after that, pushing the other headers out by 8 bytes.

Here is an example: getting ‘cur_ethaddr’ which is the 6-byte MAC address:

19.291837 * Cmd 53 A5000038 Wr WLAN 08000 len 56
19.291863 * Rsp 53 00001000 Flags 10
  Data  56 bytes: 38 00 c7 ff 34 00 00 01 00 00 00 00 01 00 00 14 00 00 00 00 06 01 00 00 14 00 00 00 00 00 03 00 00 00 00 00 63 75 72 5f 65 74 68 65 72 61 64 64 72 00 00 00 00 00 00 00 *
 IOC_W  56 bytes: seq=1 chan=0 nextlen=0 hdrlen=14 flow=0 credit=0 cmd=106 outlen=14 inlen=0 flags=30000 status=0 get 'cur_etheraddr'
19.291973   Ack 2F FF

I’m sure there must be some point to the extended header, but right now I’m not at all sure what it is. There doesn’t seem to be any official marker in the glom header to show that it has been included, which makes life difficult for any software attempting to decode the IOCTLs. For the time being, I’m hedging my bets by using a global variable to enable or disable this option, and leaving it disabled; hopefully its true purpose will be clear soon.

Partial data read

If the IOCTL command has a long response, and the software doesn’t read it all, the remainder will still be available for the next read. This can be demonstrated by the version (‘ver’) command; even though it is sent as a single 296-byte block, the Linux driver receives it as one block of 64 bytes, then another of 224:

19.295186 * Cmd 53 A5000128 Wr WLAN 08000 len 296
19.295212 * Rsp 53 00001000 Flags 10
  Data 296 bytes: 28 01 d7 fe 24 01 00 01 00 00 00 00 03 00 00 14 00 00 00 00 06 01 00 00 04 01 00 00 00 00 05 00 00 00 00 00 76 65 72 00 76 65 72 00 00 ..and so on..
 IOC_W 296 bytes: seq=3 chan=0 nextlen=0 hdrlen=14 flow=0 credit=0 cmd=106 outlen=104 inlen=0 flags=50000 status=0 get 'ver'
19.295583   Ack 2F FF
19.295980 * Cmd 52 00000A00 Rd BUS  00005
19.296006 * Rsp 52 00001002 Flags 10 data 02
19.296178 * Cmd 53 15404004 Rd BAK  180000:A020 len 4
  Data   4 bytes: 40 00 80 00 *
19.296204 * Rsp 53 00001000 Flags 10
19.296321 * Cmd 53 95404004 Wr BAK  180000:A020 len 4
19.296347 * Rsp 53 00001000 Flags 10
  Data   4 bytes: 40 00 00 00 *
19.296404   Ack 28 3F
19.296563 * Cmd 53 21000040 Rd WLAN 08000 len 64
19.296589 * Rsp 53 00001000 Flags 10
  Data  64 bytes: 20 01 df fe 05 00 00 0c 00 14 00 00 06 01 00 00 04 01 00 00 00 00 05 00 00 00 00 00 77 6c 30 3a 20 4f 63 74 20 32 33 20 32 30 31 37 20 30 33 3a 35 35 3a 35 33 20 76 65 72 73 69 6f 6e 20 37 2e *
 IOC_R  64 bytes: seq=5 chan=0 nextlen=0 hdrlen=C flow=0 credit=14 cmd=106 outlen=104 inlen=0 flags=50000 status=0 get 'wl0: Oct 23 2017 03:55:53 version 7.'
19.296841 * Cmd 53 210000E0 Rd WLAN 08000 len 224
19.296867 * Rsp 53 00001000 Flags 10
  Data 224 bytes: 34 35 2e 39 38 2e 33 38 20 28 72 36 37 34 34 34 32 20 43 59 29 20 46 57 49 44 20 30 31 2d 65 35 38 64 32 31 39 66 0a 00 00 ..and so on..

This serves to emphasise the important of reading all the data from every response, and checking that the Request ID matches that of the response; it’d be all to easy for the network driver to lose track.

Events

So far, we’ve dealt had a strict one-to-one matching between request and response, but how does the WiFi chip indicate when it has extra data available? For example, a single network scan may generate 10 or 20 data blocks (one for every access point), how does the host know when this data is available? There is mention of an interrupt pin (which we’ll save for a future blog) but how can the driver software check for data pending?

I puzzled over this for some time, on the assumption there must be a special register to indicate this, but in the end it seems that the driver just issues a normal data read; if data is available it can be recognised by the length header, if not zeros are returned.

The WiFi chip has a finite amount of buffer space to queue up such events; this is the ‘credit’ value in the IOCTL header; presumably the network driver should check this to see if events have been lost due to running out of buffers.

Network scan

Finally, we get to do something vaguely useful; scan for WiFi networks. There are 2 types: ‘iscan’ and ‘escan’. The first is an incremental scan, that seems easier to use, but is marked as ‘deprecated’ in some source code. The second is supposed to be more versatile (i.e. more complicated) but is the preferred option, so that is what we’ll be using.

We need to fill in a structure with the scan parameters; due to the large number of networks in the vicinity, I usually scan a single channel:

// WiFi channel number to scan (0 for all channels)
#define SCAN_CHAN       1

typedef struct {
    uint32_t version;
    uint16_t action,
             sync_id;
    uint32_t ssidlen;
    uint8_t  ssid[32],
             bssid[6],
             bss_type,
             scan_type;
    uint32_t nprobes,
             active_time,
             passive_time,
             home_time;
    uint16_t nchans,
             nssids;
    uint8_t  chans[14][2],
             ssids[1][32];
} SCAN_PARAMS;

SCAN_PARAMS scan_params = {
    .version=1, .action=1, .sync_id=0x1234, .ssidlen=0, .ssid={0}, 
    .bssid={0xff,0xff,0xff,0xff,0xff,0xff}, .bss_type=2, .scan_type=1, 
    .nprobes=~0, .active_time=~0, .passive_time=~0, .home_time=~0, 
#if SCAN_CHAN == 0
    .nchans=14, .nssids=0, 
    .chans={{1,0x2b},{2,0x2b},{3,0x2b},{4,0x2b},{5,0x2b},{6,0x2b},{7,0x2b},
      {8,0x2b},{9,0x2b},{10,0x2b},{11,0x2b},{12,0x2b},{13,0x2b},{14,0x2b}},
#else
    .nchans=1, .nssids=0, .chans={{SCAN_CHAN,0x2b}}, .ssids={{0}}
#endif
};

The scan is triggered by sending this data in an ‘escan’ IOCTL call, but first we must tell the chip that we’re interested in the response events. This is done by sending a very large bitfield, with a bit set for each event you want to receive; there are over 140 possible events, so you need to pick the right one. I got the list from whd_events_int.h which is part of the Cypress WiFi Host Driver project; if you don’t know what that is, please refer to part 1 of this blog, which describes all the resources I’m using.

So the code to trigger the scan becomes:

#define EVENT_ESCAN_RESULT  69
#define EVENT_MAX           160
#define SET_EVENT(e)        event_msgs[e/8] = 1 << (e & 7)
uint8_t event_msgs[EVENT_MAX / 8];

SET_EVENT(EVENT_ESCAN_RESULT);
ioctl_set_data("event_msgs", event_msgs, sizeof(event_msgs));
ioctl_set_data("escan", &scan_params, sizeof(scan_params));

Surprisingly easy, until we get back the results of the scan, which has one varying-length record for every WiFi access point found. There is a lot of data, around 300 to 600 bytes per record, so we need to do some heavyweight decoding.

Decoding the scan data

So far, I’ve avoided including any of the standard Cypress / Broadcom header files in my project. This is because any one header file often depends on another 2, which then depends on another 5, and so on… Quite rapidly, you’re including a large chunk of the Operating System which isn’t at all necessary; it just makes the decoding process much harder to follow.

Fortunately for this project, there is a way to avoid these major OS dependencies; use header files that were created for use in embedded systems, namely the Cypress WiFi Host Driver described in part 1 of this blog. Here are the structures that are needed for decoding the scan response data, and the files they’re in:

whd_types.h:
	whd_security, whd_scan_type, whd_bss_type, whd_802_11_band, whd_mac, whd_ssid, whd_bss_type, 
	whd_event_header [-> whd_event_msg], wl_bss_info
whd_events.h:
	whd_event_ether_header, whd_event_eth_hdr, whd_event_msg, whd_event
whd_wlioctl.h:
	wl_escan_result

Additional dependencies for these files are in:

cy_result.h, cyhal_hw_types.h, whd.h

So only 6 extra files need to be included at this stage, and we’ve avoided the unnecessary complexity of an Operating System interface – after all, this driver is supposed to be bare-metal code.

The code to print the MAC address, channel number and SSID (network name) is:

// Escan result event (excluding 12-byte IOCTL header)
typedef struct {
    uint8_t pad[10];
    whd_event_t event;
    wl_escan_result_t escan;
} escan_result;

escan_result *erp = (escan_result *)eventbuff;

n = ioctl_get_event(eventbuff, sizeof(eventbuff));
if (n > sizeof(escan_result))
{
    printf("%u bytes\n", n);
    disp_mac_addr((uint8_t *)&erp->event.whd_event.addr);
    printf(" %2u ", SWAP16(erp->escan.bss_info->chanspec));
    disp_ssid(&erp->escan.bss_info->SSID_len);
}

The scan result data fields are in network-standard byte-order (big endian) so the channel number needs to be byte-swapped.

Running the code

If you want to try out the code so far, you’ll need a Pi ZeroW with a USB-serial cable attached, the arm-none-eabi-gcc compiler and gdb debugger. You can find full details and a simple test program here; it is worth running this before attempting the Zerowi project.

The source code is at https://github.com/jbentham/zerowi, ‘make_scan.bat’ will create zerowi.elf on windows, which is downloaded into the target using the ‘run’ batch file. This executes alpha_speedup.py to accelerate the serial link from 115200 to 921600 baud, then runs Arm gdb using the setup commands in run.gdb.

I have provided Linux scripts ‘make_scan’ and ‘run’, these need to be made executable using ‘chmod +x’. The Alpha debugger does require arm-none-eabi-gdb, which isn’t included in many Linux distributions (including Raspbian Buster) so may need to be built from source.

My Windows system uses serial port COM7, and Linux uses /dev/ttyUSB0; yours may well be different, so you’ll need to change scripts accordingly. If the Cypress firmware is included in the build image (i.e. ‘INCLUDE_FIRMWARE’ is non-zero) then it will take around 10 seconds to load the executable image onto the ZeroW. When the code runs you should see a list of access points; to keep the number of entries low, I only scan a single channel, by default channel 1:

360 bytes
7A:30:D9:96:DA:xx  1 BTWifi-X
460 bytes
84:A4:23:04:81:xx  1 PLUSNET
360 bytes
BC:30:D9:96:DA:xx  1 BTHub6
456 bytes
20:E5:2A:0E:A1:xx  1 Virginia Drive
312 bytes
7A:30:D9:96:DA:xx  1 BTWifi-with-FON
312 bytes
00:1D:AA:C1:75:xx  1 testnet

The last of the these is a special test network I’ll be using in subsequent parts of this blog.

To select another channel, change SCAN_CHAN at the top of zerowi.c; if set to zero, all channels will be scanned.

[Overview] [Previous part] [Next part]

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.

Zerowi bare-metal WiFi driver part 4: loading firmware

In the previous post we sent some commands to the WiFi chip, and got a response. To make the chip do anything useful, we need to program its internal CPU, as it doesn’t have code in ROM.

It does have configuration tables in ROM, that indicate what resources it possesses, and their locations, so the chip variants can all be programmed by a single driver. However, parsing these tables isn’t easy; the simplest code I’ve found is in the the Plan9 driver (see part 1 of this blog for details), and that is moderately impenetrable; here is the parser output for the ZeroW (values in hex):

chip ID A9A6 hex, 43430 decimal
coreid 800, corerev 31
  chipcommon 18000000
coreid 812, corerev 27
  d11ctl 18101000
coreid 829, corerev 15
  sdregs 18002000
  sdiorev 15
coreid 82a, corerev 9
  armcore 82a (ARMcm3)
  armregs 18003000
coreid 80e, corerev 16
  socramregs 18004000

I think these are the Intellectual Property (IP) cores within the chip, and the locations they occupy in the memory map, but in the absence of documentation, a lot of guesswork is required. So I decided to ignore the configuration tables, and just use the same addresses as the Linux driver, after it has done the decode. This makes my driver a lot less flexible, as the addresses will have to be changed for each new chip, but there aren’t many of them, so only a handful of definitions will need changing.

The most important number is the chip ID; it should be A9A6 hex, so it’d be a good idea to check our chip matches that. In part 3 my code did some preliminary SDIO initialisation, now to follow on from that:

// SD function numbers
#define SD_FUNC_BUS     0
#define SD_FUNC_BAK     1
#define SD_FUNC_RAD     2

// Maximum block sizes
#define SD_BAK_BLK_BYTES    64
#define SD_RAD_BLK_BYTES    512

// [0.243831] Set bus interface
sdio_cmd52_writes(SD_FUNC_BUS, BUS_SPEED_CTRL_REG, 0x03, 1);
sdio_cmd52_writes(SD_FUNC_BUS, BUS_BI_CTRL_REG, 0x42, 1);
// [17.999101] Set block sizes
sdio_cmd52_writes(SD_FUNC_BUS, BUS_BAK_BLKSIZE_REG, SD_BAK_BLK_BYTES, 2);
sdio_cmd52_writes(SD_FUNC_BUS, BUS_RAD_BLKSIZE_REG, SD_RAD_BLK_BYTES, 2);

The SD function numbers allow Command 52 & 53 to access 3 different interfaces within the chip: think ‘hardware functions’ rather than ‘software functions’. The SDIO bus interface is configured using the ‘bus’ function, and is set into high-speed mode (as discussed in part 2). Then the block sizes for the backplane (‘BAK’) and radio (‘RAD’) functions are set; these are limited to 64 & 512 bytes by the hardware. These will be used by command 53 when operating in multi-block mode.

#define BAK_BASE_ADDR           0x18000000              // CHIPCOMMON_BASE_ADDRESS

// [17.999944] Enable I/O 
sdio_cmd52_writes(SD_FUNC_BUS, BUS_IOEN_REG, 1<<SD_FUNC_BAK, 1);
if (!sdio_cmd52_reads_check(SD_FUNC_BUS, BUS_IORDY_REG, 0xff, 2, 1))
    log_error(0, 0);
// [18.001750] Set backplane window
sdio_bak_window(BAK_BASE_ADDR);
// [18.001905] Read chip ID 
sdio_cmd53_read(SD_FUNC_BAK, SB_32BIT_WIN, u32d.bytes, 4);

We now use the ‘bus’ function to enable the ‘backplane’ interface; by default, the IP cores in the chip are switched off to conserve power, and they need to be enabled; the second line of code checks that the core has actually powered up (I/O enabled -> I/O ready). Once the backplane function is enabled, we set a window pointing to the common base address (‘chipcommon’ in the Plan9 driver) then do a read, and we get hex values A6 A9 41 15, which is correct. However, some explanation is needed with regard to the backplane window.

Backplane window

You may recall that the commands we’re using here, CMD52 and CMD53, only have a 17-bit address range, yet the chip uses 32-bit addresses internally. The way this is handled is by writing a 24-bit value to 3 of the backplane registers, to act as an offset within the internal space.

// Backplane window
#define SB_32BIT_WIN    0x8000
#define SB_ADDR_MASK    0x7fff
#define SB_WIN_MASK     (~SB_ADDR_MASK)

// Set backplane window, don't set if already OK
void sdio_bak_window(uint32_t addr)
{
    static uint32_t lastaddr=0;
    
    addr &= SB_WIN_MASK;
    if (addr != lastaddr)
        sdio_cmd52_writes(SD_FUNC_BAK, BAK_WIN_ADDR_REG, addr>>8, 3);
    lastaddr = addr;
}
// Do 1 - 4 CMD52 writes to successive addresses
int sdio_cmd52_writes(int func, int addr, uint32_t data, int nbytes)
{
    int n=0;

    while (nbytes--)
    {
        n += sdio_cmd52(func, addr++, (uint8_t)data, SD_WR, 0, 0);
        data >>= 8;
    }
    return(n);
}

It is important to realise that this is a simple windowing scheme where the bottom 15 bits are provided by the offset, and the top 17 bits by the window: the two values aren’t added together. An additional complication (yes, really) is that there are 2 copies of the lower 15-byte address space; 0 – 7fff hex is for byte accesses, and 8000 – ffff hex is for 32-bit word accesses (offset SB_32BIT_WIN).

To give a concrete example, here is the analysis of the RPi driver fetching the CPU ID:

18.001455 * Cmd 52 92001400 Wr BAK  1000A 00
18.001481 * Rsp 52 00001000 Flags 10 data 00
18.001618 * Cmd 52 92001600 Wr BAK  1000B 00
18.001644 * Rsp 52 00001000 Flags 10 data 00
18.001750 * Cmd 52 92001818 Wr BAK  1000C 18 Bak Win 180000
18.001777 * Rsp 52 00001018 Flags 10 data 18
18.001905 * Cmd 53 15000004 Rd BAK  180000:8000 len 4
  Data   4 bytes: a6 a9 41 15 *

You can see the 3 CMD52 write cycles to set the window address, then the 4-byte read cycle, with the offset into the 32-bit area. The ‘win 180000’ and ‘180000:8000’ labels are my analysis code trying to be helpful, by saving the window value, and repeating it at the subsequent read cycle.

Firmware file

There are various firmware versions that could be used (see Cypress WICED) but I’m using the same version as the RPi driver, available here. It is around 300K bytes; eventually, it’ll be stored in the SD card filesystem, but for the time being I wanted a simpler storage mechanism, so attached an external SPI memory device, that can be programmed by a standard RPi utility, and is really easy to read back.

This extra hardware isn’t compulsory; there is an INCLUDE_FIRMWARE option in the source code to link the firmware file into the binary image; the functionality is the same, it just takes longer to load over the target serial link.

The device I used is an EN25Q80B, which has a megabyte of serial flash memory. MikroElektronika sell a small flash click board that is simple to connect to the ZeroW, as follows:

MicroE pi   RPi pin
Gnd          25
3V3          17
SDI          19
SDO          21
SCK          23
CS           24

This can be programmed using the following utilities that are included in the standard Linux distribution:

objcopy -F binary brcmfmac43430-sdio.bin flash.bin --pad-to 0x100000
sudo apt install flashrom
sudo modprobe spi_bcm2835
flashrom -p linux_spi:dev=/dev/spidev0.0,spispeed=1000 -w flash.bin

The version of flashrom I used does issue a warning that the Eon chip isn’t fully supported, but still programs it OK. Reading the chip is really easy:

#define SPI0_BASE       (REG_BASE + 0x204000)
#define SPI0_CS         (uint32_t *)SPI0_BASE
#define SPI0_FIFO       (uint32_t *)(SPI0_BASE + 0x04)
#define SPI0_CLK        (uint32_t *)(SPI0_BASE + 0x08)
#define SPI0_DLEN       (uint32_t *)(SPI0_BASE + 0x0c)
#define SPI0_DC         (uint32_t *)(SPI0_BASE + 0x14)

#define SPI0_CE0_PIN    8
#define SPI0_MISO_PIN   9
#define SPI0_MOSI_PIN   10
#define SPI0_SCLK_PIN   11

// Initialise flash interface (SPI0)
void flash_init(int khz)
{
    gpio_set(SPI0_CE0_PIN, GPIO_ALT0, GPIO_NOPULL);
    gpio_set(SPI0_MISO_PIN, GPIO_ALT0, GPIO_PULLUP);
    gpio_set(SPI0_MOSI_PIN, GPIO_ALT0, GPIO_NOPULL);
    gpio_set(SPI0_SCLK_PIN, GPIO_ALT0, GPIO_NOPULL);
    *SPI0_CS = 0x30;
    *SPI0_CLK = CLOCK_KHZ / khz;
}

// Set / clear SPI chip select
void spi0_cs(int set)
{
    *SPI0_CS = set ? *SPI0_CS | 0x80 : *SPI0_CS & ~0x80;
}

// Start a flash read cycle (EN25Q80 device)
void flash_open_read(int addr)
{
    uint8_t rxdata[4], txdata[4]={3, (uint8_t)(addr>>16), (uint8_t)(addr>>8), (uint8_t)(addr)};
    
    spi0_cs(1);
    spi0_xfer(txdata, rxdata, 4);
}
// Read next block
void flash_read(uint8_t *dp, int len)
{
    while (len--)
    {
        *SPI0_FIFO = 0;
        while((*SPI0_CS & (1<<17)) == 0) ;
        *dp++ = *SPI0_FIFO;
    }
}
// End a flash cycle
void flash_close(void)
{
    spi0_cs(0);
}

If you don’t want to bother with this, just set the INCLUDE_FIRMWARE option in the source code, which links the firmware file into the main executable.

File upload

Before we can upload the code, there is a lot more initialisation to be done; another 34 commands that I won’t be describing here, mainly because I’m having difficulty understanding them in the absence of documentation; for now, the source code is the only explanation you’ll get.

The process of transferring the file is made a bit more complicated by the windowing scheme I described earlier; we have to move that along after every 32K. Command 53 is used in multi-block mode, so one command is issued for multiple data blocks.

// Upload blocks of firmware from flash to chip RAM
int write_firmware(void)
{
    int len, n=0, nbytes=0, nblocks;
    uint32_t addr;

    flash_open_read(0);
    while (nbytes < FIRMWARE_LEN)
    {
        addr = sdio_bak_addr(nbytes);
        len = MIN(sizeof(txbuffer), FIRMWARE_LEN-nbytes);
        nblocks = len / SD_BAK_BLK_BYTES;
		if (nblocks > 0)
        {
            flash_read(txbuffer, nblocks*SD_BAK_BLK_BYTES);
            n = sdio_write_blocks(SD_FUNC_BAK, SB_32BIT_WIN+addr, txbuffer, nblocks);
            if (!n)
                break;
            nbytes += nblocks * SD_BAK_BLK_BYTES;
        }
        else
        {
            flash_read(txbuffer, len);
            txbuffer[len++] = 1;
            sdio_cmd53_write(SD_FUNC_BAK, SB_32BIT_WIN+addr, txbuffer, len);
            nbytes += len;
        }
    }
    flash_close();
    return(nbytes);
}
// Write multiple 64-byte command 53 blocks (max 32K in total)
int sdio_write_blocks(int func, int addr, uint8_t *dp, int nblocks)
{
    int n=0;
    SDIO_MSG rspx, cmd={.cmd53 = {.start=0, .cmd=1, .num=53,
        .wr=1, .func=func, .blk=1, .inc=1, .addrh=(uint8_t)(addr>>15)&3,
        .addrm=(uint8_t)(addr>>7), .addrl=(uint8_t)(addr&0x7f),
        .lenh=(uint8_t)(nblocks>>8)&1, .lenl=(uint8_t)nblocks, .crc=0, .stop=1}};

    clk_0(1);
    add_crc7(cmd.data);
    log_msg(&cmd);
    sdio_cmd_write(cmd.data, MSG_BITS);
    if (sdio_rsp_read(rspx.data, MSG_BITS, SD_CMD_PIN))
    {
        gpio_write(SD_D0_PIN, 4, 0xf);
        gpio_mode(SD_D0_PIN, GPIO_OUT);
        gpio_mode(SD_D1_PIN, GPIO_OUT);
        gpio_mode(SD_D2_PIN, GPIO_OUT);
        gpio_mode(SD_D3_PIN, GPIO_OUT);
        while (n++ < nblocks)
        {
            sdio_block_out(dp, SD_BAK_BLK_BYTES);
            sdio_rsp_read(rspx.data, BLOCK_ACK_BITS, SD_D0_PIN);
            dp += SD_BAK_BLK_BYTES;
            clk_0(2);
        }
        gpio_mode(SD_D0_PIN, GPIO_IN);
        gpio_mode(SD_D1_PIN, GPIO_IN);
        gpio_mode(SD_D2_PIN, GPIO_IN);
        gpio_mode(SD_D3_PIN, GPIO_IN);
    }
    clk_0(1);
    return(n);
}

Once that is complete, we must load in the configuration data, which is available here. A small amount of pre-processing is required, namely removing the comment lines, and replacing all the newline characters with nulls. Since the file is small, command 53 is used in single-block mode.

// Upload blocks of config data to chip NVRAM
int write_nvram(void)
{
    int nbytes=0, len;

    sdio_bak_window(0x078000);
    while (nbytes < config_len)
    {
        len = MIN(config_len-nbytes, SD_BAK_BLK_BYTES);
        sdio_cmd53_write(SD_FUNC_BAK, 0xfd54+nbytes, &config_data[nbytes], len);
        nbytes += len;
    }
    return(nbytes);
}

After another 12 initialisation commands, we can check if the code was loaded OK:

usdelay(50000);
if (!sdio_cmd52_reads(SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, &u32d.uint32, 1) || u32d.uint8!=0xd0)
    log_error(0, 0);
// [19.190728]
sdio_cmd52_writes(SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, 0xd2, 1);
sdio_bak_write32(SB_TO_SB_MBOX_DATA_REG, 0x40000);
sdio_cmd52_writes(SD_FUNC_BUS, BUS_IOEN_REG, (1<<SD_FUNC_BAK) | (1<<SD_FUNC_RAD), 1);
sdio_cmd52_reads(SD_FUNC_BUS, BUS_IORDY_REG, &u32d.uint32, 1);
usdelay(100000);
if (!sdio_cmd52_reads(SD_FUNC_BUS, BUS_IORDY_REG, &u32d.uint32, 1) || u32d.uint8!=0x06)
    log_error(0, 0);

If the first value is D0 hex, and the second is 6, then all is well, and after another 21 initialisation commands, we can think about doing something useful with the chip…

[Overview] [Previous part] [Next part]

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.

Zerowi bare-metal WiFi driver part 3: initialisation

This diagram shows the internals of the CYW43xxx chip in simplified form; the important point is that the chip has its own CPU, RAM and ROM; it is a computer-within-a-computer.

In part 2 I mentioned the lack of documentation, and now this becomes a major issue; how to program this complex chip, with no data on its internals. Cypress have partially solved this problem by issuing a standard binary ‘blob’ (roughly 300K bytes) that contains all the code for the embedded CPU; we’ll just be feeding data and control messages into that program, not knowing (or caring) what it is doing to the chip hardware.

I say the problem is ‘partially’ solved because we have to set up the chip to receive this program, upload the code into its RAM, then configure the chip to run it.. a sizeable task, as I’ve discovered.

First steps

The first task is to get the WiFi chip to respond to our commands. The SD and SDIO specifications offer plenty of flowcharts that describe how such a device might be initialised, but I had minimal success with these; they may be applicable to older chips, but maybe the later incarnations of the CYW43xxx just treat the SDIO bus as a convenient parallel interface, rather than slavishly following a specification that was designed for plug-in SD cards.

Then there are the timing issues; a quick glance at the existing code shows many instances where SDIO commands are artificially delayed; after you change something within the chip, it needs time to react before receiving the next command – and if that is sent too quickly, the chip just ignores it, with no error response.

The best way I could find to tackle these problems is to capture all the SDIO commands, responses & data when the Linux driver is running. The capture method is described in the previous part of this blog, and it results in a sizeable file: over 5 gigabytes as a CSV. It contains over 2,200 commands and 13,000 data blocks, so I wrote an application (‘sd_decoder’) to decode the file and display the commands. It turns out that there is a lot of redundancy (for example, the driver loads the 300K binary into the chip, then reads it all back again) so by focusing purely on one WiFi chip, we can make major simplifications.

Fragments of the simplified command sequence can then be replayed into the chip, and eventually, it starts to respond – the first time you get a response from a new chip is a happy day! Here are the first few commands the Linux driver uses to start up the chip:

 0.000331 74 00 00 0c 00 39 * Cmd 52 00000C00 Rd BUS  00006
 0.002759 74 80 00 0c 08 9f * Cmd 52 80000C08 Wr BUS  00006 08
 0.025217 40 00 00 00 00 95 * Cmd  0 00000000
 0.028130 48 00 00 01 aa 87 * Cmd  8 000001AA
 0.028527 45 00 00 00 00 5b * Cmd  5 00000000
 0.028660 3f 20 ff ff 00 ff ? Rsp 63 20FFFF00
 0.028875 45 00 20 00 00 3d * Cmd  5 00200000
 0.029007 3f a0 ff ff 00 ff ? Rsp 63 A0FFFF00
 0.029228 43 00 00 00 00 21 * Cmd  3 00000000
 0.029353 03 00 01 00 00 eb * Rsp  3 00010000
 0.029690 47 00 01 00 00 dd * Cmd  7 00010000
 0.029815 07 00 00 1e 00 a1 * Rsp  7 00001E00

To explain the format; firstly the time in seconds, then 6 command/response bytes, ‘*’ to indicate CRC is correct, or ‘?’ if incorrect, then a partial decode.

A few points to note:

The first 4 commands produce no responses.
There are jumps in the timestamp, presumably caused by intentional delays.
CMD5 is described in the SDIO specification as enabling I/O mode, but the response we get has an incorrect CRC.
The CMD3 – CMD7 sequence is described in the SD specification; it is the way that a host selects one card from multiple card slots; CMD3 gets a Relative Card Address (RCA), then command 7 selects the card using that address.

It’d be nice to understand why two of the commands have failed CRCs, but the Linux driver ignores that error, so I will as well. The above sequence is implemented in my code as follows:

int rca;
sdio_cmd52(SD_FUNC_BUS, 0x06, 0, SD_RD, 0, 0);   
usdelay(20000);
sdio_cmd52(SD_FUNC_BUS, 0x06, 8, SD_WR, 0, 0);   
usdelay(20000);
sdio_cmd(0, 0, 0);
sdio_cmd(8, 0x1aa, 0);
// Enable I/O mode
sdio_cmd(5, 0, 0);
sdio_cmd(5, 0x200000, 0);
// Assert SD device
sdio_cmd(3, 0, &resp);
rca = SWAP16(resp.rsp3.rcax);
sdio_cmd7(rca, 0);

Note the time delays; it is tempting to reduce them, but I have found from bitter experience that this can result in major problems much later on (as a critical setting has been ignored), so I wouldn’t recommend doing that.

Raspberry Pi I/O

Now it is necessary to translate the SDIO commands into hardware I/O cycles. The good news about bare-metal programming is that is isn’t necessary to use a fancy driver, or seek permission from the operating system; we can just control the I/O directly.

The primary source of information is the ‘BCM2835 ARM Peripherals’ document; armed with that and knowledge of the I/O base address (0x20000000 for the Pi ZeroW) we can create suitable low-level functions.

// Addresses
#define REG_BASE    0x20000000      // Pi Zero (0x3F000000 for Pi 3)
#define GPIO_BASE       (REG_BASE + 0x200000)
#define GPIO_MODE0      (uint32_t *)GPIO_BASE
#define GPIO_SET0       (uint32_t *)(GPIO_BASE + 0x1c)
#define GPIO_CLR0       (uint32_t *)(GPIO_BASE + 0x28)
#define GPIO_REG(a)     ((uint32_t *)a)

// Mode values
#define GPIO_IN         0
#define GPIO_OUT        1
#define GPIO_ALT0       4
#define GPIO_ALT1       5
#define GPIO_ALT2       6
#define GPIO_ALT3       7

// Configure pin as input or output
void gpio_mode(int pin, int mode)
{
    uint32_t *reg = GPIO_REG(GPIO_MODE0) + pin / 10;
    int shift = (pin % 10) * 3;

    *reg = (*reg & ~(7 << shift)) | (mode << shift);
}

// Set an O/P pin
void gpio_out(int pin, int val)
{
    uint32_t *reg = (val ? GPIO_REG(GPIO_SET0) : GPIO_REG(GPIO_CLR0)) + pin/32;

    *reg = 1 << (pin % 32);
}

// Get an I/P pin value
uint8_t gpio_in(int pin)
{
    uint32_t *reg = GPIO_REG(GPIO_LEV0) + pin/32;

    return (((*reg) >> (pin % 32)) & 1);
}

Configuring a pin as an input or output is done by setting a 3-bit values. Writing 1 or 0 to a pin is (sadly) done using separate set & clear registers; this does make any speed-optimisations (e.g. direct DMA to I/O ports) significantly harder, so I haven’t tried this yet.

Running under Linux

Although the whole purpose of this project is to run without Linux, after I’d written the above code, I did wonder whether it’d speed up development if I ran it under Linux with the WiFi interface shut down, using mmap() to gain access to the devices at low level.

This experiment failed; I never got reliable communications with the WiFi chip, and the operating system had a tendency to crash after my code was run. This isn’t too surprising, since the whole point of the OS is to control the hardware, and having a user-mode program controlling it as well, is really asking for trouble.

Timer

In addition to I/O cycles, we need a microsecond timing reference, that can be used to provide accurate delays. Fortunately there is a 32-bit register clocked at 1 MHz that is ideal for the purpose.

#define USEC_BASE       (REG_BASE + 0x3000)
#define USEC_REG()      ((uint32_t *)(USEC_BASE+4))

// Delay given number of microseconds
void usdelay(int usec)
{
    int ticks;

    ustimeout(&ticks, 0);
    while (!ustimeout(&ticks, usec)) ;
}

// Return non-zero if timeout
int ustimeout(int *tickp, int usec)
{
    int t = *USEC_REG();

    if (usec == 0 || t - *tickp >= usec)
    {
        *tickp = t;
        return (1);
    }
    return (0);
}

SDIO output

The Raspberry Pi CPU is sufficiently fast that we can easily toggle the clock line at 500 kHz, while shifting the command bits out.

#define SD_CLK_DELAY    1   // Clock on/off time in usec

// Write command to SD interface
void sdio_cmd_write(uint8_t *data, int nbits)
{
   uint8_t b, n;

    gpio_mode(SD_CMD_PIN, GPIO_OUT);
    for (n=0; n<nbits; n++)
    {
        if (n%8 == 0)
            b = *data++;
        gpio_out(SD_CMD_PIN, b & 0x80);
        b <<= 1;
        usdelay(SD_CLK_DELAY);
        gpio_out(SD_CLK_PIN, 1);
        usdelay(SD_CLK_DELAY);
        gpio_out(SD_CLK_PIN, 0);
    }
    gpio_mode(SD_CMD_PIN, GPIO_IN);
}

This code could be made much faster, by eliminating the delays. When writing the data output function I took a more aggressive approach to the timing; the transfers are error-free with a 2 MHz clock (8 Mbit/s of data) and could go faster with some optimisation.

Reception is a lot more tricky, handling command responses, data on CMD53 read-cycles, and acknowledgements on write-cycles. This requires multiple state-machines, triggered by the clock edges and ‘start’ bit detection; see the source code for details.

The 7- and 16-bit CRC calculations for commands & data have already been explained in part 2 of this blog.

[Overview] [Previous part] [Next part]

Copyright (c) Jeremy P Bentham 2020. Please credit this blog if you use the information or software in it.