PicoWi part 5: ARP, IP and ICMP

In part 4, the wireless chip was connected to a WiFi network, so it can now send & receive data on that network, but we still have to encode the data for transmission, and decode it for reception.

We’re using a ‘full MAC’ chip, so all the low-level WiFi interfacing is handled within the chip. When transmitting, it encrypts our data, and adds the necessary 802.11 headers so that it will accepted by the network access point; when receiving, the headers are stripped off and the data is decrypted before being passed over to the Pico CPU.

This doesn’t just make our encoding & decoding tasks easier, it also ensures that the transmissions fully conform to the (exceedingly complex) 802.11 rules; if your interest is in creating non-standard wireless transmissions, then I’m afraid this project will be of no help.

TCP/IP

The suite of protocols used for data transmission over the Internet are generally known as Transmission Control Protocol / Internet Protocol, or TCP/IP. We’ll only be using a small subset of these protocols, and the initial task is just to handle Address Resolution Protocol (ARP) and Internet Control Message Protocol (ICMP). This will allow us to send & receive diagnostic ‘ping’ messages, and do some simple benchmarks by communicating with another system.

TCP/IP uses a three-tier addressing system; at the highest level, there are names with dotted notation, such as iosoft.blog or http://www.google.com. To access the computer at this address, two further steps are required:

  • a Domain Name System (DNS) database lookup is used to convert the name into an Internet Protocol (IP) address, which has 4 numeric values in dotted notation, for example 192.168.5.1
  • an Address Resolution Protocol (ARP) message is sent out on the network, with a request to convert the remote unit’s IP address into a Media Access and Control (MAC) address, which has 6 bytes, that are normally printed with a colon separator, e.g. 28:CD:C1:00:12:34

The first of these will be tackled in the next part of this project; for now, I’m assuming that the unit has obtained an IP address from somewhere, and knows the IP address of another unit it wishes to communicate with, for example the WiFi access point.

Address Resolution Protocol (ARP)

This is probably the simplest of all TCP/IP protocols; the unit broadcasts a request in a specific format, giving the IP address it wants to contact, and if any unit on the same ‘subnet’ has that address, then it will respond with its 6-byte MAC address. That is used for outgoing messages, but for incoming messages our unit must listen out for ARP broadcasts, and if a request matches its IP address, it should respond with the MAC address.

The ARP message format can be encapsulated within a C structure:

typedef unsigned char  BYTE;
typedef unsigned short WORD;
typedef unsigned int   DWORD;
typedef BYTE MACADDR[MACLEN];
typedef unsigned int IPADDR;

/* ***** ARP (Address Resolution Protocol) packet ***** */
typedef struct
{
    WORD hrd,           /* Hardware type */
         pro;           /* Protocol type */
    BYTE  hln,          /* Len of h/ware addr (6) */
          pln;          /* Len of IP addr (4) */
    WORD op;            /* ARP opcode */
    MACADDR  smac;      /* Source MAC addr */
    IPADDR   sip;       /* Source IP addr */
    MACADDR  dmac;      /* Destination MAC addr */
    IPADDR   dip;       /* Destination IP addr */
} ARPKT;

This is the first of many C structures for TCP/IP, and I’ve chosen to define 8, 16 and 32-bit values as BYTE, WORD and DWORD for clarity.

To broadcast this message, we need to add on Ethernet header, giving a source MAC address (the MAC address of our unit, as reported by the WiFi chip) the destination MAC address (broadcast, which is all-ones, i.e. FF:FF:FF:FF:FF:FF) and a protocol ID, which indicates that we’re sending an ARP packet.

/* Ethernet (DIX) header */
typedef struct {
    MACADDR dest;               /* Destination MAC address */
    MACADDR srce;               /* Source MAC address */
    WORD    ptype;              /* Protocol type or length */
} ETHERHDR;
#define PCOL_ARP    0x0806      /* Protocol type: ARP */
#define PCOL_IP     0x0800      /*                IP */

There are a lot of similarities between the higher level of wired (Ethernet) and wireless (802.11) protocols, so it makes sense that both use the same network address structure.

Creating an ARP request is really just a fill-in-the-blanks exercise:

#define HTYPE       0x0001  /* Hardware type: ethernet */
#define ARPPRO      0x0800  /* Protocol type: IP */
#define ARPREQ      0x0001  /* ARP request */
#define ARPRESP     0x0002  /* ARP response */

// Add Ethernet header to buffer, return byte count
WORD ip_add_eth(BYTE *buff, MACADDR dmac, MACADDR smac, WORD pcol)
{
    ETHERHDR *ehp = (ETHERHDR *)buff;

    MAC_CPY(ehp->dest, dmac);
    MAC_CPY(ehp->srce, smac);
    ehp->ptype = htons(pcol);
    return(sizeof(ETHERHDR));
}

// Create an ARP frame, return length
int ip_make_arp(BYTE *buff, MACADDR mac, IPADDR addr, WORD op)
{
    int n = ip_add_eth(buff, op==ARPREQ ? bcast_mac : mac, my_mac, PCOL_ARP);
    ARPKT *arp = (ARPKT *)&buff[n];

    MAC_CPY(arp->smac, my_mac);
    MAC_CPY(arp->dmac, op==ARPREQ ? bcast_mac : mac);
    arp->hrd = htons(HTYPE);
    arp->pro = htons(ARPPRO);
    arp->hln = MACLEN;
    arp->pln = sizeof(DWORD);
    arp->op  = htons(op);
    arp->dip = addr;
    arp->sip = my_ip;
    if (display_mode & DISP_ARP)
        ip_print_arp(arp);
    return(n + sizeof(ARPKT));
}

// Convert byte-order in a 'short' variable
WORD htons(WORD w)
{
    return(w<<8 | w>>8);
}

All network data is in big-endian format (most-significant byte first), but the RP2040 processor is little-endian, so the 16-bit values need to be byte-swapped.

To transmit the message, all that is needed is to add on the SDPCM layer for the WiFi chip, and copy it into an outgoing message buffer:

// Transmit an ARP frame
int ip_tx_arp(MACADDR mac, IPADDR addr, WORD op)
{
    int n = ip_make_arp(txbuff, mac, addr, op);
    
    return(ip_tx_eth(txbuff, n));
 }

// Send transmit data
int ip_tx_eth(BYTE *buff, int len)
{
    return(event_net_tx(buff, len));
}
// Transmit network data
int event_net_tx(void *data, int len)
{
    TX_MSG *txp = &tx_msg;
    int txlen = sizeof(SDPCM_HDR)+2+sizeof(BDC_HDR)+len;
    
    display(DISP_DATA, "Tx_DATA len %d\n", len);
    disp_bytes(DISP_DATA, data, len);
    display(DISP_DATA, "\n");
    txp->sdpcm.len = txlen;
    txp->sdpcm.notlen = ~txp->sdpcm.len;
    txp->sdpcm.seq = sd_tx_seq++;
    memcpy(txp->data, data, len);
    if (!wifi_reg_val_wait(10, SD_FUNC_BUS, SPI_STATUS_REG, 
            SPI_STATUS_F2_RX_READY, SPI_STATUS_F2_RX_READY, 4))
        return(0);
    return(wifi_data_write(SD_FUNC_RAD, 0, (uint8_t *)txp, (txlen+3)&~3));
}

The transmit data length is rounded up to the nearest 4 bytes, as the WiFi DMA controller only works handles complete 4-byte words.

ARP reception

An incoming message will arrive as an ‘event’ from the WiFi chip, and a handler function first checks that it is valid:

// Handler for incoming ARP frame
int arp_event_handler(EVENT_INFO *eip)
{
    ETHERHDR *ehp=(ETHERHDR *)eip->data;

    if (eip->chan == SDPCM_CHAN_DATA &&
        eip->dlen >= sizeof(ETHERHDR)+sizeof(ARPKT) &&
        htons(ehp->ptype) == PCOL_ARP &&
        (MAC_IS_BCAST(ehp->dest) ||
         MAC_CMP(ehp->dest, my_mac)))
    {
        return(ip_rx_arp(eip->data, eip->dlen));
    }
    return(0);
}

If the incoming message is an ARP request, then the receiver function transmits an appropriate response. If it is a response, the resulting MAC address is saved, for use in future transmissions:

// Receive incoming ARP data
int ip_rx_arp(BYTE *data, int dlen)
{
    ETHERHDR *ehp=(ETHERHDR *)data;
    ARPKT *arp = (ARPKT *)&data[sizeof(ETHERHDR)];
    WORD op = htons(arp->op);

    if (arp->dip == my_ip)
    {
        if (op == ARPREQ)
            ip_tx_arp(ehp->srce, arp->sip, ARPRESP);
        else if (op == ARPRESP)
            ip_save_arp(arp->smac, arp->sip);
        return(1);
    }
    return(0);
}

Ping

Ping request & response format

Having obtained the 6-byte MAC address of a unit we wish to communicate with, what can we send to it? The obvious choice is a diagnostic ‘ping’, that echoes back the data we send, and measures the round-trip time.

Ping uses the Internet Control Message Protocol (ICMP), with an IP header for the address information:

/* ***** ICMP (Internet Control Message Protocol) header ***** */
typedef struct
{
    BYTE  type,         /* Message type */
          code;         /* Message code */
    WORD  check,        /* Checksum */
          ident,        /* Identifier */
          seq;          /* Sequence number */
} ICMPHDR;
#define ICREQ           8   /* Message type: echo request */
#define ICREP           0   /*               echo reply */

/* ***** IP (Internet Protocol) header ***** */
typedef struct
{
    BYTE   vhl,         /* Version and header len */
           service;     /* Quality of IP service */
    WORD   len,         /* Total len of IP datagram */
           ident,       /* Identification value */
           frags;       /* Flags & fragment offset */
    BYTE   ttl,         /* Time to live */
           pcol;        /* Protocol used in data area */
    WORD   check;       /* Header checksum */
    IPADDR sip,         /* IP source addr */
           dip;         /* IP dest addr */
} IPHDR;
#define PICMP   1           /* Protocol type: ICMP */
#define PTCP    6           /*                TCP */
#define PUDP   17           /*                UDP */

Creating an ICMP request largely consists of filling in the values within these structures, and adding some arbitrary data on the end, but there are some issues to bear in mind:

  • As with ARP, all values are big-endian (most significant byte first) so byte-swaps are needed
  • Potentially the IP message (known as a ‘datagram’) may travel very long distances, with a large number of ‘hops’ between computers, and each of these hops will have a maximum data size it can accommodate, which is known as a Maximum Transmission Unit (MTU). To allow a large datagram to be sent across a link with a smaller MTU, there is a technique called ‘IP fragmentation’, whereby the transmission is chopped up into smaller parts, and the parts are reassembled at the receiving end. For simplicity, we won’t initially support fragmentation, which means we have an MTU of around 1.5K bytes.
  • There is a checksum across the IP header and ICMP data, and this is calculated using a method that performs identically on big-endian and little-endian processors.
/* Calculate TCP-style checksum, add to old value */
WORD add_csum(WORD sum, void *dp, int count)
{
    WORD n=count>>1, *p=(WORD *)dp, last=sum;

    while (n--)
    {
        sum += *p++;
        if (sum < last)
            sum++;
        last = sum;
    }
    if (count & 1)
        sum += *p & 0x00ff;
    if (sum < last)
        sum++;
    return(sum);
}

Ping reception

If the unit has received a unicast ICMP request, then it should return a response to the sender that basically copies everything in the request, but with the source & destination addresses swapped, and the message type changed from request to reply. Theoretically, the ICMP checksum needs to be re-computed, but as it is just a sum of 16-bit words, it isn’t affected by the address swap. So we can just re-use the existing checksum, adjusted for the change from the request value of 8 to the response value of 0:

// Receive incoming ICMP data
int ip_rx_icmp(BYTE *data, int dlen)
{
    ETHERHDR *ehp=(ETHERHDR *)data;
    IPHDR *ip = (IPHDR *)&data[sizeof(ETHERHDR)];
    ICMPHDR *icmp = (ICMPHDR *)&data[sizeof(ETHERHDR)+sizeof(IPHDR)];
    int n;

    if (display_mode & DISP_ICMP)
        ip_print_icmp(ip);
    if (icmp->type == ICREQ)
    {
        ip_add_eth(data, ehp->srce, my_mac, PCOL_IP);
        ip->dip = ip->sip;
        ip->sip = my_ip;
        icmp->check = add_csum(icmp->check, &icmp->type, 1);
        icmp->type = ICREP;
        n = htons(ip->len);
        return(ip_tx_eth(data, sizeof(ETHERHDR)+n+sizeof(ICMPHDR)));
    }
    else if (icmp->type == ICREP)
    {
        ping_rx_time = ustime();
    }
    return(0);
}

Example program: ping.c

This program generates pins every 2 seconds, and responds to incoming ping requests. It uses a hard-coded IP addresses, for itself and the target of the outgoing pings:

IPADDR myip   = IPADDR_VAL(192,168,1,165);
IPADDR hostip = IPADDR_VAL(192,168,1,1);

‘myip’ should be set to a suitable unused IP address on your subnet (e.g. 182.168.1 in the above example); you can check if an address is unused by pinging it.

‘hostip’ should be set to the address of another unit on the network that can accept pings, or the address of the WiFi Access Point.

You’ll also need to set the name & password for the WiFi network you are using:

// The hard-coded password is for test purposes only!!!
#define SSID                "testnet"
#define PASSWD              "testpass"

See the PicoWi introduction for a description of the build process, and the connection of a serial console to see the diagnostic messages.

The LED will flash rapidly for a few seconds until the device is connected to the network; it will then switch on when a ping is sent, and off when it is received; on my network, the ping time is generally quite short, so only a brief flash is visible if everything is working correctly.

Ping times

On an Ethernet network, it is usual to see fast & repeatable values for the ping round-trip time. However wireless networks aren’t as predictable, since all units that are on the same radio channels will be competing for air-time, not just with your network, but any other networks within range.

So the response time will vary, depending on the activity of any networks sharing the same WiFi channels; here is a a typical example of 20 pings, using the time reported on the Pico console:

Round-trip time for PicoWi ping

A ping time of 1.2 milliseconds is quite respectable, considering that a Pi 4 on the same network takes a minimum of 1.9 ms.

The graph was plotted using GNUplot; if you want to replicate it, the console output is captured to pings.txt, then pre-processed using awk:

awk -F [=\ ] '/time/ { print $(NF-1) }' pings.txt > pings.csv

This script should also work with the console output of Linux pings. The result is then fed to GNUplot; the command-line has been split into 4 for clarity:

gnuplot -e "set term png size 420,240 font 'sans,8'; \
  set title 'Ping time'; set grid; set key noautotitle; \
  set ylabel 'Time (ms)' offset 2; set output 'pings.png'; \
  plot 'pings.csv' with boxes"

Links to the parts of this project:
Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

I’ll be releasing updates with more TCP/IP functionality.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 4: scan and join a network

By the end of part 3, the WiFi chip was up & running, and as a simple test of WiFi operation, we’ll next scan the neighbourhood for WiFi networks, then attempt to join a network.

Scanning a network

As a quick check of wireless functionality, it can be useful to scan for WiFi networks within range. Before starting that, we need to send some IOCTL commands to configure various parameters, such as the network band.

The main problem with IOCTL calls is their sheer variety, that might require data in a specific format, or maybe no data at all. I haven’t been able to find a document that describes them, the only publicly-available documentation seems to be the source code . So when developing, it is quite possible to use the wrong IOCTL command, or send it the wrong data, and we need a way of reporting the error, without adding a lot of print function calls.

All my IOCTL functions return 0 if there wasn’t a reply, and -1 if the response indicated an error, so we can just chain commands using the short-circuit AND functionality to ensure execution will stop when an error occurs, and print the last IOCTL command that was executed:

#define IOCTL_WAIT  30 // Time to wait for ioctl response (msec)

const EVT_STR escan_evts[] = {EVT(WLC_E_ESCAN_RESULT), EVT(WLC_E_SET_SSID), EVT(-1)};

// Start a network scan
int scan_start(void)
{
    int ret;
    
    events_enable(escan_evts);
    ret = ioctl_wr_int32(WLC_SET_SCAN_CHANNEL_TIME, IOCTL_WAIT, SCAN_CHAN_TIME) > 0 &&
        ioctl_set_uint32("pm2_sleep_ret", IOCTL_WAIT, 0xc8) > 0 &&
        ioctl_set_uint32("bcn_li_bcn", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("bcn_li_dtim", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("assoc_listen", IOCTL_WAIT, 0x0a) > 0 &&
        ioctl_wr_int32(WLC_SET_BAND, IOCTL_WAIT, WIFI_BAND_ANY) > 0 &&
        ioctl_wr_int32(WLC_UP, IOCTL_WAIT, 0) > 0;
    ioctl_err_display(ret);
    return(ret);
}

// Display last IOCTL if error
void ioctl_err_display(int retval)
{
    IOCTL_MSG *msgp = &ioctl_txmsg;
    IOCTL_HDR *iohp = (IOCTL_HDR *)&msgp->data[msgp->cmd.sdpcm.hdrlen];
    char *cmds = iohp->cmd==WLC_GET_VAR ? "GET" : 
                 iohp->cmd==WLC_SET_VAR ? "SET" : "";
    char *data, *name;
    
    if (retval <= 0)
    {
        data = (char *)&msgp->data[msgp->cmd.sdpcm.hdrlen+sizeof(IOCTL_HDR)];
        name = iohp->cmd==WLC_GET_VAR || iohp->cmd==WLC_SET_VAR ? data : "";
        printf("IOCTL error: cmd %lu %s %s\n", iohp->cmd, cmds, name);
    }
}

We can check the code functioning by forcing an error, e.g. temporarily reducing the timeout value for a command such as ‘bcn_li_dtim’ to zero, in which case the code reports the following which, although somewhat terse, does indicate the source of the problem:

IOCTL error: cmd 263 SET bcn_li_dtim

To start a scan, we need one more IOCTL, with an data structure that sets some more parameters:

#define SSID_MAXLEN         32
#define SCANTYPE_ACTIVE     0
#define SCANTYPE_PASSIVE    1

#pragma pack(1)
typedef struct {
    uint32_t version;
    uint16_t action,
             sync_id;
    uint32_t ssidlen;
    uint8_t  ssid[SSID_MAXLEN],
             bssid[6],
             bss_type,
             scan_type;
    uint32_t nprobes,
             active_time,
             passive_time,
             home_time;
    uint16_t nchans,
             nssids;
    uint8_t  chans[14][2],
             ssids[1][SSID_MAXLEN];
} SCAN_PARAMS;

SCAN_PARAMS scan_params = {
    .version=1, .action=1, .sync_id=0x1, .ssidlen=0, .ssid={0},
    .bssid={0xff,0xff,0xff,0xff,0xff,0xff}, .bss_type=2,
    .scan_type=SCANTYPE_PASSIVE, .nprobes=~0, .active_time=~0,
    .passive_time=~0, .home_time=~0, .nchans=0
};

ioctl_set_data("escan", IOCTL_WAIT, &scan_params, sizeof(scan_params));

After that command is sent, we should receive several responses in the form of events; at least one from each WiFi network in range. The scan event handler has to byte-swap any 16 or 32-bit values, since they are in ‘network’ byte-order (big-endian); the handler function was described in the previous part of this blog.

It isn’t unusual for the same network to be reported more than once, e.g.

8C:59:73:xx:xx:xx 'Post_Office' chan 3
E8:65:D4:xx:xx:xx 'Court Hotel' chan 1
8C:59:73:xx:xx:xx 'Post_Office' chan 3
00:11:22:xx:xx:xx 'Virginia' chan 6
20:B0:01:xx:xx:xx 'vodafone' chan 6
6A:A2:22:xx:xx:xx '[hidden]' chan 6
..and so on..

In the tests I have done, the total time from power-up to receiving the last scan entry is around 2.1 seconds, which is surprisingly fast, considering how much chip-initialisation has been required.

Joining a network

This requires a large number of IOCTL commands to set up the WiFi interface, and there is little point in my listing all of them here, so I’m concentrating on specific settings of interest.

  • Country: this is required in order to set domain-specific parameters. I’m taking the easy way out, and specifying a country code of ‘XX’, which is a common set of world-wide characteristics.
  • Multicast: there is one MAC address set to 01:00:5E:00:00:FB which is the standard for IP v4
  • Power saving: this is disabled by default, but can be compiled in if required, though it does significantly increase WiFi response times, as the device will sleep when idle, and takes some time to wake up & respond.
  • Authentication: this uses a WPA2 pre-shared key, stored in plaintext, which is a major weakness in network security.
  • Network name: the SSID is also stored as plaintext.

Once the network join has been initiated, we receive a stream of events to show progress. These can be viewed by calling set_display_mode with DISP_EVENT. A typical joining sequence might be:

Join secure network:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

Automatic reassociation after joining a network:
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 14
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
 ..then Rx DATA flow continues..

Join open network (no security):
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

SSID not found:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Password incorrect:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 15
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 14
  ..then the same sequence repeated..

The ‘status’ values are common to all the events:

  • 0: success
  • 3: no networks
  • 6: unsolicited
  • 8: partial

The ‘reason’ values are specific to an event, for example in PSK_SUP, 14 means that a de-authentication request has been received, and 15 indicates that a timeout of the pre-shared key handshake has occurred.

Also there is no guarantee that the events will arrive in this order; for example, when I tested on a different Access Point, the last 3 events were PSK_SUP, JOIN, and SET_SSID.

I have also tested the responses to network events:

Orderly shutdown of WiFi at the access point:
  Rx_EVT  12 DISASSOC_IND,  flags 0, status 0, reason 8
  Rx_EVT   3 AUTH,          flags 0, status 5, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT  16 LINK,          flags 0, status 0, reason 2

Restore WiFi after orderly shutdown:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Power-down of the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1

Restore power to the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Network unavailable on startup:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Network becomes available after startup:
  Nothing!

Try to join a secure network, using no security 
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0

So the good news is that the WiFi chip can automatically reconnect to the network under some circumstances, but the bad news is that it will not always reconnect, and I can find no single event showing if the device is connected or not. Rather than attempting to decode the events in detail, I’ve used an overall timeout for joining a network (default 10 seconds); if that fails there is a rest period (currently also 10 seconds) before the next re-connection attempt.

Example programs

There are two examples; see the introduction for details of how to re-build and run the code.

scan.c does a single scan, and returns a list of networks found. The result returned by the WiFi chip is displayed as-is, so may contain duplicates.

join.c joins a given network, reporting on progress; the network name and password must be entered in the source code:

#define SSID                "testnet"
#define PASSWD              "testpass"

The on-board LED flashes at 5 Hz prior to connection, and at 1 Hz when connected.

In the next part I’ll start using TCP/IP protocols.

Links to the parts of this project:
Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 3: IOCTLs and events

Part 2 described how the CYW43439 WiFi chip is initialised, but used an IOCTL call and an event check without explaining what these are, or how they work, so now is the time to rectify that deficiency.

An IOCTL (Input/Output Control) call is sent by the Pico host CPU (RP2040) to the ARM CPU in the WiFi chip, to read or write configuration data, or send a specific command. An event is an unsolicited block of data sent from the WiFi CPU to the host; it can be a notification that an action is complete, or some data that has arrived over the WiFi network.

IOCTLs

A simple example of an IOCTL is a request for the 6-byte WiFi MAC address.

uint8_t mac[6];
ioctl_get_data("cur_etheraddr", 10, mac, 6);

This sends the IOCTL command GET_VAR, with a string to identify the item of interest, and a timeout in milliseconds.

#define WLC_GET_VAR 262

// Get data block from IOCTL variable
int ioctl_get_data(char *name, int wait_msec, uint8_t *data, int dlen)
{
    return(ioctl_cmd(WLC_GET_VAR, name, strlen(name)+1, wait_msec, false, data, dlen));
}

The request must be packed into a structure, for transmission the the WiFi CPU; this has 2 headers, the first is an ‘SDIO/SPI Bus Layer’ (SDPCM) header, followed by an IOCTL header:

// SDPCM header
typedef struct {
    uint16_t len,       // sdpcm_header.frametag
             notlen;
    uint8_t  seq,       // sdpcm_sw_header
             chan,
             nextlen,
             hdrlen,
             flow,
             credit,
             reserved[2];
} SDPCM_HDR;

// IOCTL header
typedef struct {
    uint32_t cmd;       // cdc_header
    uint16_t outlen,
             inlen;
    uint32_t flags,
             status;
} IOCTL_HDR;

// IOCTL command with SDPCM and IOCTL headers
typedef struct
{
    SDPCM_HDR sdpcm;
    IOCTL_HDR ioctl;
    uint8_t data[IOCTL_MAX_BLKLEN];
} IOCTL_CMD;

The first two 16-bit words of the SDPCM header contain the data length, and its bitwise inverse, then the most important fields are:

  • Chan: a number identifying which ‘channel’ is associated with the data: IOCTL channel is 0, event is 1, and data is 2.
  • Hdrlen: the length of the SDPCM header plus any padding. My code doesn’t use any padding, but the response from the WiFi chip often has a lot of padding.
  • Flow & Credit: used to track the WiFi buffer utilisation

This is followed by the IOCTL header, with a command number (262 for GET_VAR) and a data length value.

The whole message plus data is written to the SPI interface:

// Do an IOCTL transaction, get response
// Return 0 if timeout, -1 if error response
int ioctl_cmd(int cmd, char *name, int namelen, int wait_msec, int wr, void *data, int dlen)
{
    IOCTL_CMD *cmdp = &ioctl_txmsg.cmd;
    int txdlen = ((namelen + dlen + 3) / 4) * 4, ret = 0;
    int hdrlen = sizeof(SDPCM_HDR) + sizeof(IOCTL_HDR);
    int txlen = hdrlen + txdlen;

    memset(cmdp, 0, sizeof(ioctl_txmsg));
    cmdp->sdpcm.notlen = ~(cmdp->sdpcm.len = txlen);
    cmdp->sdpcm.seq = sd_tx_seq++;
    cmdp->sdpcm.chan = SDPCM_CHAN_CTRL;
    cmdp->sdpcm.hdrlen = sizeof(SDPCM_HDR);
    cmdp->ioctl.cmd = cmd;
    cmdp->ioctl.outlen = txdlen;
    cmdp->ioctl.flags = ((uint32_t)ioctl_reqid++ << 16) | (wr ? 2 : 0);
    if (namelen)
        memcpy(cmdp->data, name, namelen);
    if (wr && dlen>0)
        memcpy(&cmdp->data[namelen], data, dlen);
    wifi_data_write(SD_FUNC_RAD, 0, (void *)cmdp, txlen);
    ..continued below..

The code now waits for a response, but it is important to note that the first response it receives may be associated with a completely different request, or network data. So it is essential to check that the response matches the command, and if not, keep on checking for a matching response.

    ..continued from above..
    while (wait_msec>=0 && !(ret=ioctl_resp_match(cmd, data, dlen)))
    {
        wait_msec -= IOCTL_POLL_MSEC;        
        usdelay(IOCTL_POLL_MSEC * 1000);
    }
    return(ret);
}

// Read an ioctl response, match the given command, any command if 0
// Return 0 if no response, -1 if error response
int ioctl_resp_match(int cmd, void *data, int dlen)
{
    int rxlen=0, n=0, hdrlen;
    IOCTL_MSG *rsp = &ioctl_rxmsg;
    IOCTL_HDR *iohp;
    
    if ((rxlen = event_read(rsp, 0, 0)) > 0)
    {
        iohp = (IOCTL_HDR *)&rsp->data[rsp->cmd.sdpcm.hdrlen]; 
        hdrlen = rsp->cmd.sdpcm.hdrlen + sizeof(IOCTL_HDR);
        if (rsp->rsp.chan==SDPCM_CHAN_CTRL && 
            (cmd==0 || cmd==iohp->cmd))
        {
            n = MIN(dlen, rxlen-hdrlen);
            if (data && n>0)
                memcpy(data, &rsp->data[hdrlen], n);
            if (cmd)
            {
                if (iohp->status)
                    n = -1;
            }
        }
    }
    return(cmd==0 ? rxlen : n>0 ? n : 0);
}

You’ll note that the response has been obtained using the ‘event_read’ function, which handles all incoming data (solicited or unsolicited) from the WiFi interface; it will be described in detail below.

The IOCTL response has a similar format to the request, except that it generally has a lot of padding after the SDPCM header. This means that (unlike the transmit message) the receiver has to decode the SDPCM header ‘hdrlen’ value, in order to know how much padding has been added in front of the IOCTL header.

In addition to the IOCTL GET_VAR call that reads the value of a variable, given its name as a string, and its partner SET_VAR that writes a new value to that variable, there nearly 300 other IOCTL calls, such as SET_ANTDIV (command 64) which controls the antenna diversity, or UP (command 2) which is used to activate the WiFi interface.

Events

The WiFi chip signals an event when it has something to report to the host processor, for example it has succeeded in joining a WiFi network, or it has just received a data packet from that network.

As discussed above, there is a time-delay associated with any IOCTL command, so the IOCTL response might arrive within a stream of other events. So my code treats any incoming message as a potential event, and establishes its purpose by decoding the SDPCM header.

This raises the question of how the host CPU knows that there is an incoming event; the answer is that it can poll the BUS_SPI_STATUS_REG, to see if the ‘function 2 packet available’ flag is set. Alternatively, to avoid excessive polling cycles, the host can just check the IRQ line (described in part 1) and if that is high, there is an event pending. I use a combined approach; check the IRQ line, but is there hasn’t been any event for 10 milliseconds, check the status register:

#define SPI_STATUS_LEN_SHIFT            9
#define SPI_STATUS_LEN_MASK             0x7ff

// Get ioctl response, async event, or network data.
int event_get_resp(void *data, int maxlen)
{
    uint32_t val=0;
    int rxlen=0;
    
    val = wifi_reg_read(SD_FUNC_BUS, SPI_STATUS_REG, 4);
    if ((val != ~0) && (val & SPI_STATUS_PKT_AVAIL))
    {
        rxlen = (val >> SPI_STATUS_LEN_SHIFT) & SPI_STATUS_LEN_MASK;
        rxlen = MIN(rxlen, maxlen);
        // Read event data if present
        if (data && rxlen>0)
            wifi_data_read(SD_FUNC_RAD, 0, data, rxlen);
        // ..or clear interrupt, and discard data
        else
        {
            val = wifi_reg_read(SD_FUNC_BUS, SPI_INTERRUPT_REG, 2);
            wifi_reg_write(SD_FUNC_BUS, SPI_INTERRUPT_REG, val, 2);
            wifi_reg_write(SD_FUNC_BAK, SPI_FRAME_CONTROL, 0x01, 1);
        }
    }
    return(rxlen);
}

The status register has a flag to indicate data is available on function 2 (the radio interface), and also a length value, indicating how many bytes there are to read. Once that has been read in, the SDPCM header is checked, and the data after that header is copied into a buffer.

// Get ioctl response, async event, or network data
// Optionally copy data after SDPCM & BDC headers into a buffer, return its length
int event_read(IOCTL_MSG *rsp, void *data, int dlen)
{
    int rxlen=0, n=0, hdrlen;
    SDPCM_HDR *sdp=&rsp->cmd.sdpcm;
    BDC_HDR *bdcp;
    
    if ((rxlen = event_get_resp(rsp, sizeof(IOCTL_MSG))) >= sizeof(SDPCM_HDR)+sizeof(BDC_HDR))
    {
        if ((sdp->len ^ sdp->notlen) == 0xffff)
        {
            hdrlen = sdp->hdrlen;
            bdcp = (BDC_HDR *)&rsp->data[hdrlen];
            hdrlen += sizeof(BDC_HDR) + bdcp->offset*4;
            n = MIN(dlen, rxlen-hdrlen);
            if (data && n>0)
                memcpy(data, &rsp->data[hdrlen], n);
        }
    }
    return(dlen>0 ? (n>0 ? n : 0) : rxlen);
}

At the top of these function calls is the polling function, which stores the SDPCM values in a local structure (EVENT_INFO), and takes appropriate action with the data. The reason why a local structure is used is that the event header is in ‘network’ byte-order, which is big-endian (most-significant byte first), so the data is byte-swapped before being stored locally.

Since there may be multiple event handlers, and the the the polling function can’t know which one is the correct destination for the event, it calls each one in turn, stopping when one returns a non-zero value, indicating that it has accepted the event.

// Poll for async event, put results in info structure
int event_poll(void)
{
    EVENT_INFO *eip = &event_info;
    IOCTL_MSG *iomp = &ioctl_rxmsg;
    ESCAN_RESULT *erp=(ESCAN_RESULT *)rxdata;
    EVENT_HDR *ehp = &erp->eventh;
    int n = event_read(iomp, rxdata, sizeof(rxdata));
    
    if (n > 0)
    {
        eip->chan = iomp->rsp.sdpcm.chan;
        eip->flags = SWAP16(ehp->flags);
        eip->event_type = SWAP32(ehp->event_type);
        eip->status = SWAP32(ehp->status);
        eip->reason = SWAP32(ehp->reason);
        eip->data = rxdata;
        eip->dlen = n;
        if (eip->chan == SDPCM_CHAN_CTRL)
            display(DISP_EVENT, "\n");
        else if ((eip->chan==SDPCM_CHAN_EVT || eip->chan==SDPCM_CHAN_DATA) &&
            n >= sizeof(ETHER_HDR)+sizeof(BCMETH_HDR)+sizeof(EVENT_HDR))
            ok = event_handle(eip);
    }
    return(ok);
}

Handling an event

The code calls handler functions in turn, until one returns a non-zero value, indicating it has accepted the event.

#define MAX_HANDLERS    10
typedef int (*event_handler_t)(EVENT_INFO *eip);
event_handler_t event_handlers[MAX_HANDLERS];
int num_handlers;

// Run event handlers, until one returns non-zero
int event_handle(EVENT_INFO *eip)
{
    int ret=0;
    
    for (int i=0; i<num_handlers && !ret; i++)
        ret = event_handlers[i](eip);
    return(ret);
}

An event handler is called with a pointer to the EVENT_INFO structure, which basically contains a copy of the SDPCM header information (in the correct byte-order) and a pointer to the data after that header. The function must return zero if it hasn’t recognised the event. As an example, here is a simple handler that displays the result of a network scan:

// Handler for scan events
int scan_event_handler(EVENT_INFO *eip)
{
    ESCAN_RESULT *erp=(ESCAN_RESULT *)eip->data;
    int ret = eip->chan==SDPCM_CHAN_EVT && eip->event_type==WLC_E_ESCAN_RESULT;
    
    if (ret)
    {
        if (erp->eventh.status == 0)
        {
            printf("Scan complete\n");
            ret = -1;
        }
        else
        {
            printf("%s '", mac_addr_str(erp->info.bssid));
            disp_ssid(&erp->info.ssid_len);
            printf("' chan %d\n", SWAP16(erp->info.channel));
        }
    }
    return(ret);
}

Note that the ESCAN_RESULT data is in ‘network’ byte-order, so needs to be byte-swapped before being displayed.

This handler has to be added to the array of handlers using a function call:

add_event_handler(scan_event_handler);

This allows you to implement your own event handlers, in addition to, or instead of, the functions I have provided.

Enabling events

There are over 140 possible events, and by default they are disabled; we need to enable those we are interested in, such as network authentication & joining, so we can detect any problems.

The enabling process uses a (very large) bitfield, each bit indicating whether an event is enabled or disabled; the resulting byte array is sent to the WiFi CPU using an IOCTL call.

#define EVENT_MAX           208
#define SET_EVENT(msk, e)   msk[4 + e/8] |= 1 << (e & 7)

uint8_t event_mask[EVENT_MAX / 8];

// Enable events
int events_enable(const EVT_STR *evtp)
{
    memset(event_mask, 0, sizeof(event_mask));
    while (evtp->num >= 0)
    {
        if (evtp->num / 8 < sizeof(event_mask))
            SET_EVENT(event_mask, evtp->num);
        evtp++;
    }
    return(ioctl_set_data("bsscfg:event_msgs", 10, event_mask, sizeof(event_mask)));
}

I have used an unusual method to specify the events that are to be enabled; a macro is used to store the event number, and a string corresponding to the event name. This means that I can display event names (instead of numbers) on a diagnostic console, which is very useful to show any problems.

// Storage for event number, and string for diagnostics
typedef struct {
    int num;
    char *str;
} EVT_STR;
#define EVT(e)      {e, #e}

const EVT_STR join_evts[]={EVT(WLC_E_JOIN), EVT(WLC_E_ASSOC), EVT(WLC_E_REASSOC), 
    EVT(WLC_E_ASSOC_REQ_IE), EVT(WLC_E_ASSOC_RESP_IE), EVT(WLC_E_SET_SSID),
    EVT(WLC_E_LINK), EVT(WLC_E_AUTH), EVT(WLC_E_PSK_SUP),  EVT(WLC_E_EAPOL_MSG),
    EVT(WLC_E_DISASSOC_IND), EVT(WLC_E_DISASSOC_IND), EVT(-1)};

In the next part of this project we’ll be scanning and joining a network.

Links to the parts of this project:
Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 2: initialisation

PicoWi initialisation steps

In part 1, I described the low-level hardware & software interface to the Broadcom / Cypress / Infineon CYW43439 WiFi chip, and its close relative, the CYW4343W.

Now we need to initialise the chip, so it is ready to receive network commands and data. This involves sending three files to the chip, and unfortunately there is no simple Application Programming Interface (API) to do this; it is necessary to send a detailed sequence of commands, and if there are any errors, you usually end up with a completely unresponsive WiFi chip.

The first step is to check the Active Low Power (ALP) clock; this involves setting a register, then waiting for the WiFi chip to acknowledge that setting.

bool wifi_init(void)
{
    wifi_reg_write(SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, SD_ALP_REQ, 1);
    if (!wifi_reg_val_wait(10, SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, 
                           SD_ALP_AVAIL, SD_ALP_AVAIL, 1))
        return(false);

This wait-and-loop scenario is quite common, so I’ve created a specific function for it:

// Check register value every msec, until correct or timeout
bool wifi_reg_val_wait(int ms, int func, int addr, uint32_t mask, uint32_t val, int nbytes)
{
    bool ok;
    
    while (!(ok=wifi_reg_read_check(func, addr, mask, val, nbytes)) && ms--)
        usdelay(1000);
    return(ok);
}

// Read & check a masked value, return zero if incorrect
bool wifi_reg_read_check(int func, int addr, uint32_t mask, uint32_t val, int nbytes)
{
    return((wifi_reg_read(func, addr, nbytes) & mask) == val);
}

A value is obtained from the register, masked with an AND-function, then compared with the required value. If the comparison is false, the code delays for one millisecond, then tries again, until the given time (in milliseconds) has expired.

This raises the question as to what the code should do when it encounters an error such as this; should it try to re-send the command? In practice, the timeout generally means that the internal state of the chip is incorrect; for example, there may have been a bug in the code, or a power glitch, and the only way to correct this situation is to re-power the chip, and start again – fortunately the initialisation process is quite fast (it only takes a few seconds) so this isn’t a major problem.

Assuming the ALP check passes, there are some more register write cycles that I can’t explain in detail, as I don’t have access to any information about the chip that isn’t publicly available.

We then make 2 writes to registers in banked memory, that do deserve more explanation.

#define BAK_BASE_ADDR           0x18000000
#define SRAM_BASE_ADDR          (BAK_BASE_ADDR+0x4000)
#define SRAM_BANKX_IDX_REG      (SRAM_BASE_ADDR+0x10)
#define SRAM_BANKX_PDA_REG      (SRAM_BASE_ADDR+0x44)

wifi_bak_reg_write(SRAM_BANKX_IDX_REG, 0x03, 4);
wifi_bak_reg_write(SRAM_BANKX_PDA_REG, 0x00, 4);

The backplane function can only access a 32K block in the WiFi RAM, so the two addresses we’re writing (18004010 and 18004044 hex) are outside its access range, and we have to use bank-switching. There is a simple check to see if the bank has changed since the last access, in which case no switching is needed:

#define SB_32BIT_WIN    0x8000
#define SB_ADDR_MASK    0x7fff
#define SB_WIN_MASK     (~SB_ADDR_MASK)

// Set backplane window if address has changed
void wifi_bak_window(uint32_t addr)
{
    static uint32_t lastaddr=0;

    addr &= SB_WIN_MASK;
    if (addr != lastaddr)
        wifi_reg_write(SD_FUNC_BAK, BAK_WIN_ADDR_REG, addr>>8, 3);
    lastaddr = addr;
}

// Write a 1 - 4 byte value via the backplane window
int wifi_bak_reg_write(uint32_t addr, uint32_t val, int nbytes)
{
    wifi_bak_window(addr);
    return(wifi_reg_write(SD_FUNC_BAK, addr, val, nbytes));
}

We can now load the binary ARM firmware into the WiFi processor; it is in a file that is unique to the specific Wifi chip, so different files are needed for the CYW43439 and CYW4343w; these are the only differences in the way the chips are programmed.

const unsigned char fw_firmware_data[] = {
0x00,0x00,0x00,0x00,0x65,0x14,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,0x91,0x13,0x00,0x00,
..and so on..
}
const unsigned int fw_firmware_len = sizeof(fw_firmware_data);

wifi_data_load(SD_FUNC_BAK, 0, fw_firmware_data, fw_firmware_len);

As previously mentioned, the backplane can only access a 32K block, and each SPI access to the backplane is limited to 64 bytes, so the data loading function walks though the memory in 64-byte blocks, and when 32K is reached, the access window is moved up, and the loading resumes at address 0.

#define MAX_BLOCKLEN    64

// Load data block into WiFi chip (CPU firmware or NVRAM file)
int wifi_data_load(int func, uint32_t dest, const unsigned char *data, int len)
{
    int nbytes=0, n;
    uint32_t oset=0;
    
    wifi_bak_window(dest);
    dest &= SB_ADDR_MASK;
    while (nbytes < len)
    {
        if (oset >= SB_32BIT_WIN)
        {
            wifi_bak_window(dest+nbytes);
            oset -= SB_32BIT_WIN;
        }
        n = MIN(MAX_BLOCKLEN, len-nbytes);
        wifi_data_write(func, dest+oset, (uint8_t *)&data[nbytes], n);
        nbytes += n;
        oset += n;
    }
    return(nbytes);
}

After a delay to allow the WiFi chip to settle, the next item to be loaded is the non-volatile RAM (NVRAM) data. This is in the form of a C character array, with each entry being null-terminated, e.g.

const unsigned char fw_nvram_data[0x300] = {
    "manfid=0x2d0"  "\x00"
    "prodid=0x0727" "\x00"
    "vendid=0x14e4" "\x00"
    ..and so on..
}
const unsigned int fw_nvram_len = sizeof(fw_nvram_data);

Loading this file uses the same function as the firmware, with a different base, and a write-cycle to confirm the file length:

#define NVRAM_BASE_ADDR     0x7FCFC

wifi_data_load(SD_FUNC_BAK, NVRAM_BASE_ADDR, fw_nvram_data, fw_nvram_len);
n = ((~(fw_nvram_len / 4) & 0xffff) << 16) | (fw_nvram_len / 4);
wifi_reg_write(SD_FUNC_BAK, SB_32BIT_WIN | (SB_32BIT_WIN-4), n, 4);

Now it is necessary to reset the WiFi processor core, wait for the indication that the High Throughput (HT) clock is available, then wait for an ‘event’ that signals the device is ready. The most common fault I experienced when developing the code was that it gets stuck at this point, waiting for a confirmation that never comes.

// Reset, and wait for High Throughput (HT) clock ready
wifi_core_reset(false);
if (!wifi_reg_val_wait(50, SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, 
                              SD_HT_AVAIL, SD_HT_AVAIL, 1))
    return(false);
// Wait for backplane ready
if (!wifi_rx_event_wait(100, SPI_STATUS_F2_RX_READY))
    return(false);

Events are the main way that the WiFi chip sends signals or data asynchronously to the RP2040; for a detailed description of how they work, see the next part.

Once the system has signalled it is ready, the Country Locale Matrix (CLM) file has to be loaded. This binary file limits the WiFi parameters (e.g. transmit power level) to be within the regulatory constraints for the specific RF hardware and locale.

const unsigned char fw_clm_data[] = {
	0x42,0x4C,0x4F,0x42,0x3C,0x00,0x00,0x00,0xFA,0x69,0xE0,0xBB,
	0x01,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
    ..and so on..
};
const unsigned int fw_clm_len = sizeof(fw_clm_data);

wifi_clm_load(fw_clm_data, fw_clm_len);

The loading function uses a specific structure to send IOCTL data blocks to the WiFi chip, with flags to mark the beginning & end of the sequence:

#define MAX_LOAD_LEN        512

typedef struct {
	uint16_t flag;
	uint16_t type;
	uint32_t len;
	uint32_t crc;
} CLM_LOAD_HDR;

typedef struct {
    char req[8];
    CLM_LOAD_HDR hdr;
} CLM_LOAD_REQ;

// Load CLM
int wifi_clm_load(const unsigned char *data, int len)
{
    int nbytes=0, oset=0, n;
    CLM_LOAD_REQ clr = {.req="clmload", .hdr={.type=2, .crc=0}};
    
    while (nbytes < len)
    {
        n = MIN(MAX_LOAD_LEN, len-nbytes);
        clr.hdr.flag = 1<<12 | (nbytes?0:2) | (nbytes+n>=len?4:0);
        clr.hdr.len = n;
        ioctl_set_data2((void *)&clr, sizeof(clr), 1000, (void *)&data[oset], n);
        nbytes += n;
        oset += n;
    }
    return(nbytes);
}

Example program

To show all this code in action, we can run our first complete program;

// PicoWi blinking LED test

#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
#include "picowi_pico.h"
#include "picowi_spi.h"
#include "picowi_init.h"

int main() 
{
    uint32_t led_ticks;
    bool ledon=false;
    
    io_init();
    printf("PicoWi LED blink\n");
    set_display_mode(DISP_INFO);
    if (!wifi_setup())
        printf("Error: SPI communication\n");
    else if (!wifi_init())
        printf("Error: can't initialise WiFi\n");
    else
    {
        ustimeout(&led_ticks, 0);
        while (1)
        {
            if (ustimeout(&led_ticks, 500000))
                wifi_set_led(ledon = !ledon);
        }
    }
}

This sets up the WiFi chip as described above, prints the 6-byte MAC address, then just loops, flashing the LED that is attached to the Wifi chip at 1 Hz.

The display_mode function controls how much diagnostic information you might want to see, using a bitfield so you can combine multiple options:

// Display mask values
#define DISP_NOTHING    0       // No display
#define DISP_INFO       0x01    // General information
#define DISP_SPI        0x02    // SPI transfers
#define DISP_REG        0x04    // Register read/write
#define DISP_SDPCM      0x08    // SDPCM transfers
#define DISP_IOCTL      0x10    // IOCTL read/write
#define DISP_EVENT      0x20    // Event reception
#define DISP_DATA       0x40    // Data transfers

This can potentially provide a lot of diagnostic information, the main limitation being the speed of the console display – I use a serial link at 460800 baud, since the default of 9600 is much too slow. To see some of the internal workings of PicoWi, try:

set_display_mode(DISP_INFO|DISP_REG|DISP_IOCTL);

You can call this function multiple times with different mode values, to concentrate the diagnostic information on a specific area of interest, and avoid displaying a lot of unwanted information while the WiFi chip is being initialised.

The other unusual feature is the use of the ustimeout function, which I’ve used it in place of the more conventional delay function call, as I don’t want the delay to block all other CPU activity. In a simple program this isn’t an issue, but in later examples I want to do other things (such as checking for events) while waiting for the LED to blink, so can’t use a simple delay.

The ustimeout function takes two arguments; a pointer to a variable, and a timeout value in microseconds (zero if immediate). When the specified time has elapsed, the function returns a non-zero value and reloads the variable with the current time. So you can add extra function calls to the main loop, without affecting the LED blinking.

The code to control the LED uses a single Device I/O Control (IOCTL) call with 2 arguments; the first is a bit-mask, and the second is the value:

#define SD_LED_GPIO     0

// Set WiFi LED on or off
void wifi_set_led(bool on)
{
    ioctl_set_intx2("gpioout", 10, 1<<SD_LED_GPIO, on ? 1<<SD_LED_GPIO : 0);
}

For details on how to build & run this example program, see the introduction.

IOCTL calls are the primary mechanism for high-level communication with the WiFi chip; see the next part for a detailed description.

Links to the parts of this project:
Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 1: low-level interface

Pi Pico W wireless architecture

The WiFi interface on the Pico W uses the Broadcom/Cypress/Infineon CYW43439; this is a ‘full’ Media Access and Control (MAC) chip, so in theory you can just tell it to join a network, or send a block of data, and it’ll handle all the low-level operations.

However, in practice there is a lot more complication than that, and it takes a very large number of carefully-timed commands before the chip will start up, let alone do anything useful. This is because it actually contains two processors (ARM M3 and D11), each with their own memory and I/O, and they both have to be programmed before any network operations can start.

The CYW43439 is part of a large family of 43xxx wireless interfaces; most use PCI, USB or SDIO interface for communications with a host processor, but in this case communications is via a Serial Peripheral Interface (SPI) that is half-duplex, i.e. a single wire is used to carry commands & data to the WiFi chip, and also the responses from that chip, as described in the device datasheet.

SPI interface

Pi Pico-W interface to CYW43439

This excerpt from the Pico-W circuit diagram shows the interface between the CYW43439 WiFi chip and the RP2040 CPU. The connections on the WiFi chip are labelled as if there were an SDIO interface, since they are dual-function:

SDIO functionSPI functionRP2040 pin
WL_REG_ONPower up (ON)GP23
SDIO_CLKClock (CLK)GP29
SDIO_CMDData in (MOSI)GP24
SDIO_DATA0Data out (MISO)GP24 via 470R
SDIO_DATA1Interrupt (IRQ)GP24 via 10K
SDIO_DATA2Mode select (SEL)GP24
SDIO_DATA3Chip select (CS)GP25

On chips with dual interfaces, the state of DATA2 at power-up determines which interface is to be used; for SPI, this pin must be held low, before REG_ON is set high to power up the chip.

A single data line is shared between CMD for commands & data going to the WiFi chip, and DATA0 for the returned responses. Just in case there is a clash of I/O (e.g. both the CPU and WiFi chip transmitting at the same time) there is a 470 ohm protection resistor in series with DATA0.

The chip-select line is as usual for SPI interfaces; when it is high, the WiFi interface is disconnected from the data lines. This allows the over-worked data line to be used for a third purpose, namely interrupt request (IRQ) to the RP2040 CPU; when the interface is idle, IRQ is normally low, but goes high when the WiFi chip has some data to send (e.g. a new data packet has been received). To ensure that the IRQ line doesn’t interfere with communications, it is connected via a 10K resistor.

Debug with CYW4343W

When developing this software, there was a major problem; the Pico-W components and PCB tracks are so fine that I couldn’t attach an oscilloscope or logic analyser to the SPI connections. This makes debugging the low-level drivers very difficult, especially when programming the PIO peripheral in the RP2040.

The solution I adopted was to add a second CYW43xxx interface to the Pico; unfortunately I couldn’t find a convenient CYW43439 module, but Avnet sell an FPGA add-on board (part number AES-PMOD-MUR-1DX-G) with a Murata 1DX module containing a CYW4343W. This is sufficiently similar to the 43439 that no code modifications are required, just a different firmware file, as described in part 2 of this post.

Circuitry to add Murata 1DX (CYW4343W) module to Pi Pico

I’ve had to tweak the resistor values slightly; this is because D0 – D4 pins on the module have 10K pullup resistors, so R2 has to be lower to compensate.

The choice of GPIO pins is completely arbitrary, since the code can work with any pin taking any function; I chose D0 – D3 as GP16 – GP19, since it will allow me to experiment with an SDIO interface at a future date. If you are only interested in emulating the standard Pico-W interface, then the connections to GP16 – GP18 can be omitted, since the SPI select line (D2) has a pull-down resistor.

The resulting circuitry fits very neatly onto a Pico prototyping board; by keeping the connections short, it works fine with SPI speeds up to 16 MHz. The only problems I encountered in constructing the hardware were that the PMOD connector has an unusual pin-numbering, and is mis-labelled as Bluetooth.

Pi Pico with Murata 1DX (CYW4343W) add-on module

Using this setup, it is easy to capture the SPI waveforms; here is an oscilloscope trace using a (relatively leisurely) 2 MHz clock for clarity.

SPI transfer waveforms

This shows a 4-byte command on the MOSI line, followed by a 4-byte MISO response, which also appears on the MOSI line, due to the 470 ohm resistor linking the two.

SPI software

Normally we’d just use the RP2040 built-in SPI controller to access the WiFi chip, but that has specific sets of pins it can use, which differ from those that are connected to the chip. This isn’t a major problem, as we can use the built-in Programmble I/O (PIO) to do the transfers, but initially I’d just like to check that the hardware works, before diving into PIO programming. So the first step is to write a ‘bit-bashed’ driver, that uses direct access to the I/O bits.

The write-cycle is quite conventional, where a bit (most-significant bit first) is put onto the data line, and the clock is toggled high then low:

// Write data to SPI interface
void spi_write(uint8_t *data, int nbits)
{
    uint8_t b=0;
    int n=0;

    io_mode(SD_CMD_PIN, IO_OUT);
    usdelay(SD_CLK_DELAY);
    b = *data++;
    while (n < nbits)
    {
        IO_WR(SD_CMD_PIN, b & 0x80);
        IO_WR(SD_CLK_PIN, 1);
        b <<= 1;
        if ((++n & 7) == 0)
            b = *data++;
        IO_WR(SD_CLK_PIN, 0);
    }
    usdelay(SD_CLK_DELAY);
    io_mode(SD_CMD_PIN, IO_IN);
    usdelay(SD_CLK_DELAY);
}

If the command is a data-read, the data is read immediately, then the clock is toggled high and low. This is unusual, and can cause confusion in some protocol decoders:

// Read data from SPI interface
void spi_read(uint8_t *data, int nbits)
{
    uint8_t b;
    int n=0;

    data--;
    while (n < nbits)
    {
        b = IO_RD(SD_DIN_PIN);
        IO_WR(SD_CLK_PIN, 1);
        if ((n++ & 7) == 0)
            *++data = 0;        
        *data = (*data << 1) | b;
        IO_WR(SD_CLK_PIN, 0);
    }
 }

A write-cycle to the WiFi chip involves creating a command message as described in the data sheet, setting chip-select (CS) low, and transferring that message, followed by the data.

#define SWAP16_2(x) ((((x)&0xff000000)>>8) | (((x)&0xff0000)<<8) | \
                    (((x)&0xff00)>>8)      | (((x)&0xff)<<8))
#define SD_FUNC_BUS         0
#define SD_FUNC_BAK         1
#define SD_FUNC_RAD         2
#define SD_FUNC_SWAP        4
#define SD_FUNC_BUS_SWAP    (SD_FUNC_BUS | SD_FUNC_SWAP)
#define SD_FUNC_MASK        (SD_FUNC_SWAP - 1)

typedef struct
{
    uint32_t len:11, addr:17, func:2, incr:1, wr:1;
} SPI_MSG_HDR;

// SPI message
typedef union
{
    SPI_MSG_HDR hdr;
    uint32_t vals[2];
    uint8_t bytes[2048];
} SPI_MSG;

// Write a data block using SPI
int wifi_data_write(int func, int addr, uint8_t *dp, int nbytes)
{
    SPI_MSG msg={.hdr = {.wr=1, .incr=1, .func=func&SD_FUNC_MASK,
                         .addr=addr, .len=nbytes}};
    if (func & SD_FUNC_SWAP)
        msg.vals[0] = SWAP16_2(msg.vals[0]);
    io_out(SD_CS_PIN, 0);
    usdelay(SD_CLK_DELAY);
    if (nbytes <= 4)
    {
        memcpy(&msg.bytes[4], dp, nbytes);
        spi_write((uint8_t *)&msg, 64);
    }
    else
    {
        spi_write((uint8_t *)&msg, 32);
        usdelay(SD_CLK_DELAY);
        spi_write(dp, nbytes*8);
    }
    usdelay(SD_CLK_DELAY);
    io_out(SD_CS_PIN, 1);
    usdelay(SD_CLK_DELAY);
    return(nbytes);
}

The command header contains:

  • Length: byte-count of data
  • Address: location to receive the data
  • Function number: destination for the transfer
  • Increment: flag to enable address auto-increment
  • Write: flag to indicate a write-cycle

The unusual item is the function number, that selects which peripheral within the WiFi chip will receive the data; a value of 0 selects the SPI interface, 1 the backplane, and 2 the radio. Functions 1 & 2 are limited to a maximum size of 64 bytes, and are generally used for device configuration, whilst function 3 is used for transferring network data, which can be up to 2048 bytes. I’ve also created a dummy function 4, which is used to signal that a word-swap is required, when the chip is uninitialised.

The read cycle has a similar structure:

// Read data block using SPI
int wifi_data_read(int func, int addr, uint8_t *dp, int nbytes)
{
    SPI_MSG msg={.hdr = {.wr=0, .incr=1, .func=func&SD_FUNC_MASK,
                         .addr=addr, .len=nbytes}};
    uint8_t data[4];

    if (func & SD_FUNC_SWAP)
        msg.vals[0] = SWAP16_2(msg.vals[0]);
    else if (func == SD_FUNC_BAK)
        msg.hdr.len += 4;
    io_out(SD_CS_PIN, 0);
    usdelay(SD_CLK_DELAY);
    spi_write((uint8_t *)&msg, 32);
    io_mode(SD_CMD_PIN, IO_IN);
    usdelay(SD_CLK_DELAY);
    if (func == SD_FUNC_BAK)
        spi_read(data, 32);
    usdelay(SD_CLK_DELAY);
    spi_read(dp, nbytes*8);
    usdelay(SD_CLK_DELAY);
    io_mode(SD_CMD_PIN, IO_OUT);
    io_out(SD_CS_PIN, 1);
    return(nbytes);
}

When making a ‘backplane’ read, the first 4 return bytes are discarded; they are padding to give the remote peripheral time to respond.

Now that we have the necessary read/write functions, we can perform a simple check to see if the WiFi chip is responding. The data sheet describes several ‘gSPI registers’ and the ‘test read-only’ register at address 0x14 has the defined constant 0xFEEDBEAD. The first attempt to read this register generally fails, but subsequent reads should return the desired value:

#define SPI_TEST_VALUE 0xfeedbead
bool ok=0;
for (int i=0; i<4 && !ok; i++)
{
    usdelay(2000);
    val = wifi_reg_read(SD_FUNC_BUS_SWAP, 0x14, 4);
    ok = (val == SPI_TEST_VALUE);
}
if (!ok)
    printf("Error: SPI test pattern %08lX\n", val);

Next we need to configure the SPI interface to our preferences, using register 0, as described in the datasheet. The main change is to eliminate the awkward byte-swapping, but to do that, we need to send a byte-swapped command:

// Write a register using SPI
int spi_reg_write(int func, uint32_t addr, uint32_t val, int nbytes)
{
    if (func&SD_FUNC_SWAP && nbytes>1)
        val = SWAP16_2(val);
    return(wifi_data_write(func, addr, (uint8_t *)&val, nbytes));
}

wifi_reg_write(SD_FUNC_BUS_SWAP, SPI_BUS_CONTROL_REG, 0x204b3, 4);

Now we can re-read the test register without the awkward byte-swapping:

wifi_reg_read(SD_FUNC_BUS, 0x14, 4);

Another parameter we’ve set is ‘high-speed mode’, which means that reading & writing occur on the rising clock edge.

Using RP2040 PIO

To maximise the speed of SPI transfers, we need to use a peripheral within the RP2040 CPU. Normally this would be an SPI controller, but this can not control the pins that are connected to the WiFi chip, so we have to use the Programmable I/O (PIO) peripheral instead.

There are plenty of online tutorials explaining how PIO works; it is basically a small state-machine that is programmed in assembly-language. It operates in a highly deterministic fashion, at a rate of up to 125M instructions per second, so is ideally suited to handling the SPI interface.

I wanted to use the PIO as a direct replacement for the bit-bashed spi_read and spi_write functions described above, so the PIO program is:

; Pico PIO program for half-duplex SPI transfers
.program picowi_pio
.side_set 1
.wrap_target
.origin 0
public stall:               ; Stall here when transfer complete
    pull            side 0  ; Get byte to transmit from FIFO
loop1:
    nop             side 0  ; Idle with clock low
    in pins, 1      side 0  ; Fetch next Rx bit
    out pins, 1     side 0  ; Set next Tx bit 
    nop             side 1  ; Idle high
    jmp !osre loop1 side 1  ; Loop if data in shift reg
    push            side 0  ; Save Rx byte in FIFO
.wrap
; EOF

The origin statement ensures the program is loaded at address zero, rather than the default, which is to load it at the top of program memory.

The ‘pull’ instruction fetches an 8-bit value from the transmit first-in first-out (FIFO) buffer, then there is a loop to output & input the individual bits of that byte until the transmit shift register is empty, and the receive register is full, so the latter can be pushed onto the receive FIFO.

So for SPI write, the associated C code just needs to keep the 4-entry transmit FIFO topped up with the outgoing data, and discard the incoming data, so the receive FIFO doesn’t overflow.

static PIO my_pio = pio0;
uint my_sm = pio_claim_unused_sm(my_pio, true);
io_rw_8 *my_txfifo = (io_rw_8 *)&my_pio->txf[0];

// Write data block to SPI interface,
// When complete, set data pin as I/P (so it is available for IRQ)
void pio_spi_write(unsigned char *data, int len)
{
    config_output(1);
    pio_sm_clear_fifos(my_pio, my_sm);
    while (len)
    {
        if (!pio_sm_is_tx_fifo_full(my_pio, my_sm))
        {
            *my_txfifo = *data++;
            len --;
        }
        if (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
            pio_sm_get(my_pio, my_sm);
    }
    while (!pio_sm_is_tx_fifo_empty(my_pio, my_sm) || !pio_complete())
    {
        while (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
            pio_sm_get(my_pio, my_sm);
    }
    pio_sm_get(my_pio, my_sm);
    config_output(0);
}

The SPI read code fills the transmit FIFO with null bytes, and fetches the incoming data from the receive FIFO:

// Read data block from SPI interface
void pio_spi_read(unsigned char *data, int rxlen)
{
    int txlen=rxlen;
    pio_sm_clear_fifos(my_pio, my_sm);
    while (rxlen > 0 || !pio_complete())
    {
        if (txlen>0 && !pio_sm_is_tx_fifo_full(my_pio, my_sm))
        {
            *my_txfifo = 0;
            txlen--;
        }
        if (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
        {
            *data++ = pio_sm_get(my_pio, my_sm);
            rxlen--;
        }
    }
}

Since the reading of a WiFi register involves an SPI write cycle closely followed by a read cycle, it is important that the write cycle is complete before the read cycle starts. This issue proved to be the biggest problem with the PIO code; it is easy to detect when the transmit FIFO is empty, but the code must carry on waiting until the last bit of the last byte has been shifted out. This means that I have to use an explicitly-coded loop in the assembly language, with a check of shift-register-empty, rather than using the auto-load capability of the input & output instructions.

The other tricky issue was how the assembly-language program should signal to the C program that the output-shift is complete. In theory, I can use an IRQ flag to do this signalling, but in practice I could not make that work reliably – the technique would only work at specific clock frequencies, which suggested that there might be a critical race between the two sets of code. The problem with timing-sensitive code is that a small unrelated change to the main program (e.g. addition of an interrupt) can cause the code to fail in a manner that is very difficult to diagnose, so it is essential that the code works reliably over a wide range of SPI frequencies.

The solution I adopted is encapsulated in the pio_complete function:

// Check to see if PIO transfer complete (stalled at FIFO pull)
static inline int pio_complete(void)
{
    return(my_pio->sm[my_sm].addr == picowi_pio_offset_stall);
}

This compares the current PIO execution address with the ‘stall’ label in the PIO code; if the transmit FIFO is empty, and this comparison is true, then the PIO is stalled waiting for more data, having shifted out everything it was given.

In a single-threaded program, it won’t be too difficult to keep the transmit FIFO topped up, and the receive FIFO emptied, so there is no risk of an overflow or underflow causing problems. However, this is more difficult when the program is multi-tasking, so it will be necessary to add Direct Memory Access (DMA) transfers to the current code.

In the next part we’ll initialise the WiFi chip.

Links to the parts of this project:
Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi: standalone WiFi driver for the Pi Pico W

Introduction

Raspberry Pi Pico W

The aim of this project is to provide a fast WiFi driver for the CYW43439 chip on the Pi Pico W module, with C code running on the RP2040 processor; it can also be used with similar Broadcom / Cypress / Infineon chips that have an SPI interface, such as the CYW4343W.

It is based on my Zerowi project which performs a similar function on the Pi Zero device CYW43438, which has an SDIO interface. However, due to myriad difficulties getting the code running, the code has been restructured and simplified to emphasise the various stages in setting up the chip, and to provide copious run-time diagnostics.

The structured approach of the WiFi drivers is mirrored in the example programs, and the individual parts of this blog; they range from a simple LED-flash program, to one that provides some TCP/IP functionality.

A major problem with debugging Pico-W code is the difficulty attaching any hardware diagnostic tools, such as an oscilloscope or logic analyser; this has been addressed by supporting an add-on board with a Murata 1DX module and CYW4343W chip; full details are given in part 1 of this project.

Development environment

For simplicity, I use a Raspberry Pi 4 to build the code and program the Pico, with two I/O lines connected to the Pico SWD interface. This is really easy to set up, using a single script that installs the SDK and all the necessary software tools on the Pi 4:

wget https://raw.githubusercontent.com/raspberrypi/pico-setup/master/pico_setup.sh
chmod +x pico_setup.sh
./pico_setup.sh

For SWD programming, the Pico must be connected to the I/O pins on the Pi as follows:

Pico SWCLK   Pi pin 22 (GPIO 25)
Pico GND     Pi pin 20
Pico SWDIO   Pi pin 18 (GPIO 24)

The serial interface is used extensively for displaying diagnostic information; pin 1 of the Pico is the serial output, and pin 3 is ground. I use a 3.3 volt FTDI USB-serial adaptor to display the serial output on a PC, but the Pi 4 serial input could be used instead.

The PicoWi source code is on github:

cd ~
git clone http://github.com/jbentham/picowi
cd ~/picowi/build
chmod +x prog
chmod +x reset

I’ve included the necessary CMakeLists.txt, so the project can be built using the following commands:

cd ~/picowi/build
cmake ..      # create the makefiles
make picowi   # make the picowi library
make blink    # make the 'blink' application
./prog blink  # program the RP2040, and run the application

When building the current pico-SDK, there are some ‘parameter passing’ warnings when the pio assembler is compiled; these can be ignored.

Compilation is reasonably fast on the Pi 4; once the SDK libraries have been built, you can do a complete re-build of the PicoWi library and application within 10 seconds, and reprogram the RP2040 in under 3 seconds.

The ‘reset’ command is useful when you just want to restart the Pico, without loading any new code.

If OpenOCD reports ‘read incorrect DLIPDR’ then there is a problem with the wiring. I’ve set the SWD speed to 2 MHz, which should work error-free, providing the wires are sufficiently short (e.g. under 6 inches or 150 mm) and there are good power & ground connections between the Pi & Pico. I use a short USB cable to power the Pico from the Pi, and this is generally problem-free, though sometimes the Pi won’t boot with the Pico connected; this appears to be a USB communication problem.

Compile-time settings

There is currently only one setting in the CMakeLists.txt file, to choose between the on-board CYW43439 device, or an external CYW4343W module:

# Set to 0 for Pico-W CYW43439, 1 for Murata 1DX (CYW4343W)
set (CHIP_4343W 0)

The Pico-specific settings are in picowi_pico.h:

#define USE_GPIO_REGS   0       // Set non-zero for direct register access
                                // (boosts SPI from 2 to 5.4 MHz read, 7.8 MHz write)
#define SD_CLK_DELAY    0       // Clock on/off delay time in usec
#define USE_PIO         1       // Set non-zero to use Pico PIO for SPI
#define PIO_SPI_FREQ    8000000 // SPI frequency if using PIO

These affect the way the SPI interface is driven; the default is to use the Pico PIO (programmable I/O) with the given clock frequency; 8 MHz is a conservative value, I have run it at 12MHz, and higher speeds should be possible with some tweaking of the I/O settings.

Setting USE_PIO to zero will enable a ‘bit-bashed’ (or ‘bit-banged’) driver; this can run over 7 MHz if using direct register writes, or 2 MHz if using normal function calls.

You’ll note that I haven’t included a driver for the SPI peripheral inside the RP2040; this would have been easier to use than the PIO peripheral, but the on-board CYW43439 chip isn’t connected to suitable I/O pins. The actual pins used are defined in picowi_pico.h:

#define SD_ON_PIN       23
#define SD_CMD_PIN      24
#define SD_DIN_PIN      24
#define SD_D0_PIN       24
#define SD_CS_PIN       25
#define SD_CLK_PIN      29
#define SD_IRQ_PIN      24

You’ll see that pin 24 is performing multiple functions; this hardware configuration is discussed in detail in the next part of this blog. If you are using an external module, the pin definitions can be modified to use any of the RP2040 I/O pins.

Diagnostic settings

My code makes extensive use of a serial console for diagnostic purposes, and I generally use an FTDI USB-serial adaptor connected to pin 1 of the Pico module to monitor this. In view of the large amount of information that can be produced, I modify the serial interface to run at 460800 baud:

Edit pico-sdk/src/rp2_common/hardware_uart/include/hardware/uart.h
Change definition to:
  #define PICO_DEFAULT_UART_BAUD_RATE 460800

I originally tried running the interface at 921600 baud, but this resulted in occasional characters being lost, which is a major problem when trying to understand what is going wrong with the code.

You can use the Pico USB link instead; it must be enabled in the CMakeLists.txt, using the name of the main file, for example to enable it for the ‘ping’ example program:

pico_enable_stdio_usb(ping 1)

Then you can use a terminal program such as minicom on the Pi 4 to view the console:

# Run minicom, press ctrl-A X to exit. 
minicom -D /dev/ttyACM0 -b 460800

The only disadvantage of this approach is that when the Pico is reprogrammed, its CPU is reset, which causes a failure of the USB link. After a few seconds, the link is re-established, but there will be a gap in the console display, which can be misleading.

You can control the extent to which diagnostic data is reported on the console; this is done by inserting function calls, rather than using compile-time definitions, to give fine-grained control. The display options are in a bitfield, so can be individually enabled or disabled, for example:

// Display SPI traffic details
set_display_mode(DISP_INFO|DISP_EVENT|DISP_SDPCM|
                 DISP_REG|DISP_JOIN|DISP_DATA);

// Display nothing
set_display_mode(DISP_NOTHING);

// Display ARP and ICMP data transfers
set_display_mode(DISP_ARP|DISP_IP|DISP_ICMP);

WiFi network

For the time being, the code does not support the Access Point functionality within the WiFi chip. It can only join a network that is unencrypted, or with WPA1 or WPA2 encryption, as set in the file picowi_join.h:

// Security settings: 0 for none, 1 for WPA_TKIP, 2 for WPA2
#define SECURITY            2

The network name (SSID) and password are defined in the ‘main’ file for each application, e.g. join.c or ping.c, which means they are insecure, as they can be seen by anyone with access to the source code or binary executable:

// Insecure password for test purposes only!!!
#define SSID                "testnet"
#define PASSWD              "testpass"

Other resources

The data sheets for the CYW43439 and CYW4343W are well worth a read, as they contain a good description of the low-level SPI interface, but contain nothing on the inner workings of these incredibly complicated chips. The Infineon WICED development environment has very comprehensive coverage of the WiFi chips, though it would take some work to port this code to the RP2040. The Pi Pico SDK contains the full source code to drive the CYW43439, with the lwIP (lightweight IP) open-source TCP/IP stack.

I’m using a different approach, with a completely new low-level driver, and a built-in IP stack to maximise throughput, as described in the following parts:

Introduction
Part 1: low-level interface
Part 2: initialisation
Part 3: IOCTLs and events
Part 4: scan and join a network
Part 5: ARP, IP and ICMP
Source code

I’ll be releasing updates with more TCP/IP functionality.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA part 3: browser display and Python API for remote logic analyser

This is the third part of a 3-part blog post describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. Part 1 described the hardware, part 2 the unit firmware, now this post describes the Web interface that controls the logic analyser units, and displays the captured data, also a Python class that can be used to remote-control the units for data analysis.

In a previous post, I experimented with shader hardware (via WebGL) for quickly displaying the logic analyser traces in a Web page. Whilst this technique can provide really fast display updates, there were some browser compatibility problems, and also a pure-javascript version proved to be fast enough, given that the main constraint is the time taken to transfer the data over the network.

So the current solution just used HTML and Javascript, with no hardware acceleration.

Network topology

REMLA network topology

In part 2, I described how the analyser units return data in response to Web page requests; the status information is in the form of a JSON string, and the sample data is Base64 encoded. So each unit has a built-in Web server, and it is tempting to load the HTML display files onto them. However, I chose not to do that, for the following reasons:

  • The analyser units use microcontrollers with finite resources, and not much spare storage space.
  • Every time the display software is updated, it would have to be loaded onto all the units individually.
  • It is easier to keep a single central server up-to-date with all the necessary security & access control measures.

So I’m assuming that there is a Web server somewhere on the system that serves the display file, and any necessary library files. This is a bit inconvenient for development, so when debugging I run a Web server on my development PC, for example using Python 3:

python -m http.server 8000

This launches a server on port 8000; if the display file is in a subdirectory ‘test’, its URL would look like:

http://127.0.0.1:8000/test/remla.html

There is also a question how the display program knows the addresses of the units, so it can access the right one. I had intended to use Multicast DNS (MDNS) for this purpose, but it proved to be a bit unreliable, so I assigned static IP addresses to the units instead.

Data display

The waveforms are drawn as vectors (as opposed to bitmaps), so the display can be re-sized to suit any size of screen. There are two basic drawing methods that can be used: an HTML canvas, or SVG (Scalable Vector Graphics). After some experimentation, I adopted the former, as it seemed to be a more flexible solution; the canvas is just an area of the screen that responds to simple line- and text-drawing commands, for example to draw & label the display grid:

var ctx1 = document.getElementById("canvas1").getContext("2d");
drawGrid(ctx1);

// Draw grid in display area
function drawGrid(ctx) {
  var w=ctx.canvas.clientWidth, h=ctx.canvas.clientHeight;
  var dw = w/xdivisions, dh=h/ydivisions;
  ctx.fillStyle = grid_bg;
  ctx.fillRect(0, 0, w, h);
  ctx.lineWidth = 1;
  ctx.strokeStyle = grid_fg;
  ctx.strokeRect(0, 1, w-1, h-1);
  ctx.beginPath();
  for (var n=0; n<xdivisions; n++) {
    var x = n*dw;
    ctx.moveTo(x, 0);
    ctx.lineTo(x, h);
    ctx.fillStyle = 'blue';
    if (n)
        drawXLabel(ctx, x, h-5);
    }
    for (var n=0; n<ydivisions; n++) {
      var y = n*dh;
      ctx.moveTo(0, y);
      ctx.lineTo(w, y);
    }
    ctx.stroke();
  }

Drawing the logic traces uses a similar method; begin a path, add line drawing commands to it, then invoke the stroke method.

Controls

The various control buttons and list boxes need to be part of a form, to simplify the process of sending their values to the analyser unit. So they are implemented as pure HTML:

  <form id="captureForm">
    <fieldset><legend>Unit</legend>
      <select name="unit" id="unit" onchange="unitChange()">
        <option value=1>1</option><option value=2>2</option><option value=3>3</option>
        <option value=4>4</option><option value=5>5</option><option value=6>6</option>
      </select>
    </fieldset>
    <fieldset><legend>Capture</legend>
      <button id="load" onclick="doLoad()">Load</button>
      <button id="single" onclick="doSingle()">Single</button>
      <button id="multi" onclick="doMulti()">Multi</button>
      <label for="simulate">Sim</label>
      <input type="checkbox" id="simulate" name="simulate">
    </fieldset>
..and so on..

To update the parameters on the unit, they are gathered from the form, and sent along with an optional command, e.g. cmd=1 to start a capture.

// Get form parameters
function formParams(cmd) {
  var formdata = new FormData(document.getElementById("captureForm"));
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0]+ '=' + entry[1]);
  }
  if (cmd != null)
    params.push("cmd=" + cmd);
  return params;
}

// Get status from unit, optionally send command
function get_status(cmd=null) {
  http_request = new XMLHttpRequest();
  http_request.addEventListener("load", status_handler);
  http_request.addEventListener("error", status_fail);
  http_request.addEventListener("timeout", status_fail);
  var params = formParams(cmd), statusfile=remote_ip()+'/'+statusname;
  http_request.open( "GET", statusfile + "?" + encodeURI(params.join("&")));
  http_request.timeout = 2000;
  http_request.send();
}

The result of this HTTP request is handled by callbacks, for example if the request fails, there is a retry mechanism:

// Handle failure to fetch status page
function status_fail(e) {
  var evt = e || event;
  evt.preventDefault();
  if (retry_count < RETRIES) {
    addStatus(retry_count ? "." : " RETRYING")
    get_status();
    retry_count++;
  }
  else {
    doStop();
    redraw(ctx1);
  }
}

This mechanism was found to be necessary since very occasionally the remote unit fails to respond, for no apparent reason; if there is a real reason (e.g. it has been powered down) then the transfer is halted after 3 attempts.

If the status information has been returned OK, then a suitable action is taken; if a capture has been triggered, and the status page indicates that the capture is complete, then the data is fetched:

// Decode status response
function status_handler(e) {
  var evt = e || event;
  var remote_status = JSON.parse(evt.target.responseText);
  var state = remote_status.state;
  if (state != last_state) {
    dispStatus(state_strs[state]);
    last_state = state;
  }
  addStatus(".");
  if (state==STATE_IDLE || state==STATE_PRELOAD || state==STATE_PRETRIG || state==STATE_POSTTRIG) {
    repeat_timer = setTimeout(get_status, 500);
  }
  else if (remote_status.state == STATE_READY) {
    loadData();
  }
  else {
    doStop();
  }
}

Fetching data

Fetching the data is similar to fetching the status page, since it is a text file containing base64-encoded bytes. The callback converts the text into bytes, then pairs of bytes into an array of numeric values:

// Read captured data (display is done by callback)
function loadData() {
  dispStatus("Reading from " + remote_ip());
  http_request = new XMLHttpRequest();
  http_request.addEventListener("progress", capfile_progress_handler);
  http_request.addEventListener( "load", capfile_load_handler);
  var params = formParams(), capfile=remote_ip()+'/'+capname;
  http_request.open( "GET", capfile + "?" + encodeURI(params.join("&")));
  http_request.send();
}

// Display data (from callback event)
function capfile_load_handler(event) {
  sampledata = getData(event.target.responseText);
  doZoomReset();
  if (command == CMD_MULTI)
    window.requestAnimationFrame(doStart);
  else
    doStop();
}

// Get data from HTTP response
function getData(resp) {
  var d = resp.replaceAll("\n", "");
  return strbin16(atob(d));
}

// Convert string of 16-bit values to binary array
function strbin16(s) {
  var vals = [];
  for (var n=0; n<s.length;) {
    var v = s.charCodeAt(n++);
    vals.push(v | s.charCodeAt(n++) << 8);
  }
  return vals;
}

It is probable that this process could be streamlined somewhat, but currently the main speed restriction is the transfer of data from the ESP to the PC over the wireless network, so improving the byte-decoder wouldn’t give a noticeable speed improvement.

Saving the data

There needs to be some way of saving the sample data for further analysis; as it happens, the initial users of the system were already using the open-source Sigrok Pulseview utility for capturing data from small USB pods, so it was decided to save the data in the Sigrok file format.

This a basically a zipfile, with 3 components:

  • Metadata, identifying the channels, sample rate, etc.
  • Version, giving the file format version (currently 2)
  • Logic file, containing the binary data

The metadata format is quite easy to replicate, e.g.

[global]
sigrok version=0.5.1

[device 1]
capturefile=logic-1
total probes=16
samplerate=5 MHz
total analog=0
probe1=D1
probe2=D2
probe3=D3
..and so on until..
probe16=D16
unitsize=2

The dummy labels D1, D2 etc. are normally replaced with meaningful descriptions of the signals, followed by the unitsize parameter which gives the byte-width of the data, and marks the end of the labels.

The JSZip library is used to zip the various components together in a single file with the ‘sr’ extension:

function write_srdata(fname) {
  var meta = encodeMeta(), zip = new JSZip();
  var samps = new Uint16Array(sampledata);
  zip.file("metadata", meta);
  zip.file("version", "2");
  zip.file("logic-1-1", samps.buffer);
  zip.generateAsync({type:"blob", compression:"DEFLATE"})
  .then(function(content) {
    writeFile(fname, "application/zip", content);
  });
}

// Encode Sigrok metadata
function encodeMeta() {
  var meta=[], rate=elem("xrate").value + " Hz";
  for (var key in sr_dict) {
    var val = key=="samplerate" ? rate : sr_dict[key];
    meta.push(val[0]=='[' ? ((meta.length ? "\n" : "") + val) : key+'='+val);
  }
  for (var n=0; n<nchans; n++) {
    meta.push("probe"+(n+1) + "=" + (probes.length?probes[n]:n+1));
  }
  meta.push("unitsize=2");
  return meta.join("\n");
}

Configuration

So far, the only way the units can be configured is by using the browser controls, to set the sample rate, number of samples, threshold etc. Whilst this might be acceptable for a portable system, a semi-permanent installation needs some way of storing the configuration, including the naming of input channels on the display. Since there is a central Web server for the display files, can’t this also be used to store configuration files? The answer is ‘yes’, but there is then a question how these files can be modified in a browser-friendly way.

This is a bit difficult, since there are numerous security protections for the files on a server, to make sure they can’t be modified by a Web client. However, there is an extension to the HTTP protocol known as WebDAV (Web Distributed Authoring and Versioning), which does provide a mechanism for writing to files. Basically you need a general-purpose Web server that can be configured to support Web DAV (such as lighttpd, see this page), or alternatively a special-purpose server, such as wsgidav (see this page).

Assuming you already have a working lighttpd server, the additional configuration file may look something like this, with some_path, dav_username and dav_password being customised for your installation:

File lighttpd/conf.d/30-webdav.conf:

server.modules += ( "mod_webdav" )
$HTTP["url"] =~ "^/dav($|/)" {
  webdav.activate = "enable"
  webdav.sqlite-db-name = "/some_path/webdav.db"
  server.document-root = "/www/"
  auth.backend = "plain"
  auth.backend.plain.userfile = "/some_path/webdav.shadow"
  auth.require = ("" => ("method" => "basic", "realm" => "webdav", "require" => "valid-user"))
}

File /some_path/webdav.shadow
  dav_username:dav_password
Create directory www/dav for files

Instead, you can use wsgidav to act as a Web and DAV server, run using the Windows command line:

wsgidav.exe --host 0.0.0.0 --port=8000 -c wsgidav.json

The JSON-format configuration file I’m using is:

{
    "host": "0.0.0.0",
    "port": 8080,
    "verbose": 3,
    "provider_mapping": {
        "/": "/projects/remla/test",
        "/test": "/projects/remla/test",
    },
    "http_authenticator": {
        "domain_controller": null,
        "accept_basic": true,
        "accept_digest": true,
        "default_to_digest": true,
        "trusted_auth_header": null
    },
    "simple_dc": {
        "user_mapping": {
            "*": {
                "dav_username": {
                    "password": "dav_password"
                }
            }
        }
    },
    "dir_browser": {
        "enable": true,
        "response_trailer": "",
        "davmount": true,
        "davmount_links": false,
        "ms_sharepoint_support": true,
        "htdocs_path": null
    }
}

Again, this will need to be customised for your environment, and you also need to be mindful that the configurations I’ve shown for lighttpd and wsgidav are quite insecure, for example the password isn’t encrypted, so it can easily be captured by anyone snooping on network traffic.

Configuration Web page

I created a simple Web page to handle the configuration, with list boxes for most options, and text boxes to allow the input channels to be named.

At the bottom of the page there are buttons to submit the new configuration to the server, and exit back to the waveform display page.

The key Javascript function to save the configuration on the server uses the ‘davclient’ library, and is quite simple, but it does need to know the host IP address and port number to receive the data. This code attempts to fetch that information using the DOM Location object:

// Save the config file
function saveConfig() {
  var fname = CONFIG_UNIT.replace('$', String(unitNum()));
  var ip = location.host.split(':')
  var host = ip[0], port = ip[1];
  port = !port ? 80 : parseInt(port);
  var davclient = new davlib.DavClient();
  davclient.initialize(host, port, 'http', DAVUSER, DAVPASS);
  davclient.PUT(fname, JSON.stringify(getFormData()), saveHandler)
 }

For simplicity, the DAV username and password are stored as plain text in the Javascript, which means that anyone viewing the page source can see what they are. This makes the server completely insecure, and must be improved.

Python interface

Although some data analysis can be done in Javascript, it is much more convenient to use Python and its numerical library numpy. I have written a Python class EdlaUnit that provides an API for remote control and data analysis, and a program edla_sweep that demonstrates this functionality.

It repeatedly captures a data block, whilst stepping up the threshold voltage. Then for each block, the number of transitions for each channel is counted and displayed.

import edla_utils as edla, base64, numpy as np

edla.verbose_mode(False)
unit = edla.EdlaUnit(1, "192.168.8")
unit.set_sample_rate(10000)
unit.set_sample_count(10000)

MIN_V, MAX_V, STEP_V = 0, 50, 5

def get_data():
    ok = False
    data = None
    status = unit.fetch_status()
    if status:
        ok = unit.do_capture()
    else:
        print("Can't fetch status from %s" % unit.status_url)
    if ok:
        data = unit.do_load()
    if data == None:
        print("Can't load data")
    return data

for v in range(MIN_V, MAX_V, STEP_V):
    unit.set_threshold(v)
    d = get_data()
    byts = base64.b64decode(d)
    samps = np.frombuffer(byts, dtype=np.uint16)
    diffs = np.diff(samps)
    edges = np.where(diffs != 0)[0]
    totals = np.zeros(16, dtype=int)
    for edge in edges:
        bits = samps[edge] ^ samps[edge+1]
        for n in range(0, 15):
            if bits & (1<<n):
                totals[n] += 1
    s = "%4u," % v
    s += ",".join([("%4u" % val) for val in totals])
    print(s)

The idea is to give a quick overview of the logic levels the analyser is seeing, to make sure they are within reasonable bounds. An example output is:

Volts Ch1  Ch2  Ch3  Ch4  Ch5  Ch6  Ch7  Ch8
0,      0,   0,   0,   0,   0,   0,   0,   0
5,    564, 384, 620, 454, 548, 550, 572, 552
10,   328, 286, 326, 288, 302, 318, 326, 314
15,   260, 246, 262, 244, 260, 254, 260, 250
20,   216, 192, 216, 198, 202, 202, 208, 206
25,    92,   0, 122,   0,  60,  30, 106,  44
30,     0,   0,   0,   0,   0,   0,   0,   0
35,     0,   0,   0,   0,   0,   0,   0,   0
40,     0,   0,   0,   0,   0,   0,   0,   0
45,     0,   0,   0,   0,   0,   0,   0,   0

The absolute count isn’t necessarily very important, since it will vary depending on the signal that is being monitored. What is interesting is the way it changes as the threshold voltage increases. If the number dramatically increases as the ‘1’ logic voltage is approached, one might suspect that there is a noise problem, causing spurious edges. Conversely, if the value declines rapidly before the ‘1’ voltage is reached, the logic level is probably too low.

There is a tendency to assume that all logic signals are a perfect ‘1’ or ‘0’, with nothing in between; this technique allows you to look beyond that, and check whether your signals really are that perfect – and of course you can use the power of Python and numpy to do other analytical tests, or protocol decoding, specific to the signals being monitored.

Part 1 of this project looked at the hardware, part 2 the ESP32 firmware. The source files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA part 2: firmware for remote logic analyser

Remote logic analyser system

This is the second part of a 3-part blog post describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. Part 1 described the hardware, this post now describes the firmware within the logic analyser unit.

Development environment

There are two main development environments for the ESP32 processor; ESP-IDF and Arduino-compatible. The former is much more comprehensive, but a lot of those features aren’t needed, so to save time, I have used the latter.

There are two ways of developing Arduino code; using the original Arduino IDE, or using Microsoft Visual Studio Code (VS Code) with a build system called PlatformIO. I originally tried to support both, but found the Arduino IDE too restrictive, so opted for VS Code and PlatformIO.

Installing this on Windows is remarkably easy, see these posts on PlatformIO installation or PlatformIO development

Then it is just necessary to open a directory containing the project files, and after a suitable pause while the necessary files are downloaded, the source files can be compiled, and the resulting binary downloaded onto the ESP32 module.

Visual Studio Code IDE

The code has two main areas: driving the custom hardware that captures the samples, and the network interface.

Hardware driver

As described in the previous post, the main hardware elements driven by the CPU are:

  • 16-bit data bus for the RAM chips and the comparator outputs
  • Clock & chip select for RAM chips
  • SPI interface for the DAC that sets the threshold

Data bus

The sample memory consists of four 23LC1024 serial RAM chips, each storing 1 Mbit in quad-SPI (4-bit) mode. They are arranged to form a 16-bit data bus; it would be really convenient if this could be assigned to 16 consecutive I/O bits on the CPU, but the ESP32 hardware does not permit this. The assignment is:

Data line 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
GPIO      4  5 12 13 14 15 16 17 18 19 21 22 23 25 26 27

There is an obvious requirement to handle the data bus as a single 16-bit value within the code, so it is necessary to provide functions that convert that 16-bit data into a 32-bit value to be fed to the I/O pins, and vice-versa, and it’d be helpful if this was done in an easy-to-understand manner, to simplify any changes when a new CPU is used that has a different pin assignment.

After having tried the usual mess of shift-and-mask operations, I hit upon the idea of creating a bitfield for each group of consecutive GPIO pins, and a matching bitfield for the same group in the 16-bit word; then it is only necessary to equate each field to its partner, to produce the required conversion.

// Data bus pin definitions
// z-variables are unused pins
typedef struct {
    uint32_t z1:4, d0_1:2, z2:6, d2_9:8, z3:1, d10_12:3, z4:1, d13_15:3;
} BUSPINS;
typedef union {
    uint32_t val;
    BUSPINS pins;
} BUSPINVAL;

// Matching elements in 16-bit word
typedef struct {
    uint32_t d0_1:2, d2_9:8, d10_12:3, d13_15:3;
} BUSWORD;
typedef union {
    uint16_t val;
    BUSWORD bits;
} BUSWORDVAL;

// Return 32-bit bus I/O value, given 16-bit word
inline uint32_t word_busval(uint16_t val) {
    BUSWORDVAL w = { .val = val };
    BUSPINVAL  p = { .pins = { 0, w.bits.d0_1,   0, w.bits.d2_9,
                               0, w.bits.d10_12, 0, w.bits.d13_15 } };
    return (p.val);
}

// Return 16-bit word, given 32-bit bus I/O value
inline uint16_t bus_wordval(uint32_t val) {
    BUSPINVAL  p = { .val = val };
    BUSWORDVAL w = { .bits = { p.pins.d0_1, p.pins.d2_9, 
                               p.pins.d10_12, p.pins.d13_15 } };
    return (w.val);
}

An additional complication is that the 16-bit value is going to 4 RAM chips, and each chip needs to receive the same command, and the bit-pattern of that command changes depending on whether the chip is in SPI or quad-SPI (QSPI, also known as SQI) mode. So the command to send a command to all 4 RAM chips in SPI mode is:

#define RAM_SPI_DOUT    1
#define MSK_SPI_DOUT    (1 << RAM_SPI_DIN)
#define ALL_RAM_WORD(b) ((b) | (b)<<4 | (b)<<8 | (b)<<12)
uint32_t spi_dout_pins = word_busval(ALL_RAM_WORD(MSK_SPI_DOUT));

// Send byte command to all RAMs using SPI
// Toggles SPI clock at around 7 MHz
void bus_send_spi_cmd(byte *cmd, int len) {
    GPIO.out_w1ts = spi_hold_pins;
    while (len--) {
        byte b = *cmd++;
        for (int n = 0; n < 8; n++) {
            if (b & 0x80) GPIO.out_w1ts = spi_dout_pins;
            else GPIO.out_w1tc = spi_dout_pins;
            SET_SCK;
            b <<= 1;
            CLR_SCK;
        }
    }
}

I have used a ‘bit-bashing’ technique (i.e. manually driving the I/O pins high or low) since I’m emulating 4 SPI transfers in parallel, and as you can see from the comment, the end-result is reasonably fast.

When the RAMS are in QSPI mode, instead of doing eight single-bit transfers, we must do two four-bit transfers:

// Send a single command to all RAMs using QSPI
void bus_send_qspi_cmd(byte *cmd, int len) {
    while (len--) {
        uint32_t b1=*cmd>>4, b2=*cmd&15;
        uint32_t val=word_busval(ALL_RAM_WORD(b1));
        gpio_out_bus(val);
        SET_SCK;
        val = word_busval(ALL_RAM_WORD(b2));
        CLR_SCK;
        gpio_out_bus(val);
        SET_SCK;
        cmd++;
        CLR_SCK;
    }
}

The above code assumes that the appropriate I/O pin-directions (input or output) have been set, but that too depends on which mode the RAMs are in; for SPI each RAM chip has 2 data inputs (DIN and HOLD) and 1 output (DOUT), whilst in QSPI mode all 4 RAM data pins are inputs or outputs depending on whether the RAM is being written to, or read from.

There are 4 commands that the software sends to the RAM chips, each is a single byte:

  • 0x38: enter quad-SPI (QSPI) mode
  • 0xff: leave QPSI mode, enter SPI mode
  • 0x02: write data
  • 0x03: read data

The read & write commands are followed by a 3-byte address value, that dictates the starting-point for the transfer. So if the RAMs are already in QSPI mode, the sequence for capturing samples is:

  • Set bus pins as outputs, so bus is controlled by CPU
  • Assert RAM chip select
  • Send command byte, with a value of 2 (write)
  • Send 3 address bytes (all zero when starting data capture)
  • Set bus pins as inputs, so bus is controlled by comparators
  • Start RAM clock
  • When capture is complete, stop RAM clock
  • Negate RAM chip select

The steps for recovering the captured data are:

  • Set bus pins as outputs, so bus is controlled by CPU
  • Assert RAM chip select
  • Send command byte, with a value of 3 (read)
  • Send 3 address bytes
  • Set bus pins as inputs, so bus is controlled by the RAM chips
  • Toggle clock line, and read data from the 16-bit bus
  • When readout is complete, negate RAM chip select

RAM clock and chip select

When the CPU is directly accessing the RAM chips (to send commands, or read back data samples) it is most convenient to ‘bit-bash’ the clock and I/O signals, as described above. It is possible that incoming interrupts can cause temporary pauses in the clock transitions, but this doesn’t matter: the RAM chips use ‘static’ memory, which won’t change its state even if there is a very long pause in a transfer cycle.

However, when capturing data, it is very important that the RAMs receive a steady clock at the required sample rate, with no interruptions. This is easily achieved on the ESP32 by using the LED PWM peripheral:

#define PIN_SCK         33
#define PWM_CHAN        0

// Initialise PWM output
void pwm_init(int pin, int freq) {
    ledcSetup(PWM_CHAN, freq, 1);
    ledcAttachPin(pin, PWM_CHAN);
}

// Start PWM output
void pwm_start(void) {
    ledcWrite(PWM_CHAN, 1);
}
// Stop PWM output
void pwm_stop(void) {
    ledcWrite(PWM_CHAN, 0);
}

In addition, the CPU must count the number of pulses that have been output, so that it knows which memory address is currently being written – there is no way to interrogate the RAM chip to establish its current address value. Surprisingly, the ESP32 doesn’t have a general-purpose 32-bit counter, so we have to use the 16-bit pulse-count peripheral instead, and detect overflows in order to produce a 32-bit value.

volatile uint16_t pcnt_hi_word;

// Handler for PCNT interrupt
void IRAM_ATTR pcnt_handler(void *x) {
    uint32_t intr_status = PCNT.int_st.val;
    if (intr_status) {
        pcnt_hi_word++;
        PCNT.int_clr.val = intr_status;
    }
}

// Initialise PWM pulse counter
void pcnt_init(int pin) {
    pcnt_intr_disable(PCNT_UNIT);
    pcnt_config_t pcfg = { pin, PCNT_PIN_NOT_USED, PCNT_MODE_KEEP, PCNT_MODE_KEEP,
        PCNT_COUNT_INC, PCNT_COUNT_DIS, 0, 0, PCNT_UNIT, PCNT_CHAN };
    pcnt_unit_config(&pcfg);
    pcnt_counter_pause(PCNT_UNIT);
    pcnt_event_enable(PCNT_UNIT, PCNT_EVT_THRES_0);
    pcnt_set_event_value(PCNT_UNIT, PCNT_EVT_THRES_0, 0);
    pcnt_isr_register(pcnt_handler, 0, 0, 0);
    pcnt_intr_enable(PCNT_UNIT);
    pcnt_counter_pause(PCNT_UNIT);
    pcnt_counter_clear(PCNT_UNIT);
    pcnt_counter_resume(PCNT_UNIT);
    pcnt_hi_word = 0;
}

// Return sample counter value (mem addr * 2), extended to 32 bits
uint32_t pcnt_val32(void) {
    uint16_t hi = pcnt_hi_word, lo = PCNT.cnt_unit[PCNT_UNIT].cnt_val;
    if (hi != pcnt_hi_word)
        lo = PCNT.cnt_unit[PCNT_UNIT].cnt_val;
    return(((uint32_t)hi<<16) | lo);
}

When writing this code, I came across some strange features of the PCNT interrupt, such as multiple interrupts for a single event, and misleading values when reading the count value inside the interrupt handler, so be careful when doing any modifications.

The pulse count does not equal the RAM address; is the RAM address multiplied by 2. This is because it takes two 4-bit write cycles to create one byte in RAM (bits 4-7, then 0-3), so the memory chip increments its RAM address once for every 2 samples.

All the RAMs share a single clock line and chip select; the select line is driven low at the start of a command, and must remain low for the duration of the command and data transfer; when it goes high, the transfer is terminated.

Setting threshold value

The comparators compare the incoming signal with a threshold value, to determine if the value is 1 or 0 (above or below threshold). The threshold is derived from a digital-to-analog converter (DAC), the part I’ve chosen is the Microchip MCP4921; it was necessary to use a part with an SPI interface, since there is only 1 spare output pin, which serves as the chip select for this device; the clock and data pins are shared with the RAM chips.

This means that the DAC control code can use the same drivers as the RAM chips by negating the RAM chip select, and asserting the DAC chip select:

#define PIN_DAC_CS      2
#define DAC_SELECT      GPIO.out_w1tc = 1<<PIN_DAC_CS
#define DAC_DESELECT    GPIO.out_w1ts = 1<<PIN_DAC_CS

// Output voltage from DAC; Vout = Vref * n / 4096
void dac_out(int mv) {
    uint16_t w = 0x7000 + ((mv * 4096) / 3300);
    byte cmd[2] = { (byte)(w >> 8), (byte)(w & 0xff) };
    RAM_DESELECT;
    DAC_SELECT;
    bus_send_spi_cmd(cmd, 2);
    DAC_DESELECT;
}

Triggering

Triggering is achieved by using the ESP32 pin-change interrupt, as this can capture quite a narrow pulses. There will be a delay before the interrupt is serviced, which means that we don’t get an accurate indication of which sample caused the trigger, but that isn’t a problem in practice.

int trigchan, trigflag;

// Handler for trigger interrupt
void IRAM_ATTR trig_handler(void) {
    if (!trigflag) {
        trigsamp = pcnt_val32();
        trigflag = 1;
    }
}

// Enable or disable the trigger interrupt for channels 1 to 16
void set_trig(bool en) {
    int chan=server_args[ARG_TRIGCHAN].val, mode=server_args[ARG_TRIGMODE].val;
    if (trigchan) {
        detachInterrupt(busbit_pin(trigchan-1));
        trigchan = 0;
    }
    if (en && chan && mode) {
        attachInterrupt(busbit_pin(chan-1), trig_handler, 
            mode==TRIG_FALLING ? FALLING : RISING);
        trigchan = chan;
    }
    trigflag = 0;
}

This interrupt handler sets a flag, that is actioned by the main state machine. There is a ‘trig_pos’ parameter that sets how many tenths of the data should be displayed prior to triggering; it is normally set to 1, which means that (approximately) 1 tenth will be displayed before the trigger, and 9 tenths after.

It is possible that there may be a considerable delay before the trigger event is encountered. In this case, the unit continues to capture samples, and the RAM address counter will wrap around every time it reaches the maximum value. This means that the pre-trigger data won’t necessarily begin at address zero; the firmware has to fetch the trigger RAM address, then jump backwards to find the start of the data.

State machine

This handles the whole capture process. There are 6 states:

  • Idle: no data, and not capturing data
  • Ready: data has been captured, ready to be uploaded
  • Preload: capturing data, before looking for trigger
  • PreTrig: capturing data, looking for trigger
  • PostTrig: capturing data after trigger
  • Upload: transferring data over the network

The Preload state is needed to ensure there is some data prior to the trigger. If triggering is disabled, then as soon as the capture is started, the software goes directly to the PostTrig state, checking the sample count to detect when it is greater than the requested number.

// Check progress of capture, return non-zero if complete
bool web_check_cap(void) {
    uint32_t nsamp = pcnt_val32(), xsamp = server_args[ARG_XSAMP].val;
    uint32_t presamp = (xsamp/10) * server_args[ARG_TRIGPOS].val;
    STATE_VALS state = (STATE_VALS)server_args[ARG_STATE].val;
    server_args[ARG_NSAMP].val = nsamp;
    if (state == STATE_PRELOAD) {
        if (nsamp > presamp)
            set_state(STATE_PRETRIG);
    }
    else if (state == STATE_PRETRIG) {
        if (trigflag) {
            startsamp = trigsamp - presamp;
            set_state(STATE_POSTTRIG);
        }
    }
    else if (state == STATE_POSTTRIG) {
        if (nsamp-startsamp > xsamp) {
            cap_end();
            set_state(STATE_READY);
            return(true);
        }
    }
    return (false);
}

Network interface

A detailed description of network operation will be found in part 3 of this project; for now, it is sufficient to say that the unit acts as a wireless client, connecting to a pre-defined WiFi access point; it has a simple Web server with all requests & responses using HTTP.

Wireless connection

The first step is to join a wireless network, using a predefined network name (‘SSID’) and password. The code must also try to re-establish the link to he Access Point if the connection fails, so there is a polling function that checks for connectivity.

// Begin WiFi connection
void net_start(void) {
    DEBUG.print("Connecting to ");
    DEBUG.println(ssid);
    WiFi.begin(ssid, password);
    WiFi.setSleep(false);
}

// Check network is connected
bool net_check(void) {
    static int lastat=0;
    int stat = WiFi.status();
    if (stat != lastat) {
        if (stat<=WL_DISCONNECTED) {
            DEBUG. printf("WiFi status: %s\r\n", wifi_states[stat]);
            lastat = stat;
        }
        if (stat == WL_DISCONNECTED)
            WiFi.reconnect();
    }
    return(stat == WL_CONNECTED);
}

Web server

The Web pages are very simple and only contain data; the HTML layout and Javascript code to display the data is fetched from a different server.

The server is initialised with callbacks for three pages:

#define STATUS_PAGENAME "/status.txt"
#define DATA_PAGENAME   "/data.txt"
#define HTTP_PORT       80

WebServer server(HTTP_PORT);

// Check if WiFi & Web server is ready
bool net_ready(void) {
    bool ok = (WiFi.status() == WL_CONNECTED);
    if (ok) {
        DEBUG.print("Connected, IP ");
        DEBUG.println(WiFi.localIP());
        server.enableCORS();
        server.on("/", web_root_page);
        server.on(STATUS_PAGENAME, web_status_page);
        server.on(DATA_PAGENAME, web_data_page);
        server.onNotFound(web_notfound);
        DEBUG.print("HTTP server on port ");
        DEBUG.println(HTTP_PORT);
        delay(100);
    }
    return (ok);
}

The root page returns a simple text string, and is mainly used to check that the Web server is functioning:

#define HEADER_NOCACHE  "Cache-Control", "no-cache, no-store, must-revalidate"

// Return root Web page
void web_root_page(void) {
    server.sendHeader(HEADER_NOCACHE);
    sprintf((char *)txbuff, "%s, attenuator %u:1", version, THRESH_SCALE);
    server.send(200, "text/plain", (char *)txbuff);
}

All the Web pages are sent with a header that disables browser caching; this is necessary to ensure that the most up-to-date data is displayed.

The status page returns a JSON (Javascript Object Notation) formatted string, containing the current settings; a typical response might be:

{"state":1,"nsamp":10010,"xsamp":10000,"xrate":100000,"thresh":10,"trig_chan":0,"trig_mode":0,"trig_pos":1}

This indicates that 10000 samples were requested at 100 KS/s, 10010 were actually collected, using a threshold of 10 volts. The ‘state’ value of 1 indicates that data collection is complete, and the data is ready to be uploaded.

The individual arguments are stored in an array of structures, which is converted into the JSON string:

typedef struct {
    char name[16];
    int val;
} SERVER_ARG;

SERVER_ARG server_args[] = {
    {"state",       STATE_IDLE},
    {"nsamp",       0},
    {"xsamp",       10000},
    {"xrate",       100000},
    {"thresh",      THRESH_DEFAULT},
    {"trig_chan",   0},
    {"trig_mode",   0},
    {"trig_pos",    1},
    {""}
};

// Return server status as json string
int web_json_status(char *buff, int maxlen) {
    SERVER_ARG *arg = server_args;
    int n=sprintf(buff, "{");
    while (arg->name[0] && n<maxlen-20) {
        n += sprintf(&buff[n], "%s\"%s\":%d", n>2?",":"", arg->name, arg->val);
        arg++;
    }
    return(n += sprintf(&buff[n], "}"));
}

The HTTP request for the status page can also include a query string with parameters that reflect the values the user has entered in a Web form. If a ‘cmd’ parameter is included, it is interpreted as a command; the following query includes ‘cmd=1’, which starts a new capture:

GET /status.txt?unit=1&thresh=10&xsamp=10000&xrate=100000&trig_mode=0&trig_chan=0&zoom=1&cmd=1

The software matches the parameters with those in the server_args array, and stores the values in that array; unmatched parameters (such as the zoom level) are ignored.

// Return status Web page
void web_status_page(void) {
    web_set_args();
    web_do_command();
    web_json_status((char *)txbuff, TXBUFF_LEN);
    server.sendHeader(HEADER_NOCACHE);
    server.setContentLength(CONTENT_LENGTH_UNKNOWN);
    server.send(200, "application/json");
    server.sendContent((char *)txbuff);
    server.sendContent("");
}

// Get command from incoming Web request
int web_get_cmd(void) {
    for (int i=0; i<server.args(); i++) {
        if (!strcmp(server.argName(i).c_str(), "cmd"))
            return(atoi(server.arg(i).c_str()));
    }
    return(0);
}

// Get arguments from incoming Web request
void web_set_args(void) {
    for (int i=0; i<server.args(); i++) {
        int val = atoi(server.arg(i).c_str());
        web_set_arg(server.argName(i).c_str(), val);
    }
}

Data transfer

The captured data is transferred using an HTTP GET request to the page data.txt. The binary data is encoded using the base64 method, which converts 3 bytes into 4 ASCII characters, so it can be sent as a text block. There is insufficient RAM in the ESP32 to store the sample data, so it is transferred on-the-fly from the RAM chips to a network buffer.

// Return data Web page
void web_data_page(void) {
    web_set_args();
    web_do_command();
    server.sendHeader(HEADER_NOCACHE);
    server.setContentLength(CONTENT_LENGTH_UNKNOWN);
    server.send(200, "text/plain");
    cap_read_start(startsamp);
    int count=0, nsamp=server_args[ARG_XSAMP].val;
    size_t outlen = 0;
    while (count < nsamp) {
        size_t n = min(nsamp - count, TXBUFF_NSAMP);
        cap_read_block(txbuff, n);
        byte *enc = base64_encode((byte *)txbuff, n * 2, &outlen);
        count += n;
        server.sendContent((char *)enc);
        free(enc);
    }
    server.sendContent("");
    cap_read_end();
}

The ‘unknown’ content length means that the software can send an arbitrary number of text blocks, without having to specify the total length in advance. The transfer is terminated by calling sendContent with a null string.

Diagnostics

There is a single red LED, but due to pin constraints, it is shared with the RAM chip select. So it will always illuminate when the RAM is being accessed, but in addition:

  • Rapid flashing (5 Hz) if the unit is not connected to the WiFi network
  • Brief flash (100 ms every 2 seconds) when the unit is connected to the network.
  • Solid on when the unit is capturing data, and is waiting for a trigger, or until the required amount of data has been collected.

There is also the ESP32 USB interface that emulates a serial console at 115 Kbaud:

#define DEBUG_BAUD  115200
#define DEBUG       Serial      // Debug on USB serial link

DEBUG.begin(DEBUG_BAUD);

// 'print' 'println' and 'printf' functions are supported, e.g.
DEBUG.print("Connecting to ");
DEBUG.println(ssid);

To view the console display, you can use your favourite terminal emulator (e.g. TeraTerm on Windows) connected to the USB serial port, however you will have to break that connection every time you re-program the ESP32, since it is needed for re-flashing the firmware. The VS Code IDE does have its own terminal emulator, which generally auto-disconnects for re-programming, but I have had occasional problems with this feature, for reasons that are a bit unclear.

Modifications

There are a few compile-time options that need to be set before compiling the source code:

  • SW_VERSION (in main.cpp): a string indicating the current software version number
  • ssid & password (in esp32_web.cpp): must be changed to match your wireless network
  • THRESH_SCALE (in esp32_la.h): the scaling factor for the threshold value, that is used to program the DAC.

The threshold scaling will depend on the values of the attenuator resistors. The unit was originally designed for input voltages up to 50V, with a possible overload to 250V, so the input attenuation was 101 (100K series resistor, 1K shunt resistor). If using the unit with, say, 5 volt logic, then the series resistor will need to be much lower (and maybe the shunt resistance a bit higher) so the threshold scaling value will need to be adjusted accordingly. Since the threshold value sent from the browser is an integer value (currently 0 – 50) you might choose the redefine that value when working with lower voltages, for example represent 0 – 7 volts as a value of 0 – 70, in tenths of a volt. This change will need to be made in the firmware, and both Web interfaces.

An important note, when creating a new unit. Since I’m using all the available I/O pins on the ESP32, I’ve had to use GPIO12, even though this does (by default) determine the Flash voltage at startup.

To use the pin for I/O, it is essential that this behaviour is changed by modifying the parameters in the ESP32 one-time-programmable memory. This is done using the Python espefuse program that is provided in the IDE. To summarise the current settings, navigate to the directory containing that file, and execute:

python espefuse.py --port COM4 summary

..assuming the USB serial link is on Windows COM port 4. Then to modify the setting, execute:

python espefuse.py --port COM4 set_flash_voltage 3.3V

You will be prompted to confirm that the change should be made, since it is irreversible. Then if you re-run the summary, the last line should be:

Flash voltage (VDD_SDIO) set to 3.3V by efuse.

Part 1 of this project looked at the hardware, part 3 the Web interface and Python API. The source files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA part 1: hardware for remote logic analyser

This is the first post in a series describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. For an overview of the project, see this post. The hardware specification is:

  • Digital inputs: 16 for each unit
  • Input threshold: programmable
  • Sample rate: up to 20 megasamples per second
  • Sample store: up to 250 kilosamples
  • Network interface: WiFi

Wireless networking is an essential component of this project, and at the time of writing, the most logical hardware choice is the Espressif ESP32, which is a microcontroller with up to 34 GPIO pins, an Xtensa dual-core 32-bit LX6 processor, and built-in WiFi.

Sample storage

There are ESP32 variants with differing amounts RAM & flash ROM, but currently the most common type is the ESP32-S2 with 320 KiB SRAM and 128 KiB ROM. A significant portion of this is taken up by the WiFi code, so there is insufficient RAM to store the required number of samples.

When using external memory for sample storage, the standard practice is to employ one or more RAM chips and an address counter that is fed from a constant clock that increments when a sample is stored.

Unfortunately, the 16 data lines plus 18 address lines make this arrangement quite bulky, even when implemented using surface-mount parts. One way of simplifying the circuit is to embed the logic within a programmable logic device, such as a Field-Programmable Gate Array (FPGA), but the programming & debugging of such a device can be quite complex.

Ideally, what we want is a RAM device that has a built-in address counter, that will auto-increment on each sample. Such devices do exist, they are known as ‘Serial SRAM’; they have a 1-bit, 2-bit or 4-bit clocked serial interface for sending commands and data. You may be familiar with 1-bit SPI (Serial Peripheral Interface), as it is used in a wide variety of devices, and the RAM chip is in this mode on startup. Less well-known are the 2-bit (SDI or DSPI) and 4-bit (SQI or QSPI) interfaces that use the same hardware lines unidirectionally; commands are sent to read or write data in these modes, and thereafter the RAM chip transfers the data with an auto-incrementing address counter that wraps around at the end of the RAM.

Each RAM chip handles 4 input channels, so 4 chips are needed for the 16-channel input.

The following steps are needed for data capture:

  • Send command to switch RAM from SPI to SQI mode
  • Send a ‘write’ command, with the desired starting address
  • Assert the chip select line, and start the clock signal.
  • When capture is complete, stop the clock signal
  • Negate chip select, which completes the ‘write’ command

Readback of the captured data is similar, except that a ‘read’ command is used.

Unfortunately there is no way to read back the address counter within the RAM chip, so to keep track of the sampling process, it is necessary to attach a pulse counter to the clock line; a counter/timer within the microcontroller can be used for this purpose.

Another issue is the fact that the RAM data lines serve two purposes; to receive data from the comparators, and commands from the CPU. Ideally the comparators would have an ‘enable’ pin to tri-state their outputs, but I couldn’t find a suitable device with this feature. Failing that, the conventional approach would be to use multiplexer chips to switch between the two data sources, but I’ve taken a much simpler approach, using resistors in the output of the comparators. When the CPU is in control, it just sets the data lines high or low as required, overriding the comparator outputs; when capturing data, the CPU sets its pins as inputs, so the comparators control the data going into the RAM, albeit with a small delay due to the 1K series resistance interacting with the circuit capacitance, but this hasn’t proved a problem in practice.

Triggering

An important feature of logic analysers is triggering; the ability to continuously capture data until a specific condition is met, carry on capturing for a specific number of samples after the trigger, then stop.

The logic to support this operation can be quite complex, and the addition of digital comparators (or their equivalent in programmable logic) would be a major complication. However, it is worth bearing in mind two things:

  • If there is a small time-delay between the trigger condition being detected, and the hardware reacting to the trigger, then it is no problem; if we are capturing tens of thousands of samples, a trigger delay of 10 or 20 samples is of no great concern.
  • The CPU is largely idle while data is being captured; it only has to respond to network requests, which are largely handled by the 2nd CPU in a dual-CPU device.

In common with most modern microcontrollers, the ESP32 has the ability to generate an interrupt on the state-change of any I/O pin, and this interrupt can be used for triggering, since it can capture very short pulses (under 50 nanoseconds). In theory it is possible to chain several edge-interrupts together, to give more complex triggering, but personally I’ve found a single edge-trigger to be sufficient for most purposes.

ESP32

The decision to use an ESP32 processor was largely driven by its built-in WiFi interface, and the ready availability of complete low-cost modules with a built-in antenna (or connector for an external antenna). The module used is the ESP32-S2-DevKitC with 38 pins, and an ESP32-WROOM-32D or -32E processor; take care not not to be confuse it with similar-looking modules.

This module has a few features that make it an excellent choice:

  • Fast dual-CPU architecture.
  • Easy-to-use C software development environment based on the VScode IDE, PlatformIO configuration, and the Arduino run-time environment.
  • A pin multiplexer that removes a lot of constraints as to which pins can be used for which internal functions
  • Simple PWM generator that can generate the required clock frequencies
  • Edge-detection interrupts on any I/O pin.

However, there are some less-than-ideal features:

  • Gaps in the I/O pin assignments, so it is impossible to assign 8 consecutive bits to form a single byte-wide input, or 16 consecutive bits to form a word-wide input.
  • Absence of a general-purpose 32-bit pulse counter; only 16 bits are available.
  • Usage of some I/O pins to specify boot-time settings.
  • Some GPIO pins are input-only.

These issues can be resolved in software, as will be described in the next blog post, but the final design does use all the input/output pins, with none spare, which forces some economies. For example, it is highly desirable to have a diagnostic LED controlled by the CPU, but there is no O/P pin to drive it, so it has been put on the RAM chip-select line, which slightly reduces the flexibility of the LED indications.

Analog inputs

Each analog input requires an attenuator to reduce the input voltage down to something manageable, and a comparator that compares the attenuated signal with a programmable reference voltage produced by a DAC (Digital-Analog Converter).

The attenuator is just a resistive potential divider; the resistors have been arranged in groups of 8, such that dual-in-line (DIL) plug-in resistor packs can be used in place of discrete resistors. This means that the board can handle very a wide range of input voltages by plugging in different resistor packs.

It proved quite difficult to find a comparator that is readily available, fast enough, and with a push-pull output (not open-drain). An early candidate was the MAX942, but this has back-to-back diodes between the inverting and non-inverting inputs, which would cause major problems if the voltage difference was sufficiently high to make them conduct. In the end, TS3022 devices in SO-8 packages were selected, and they perform really well; provision had been made for adding positive feedback to provide hysteresis (by adding resistors to the DIL-footprint through-holes), but in practice this has not been necessary.

Programming

The ESP32 module has a micro USB connector to provide power to the unit, and a programming interface. As a backup, the PCB also includes a JTAG programming interface, but this uses some of the data pins, so is only usable on a bare depopulated PCB.

The USB interface also emulates a serial console, that is compatible with standard PC terminal emulators; the ESP32 firmware makes extensive use of this for diagnostic reporting.

PCB design

Assembled logic analyser unit

The circuit diagram, PCB manufacturing files (Gerbers) and parts list are in the project repository; do check the README file for the latest information.

The PCB has dual-footprints (DIL & SO-8) for the memory chips and the comparators. I have used DIL sockets for the RAM chips so they can be upgraded at a future date, but as mentioned above, none of the DIL-packaged comparators were suitable, so surface-mount TS3022 parts were used – they have a relatively generous pin spacing (1.27 mm) so shouldn’t be difficult for anyone to assemble who has reasonable soldering skills.

The photo above shows socketed resistor packs for the input attenuators; if using these (as opposed to individual resistors) make sure you buy the type with 8 individual resistors, not commoned.

The ESP32 module requires two 19-way sockets with square pins; I had to use 20-pin parts, with one pin cut off. To help with hardware diagnostics, I have included convenient 2.54 mm pitch headers for the RAM clock, chip select and data lines. These only need to be populated if you are using a logic analyser to trace the board’s operation, or if you wish to remove the ESP32 module and drive the board from some other CPU.

Power (5 volts, with a current capacity of at least 250 mA) is either applied on the USB connector, or on P14, in which case there needs to be an on/off switch connected to the terminals of P7, or those pins must be bonded across. The module is programmed over USB; do not use the JTAG interface unless the board is de-populated.

Part 2 of this project looks at the ESP32 firmware, part 3 the Web interface and Python API. The circuit diagram and PCB files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA: remote logic analyzer using ESP32 and Web protocols

Remote logic analyser

There are plenty of low-cost logic analysers but they all share a common characteristic; a USB link is used to transfer the data into a PC for analysis.

If the equipment is in a safe & comfortable office environment, then this isn’t a problem, but in many cases it is operating in an distant, inaccessible or hostile location, so remote monitoring is desirable. If the analyser unit is small and low-cost, it can remain attached on a semi-permanent basis, enabling long-term monitoring & diagnosis of remote equipment

The initial specification of the logic analyser unit is

  • Digital inputs: 16 for each unit
  • Input threshold: programmable
  • Sample rate: up to 20 megasamples per second
  • Sample store: up to 250 kilosamples
  • Network interface: WiFi
  • Network protocols: TCP and HTTP
  • Control method: full remote control
  • Display method: Web pages with Javascript
  • Remote API: Python class

The project is fully open-source, and is documented in the following posts:

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.