PicoWi part 8: UDP server socket

So far I have used a large number of custom functions to configure and control the WiFi networking, but before adding yet more functionality, I need to offer a simpler (and more standard) way of doing all this programming.

When it comes to network programming on Linux or Windows systems, there is only one widely-used Application Programming Interface (API), namely Network Sockets, also known as Internet Sockets. A socket is an endpoint on the network, defined by an IP address & port number; in previous parts, I have used sockets to construct DHCP & DNS clients, and now will use them in a general-purpose programming interface.

UDP & TCP, Client & Server

To recap on the difference between User Datagram Protocol (UDP) & Transmission Control Protocol (TCP), also Client & Server:

  • UDP allows you to send blocks of data across the network, the maximum block size generally being around 1.5 kBytes. There is no synchronisation between sender & receiver, and no error handling, so there is no guarantee that the data will arrive at the destination.
  • TCP starts with an exchange of packets to synchronise the two endpoints, and thereafter the transfer is bidirectional and stream-orientated; it acts like a transparent pipe between the sockets, carrying arbitrary numbers of bytes from one socket to the other. When the transfer is complete, either side can close the connection, but the closure will only be complete when both sides acknowledge it.
  • A client initiates contact with a server, by sending UDP or TCP traffic to a well-known port on that system.
  • A server takes possession of (‘binds to’) a specific UDP or TCP port number, and runs continuously, waiting for a client to initiate contact.

I’m starting with the simplest of these options, namely a UDP server.

UDP server

To create a server socket, it is just necessary to issue a socket() call, followed by bind() to take possession of a specific port, by providing an initialised structure. The same code applies for both Linux, and my RP2040 socket API:

#define PORTNUM  8080    // Server port number
#define MAXLEN   1024    // Maximum data length

int server_sock; 
struct sockaddr_in server_addr, client_addr; 
server_sock = socket(AF_INET, SOCK_DGRAM, 0);
if (server_sock < 0)
    perror("socket creation failed"); 
memset(&server_addr, 0, sizeof(server_addr)); 
server_addr.sin_family    = AF_INET;
server_addr.sin_addr.s_addr = INADDR_ANY; 
server_addr.sin_port = htons(PORTNUM); 
if (bind(server_sock, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0) 
    perror("bind failed"); 

You may be surprised at the use of 2 structures, namely sockaddr_in, and sockaddr. The former is specific to Internet Protocol, whereas the latter is general-purpose, capable of being used with other network protocols. The structure has to be filled in with the address family (Internet Protocol) the desired IP address (zero for any), and the port number, byte-swapped to big-endian.

The code then enters an endless loop, waiting for a a datagram to be received, printing the data (on the assumption it is text), and returning a text string in response:

printf("UDP server on port %u\n", PORTNUM);
while (1)
    memset(&client_addr, 0, sizeof(client_addr)); 
    addrlen = sizeof(client_addr);
    n = recvfrom(server_sock, buffer, MAXLEN-1, 0, 
        (struct sockaddr *)&client_addr, &addrlen);
    if (n > 0)
        buffer[n] = 0; 
        printf("Client %s: %s\n", inet_ntoa(client_addr.sin_addr), buffer); 
        n = sprintf(buffer, "Test %u\n", testnum++);
        sendto(server_sock, buffer, n, MSG_CONFIRM, 
            (struct sockaddr *)&client_addr, addrlen);

It may seem strange to set ‘addrlen’ every time we go round the loop; this is for safety, as recvfrom() can potentially modify its value. Also, there is no guarantee that the incoming string is null-terminated, so a null is added at the end of the datagram, just in case.


The above code ‘blocks’ on the receive function, that is to say it waits indefinitely until the data arrives, which is standard practice on Linux or Windows sockets – it is sensible when we have a multi-tasking operating system, but inconvenient if we are restricted to a single task, as the CPU can’t run any other code while waiting for incoming data.

There are various ways round this limitation, for example Linux has a ‘select’ function that allows you to poll the interface, checking if any incoming data is available. Alternatively, we can set a timeout on the socket read, for example 0.1 seconds (100,000 microseconds):

struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 100000;
setsockopt(server_sock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

To demonstrate this functionality, we can add a useful feature whereby the LED is flashed every time an access to the server is made; if we set the timeout to 1000 microseconds, then the timeouts give us a convenient millisecond timer. The following code does a rapid flash of the LED until the unit joins the Wifi network, then a quick blink every 2 seconds, plus a flash every time the server is accessed.

sock_set_timeout(server_sock, 1000);
while (1)
    memset(&client_addr, 0, sizeof(client_addr)); 
    addrlen = sizeof(client_addr);
    n = recvfrom(server_sock, buffer, MAXLEN, 0, 
        (struct sockaddr *)&client_addr, &addrlen);
    if (n > 0)
        buffer[n] = 0; 
        printf("Client %s: %s\n", inet_ntoa(client_addr.sin_addr), buffer); 
        n = sprintf(buffer, "Test %u\n", testnum++);
        sendto(server_sock, buffer, n, 0, (struct sockaddr *)&client_addr, addrlen);
        msec = 0;
        if (msec == 1)
        else if (msec == 100)
        else if (msec == 200 && !link_check())
            msec = 0;
        else if (msec == 2000)
            msec = 0;

Test program

See the introduction for details on how to rebuild and program the udp_socket_server application into the Pico.

The simplest way to test the server is using the netcat application, that can easily be installed on the Raspberry Pi using ‘apt install’; you just need to note the IP address of the Pico as reported on the serial console, and enter it into the netcat command line, e.g. for an address of

echo -n "hello" | nc -4u -w1 8080

The server will respond with an incrementing test number, e.g.

Test 1

Alternatively, here is a simple Python client program:

# Simple UDP client example
import time, socket

addr = "", 8080
data = 'test'

while True:  
    client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    client.sendto(str.encode(data), addr)
    print("Tx: " + data);        
        resp, server = client.recvfrom(1024)
    except socket.timeout:
        resp = None
    print("Rx: " + "timeout" if not resp else resp.decode('utf-8'))

Or in C:

// Simple UDP client for Linux

#include <stdio.h> 
#include <string.h> 
#include <sys/socket.h> 
#include <arpa/inet.h> 
#include <netinet/in.h> 
#include <unistd.h>

#define SERVER_ADDR     ""
#define SERVER_PORT     8080
#define MAXLEN          1024 

void sock_set_timeout(int sock, int usec);

int main(int argc, char **argv)
    int sock, n;
    struct sockaddr_in server_addr;
    char txdata[MAXLEN] = "test", rxdata[MAXLEN];

    sock = socket(AF_INET, SOCK_DGRAM, 0);
    sock_set_timeout(sock, 100000);
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = INADDR_ANY;
    server_addr.sin_port = htons(SERVER_PORT);
    while (1)
        printf("Tx %s:%u: %s\n", SERVER_ADDR, SERVER_PORT, txdata);
        sendto(sock, txdata, strlen(txdata), 0,
              (struct sockaddr *)&server_addr, sizeof(server_addr));
        n = recvfrom(sock, rxdata, MAXLEN-1, 0, NULL, NULL);
        if (n < 0)
            printf("Rx timeout\n");
            rxdata[n] = 0;
            printf("Rx %s\n", rxdata);

// Set socket timeout
void sock_set_timeout(int sock, int usec)
    struct timeval tv;
    tv.tv_sec = usec / 1000000;
    tv.tv_usec = usec % 1000000;
    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 7: DNS

The Domain Name System (DNS) is the top layer of the 3-layer addressing scheme, as described in the previous part of this blog. To recap:

The Domain Name System translates a Domain Name such as http://www.google.com into an Internet Protocol (IP) address, consisting of 4 decimal numbers 0-255 with ‘.’ separators, e.g.

The Address Resolution Protocol (ARP) translates an IP address into a Medium Access and Control (MAC) address, consisting of 6 hexadecimal values with ‘:’ separators, e.g. 28:CD:C1:00:D9:20

The MAC address is used for all communication within a network, to identify a sender or intended recipient.

In the early days of the Internet there was a simple one-to-one relationship between a domain name and an IP address, so DNS just required a simple lookup on a remote database. Nowadays the name-to-address mapping is much more complex, but we can still take a simple approach, by providing DNS with a name, and it will return an IP address, or several IP addresses if there are some alternatives.

DNS specification

The specifications for DNS, and all other aspects of TCP/IP communication, are in the form of documents called Request For Comments (RFC); DNS is RFC-1034 and RFC-1035. These documents are freely available, and don’t just describe the protocol, but also give background information on the rationale for the design decisions that were made. RFCs are generally very clearly written, with a minimum of jargon, and are well worth a read.

A DNS lookup consists of a single UserDatagram Protocol (UDP) message sent to a server, and a single matching UDP response. UDP is an ‘unreliable’ protocol, so there is no guarantee that a response will be received; if not, the process can just be repeated.

DNS request

The request consists of a fixed-format header, plus variable-format data. The fixed section has just 6 2-byte values:

typedef struct
	WORD ident,   // Message identification number
        flags,    // Option flags
	    n_query,  // Number of queries
		n_ans,    // Number of answers
		n_auth,   // Number of authority records
		n_rr;     // Number of additional records

All values are in ‘network’ byte-order, so are most-significant-byte first. The meaning of the flags can best be described by looking at the decode of a typical request in Wireshark:

Domain Name System (query)
    Transaction ID: 0x1234
    Flags: 0x0100 Standard query
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data: Unacceptable
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0

This is followed by a variable-length Resource Record (RR) field with the domain name, then 16-bit name type & name class, which normally have a value of 1. Each part of the name is prefixed by a length byte, and the name is terminated by a zero length, e.g.

LEN w w w  LEN r a s p b e r r y p i  LEN o r g  NUL Type Class
03  777777 0b  7261737062657272797069 03  6f7267 00  0001 0001

The code to create the header and data is relatively simple:

// Format DNS request data, given name string
int dns_add_hdr_data(BYTE *buff, char *s) 
	BYTE *p, *q;
    DNS_HDR *dhp = (DNS_HDR *)buff;
    int len = sizeof(DNS_HDR);
    static int ident = 1;
    memset(dhp, 0, sizeof(DNS_HDR));
    dhp->ident = htons(ident++);
    dhp->flags = htons(0x100);  // Recursion desired
    dhp->n_query = htons(1);
    p = q = &buff[len];
	while (*s)	                // Prefix each part with length byte
		while (*s && *s != '.')
			*p++ = (BYTE)*s++;
		*q = (BYTE)(p - q - 1);
		q = p;
    	if (*s)
    *p++ = 0;   // Null terminator
    *p++ = 0;	// Type A (host address)
    *p++ = 1;
    *p++ = 0;	// Class IN
    *p++ = 1;
	return (p - buff);

Transmitting this message is just a question of adding on the Ethernet, IP and UDP headers; this is done in a slightly strange order, since the UDP header must include a checksum calculated from the subsequent (DNS) data.

// Transmit DNS request
int dns_tx(MACADDR mac, IPADDR dip, WORD sport, char *s)
    char temps[300];
    int oset = 0;
	int len = ip_add_eth(txbuff, mac, my_mac, PCOL_IP);
	int dlen = dns_add_hdr_data(&txbuff[len + sizeof(IPHDR) + sizeof(UDPHDR)], s);
	len += ip_add_hdr(&txbuff[len], dip, PUDP, sizeof(UDPHDR) + dlen);
	len += udp_add_hdr_data(&txbuff[len], sport, DNS_SERVER_PORT, 0, dlen);
 	return (ip_tx_eth(txbuff, len));

DNS response

The response (if any) will arrive from the WiFi chip as an ‘event’, and it has to go through IP and UDP pre-processing before arriving at the DNS decoder, or any other protocol handler that matches the incoming data. I’ve already established an ‘add_event_handler’ mechanism for distributing events to various handlers, and am using a similar mechanism for distributing incoming UDP traffic, by giving the standard DNS port number:

#define DNS_SERVER_PORT	53
udp_sock_init(udp_dns_handler, zero_ip, 0, DNS_SERVER_PORT);

// Return number of DNS responses
int dns_num_resps(BYTE *buff, int len)
    DNS_HDR *dhp = (DNS_HDR *)&buff[sizeof(ETHERHDR) + sizeof(IPHDR) + sizeof(UDPHDR)];
    return (len > sizeof(ETHERHDR)+sizeof(IPHDR)+sizeof(UDPHDR)+sizeof(DNS_HDR) ?
        htons(dhp->n_ans) : 0);

// Handler for UDP DNS response
int udp_dns_handler(UDP_SOCKET *usp)
    char temps[300];
    IPADDR addr;
    int oset = 0;
    if (display_mode & DISP_DNS)
        printf("Rx %s: ", dns_hdr_str(temps, usp->data, usp->dlen));
        printf("%s\n", dns_name_str(temps, usp->data, usp->dlen, &oset, 0, 0));
        for (int n = 0; n < dns_num_resps(usp->data, usp->dlen); n++)
            printf("%s\n", dns_name_str(temps, usp->data, usp->dlen, &oset, 0, addr));
    return (1);

The response handler just iterates through the responses and prints them out:

Tx DNS 1 query: www.raspberrypi.org type A
Tx UDP> len 37
Rx UDP> len 85
Rx DNS 1 query, 3 resp: www.raspberrypi.org type A
  www.raspberrypi.org type A
  www.raspberrypi.org type A
  www.raspberrypi.org type A

There are 3 responses, but the way they are structured is surprising; this is the binary data:

0040                                 03 77 77 77 0b 72   ...........www.r
0050   61 73 70 62 65 72 72 79 70 69 03 6f 72 67 00 00   aspberrypi.org..
0060   01 00 01 c0 0c 00 01 00 01 00 00 00 df 00 04 68   ...............h
0070   16 01 2b c0 0c 00 01 00 01 00 00 00 df 00 04 68   ..+............h
0080   16 00 2b c0 0c 00 01 00 01 00 00 00 df 00 04 ac   ..+.............
0090   43 24 62                                          C$b

The first entry has the same format as the request, with the domain name http://www.raspberrypi.org having its delimiters replaced by length-bytes. However the subsequent responses have the length byte replaced by the value 0xc0, followed by a value of 0x0c. This 16-bit value is the result of a compression scheme; the two most-significant bits are set to indicate a compressed entry, and the remaining 14 bits are an offset value pointing at the duplicate text (calculated from the start of the DNS header).

The domain name (or compression pointer) is followed by 16-bit type & class words (usually both 1), a 32-bit time-to-live value, then the IP address length and the 4 address bytes.

This makes the decoder quite complex; typically the item most of interest is the IP address, but it is necessary to decode the 3 parts of the name (or the compression pointer) first; see the function dns_name_str() for my version of this.

Misaligned data values

A major issue that arose while debugging the message decoder is the fact that the 16- and 32-bit values in the response may not be aligned on a 2- or 4-byte boundary. This is an issue that is relatively unique to protocol decoding on small embedded systems, but does have a really bad outcome (crashing the CPU) so some explanation is in order. Here is a test program, with a simplified version of the DNS response, just a null-terminated string followed by a 4-byte IP address:

#include <stdio.h>
#include <string.h>

char data[] = {'a', 'b', 'c', 0, 0x11, 0x22, 0x33, 0x44};

typedef unsigned int IPADDR;

int main(int argc, char *argv[])
    char *p = &data[strlen(data) + 1];
    IPADDR addr = *(IPADDR *)p;
    printf("%s %X\n", data, addr);

When this program is run on the Pico board, or a little-endian Linux system, the result is printed out as expected:

abc 44332211

The program can now be changed so the string is 1 character longer:

char data[] = {'a', 'b', 'c', 'd', 0, 0x11, 0x22, 0x33, 0x44};

The Linux system works as expected, printing out:

abcd 44332211

However the program fails on the Pico, and prints nothing out. If run under a debugger, a ‘hard fault’ trap is reported, with the call stack showing that the problem occurs when the address value is set. This is because the address is no longer on a 4-byte boundary, and the simpler RP2040 processor can’t handle this misaligned transfer, whereas the PC processor can. So it is quite easy to write some code that works fine on a PC, and sometimes fails on the Pico; for example, my early DNS tests used the domain name ‘pool.ntp.org’ and when the response is obtained, all the 16-bit values happen to all be on 2-byte boundaries so the code worked fine; switching to ‘www.raspberrypi.org’ these values became misaligned, so the code failed.

There are various workarounds for this problem, the simplest being not to use casts; my newer code defines the IP address as a byte array, which can be copied byte-by-byte to avoid any misalignment issues. It isn’t sensible to use this approach on 16-bit port values, but these are generally in a byte array and need to be swapped from from big-endian to little-endian, so we can use a byte pointer, e.g.

BYTE *buff;
WORD val = htonsp(buff);

// Convert byte-order in a 'short' variable, given byte pointer
WORD htonsp(BYTE *p)
    WORD w = (WORD)(*p++) << 8;
    return(w | *p);

Test program

The ‘dns’ program joins a network using the name ‘testname’ and password ‘testpass’ that need to be changed. It uses DHCP to fetch an IP address, and the addresses of a router and DNS server. ARP is then used to resolve the server IP address to a MAC address, then that is used to contact the nameserver, asking to resolve the name http://www.raspberrypi.org, and printing the result.

For more information on compiling and running the code, see the introduction.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 6: DHCP

In part 5, we joined a WiFi network, and used ‘ping’ to contact another unit on that network, but this was achieved by setting the IP address manually, which is generally known as using a ‘static’ IP.

The alternative is to use a ‘dynamic’ IP, that a central server (such as the WiFi Access Point) allocates from a pool of available addresses, using Dynamic Host Configuration Protocol (DHCP); this also provides other information such as a netmask & router address, to allow our unit to communicate with the wider Internet.

IP addresses and routing

So far, I’ve just said that an IP address consists of 4 bytes, that are usually expressed as decimal values with dotted notation, e.g., but there is some extra complication.

Firstly it is important to note I’m using version 4 of the protocol (IPv4); there is a newer version (IPv6) with a much wider address range, but the older version is sufficient for our purposes, and easier to implement.

Next it is important to distinguish between a public and private IP address.

  • Public: an address that is accessible from the Internet, generally assigned by an Internet Service Provider (ISP)
  • Private: an address used locally within an organisation, that is not unique; generally assigned from the blocks 192.168.x.x, 172.16.x.x or 10.x.x.x

The address we’ll be getting from the DHCP server is probably private; if we are accessing the Internet, there will be one or more network devices (‘routers’) that perform public-to-private translation, and also security functions (‘firewalls’) to block malicious data.

If our unit has an IP address it wishes to contact, how does it know what to do? It just has to determine if the target address is local or remote by applying a netmask. For example if our unit is given the address with netmask, then a logical AND of the two values means that our local network (known as a ‘subnet’) is 192.168.1. If the unit we’re contacting is on that subnet (i.e. the address begins with 192.168.1) then we just send out a local ARP request to convert their IP address into a MAC address, and start communicating.

If the target address isn’t on the same subnet (e.g.,, or anything else) then our unit contacts a router (using the address given in the DHCP response) and relies on the router to forward the data appropriately.

In the diagram above, there are networks with public addresses and, and they both have private addresses in 192.168.1.x subnetworks; the job of the router is to move the data between these subnetworks by performing Network Address Translation (NAT) between them.

If unit wants to contact it will check the netmask, and because the target isn’t on the same subnetwork, the data will be sent to the router, which will forward it over the Internet.

If wants to contact, ANDing with the netmask will show that they are both on the same subnet, so the data will be sent directly, bypassing using the router.

However, if wants to send the data to on the remote network, how does the router know what to do? The simple answer is “it doesn’t”, as addresses on the 192.168.1.x subnet aren’t unique, and there will be thousands (or millions!) of units with that same address around the world. Also the netmask clearly indicates that must be on the same subnet as, so the data will be sent locally to, whether it exists or not; if it doesn’t exist, that’ll be flagged up by the ARP request failing.

There are various workarounds for this ‘NAT traversal’ problem, for example sends the data to the router, which is configured to copy incoming data to, but there are major security risks associated with opening up a system to unfiltered Internet traffic, so for the purposes of this blog, I’m assuming that our unit will only be communicating with other units on the same subnetwork, or publicly-available systems on the Internet.

The above example assumes there is a single router for all outgoing traffic, and this is generally the case on a WiFi network, where the Access Point also acts as a router. However, on more complex networks there can be multiple routers to provide alternative routes to other networks or the Internet.

Client and server

The most common model for communication between two systems is client-server. The server runs continuously, waiting for a client to get in contact. The client uses a specific communications format (a ‘protocol’) to establish a link (‘connection’) to the server. The connection persists for as long as is needed to exchange the data, then it is closed by both sides.

Simpler protocols can dispense with the connection, but still retain the client-server model; for example, to fetch the time with Network Time Protocol (NTP) you just send a single message to a time server, and get a single message back with the time. This ‘connectionless’ approach means that a single ‘stateless’ server can handle very large numbers of clients, since it doesn’t have to track the state of its clients; an incoming request has all the information needed to send the response.

UDP message format

So there are two distinct ways for a client to communicate with a server; one creates a persistent connection, with both sides tracking the flow of data, and re-sending any data that is lost in transit: this is Transmission Control Protocol (TCP). The other way is User Datagram Protocol (UDP), which has no such tracking, or error correction; just send a block of data and hope it arrives.

This uncertainty means that, if faced with a choice, many programmers reject UDP as being too unreliable, however it does have a very important place in the suite of TCP/IP protocols, not least because it is used for DHCP.

A DHCP transmission consists of the following:

  • Ethernet header
  • IP header
  • UDP header
  • DHCP header
  • DHCP option data

We’ve already used the Ethernet and IP headers when sending an ICMP (ping) message, this time we’re stacking on a UDP header.

/* ***** UDP (User Datagram Protocol) header ***** */
typedef struct udph
    WORD  sport,            /* Source port */
          dport,            /* Destination port */
          len,              /* Length of datagram + this header */
          check;            /* Checksum of data, header + pseudoheader */

There is a 16-bit length, which shows the total length of the header plus any data that follows, and a 16-bit checksum, which is calculated in an unusual manner; it incorporates the UDP header, parts of the IP header, and all the data that follows. The way this is calculated is to create a pseudo-header containing the relevant IP parts:

/* ***** Pseudo-header for UDP or TCP checksum calculation ***** */
/* The integers must be in hi-lo byte order for checksum */
typedef struct              /* Pseudo-header... */
    IPADDR sip,             /* Source IP address */
          dip;              /* Destination IP address */
    BYTE  z,                /* Zero */
          pcol;             /* Protocol byte */
    WORD  len;              /* UDP length field */

So the UDP code has to prepare two headers, though the pseudo-header is only used for checksum calculation, and can be discarded after that is done.

// Add UDP header to buffer, return byte count
int ip_add_udp(BYTE *buff, WORD sport, WORD dport, void *data, int dlen)
    UDPHDR *udp=(UDPHDR *)buff;
    IPHDR *ip=(IPHDR *)(buff-sizeof(IPHDR));
    WORD len=sizeof(UDPHDR), check;
    PHDR ph;

    udp->sport = htons(sport);
    udp->dport = htons(dport);
    udp->len = htons(sizeof(UDPHDR) + dlen);
    udp->check = 0;
    len += ip_add_data(&buff[sizeof(UDPHDR)], data, dlen);
    check = add_csum(0, udp, len);
    IP_CPY(ph.sip, ip->sip);
    IP_CPY(ph.dip, ip->dip);
    ph.z = 0;
    ph.pcol = PUDP;
    ph.len = udp->len;
    udp->check = 0xffff ^ add_csum(check, &ph, sizeof(PHDR));

Port numbers

Another notable feature of the UDP header is the source & destination port numbers, and these deserve some explanation.

A port number can identify a specific service on a server; for example port 80 identifies an HTTP web server, and 67 is a DHCP server. These are ‘well-known’ port numbers and are in the range 0 to 1023. Ports numbered 1024 to 49151 are also used for specific server functionality that isn’t part of the original set, so are known as ‘registered’. The remaining numbers 49152 to 65535 are ‘dynamic’ ports, that are used temporarily by client applications.

When a client wishes to communicate with a server, it will obtain a dynamic port from its operating system, and use that port for the duration of a transaction, releasing it when the transaction is complete. In contrast, a server will generally monopolise a well-known or registered port on a permanent basis, though some servers additionally open up a dynamic port on a short-term basis to handle a specific interaction with the client, such as a file transfer.

Unusually, the DHCP server & client are both assigned well-known numbers, namely UDP 67 and 68. You may see these identified as BOOTP ports, since DHCP is based on the older BOOTP protocol, with some additions.

DHCP message format

DHCP is a 4-step process:

  • Discover: the unit broadcasts a request asking for network parameters, such as an IP address it can use, also a router address, and subnet mask.
  • Offer: the server responds with some proposed values, that the unit can accept or reject.
  • Request: the unit signifies its acceptance of the proposed values
  • ACK: the server acknowledges the request, indicating that the parameters have been assigned to the unit.

Once the parameters have been assigned, the server will generally attempt to keep them unchanged, such that every time the unit boots, it will get the same IP address. However, this is not guaranteed, and a busy server with a lot of temporary clients will be forced to re-use addresses from units that haven’t been active for a while.

The message format is based on the older protocol BOOTP:

typedef struct {
  	BYTE  opcode;   			/* Message opcode/type. */
	BYTE  htype;				/* Hardware addr type (net/if_types.h). */
	BYTE  hlen;					/* Hardware addr length. */
	BYTE  hops;					/* Number of relay agent hops from client. */
	DWORD trans;				/* Transaction ID. */
	WORD secs;					/* Seconds since client started looking. */
	WORD flags;					/* Flag bits. */
	IPADDR ciaddr,				/* Client IP address (if already in use). */
           yiaddr,				/* Client IP address. */
           siaddr,				/* Server IP address */
           giaddr;				/* Relay agent IP address. */
	BYTE chaddr [16];		    /* Client hardware address. */
	char sname[SNAME_LEN];	    /* Server name. */
	char bname[BOOTF_LEN];		/* Boot filename. */
	BYTE cookie[DHCP_COOKIE_LEN];   /* Magic cookie */

When making the initial discovery request, many of these values are unused; the ‘cookie’ is filled in with a specific 4-byte value (99, 130, 83, 99) that signal this is a DHCP request, not BOOTP. Then there is a data field with ‘option’ values; each entry has one byte indicating the option type, one byte indicating data length, and that number of data bytes. The options I use in the discovery request are a byte value of 1, indicating it is a discovery message, and 4 parameter values, indicating what should be provided by the server (1 for subnet mask, 3 for router address, 6 for nameserver address and 15 for network name).

// DHCP message options
typedef struct {
    BYTE typ1, len1, opt;
    BYTE typ2, len2, data[4];
    BYTE end;

// DHCP discover options
DHCP_MSG_OPTS dhcp_disco_opts = 
   {53, 1, 1,               // Msg len 1 type 1: discover
    55, 4, {1, 3, 6, 15},   // Param len 4: mask, router, DNS, name
    255};                   // End

The resulting offer from the server probably includes much more than we asked for; this is what my server returns:

    Option: (53) DHCP Message Type (Offer)
    Option: (54) DHCP Server Identifier (
    Option: (51) IP Address Lease Time (7 days)
    Option: (58) Renewal Time Value (3 days, 12 hours)
    Option: (59) Rebinding Time Value (6 days, 3 hours)
    Option: (1) Subnet Mask (
    Option: (28) Broadcast Address (
    Option: (15) Domain Name ("home")
    Option: (6) Domain Name Server (
    Option: (3) Router (
    Option: (255) End

You’ll see that the Access Point is acting as a router and nameserver; we’ll be looking at the Domain Name System (DNS) in the next part of this blog.

If the unit wants to accept these proposed settings, it must send a request containing the proposed IP address. This can have the same format as the discovery, with a byte value of 3, indicating it is a request message, and a the 4-byte address value:

// DHCP request options
DHCP_MSG_OPTS dhcp_req_opts = 
   {53, 1, 3,               // Msg len 1 type 3: request
    50, 4, {0, 0, 0, 0},    // Address len 4 (copied from offer)
    255};                   // End

Assuming all is OK, the ACK response from the server will be similar to the offer, maybe with more values added (such as vendor-specific information), so an important part of the receiver code is the scanning of the parameters to find the values that are needed.

State machine

If we were in a multi-tasking environment, the DHCP process might basically consist of a sequence of 4 function calls, each function stopping (‘blocking’) until it is complete:


Since we don’t currently have multi-tasking, we can’t adopt this approach, as it would block any other code from running, and in the event of an error, one of these functions might stall indefinitely. Instead, we have to adopt a ‘polled’ approach, where we keep on re-visiting this process to see what (if anything) has changed. The key to this is to have a single ‘state’ variable that reflects what has happened, e.g. it has a value of 1 when we have sent the discovery, 2 when we have received an offer, and so on.

// Poll DHCP state machine
void dhcp_poll(void)
    static uint32_t dhcp_ticks=0;
    if (dhcp_state == 0 ||              // Send DHCP Discover
       (dhcp_state != DHCPT_ACK && ustimeout(&dhcp_ticks, DHCP_TIMEOUT)))
        ustimeout(&dhcp_ticks, 0);
        ip_tx_dhcp(bcast_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_disco_opts, sizeof(dhcp_disco_opts));
        dhcp_state = DHCPT_DISCOVER;
    else if (dhcp_state == DHCPT_OFFER) // Received Offer, send Request
        ustimeout(&dhcp_ticks, 0);
        IP_CPY(dhcp_req_opts.data, offered_ip);
        ip_tx_dhcp(host_mac, bcast_ip, DHCP_REQUEST, 
                   &dhcp_req_opts, sizeof(dhcp_req_opts));
        dhcp_state = DHCPT_REQUEST;

The polling of the DHCP state also incorporates a timeout, that is triggered in the event of an error; with a simple 4-step protocol like this, we can just restart the process from the beginning, rather than trying to work out where the error occurred.

Example program

There is one example program dhcp.c that fetches IP addresses and netmask from a DHCP server, and prints the result:

Joining network
Joined network
Rx DHCP ACK mask router DNS
DHCP complete, IP address router> ARP request> ARP response

The display mode is set to include DHCP:


This allows you to see the message-passing; it isn’t unusual to receive duplicate messages, and in the DHCP OFFER above. The ARP display is also enabled so you can see the router using ARP to check the newly-assigned address.

It will be necessary to change the default SSID and PASSWD to match your network; for details on how to build & load the application, see the introduction.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 5: ARP, IP and ICMP

In part 4, the wireless chip was connected to a WiFi network, so it can now send & receive data on that network, but we still have to encode the data for transmission, and decode it for reception.

We’re using a ‘full MAC’ chip, so all the low-level WiFi interfacing is handled within the chip. When transmitting, it encrypts our data, and adds the necessary 802.11 headers so that it will accepted by the network access point; when receiving, the headers are stripped off and the data is decrypted before being passed over to the Pico CPU.

This doesn’t just make our encoding & decoding tasks easier, it also ensures that the transmissions fully conform to the (exceedingly complex) 802.11 rules; if your interest is in creating non-standard wireless transmissions, then I’m afraid this project will be of no help.


The suite of protocols used for data transmission over the Internet are generally known as Transmission Control Protocol / Internet Protocol, or TCP/IP. We’ll only be using a small subset of these protocols, and the initial task is just to handle Address Resolution Protocol (ARP) and Internet Control Message Protocol (ICMP). This will allow us to send & receive diagnostic ‘ping’ messages, and do some simple benchmarks by communicating with another system.

TCP/IP uses a three-tier addressing system; at the highest level, there are names with dotted notation, such as iosoft.blog or http://www.google.com. To access the computer at this address, two further steps are required:

  • a Domain Name System (DNS) database lookup is used to convert the name into an Internet Protocol (IP) address, which has 4 numeric values in dotted notation, for example
  • an Address Resolution Protocol (ARP) message is sent out on the network, with a request to convert the remote unit’s IP address into a Media Access and Control (MAC) address, which has 6 bytes, that are normally printed with a colon separator, e.g. 28:CD:C1:00:12:34

The first of these will be tackled in the next part of this project; for now, I’m assuming that the unit has obtained an IP address from somewhere, and knows the IP address of another unit it wishes to communicate with, for example the WiFi access point.

Address Resolution Protocol (ARP)

This is probably the simplest of all TCP/IP protocols; the unit broadcasts a request in a specific format, giving the IP address it wants to contact, and if any unit on the same ‘subnet’ has that address, then it will respond with its 6-byte MAC address. That is used for outgoing messages, but for incoming messages our unit must listen out for ARP broadcasts, and if a request matches its IP address, it should respond with the MAC address.

The ARP message format can be encapsulated within a C structure:

typedef unsigned char  BYTE;
typedef unsigned short WORD;
typedef unsigned int   DWORD;
typedef unsigned int IPADDR;

/* ***** ARP (Address Resolution Protocol) packet ***** */
typedef struct
    WORD hrd,           /* Hardware type */
         pro;           /* Protocol type */
    BYTE  hln,          /* Len of h/ware addr (6) */
          pln;          /* Len of IP addr (4) */
    WORD op;            /* ARP opcode */
    MACADDR  smac;      /* Source MAC addr */
    IPADDR   sip;       /* Source IP addr */
    MACADDR  dmac;      /* Destination MAC addr */
    IPADDR   dip;       /* Destination IP addr */

This is the first of many C structures for TCP/IP, and I’ve chosen to define 8, 16 and 32-bit values as BYTE, WORD and DWORD for clarity.

To broadcast this message, we need to add on Ethernet header, giving a source MAC address (the MAC address of our unit, as reported by the WiFi chip) the destination MAC address (broadcast, which is all-ones, i.e. FF:FF:FF:FF:FF:FF) and a protocol ID, which indicates that we’re sending an ARP packet.

/* Ethernet (DIX) header */
typedef struct {
    MACADDR dest;               /* Destination MAC address */
    MACADDR srce;               /* Source MAC address */
    WORD    ptype;              /* Protocol type or length */
#define PCOL_ARP    0x0806      /* Protocol type: ARP */
#define PCOL_IP     0x0800      /*                IP */

There are a lot of similarities between the higher level of wired (Ethernet) and wireless (802.11) protocols, so it makes sense that both use the same network address structure.

Creating an ARP request is really just a fill-in-the-blanks exercise:

#define HTYPE       0x0001  /* Hardware type: ethernet */
#define ARPPRO      0x0800  /* Protocol type: IP */
#define ARPREQ      0x0001  /* ARP request */
#define ARPRESP     0x0002  /* ARP response */

// Add Ethernet header to buffer, return byte count
WORD ip_add_eth(BYTE *buff, MACADDR dmac, MACADDR smac, WORD pcol)
    ETHERHDR *ehp = (ETHERHDR *)buff;

    MAC_CPY(ehp->dest, dmac);
    MAC_CPY(ehp->srce, smac);
    ehp->ptype = htons(pcol);

// Create an ARP frame, return length
int ip_make_arp(BYTE *buff, MACADDR mac, IPADDR addr, WORD op)
    int n = ip_add_eth(buff, op==ARPREQ ? bcast_mac : mac, my_mac, PCOL_ARP);
    ARPKT *arp = (ARPKT *)&buff[n];

    MAC_CPY(arp->smac, my_mac);
    MAC_CPY(arp->dmac, op==ARPREQ ? bcast_mac : mac);
    arp->hrd = htons(HTYPE);
    arp->pro = htons(ARPPRO);
    arp->hln = MACLEN;
    arp->pln = sizeof(DWORD);
    arp->op  = htons(op);
    arp->dip = addr;
    arp->sip = my_ip;
    if (display_mode & DISP_ARP)
    return(n + sizeof(ARPKT));

// Convert byte-order in a 'short' variable
WORD htons(WORD w)
    return(w<<8 | w>>8);

All network data is in big-endian format (most-significant byte first), but the RP2040 processor is little-endian, so the 16-bit values need to be byte-swapped.

To transmit the message, all that is needed is to add on the SDPCM layer for the WiFi chip, and copy it into an outgoing message buffer:

// Transmit an ARP frame
int ip_tx_arp(MACADDR mac, IPADDR addr, WORD op)
    int n = ip_make_arp(txbuff, mac, addr, op);
    return(ip_tx_eth(txbuff, n));

// Send transmit data
int ip_tx_eth(BYTE *buff, int len)
    return(event_net_tx(buff, len));
// Transmit network data
int event_net_tx(void *data, int len)
    TX_MSG *txp = &tx_msg;
    int txlen = sizeof(SDPCM_HDR)+2+sizeof(BDC_HDR)+len;
    display(DISP_DATA, "Tx_DATA len %d\n", len);
    disp_bytes(DISP_DATA, data, len);
    display(DISP_DATA, "\n");
    txp->sdpcm.len = txlen;
    txp->sdpcm.notlen = ~txp->sdpcm.len;
    txp->sdpcm.seq = sd_tx_seq++;
    memcpy(txp->data, data, len);
    if (!wifi_reg_val_wait(10, SD_FUNC_BUS, SPI_STATUS_REG, 
    return(wifi_data_write(SD_FUNC_RAD, 0, (uint8_t *)txp, (txlen+3)&~3));

The transmit data length is rounded up to the nearest 4 bytes, as the WiFi DMA controller only works handles complete 4-byte words.

ARP reception

An incoming message will arrive as an ‘event’ from the WiFi chip, and a handler function first checks that it is valid:

// Handler for incoming ARP frame
int arp_event_handler(EVENT_INFO *eip)
    ETHERHDR *ehp=(ETHERHDR *)eip->data;

    if (eip->chan == SDPCM_CHAN_DATA &&
        eip->dlen >= sizeof(ETHERHDR)+sizeof(ARPKT) &&
        htons(ehp->ptype) == PCOL_ARP &&
        (MAC_IS_BCAST(ehp->dest) ||
         MAC_CMP(ehp->dest, my_mac)))
        return(ip_rx_arp(eip->data, eip->dlen));

If the incoming message is an ARP request, then the receiver function transmits an appropriate response. If it is a response, the resulting MAC address is saved, for use in future transmissions:

// Receive incoming ARP data
int ip_rx_arp(BYTE *data, int dlen)
    ETHERHDR *ehp=(ETHERHDR *)data;
    ARPKT *arp = (ARPKT *)&data[sizeof(ETHERHDR)];
    WORD op = htons(arp->op);

    if (arp->dip == my_ip)
        if (op == ARPREQ)
            ip_tx_arp(ehp->srce, arp->sip, ARPRESP);
        else if (op == ARPRESP)
            ip_save_arp(arp->smac, arp->sip);


Ping request & response format

Having obtained the 6-byte MAC address of a unit we wish to communicate with, what can we send to it? The obvious choice is a diagnostic ‘ping’, that echoes back the data we send, and measures the round-trip time.

Ping uses the Internet Control Message Protocol (ICMP), with an IP header for the address information:

/* ***** ICMP (Internet Control Message Protocol) header ***** */
typedef struct
    BYTE  type,         /* Message type */
          code;         /* Message code */
    WORD  check,        /* Checksum */
          ident,        /* Identifier */
          seq;          /* Sequence number */
#define ICREQ           8   /* Message type: echo request */
#define ICREP           0   /*               echo reply */

/* ***** IP (Internet Protocol) header ***** */
typedef struct
    BYTE   vhl,         /* Version and header len */
           service;     /* Quality of IP service */
    WORD   len,         /* Total len of IP datagram */
           ident,       /* Identification value */
           frags;       /* Flags & fragment offset */
    BYTE   ttl,         /* Time to live */
           pcol;        /* Protocol used in data area */
    WORD   check;       /* Header checksum */
    IPADDR sip,         /* IP source addr */
           dip;         /* IP dest addr */
#define PICMP   1           /* Protocol type: ICMP */
#define PTCP    6           /*                TCP */
#define PUDP   17           /*                UDP */

Creating an ICMP request largely consists of filling in the values within these structures, and adding some arbitrary data on the end, but there are some issues to bear in mind:

  • As with ARP, all values are big-endian (most significant byte first) so byte-swaps are needed
  • Potentially the IP message (known as a ‘datagram’) may travel very long distances, with a large number of ‘hops’ between computers, and each of these hops will have a maximum data size it can accommodate, which is known as a Maximum Transmission Unit (MTU). To allow a large datagram to be sent across a link with a smaller MTU, there is a technique called ‘IP fragmentation’, whereby the transmission is chopped up into smaller parts, and the parts are reassembled at the receiving end. For simplicity, we won’t initially support fragmentation, which means we have an MTU of around 1.5K bytes.
  • There is a checksum across the IP header and ICMP data, and this is calculated using a method that performs identically on big-endian and little-endian processors.
/* Calculate TCP-style checksum, add to old value */
WORD add_csum(WORD sum, void *dp, int count)
    WORD n=count>>1, *p=(WORD *)dp, last=sum;

    while (n--)
        sum += *p++;
        if (sum < last)
        last = sum;
    if (count & 1)
        sum += *p & 0x00ff;
    if (sum < last)

Ping reception

If the unit has received a unicast ICMP request, then it should return a response to the sender that basically copies everything in the request, but with the source & destination addresses swapped, and the message type changed from request to reply. Theoretically, the ICMP checksum needs to be re-computed, but as it is just a sum of 16-bit words, it isn’t affected by the address swap. So we can just re-use the existing checksum, adjusted for the change from the request value of 8 to the response value of 0:

// Receive incoming ICMP data
int ip_rx_icmp(BYTE *data, int dlen)
    ETHERHDR *ehp=(ETHERHDR *)data;
    IPHDR *ip = (IPHDR *)&data[sizeof(ETHERHDR)];
    ICMPHDR *icmp = (ICMPHDR *)&data[sizeof(ETHERHDR)+sizeof(IPHDR)];
    int n;

    if (display_mode & DISP_ICMP)
    if (icmp->type == ICREQ)
        ip_add_eth(data, ehp->srce, my_mac, PCOL_IP);
        ip->dip = ip->sip;
        ip->sip = my_ip;
        icmp->check = add_csum(icmp->check, &icmp->type, 1);
        icmp->type = ICREP;
        n = htons(ip->len);
        return(ip_tx_eth(data, sizeof(ETHERHDR)+n+sizeof(ICMPHDR)));
    else if (icmp->type == ICREP)
        ping_rx_time = ustime();

Example program: ping.c

This program generates pins every 2 seconds, and responds to incoming ping requests. It uses a hard-coded IP addresses, for itself and the target of the outgoing pings:

IPADDR myip   = IPADDR_VAL(192,168,1,165);
IPADDR hostip = IPADDR_VAL(192,168,1,1);

‘myip’ should be set to a suitable unused IP address on your subnet (e.g. 182.168.1 in the above example); you can check if an address is unused by pinging it.

‘hostip’ should be set to the address of another unit on the network that can accept pings, or the address of the WiFi Access Point.

You’ll also need to set the name & password for the WiFi network you are using:

// The hard-coded password is for test purposes only!!!
#define SSID                "testnet"
#define PASSWD              "testpass"

See the PicoWi introduction for a description of the build process, and the connection of a serial console to see the diagnostic messages.

The LED will flash rapidly for a few seconds until the device is connected to the network; it will then switch on when a ping is sent, and off when it is received; on my network, the ping time is generally quite short, so only a brief flash is visible if everything is working correctly.

Ping times

On an Ethernet network, it is usual to see fast & repeatable values for the ping round-trip time. However wireless networks aren’t as predictable, since all units that are on the same radio channels will be competing for air-time, not just with your network, but any other networks within range.

So the response time will vary, depending on the activity of any networks sharing the same WiFi channels; here is a a typical example of 20 pings, using the time reported on the Pico console:

Round-trip time for PicoWi ping

A ping time of 1.2 milliseconds is quite respectable, considering that a Pi 4 on the same network takes a minimum of 1.9 ms.

The graph was plotted using GNUplot; if you want to replicate it, the console output is captured to pings.txt, then pre-processed using awk:

awk -F [=\ ] '/time/ { print $(NF-1) }' pings.txt > pings.csv

This script should also work with the console output of Linux pings. The result is then fed to GNUplot; the command-line has been split into 4 for clarity:

gnuplot -e "set term png size 420,240 font 'sans,8'; \
  set title 'Ping time'; set grid; set key noautotitle; \
  set ylabel 'Time (ms)' offset 2; set output 'pings.png'; \
  plot 'pings.csv' with boxes"
Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 4: scan and join a network

By the end of part 3, the WiFi chip was up & running, and as a simple test of WiFi operation, we’ll next scan the neighbourhood for WiFi networks, then attempt to join a network.

Scanning a network

As a quick check of wireless functionality, it can be useful to scan for WiFi networks within range. Before starting that, we need to send some IOCTL commands to configure various parameters, such as the network band.

The main problem with IOCTL calls is their sheer variety, that might require data in a specific format, or maybe no data at all. I haven’t been able to find a document that describes them, the only publicly-available documentation seems to be the source code . So when developing, it is quite possible to use the wrong IOCTL command, or send it the wrong data, and we need a way of reporting the error, without adding a lot of print function calls.

All my IOCTL functions return 0 if there wasn’t a reply, and -1 if the response indicated an error, so we can just chain commands using the short-circuit AND functionality to ensure execution will stop when an error occurs, and print the last IOCTL command that was executed:

#define IOCTL_WAIT  30 // Time to wait for ioctl response (msec)

const EVT_STR escan_evts[] = {EVT(WLC_E_ESCAN_RESULT), EVT(WLC_E_SET_SSID), EVT(-1)};

// Start a network scan
int scan_start(void)
    int ret;
    ret = ioctl_wr_int32(WLC_SET_SCAN_CHANNEL_TIME, IOCTL_WAIT, SCAN_CHAN_TIME) > 0 &&
        ioctl_set_uint32("pm2_sleep_ret", IOCTL_WAIT, 0xc8) > 0 &&
        ioctl_set_uint32("bcn_li_bcn", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("bcn_li_dtim", IOCTL_WAIT, 1) > 0 &&
        ioctl_set_uint32("assoc_listen", IOCTL_WAIT, 0x0a) > 0 &&
        ioctl_wr_int32(WLC_SET_BAND, IOCTL_WAIT, WIFI_BAND_ANY) > 0 &&
        ioctl_wr_int32(WLC_UP, IOCTL_WAIT, 0) > 0;

// Display last IOCTL if error
void ioctl_err_display(int retval)
    IOCTL_MSG *msgp = &ioctl_txmsg;
    IOCTL_HDR *iohp = (IOCTL_HDR *)&msgp->data[msgp->cmd.sdpcm.hdrlen];
    char *cmds = iohp->cmd==WLC_GET_VAR ? "GET" : 
                 iohp->cmd==WLC_SET_VAR ? "SET" : "";
    char *data, *name;
    if (retval <= 0)
        data = (char *)&msgp->data[msgp->cmd.sdpcm.hdrlen+sizeof(IOCTL_HDR)];
        name = iohp->cmd==WLC_GET_VAR || iohp->cmd==WLC_SET_VAR ? data : "";
        printf("IOCTL error: cmd %lu %s %s\n", iohp->cmd, cmds, name);

We can check the code functioning by forcing an error, e.g. temporarily reducing the timeout value for a command such as ‘bcn_li_dtim’ to zero, in which case the code reports the following which, although somewhat terse, does indicate the source of the problem:

IOCTL error: cmd 263 SET bcn_li_dtim

To start a scan, we need one more IOCTL, with an data structure that sets some more parameters:

#define SSID_MAXLEN         32
#define SCANTYPE_ACTIVE     0

#pragma pack(1)
typedef struct {
    uint32_t version;
    uint16_t action,
    uint32_t ssidlen;
    uint8_t  ssid[SSID_MAXLEN],
    uint32_t nprobes,
    uint16_t nchans,
    uint8_t  chans[14][2],

SCAN_PARAMS scan_params = {
    .version=1, .action=1, .sync_id=0x1, .ssidlen=0, .ssid={0},
    .bssid={0xff,0xff,0xff,0xff,0xff,0xff}, .bss_type=2,
    .scan_type=SCANTYPE_PASSIVE, .nprobes=~0, .active_time=~0,
    .passive_time=~0, .home_time=~0, .nchans=0

ioctl_set_data("escan", IOCTL_WAIT, &scan_params, sizeof(scan_params));

After that command is sent, we should receive several responses in the form of events; at least one from each WiFi network in range. The scan event handler has to byte-swap any 16 or 32-bit values, since they are in ‘network’ byte-order (big-endian); the handler function was described in the previous part of this blog.

It isn’t unusual for the same network to be reported more than once, e.g.

8C:59:73:xx:xx:xx 'Post_Office' chan 3
E8:65:D4:xx:xx:xx 'Court Hotel' chan 1
8C:59:73:xx:xx:xx 'Post_Office' chan 3
00:11:22:xx:xx:xx 'Virginia' chan 6
20:B0:01:xx:xx:xx 'vodafone' chan 6
6A:A2:22:xx:xx:xx '[hidden]' chan 6
..and so on..

In the tests I have done, the total time from power-up to receiving the last scan entry is around 2.1 seconds, which is surprisingly fast, considering how much chip-initialisation has been required.

Joining a network

This requires a large number of IOCTL commands to set up the WiFi interface, and there is little point in my listing all of them here, so I’m concentrating on specific settings of interest.

  • Country: this is required in order to set domain-specific parameters. I’m taking the easy way out, and specifying a country code of ‘XX’, which is a common set of world-wide characteristics.
  • Multicast: there is one MAC address set to 01:00:5E:00:00:FB which is the standard for IP v4
  • Power saving: this is disabled by default, but can be compiled in if required, though it does significantly increase WiFi response times, as the device will sleep when idle, and takes some time to wake up & respond.
  • Authentication: this uses a WPA2 pre-shared key, stored in plaintext, which is a major weakness in network security.
  • Network name: the SSID is also stored as plaintext.

Once the network join has been initiated, we receive a stream of events to show progress. These can be viewed by calling set_display_mode with DISP_EVENT. A typical joining sequence might be:

Join secure network:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

Automatic reassociation after joining a network:
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 14
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
 ..then Rx DATA flow continues..

Join open network (no security):
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  ..then Rx_DATA for broadcast/multicast network traffic..

SSID not found:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Password incorrect:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   7 ASSOC,         flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 15
  Rx_EVT  46 PSK_SUP,       flags 0, status 8, reason 14
  ..then the same sequence repeated..

The ‘status’ values are common to all the events:

  • 0: success
  • 3: no networks
  • 6: unsolicited
  • 8: partial

The ‘reason’ values are specific to an event, for example in PSK_SUP, 14 means that a de-authentication request has been received, and 15 indicates that a timeout of the pre-shared key handshake has occurred.

Also there is no guarantee that the events will arrive in this order; for example, when I tested on a different Access Point, the last 3 events were PSK_SUP, JOIN, and SET_SSID.

I have also tested the responses to network events:

Orderly shutdown of WiFi at the access point:
  Rx_EVT  12 DISASSOC_IND,  flags 0, status 0, reason 8
  Rx_EVT   3 AUTH,          flags 0, status 5, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT  16 LINK,          flags 0, status 0, reason 2

Restore WiFi after orderly shutdown:
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Power-down of the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1

Restore power to the access point:
  Rx_EVT  16 LINK,          flags 0, status 0, reason 1
  Rx_EVT  87 ASSOC_REQ_IE,  flags 0, status 0, reason 0
  Rx_EVT   3 AUTH,          flags 0, status 0, reason 0
  Rx_EVT  88 ASSOC_RESP_IE, flags 0, status 0, reason 0
  Rx_EVT   9 REASSOC,       flags 0, status 0, reason 0
  Rx_EVT  16 LINK,          flags 1, status 0, reason 0
  Rx_EVT  46 PSK_SUP,       flags 0, status 6, reason 0
  Rx_EVT   1 JOIN,          flags 0, status 0, reason 0
  ..then the data flow resumes..

Network unavailable on startup:
  Rx_EVT   0 SET_SSID,      flags 0, status 3, reason 0

Network becomes available after startup:

Try to join a secure network, using no security 
  Rx_EVT   0 SET_SSID,      flags 0, status 0, reason 0

So the good news is that the WiFi chip can automatically reconnect to the network under some circumstances, but the bad news is that it will not always reconnect, and I can find no single event showing if the device is connected or not. Rather than attempting to decode the events in detail, I’ve used an overall timeout for joining a network (default 10 seconds); if that fails there is a rest period (currently also 10 seconds) before the next re-connection attempt.

Example programs

There are two examples; see the introduction for details of how to re-build and run the code.

scan.c does a single scan, and returns a list of networks found. The result returned by the WiFi chip is displayed as-is, so may contain duplicates.

join.c joins a given network, reporting on progress; the network name and password must be entered in the source code:

#define SSID                "testnet"
#define PASSWD              "testpass"

The on-board LED flashes at 5 Hz prior to connection, and at 1 Hz when connected.

In the next part I’ll start using TCP/IP protocols.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 3: IOCTLs and events

Part 2 described how the CYW43439 WiFi chip is initialised, but used an IOCTL call and an event check without explaining what these are, or how they work, so now is the time to rectify that deficiency.

An IOCTL (Input/Output Control) call is sent by the Pico host CPU (RP2040) to the ARM CPU in the WiFi chip, to read or write configuration data, or send a specific command. An event is an unsolicited block of data sent from the WiFi CPU to the host; it can be a notification that an action is complete, or some data that has arrived over the WiFi network.


A simple example of an IOCTL is a request for the 6-byte WiFi MAC address.

uint8_t mac[6];
ioctl_get_data("cur_etheraddr", 10, mac, 6);

This sends the IOCTL command GET_VAR, with a string to identify the item of interest, and a timeout in milliseconds.

#define WLC_GET_VAR 262

// Get data block from IOCTL variable
int ioctl_get_data(char *name, int wait_msec, uint8_t *data, int dlen)
    return(ioctl_cmd(WLC_GET_VAR, name, strlen(name)+1, wait_msec, false, data, dlen));

The request must be packed into a structure, for transmission the the WiFi CPU; this has 2 headers, the first is an ‘SDIO/SPI Bus Layer’ (SDPCM) header, followed by an IOCTL header:

// SDPCM header
typedef struct {
    uint16_t len,       // sdpcm_header.frametag
    uint8_t  seq,       // sdpcm_sw_header

// IOCTL header
typedef struct {
    uint32_t cmd;       // cdc_header
    uint16_t outlen,
    uint32_t flags,

// IOCTL command with SDPCM and IOCTL headers
typedef struct
    SDPCM_HDR sdpcm;
    IOCTL_HDR ioctl;
    uint8_t data[IOCTL_MAX_BLKLEN];

The first two 16-bit words of the SDPCM header contain the data length, and its bitwise inverse, then the most important fields are:

  • Chan: a number identifying which ‘channel’ is associated with the data: IOCTL channel is 0, event is 1, and data is 2.
  • Hdrlen: the length of the SDPCM header plus any padding. My code doesn’t use any padding, but the response from the WiFi chip often has a lot of padding.
  • Flow & Credit: used to track the WiFi buffer utilisation

This is followed by the IOCTL header, with a command number (262 for GET_VAR) and a data length value.

The whole message plus data is written to the SPI interface:

// Do an IOCTL transaction, get response
// Return 0 if timeout, -1 if error response
int ioctl_cmd(int cmd, char *name, int namelen, int wait_msec, int wr, void *data, int dlen)
    IOCTL_CMD *cmdp = &ioctl_txmsg.cmd;
    int txdlen = ((namelen + dlen + 3) / 4) * 4, ret = 0;
    int hdrlen = sizeof(SDPCM_HDR) + sizeof(IOCTL_HDR);
    int txlen = hdrlen + txdlen;

    memset(cmdp, 0, sizeof(ioctl_txmsg));
    cmdp->sdpcm.notlen = ~(cmdp->sdpcm.len = txlen);
    cmdp->sdpcm.seq = sd_tx_seq++;
    cmdp->sdpcm.chan = SDPCM_CHAN_CTRL;
    cmdp->sdpcm.hdrlen = sizeof(SDPCM_HDR);
    cmdp->ioctl.cmd = cmd;
    cmdp->ioctl.outlen = txdlen;
    cmdp->ioctl.flags = ((uint32_t)ioctl_reqid++ << 16) | (wr ? 2 : 0);
    if (namelen)
        memcpy(cmdp->data, name, namelen);
    if (wr && dlen>0)
        memcpy(&cmdp->data[namelen], data, dlen);
    wifi_data_write(SD_FUNC_RAD, 0, (void *)cmdp, txlen);
    ..continued below..

The code now waits for a response, but it is important to note that the first response it receives may be associated with a completely different request, or network data. So it is essential to check that the response matches the command, and if not, keep on checking for a matching response.

    ..continued from above..
    while (wait_msec>=0 && !(ret=ioctl_resp_match(cmd, data, dlen)))
        wait_msec -= IOCTL_POLL_MSEC;        
        usdelay(IOCTL_POLL_MSEC * 1000);

// Read an ioctl response, match the given command, any command if 0
// Return 0 if no response, -1 if error response
int ioctl_resp_match(int cmd, void *data, int dlen)
    int rxlen=0, n=0, hdrlen;
    IOCTL_MSG *rsp = &ioctl_rxmsg;
    IOCTL_HDR *iohp;
    if ((rxlen = event_read(rsp, 0, 0)) > 0)
        iohp = (IOCTL_HDR *)&rsp->data[rsp->cmd.sdpcm.hdrlen]; 
        hdrlen = rsp->cmd.sdpcm.hdrlen + sizeof(IOCTL_HDR);
        if (rsp->rsp.chan==SDPCM_CHAN_CTRL && 
            (cmd==0 || cmd==iohp->cmd))
            n = MIN(dlen, rxlen-hdrlen);
            if (data && n>0)
                memcpy(data, &rsp->data[hdrlen], n);
            if (cmd)
                if (iohp->status)
                    n = -1;
    return(cmd==0 ? rxlen : n>0 ? n : 0);

You’ll note that the response has been obtained using the ‘event_read’ function, which handles all incoming data (solicited or unsolicited) from the WiFi interface; it will be described in detail below.

The IOCTL response has a similar format to the request, except that it generally has a lot of padding after the SDPCM header. This means that (unlike the transmit message) the receiver has to decode the SDPCM header ‘hdrlen’ value, in order to know how much padding has been added in front of the IOCTL header.

In addition to the IOCTL GET_VAR call that reads the value of a variable, given its name as a string, and its partner SET_VAR that writes a new value to that variable, there nearly 300 other IOCTL calls, such as SET_ANTDIV (command 64) which controls the antenna diversity, or UP (command 2) which is used to activate the WiFi interface.


The WiFi chip signals an event when it has something to report to the host processor, for example it has succeeded in joining a WiFi network, or it has just received a data packet from that network.

As discussed above, there is a time-delay associated with any IOCTL command, so the IOCTL response might arrive within a stream of other events. So my code treats any incoming message as a potential event, and establishes its purpose by decoding the SDPCM header.

This raises the question of how the host CPU knows that there is an incoming event; the answer is that it can poll the BUS_SPI_STATUS_REG, to see if the ‘function 2 packet available’ flag is set. Alternatively, to avoid excessive polling cycles, the host can just check the IRQ line (described in part 1) and if that is high, there is an event pending. I use a combined approach; check the IRQ line, but is there hasn’t been any event for 10 milliseconds, check the status register:

#define SPI_STATUS_LEN_SHIFT            9
#define SPI_STATUS_LEN_MASK             0x7ff

// Get ioctl response, async event, or network data.
int event_get_resp(void *data, int maxlen)
    uint32_t val=0;
    int rxlen=0;
    val = wifi_reg_read(SD_FUNC_BUS, SPI_STATUS_REG, 4);
    if ((val != ~0) && (val & SPI_STATUS_PKT_AVAIL))
        rxlen = (val >> SPI_STATUS_LEN_SHIFT) & SPI_STATUS_LEN_MASK;
        rxlen = MIN(rxlen, maxlen);
        // Read event data if present
        if (data && rxlen>0)
            wifi_data_read(SD_FUNC_RAD, 0, data, rxlen);
        // ..or clear interrupt, and discard data
            val = wifi_reg_read(SD_FUNC_BUS, SPI_INTERRUPT_REG, 2);
            wifi_reg_write(SD_FUNC_BUS, SPI_INTERRUPT_REG, val, 2);
            wifi_reg_write(SD_FUNC_BAK, SPI_FRAME_CONTROL, 0x01, 1);

The status register has a flag to indicate data is available on function 2 (the radio interface), and also a length value, indicating how many bytes there are to read. Once that has been read in, the SDPCM header is checked, and the data after that header is copied into a buffer.

// Get ioctl response, async event, or network data
// Optionally copy data after SDPCM & BDC headers into a buffer, return its length
int event_read(IOCTL_MSG *rsp, void *data, int dlen)
    int rxlen=0, n=0, hdrlen;
    SDPCM_HDR *sdp=&rsp->cmd.sdpcm;
    BDC_HDR *bdcp;
    if ((rxlen = event_get_resp(rsp, sizeof(IOCTL_MSG))) >= sizeof(SDPCM_HDR)+sizeof(BDC_HDR))
        if ((sdp->len ^ sdp->notlen) == 0xffff)
            hdrlen = sdp->hdrlen;
            bdcp = (BDC_HDR *)&rsp->data[hdrlen];
            hdrlen += sizeof(BDC_HDR) + bdcp->offset*4;
            n = MIN(dlen, rxlen-hdrlen);
            if (data && n>0)
                memcpy(data, &rsp->data[hdrlen], n);
    return(dlen>0 ? (n>0 ? n : 0) : rxlen);

At the top of these function calls is the polling function, which stores the SDPCM values in a local structure (EVENT_INFO), and takes appropriate action with the data. The reason why a local structure is used is that the event header is in ‘network’ byte-order, which is big-endian (most-significant byte first), so the data is byte-swapped before being stored locally.

Since there may be multiple event handlers, and the the the polling function can’t know which one is the correct destination for the event, it calls each one in turn, stopping when one returns a non-zero value, indicating that it has accepted the event.

// Poll for async event, put results in info structure
int event_poll(void)
    EVENT_INFO *eip = &event_info;
    IOCTL_MSG *iomp = &ioctl_rxmsg;
    ESCAN_RESULT *erp=(ESCAN_RESULT *)rxdata;
    EVENT_HDR *ehp = &erp->eventh;
    int n = event_read(iomp, rxdata, sizeof(rxdata));
    if (n > 0)
        eip->chan = iomp->rsp.sdpcm.chan;
        eip->flags = SWAP16(ehp->flags);
        eip->event_type = SWAP32(ehp->event_type);
        eip->status = SWAP32(ehp->status);
        eip->reason = SWAP32(ehp->reason);
        eip->data = rxdata;
        eip->dlen = n;
        if (eip->chan == SDPCM_CHAN_CTRL)
            display(DISP_EVENT, "\n");
        else if ((eip->chan==SDPCM_CHAN_EVT || eip->chan==SDPCM_CHAN_DATA) &&
            n >= sizeof(ETHER_HDR)+sizeof(BCMETH_HDR)+sizeof(EVENT_HDR))
            ok = event_handle(eip);

Handling an event

The code calls handler functions in turn, until one returns a non-zero value, indicating it has accepted the event.

#define MAX_HANDLERS    10
typedef int (*event_handler_t)(EVENT_INFO *eip);
event_handler_t event_handlers[MAX_HANDLERS];
int num_handlers;

// Run event handlers, until one returns non-zero
int event_handle(EVENT_INFO *eip)
    int ret=0;
    for (int i=0; i<num_handlers && !ret; i++)
        ret = event_handlers[i](eip);

An event handler is called with a pointer to the EVENT_INFO structure, which basically contains a copy of the SDPCM header information (in the correct byte-order) and a pointer to the data after that header. The function must return zero if it hasn’t recognised the event. As an example, here is a simple handler that displays the result of a network scan:

// Handler for scan events
int scan_event_handler(EVENT_INFO *eip)
    ESCAN_RESULT *erp=(ESCAN_RESULT *)eip->data;
    int ret = eip->chan==SDPCM_CHAN_EVT && eip->event_type==WLC_E_ESCAN_RESULT;
    if (ret)
        if (erp->eventh.status == 0)
            printf("Scan complete\n");
            ret = -1;
            printf("%s '", mac_addr_str(erp->info.bssid));
            printf("' chan %d\n", SWAP16(erp->info.channel));

Note that the ESCAN_RESULT data is in ‘network’ byte-order, so needs to be byte-swapped before being displayed.

This handler has to be added to the array of handlers using a function call:


This allows you to implement your own event handlers, in addition to, or instead of, the functions I have provided.

Enabling events

There are over 140 possible events, and by default they are disabled; we need to enable those we are interested in, such as network authentication & joining, so we can detect any problems.

The enabling process uses a (very large) bitfield, each bit indicating whether an event is enabled or disabled; the resulting byte array is sent to the WiFi CPU using an IOCTL call.

#define EVENT_MAX           208
#define SET_EVENT(msk, e)   msk[4 + e/8] |= 1 << (e & 7)

uint8_t event_mask[EVENT_MAX / 8];

// Enable events
int events_enable(const EVT_STR *evtp)
    memset(event_mask, 0, sizeof(event_mask));
    while (evtp->num >= 0)
        if (evtp->num / 8 < sizeof(event_mask))
            SET_EVENT(event_mask, evtp->num);
    return(ioctl_set_data("bsscfg:event_msgs", 10, event_mask, sizeof(event_mask)));

I have used an unusual method to specify the events that are to be enabled; a macro is used to store the event number, and a string corresponding to the event name. This means that I can display event names (instead of numbers) on a diagnostic console, which is very useful to show any problems.

// Storage for event number, and string for diagnostics
typedef struct {
    int num;
    char *str;
#define EVT(e)      {e, #e}


In the next part of this project we’ll be scanning and joining a network.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 2: initialisation

PicoWi initialisation steps

In part 1, I described the low-level hardware & software interface to the Broadcom / Cypress / Infineon CYW43439 WiFi chip, and its close relative, the CYW4343W.

Now we need to initialise the chip, so it is ready to receive network commands and data. This involves sending three files to the chip, and unfortunately there is no simple Application Programming Interface (API) to do this; it is necessary to send a detailed sequence of commands, and if there are any errors, you usually end up with a completely unresponsive WiFi chip.

The first step is to check the Active Low Power (ALP) clock; this involves setting a register, then waiting for the WiFi chip to acknowledge that setting.

bool wifi_init(void)
    wifi_reg_write(SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, SD_ALP_REQ, 1);
    if (!wifi_reg_val_wait(10, SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, 
                           SD_ALP_AVAIL, SD_ALP_AVAIL, 1))

This wait-and-loop scenario is quite common, so I’ve created a specific function for it:

// Check register value every msec, until correct or timeout
bool wifi_reg_val_wait(int ms, int func, int addr, uint32_t mask, uint32_t val, int nbytes)
    bool ok;
    while (!(ok=wifi_reg_read_check(func, addr, mask, val, nbytes)) && ms--)

// Read & check a masked value, return zero if incorrect
bool wifi_reg_read_check(int func, int addr, uint32_t mask, uint32_t val, int nbytes)
    return((wifi_reg_read(func, addr, nbytes) & mask) == val);

A value is obtained from the register, masked with an AND-function, then compared with the required value. If the comparison is false, the code delays for one millisecond, then tries again, until the given time (in milliseconds) has expired.

This raises the question as to what the code should do when it encounters an error such as this; should it try to re-send the command? In practice, the timeout generally means that the internal state of the chip is incorrect; for example, there may have been a bug in the code, or a power glitch, and the only way to correct this situation is to re-power the chip, and start again – fortunately the initialisation process is quite fast (it only takes a few seconds) so this isn’t a major problem.

Assuming the ALP check passes, there are some more register write cycles that I can’t explain in detail, as I don’t have access to any information about the chip that isn’t publicly available.

We then make 2 writes to registers in banked memory, that do deserve more explanation.

#define BAK_BASE_ADDR           0x18000000
#define SRAM_BASE_ADDR          (BAK_BASE_ADDR+0x4000)

wifi_bak_reg_write(SRAM_BANKX_IDX_REG, 0x03, 4);
wifi_bak_reg_write(SRAM_BANKX_PDA_REG, 0x00, 4);

The backplane function can only access a 32K block in the WiFi RAM, so the two addresses we’re writing (18004010 and 18004044 hex) are outside its access range, and we have to use bank-switching. There is a simple check to see if the bank has changed since the last access, in which case no switching is needed:

#define SB_32BIT_WIN    0x8000
#define SB_ADDR_MASK    0x7fff
#define SB_WIN_MASK     (~SB_ADDR_MASK)

// Set backplane window if address has changed
void wifi_bak_window(uint32_t addr)
    static uint32_t lastaddr=0;

    addr &= SB_WIN_MASK;
    if (addr != lastaddr)
        wifi_reg_write(SD_FUNC_BAK, BAK_WIN_ADDR_REG, addr>>8, 3);
    lastaddr = addr;

// Write a 1 - 4 byte value via the backplane window
int wifi_bak_reg_write(uint32_t addr, uint32_t val, int nbytes)
    return(wifi_reg_write(SD_FUNC_BAK, addr, val, nbytes));

We can now load the binary ARM firmware into the WiFi processor; it is in a file that is unique to the specific Wifi chip, so different files are needed for the CYW43439 and CYW4343w; these are the only differences in the way the chips are programmed.

const unsigned char fw_firmware_data[] = {
..and so on..
const unsigned int fw_firmware_len = sizeof(fw_firmware_data);

wifi_data_load(SD_FUNC_BAK, 0, fw_firmware_data, fw_firmware_len);

As previously mentioned, the backplane can only access a 32K block, and each SPI access to the backplane is limited to 64 bytes, so the data loading function walks though the memory in 64-byte blocks, and when 32K is reached, the access window is moved up, and the loading resumes at address 0.

#define MAX_BLOCKLEN    64

// Load data block into WiFi chip (CPU firmware or NVRAM file)
int wifi_data_load(int func, uint32_t dest, const unsigned char *data, int len)
    int nbytes=0, n;
    uint32_t oset=0;
    dest &= SB_ADDR_MASK;
    while (nbytes < len)
        if (oset >= SB_32BIT_WIN)
            oset -= SB_32BIT_WIN;
        n = MIN(MAX_BLOCKLEN, len-nbytes);
        wifi_data_write(func, dest+oset, (uint8_t *)&data[nbytes], n);
        nbytes += n;
        oset += n;

After a delay to allow the WiFi chip to settle, the next item to be loaded is the non-volatile RAM (NVRAM) data. This is in the form of a C character array, with each entry being null-terminated, e.g.

const unsigned char fw_nvram_data[0x300] = {
    "manfid=0x2d0"  "\x00"
    "prodid=0x0727" "\x00"
    "vendid=0x14e4" "\x00"
    ..and so on..
const unsigned int fw_nvram_len = sizeof(fw_nvram_data);

Loading this file uses the same function as the firmware, with a different base, and a write-cycle to confirm the file length:

#define NVRAM_BASE_ADDR     0x7FCFC

wifi_data_load(SD_FUNC_BAK, NVRAM_BASE_ADDR, fw_nvram_data, fw_nvram_len);
n = ((~(fw_nvram_len / 4) & 0xffff) << 16) | (fw_nvram_len / 4);
wifi_reg_write(SD_FUNC_BAK, SB_32BIT_WIN | (SB_32BIT_WIN-4), n, 4);

Now it is necessary to reset the WiFi processor core, wait for the indication that the High Throughput (HT) clock is available, then wait for an ‘event’ that signals the device is ready. The most common fault I experienced when developing the code was that it gets stuck at this point, waiting for a confirmation that never comes.

// Reset, and wait for High Throughput (HT) clock ready
if (!wifi_reg_val_wait(50, SD_FUNC_BAK, BAK_CHIP_CLOCK_CSR_REG, 
                              SD_HT_AVAIL, SD_HT_AVAIL, 1))
// Wait for backplane ready
if (!wifi_rx_event_wait(100, SPI_STATUS_F2_RX_READY))

Events are the main way that the WiFi chip sends signals or data asynchronously to the RP2040; for a detailed description of how they work, see the next part.

Once the system has signalled it is ready, the Country Locale Matrix (CLM) file has to be loaded. This binary file limits the WiFi parameters (e.g. transmit power level) to be within the regulatory constraints for the specific RF hardware and locale.

const unsigned char fw_clm_data[] = {
    ..and so on..
const unsigned int fw_clm_len = sizeof(fw_clm_data);

wifi_clm_load(fw_clm_data, fw_clm_len);

The loading function uses a specific structure to send IOCTL data blocks to the WiFi chip, with flags to mark the beginning & end of the sequence:

#define MAX_LOAD_LEN        512

typedef struct {
	uint16_t flag;
	uint16_t type;
	uint32_t len;
	uint32_t crc;

typedef struct {
    char req[8];
    CLM_LOAD_HDR hdr;

// Load CLM
int wifi_clm_load(const unsigned char *data, int len)
    int nbytes=0, oset=0, n;
    CLM_LOAD_REQ clr = {.req="clmload", .hdr={.type=2, .crc=0}};
    while (nbytes < len)
        n = MIN(MAX_LOAD_LEN, len-nbytes);
        clr.hdr.flag = 1<<12 | (nbytes?0:2) | (nbytes+n>=len?4:0);
        clr.hdr.len = n;
        ioctl_set_data2((void *)&clr, sizeof(clr), 1000, (void *)&data[oset], n);
        nbytes += n;
        oset += n;

Example program

To show all this code in action, we can run our first complete program;

// PicoWi blinking LED test

#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
#include "picowi_pico.h"
#include "picowi_spi.h"
#include "picowi_init.h"

int main() 
    uint32_t led_ticks;
    bool ledon=false;
    printf("PicoWi LED blink\n");
    if (!wifi_setup())
        printf("Error: SPI communication\n");
    else if (!wifi_init())
        printf("Error: can't initialise WiFi\n");
        ustimeout(&led_ticks, 0);
        while (1)
            if (ustimeout(&led_ticks, 500000))
                wifi_set_led(ledon = !ledon);

This sets up the WiFi chip as described above, prints the 6-byte MAC address, then just loops, flashing the LED that is attached to the Wifi chip at 1 Hz.

The display_mode function controls how much diagnostic information you might want to see, using a bitfield so you can combine multiple options:

// Display mask values
#define DISP_NOTHING    0       // No display
#define DISP_INFO       0x01    // General information
#define DISP_SPI        0x02    // SPI transfers
#define DISP_REG        0x04    // Register read/write
#define DISP_SDPCM      0x08    // SDPCM transfers
#define DISP_IOCTL      0x10    // IOCTL read/write
#define DISP_EVENT      0x20    // Event reception
#define DISP_DATA       0x40    // Data transfers

This can potentially provide a lot of diagnostic information, the main limitation being the speed of the console display – I use a serial link at 460800 baud, since the default of 9600 is much too slow. To see some of the internal workings of PicoWi, try:


You can call this function multiple times with different mode values, to concentrate the diagnostic information on a specific area of interest, and avoid displaying a lot of unwanted information while the WiFi chip is being initialised.

The other unusual feature is the use of the ustimeout function, which I’ve used it in place of the more conventional delay function call, as I don’t want the delay to block all other CPU activity. In a simple program this isn’t an issue, but in later examples I want to do other things (such as checking for events) while waiting for the LED to blink, so can’t use a simple delay.

The ustimeout function takes two arguments; a pointer to a variable, and a timeout value in microseconds (zero if immediate). When the specified time has elapsed, the function returns a non-zero value and reloads the variable with the current time. So you can add extra function calls to the main loop, without affecting the LED blinking.

The code to control the LED uses a single Device I/O Control (IOCTL) call with 2 arguments; the first is a bit-mask, and the second is the value:

#define SD_LED_GPIO     0

// Set WiFi LED on or off
void wifi_set_led(bool on)
    ioctl_set_intx2("gpioout", 10, 1<<SD_LED_GPIO, on ? 1<<SD_LED_GPIO : 0);

For details on how to build & run this example program, see the introduction.

IOCTL calls are the primary mechanism for high-level communication with the WiFi chip; see the next part for a detailed description.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi part 1: low-level interface

Pi Pico W wireless architecture

The WiFi interface on the Pico W uses the Broadcom/Cypress/Infineon CYW43439; this is a ‘full’ Media Access and Control (MAC) chip, so in theory you can just tell it to join a network, or send a block of data, and it’ll handle all the low-level operations.

However, in practice there is a lot more complication than that, and it takes a very large number of carefully-timed commands before the chip will start up, let alone do anything useful. This is because it actually contains two processors (ARM M3 and D11), each with their own memory and I/O, and they both have to be programmed before any network operations can start.

The CYW43439 is part of a large family of 43xxx wireless interfaces; most use PCI, USB or SDIO interface for communications with a host processor, but in this case communications is via a Serial Peripheral Interface (SPI) that is half-duplex, i.e. a single wire is used to carry commands & data to the WiFi chip, and also the responses from that chip, as described in the device datasheet.

SPI interface

Pi Pico-W interface to CYW43439

This excerpt from the Pico-W circuit diagram shows the interface between the CYW43439 WiFi chip and the RP2040 CPU. The connections on the WiFi chip are labelled as if there were an SDIO interface, since they are dual-function:

SDIO functionSPI functionRP2040 pin
WL_REG_ONPower up (ON)GP23
SDIO_DATA0Data out (MISO)GP24 via 470R
SDIO_DATA1Interrupt (IRQ)GP24 via 10K
SDIO_DATA2Mode select (SEL)GP24
SDIO_DATA3Chip select (CS)GP25

On chips with dual interfaces, the state of DATA2 at power-up determines which interface is to be used; for SPI, this pin must be held low, before REG_ON is set high to power up the chip.

A single data line is shared between CMD for commands & data going to the WiFi chip, and DATA0 for the returned responses. Just in case there is a clash of I/O (e.g. both the CPU and WiFi chip transmitting at the same time) there is a 470 ohm protection resistor in series with DATA0.

The chip-select line is as usual for SPI interfaces; when it is high, the WiFi interface is disconnected from the data lines. This allows the over-worked data line to be used for a third purpose, namely interrupt request (IRQ) to the RP2040 CPU; when the interface is idle, IRQ is normally low, but goes high when the WiFi chip has some data to send (e.g. a new data packet has been received). To ensure that the IRQ line doesn’t interfere with communications, it is connected via a 10K resistor.

Debug with CYW4343W

When developing this software, there was a major problem; the Pico-W components and PCB tracks are so fine that I couldn’t attach an oscilloscope or logic analyser to the SPI connections. This makes debugging the low-level drivers very difficult, especially when programming the PIO peripheral in the RP2040.

The solution I adopted was to add a second CYW43xxx interface to the Pico; unfortunately I couldn’t find a convenient CYW43439 module, but Avnet sell an FPGA add-on board (part number AES-PMOD-MUR-1DX-G) with a Murata 1DX module containing a CYW4343W. This is sufficiently similar to the 43439 that no code modifications are required, just a different firmware file, as described in part 2 of this post.

Circuitry to add Murata 1DX (CYW4343W) module to Pi Pico

I’ve had to tweak the resistor values slightly; this is because D0 – D4 pins on the module have 10K pullup resistors, so R2 has to be lower to compensate.

The choice of GPIO pins is completely arbitrary, since the code can work with any pin taking any function; I chose D0 – D3 as GP16 – GP19, since it will allow me to experiment with an SDIO interface at a future date. If you are only interested in emulating the standard Pico-W interface, then the connections to GP16 – GP18 can be omitted, since the SPI select line (D2) has a pull-down resistor.

The resulting circuitry fits very neatly onto a Pico prototyping board; by keeping the connections short, it works fine with SPI speeds up to 16 MHz. The only problems I encountered in constructing the hardware were that the PMOD connector has an unusual pin-numbering, and is mis-labelled as Bluetooth.

Pi Pico with Murata 1DX (CYW4343W) add-on module

Using this setup, it is easy to capture the SPI waveforms; here is an oscilloscope trace using a (relatively leisurely) 2 MHz clock for clarity.

SPI transfer waveforms

This shows a 4-byte command on the MOSI line, followed by a 4-byte MISO response, which also appears on the MOSI line, due to the 470 ohm resistor linking the two.

SPI software

Normally we’d just use the RP2040 built-in SPI controller to access the WiFi chip, but that has specific sets of pins it can use, which differ from those that are connected to the chip. This isn’t a major problem, as we can use the built-in Programmble I/O (PIO) to do the transfers, but initially I’d just like to check that the hardware works, before diving into PIO programming. So the first step is to write a ‘bit-bashed’ driver, that uses direct access to the I/O bits.

The write-cycle is quite conventional, where a bit (most-significant bit first) is put onto the data line, and the clock is toggled high then low:

// Write data to SPI interface
void spi_write(uint8_t *data, int nbits)
    uint8_t b=0;
    int n=0;

    io_mode(SD_CMD_PIN, IO_OUT);
    b = *data++;
    while (n < nbits)
        IO_WR(SD_CMD_PIN, b & 0x80);
        IO_WR(SD_CLK_PIN, 1);
        b <<= 1;
        if ((++n & 7) == 0)
            b = *data++;
        IO_WR(SD_CLK_PIN, 0);
    io_mode(SD_CMD_PIN, IO_IN);

If the command is a data-read, the data is read immediately, then the clock is toggled high and low. This is unusual, and can cause confusion in some protocol decoders:

// Read data from SPI interface
void spi_read(uint8_t *data, int nbits)
    uint8_t b;
    int n=0;

    while (n < nbits)
        b = IO_RD(SD_DIN_PIN);
        IO_WR(SD_CLK_PIN, 1);
        if ((n++ & 7) == 0)
            *++data = 0;        
        *data = (*data << 1) | b;
        IO_WR(SD_CLK_PIN, 0);

A write-cycle to the WiFi chip involves creating a command message as described in the data sheet, setting chip-select (CS) low, and transferring that message, followed by the data.

#define SWAP16_2(x) ((((x)&0xff000000)>>8) | (((x)&0xff0000)<<8) | \
                    (((x)&0xff00)>>8)      | (((x)&0xff)<<8))
#define SD_FUNC_BUS         0
#define SD_FUNC_BAK         1
#define SD_FUNC_RAD         2
#define SD_FUNC_SWAP        4
#define SD_FUNC_MASK        (SD_FUNC_SWAP - 1)

typedef struct
    uint32_t len:11, addr:17, func:2, incr:1, wr:1;

// SPI message
typedef union
    SPI_MSG_HDR hdr;
    uint32_t vals[2];
    uint8_t bytes[2048];

// Write a data block using SPI
int wifi_data_write(int func, int addr, uint8_t *dp, int nbytes)
    SPI_MSG msg={.hdr = {.wr=1, .incr=1, .func=func&SD_FUNC_MASK,
                         .addr=addr, .len=nbytes}};
    if (func & SD_FUNC_SWAP)
        msg.vals[0] = SWAP16_2(msg.vals[0]);
    io_out(SD_CS_PIN, 0);
    if (nbytes <= 4)
        memcpy(&msg.bytes[4], dp, nbytes);
        spi_write((uint8_t *)&msg, 64);
        spi_write((uint8_t *)&msg, 32);
        spi_write(dp, nbytes*8);
    io_out(SD_CS_PIN, 1);

The command header contains:

  • Length: byte-count of data
  • Address: location to receive the data
  • Function number: destination for the transfer
  • Increment: flag to enable address auto-increment
  • Write: flag to indicate a write-cycle

The unusual item is the function number, that selects which peripheral within the WiFi chip will receive the data; a value of 0 selects the SPI interface, 1 the backplane, and 2 the radio. Functions 0 & 1 are limited to a maximum size of 64 bytes, and are generally used for device configuration, whilst function 2 is used for transferring network data, which can be up to 2048 bytes. I’ve also created a dummy function 4, which is used to signal that a word-swap is required, when the chip is uninitialised.

The read cycle has a similar structure:

// Read data block using SPI
int wifi_data_read(int func, int addr, uint8_t *dp, int nbytes)
    SPI_MSG msg={.hdr = {.wr=0, .incr=1, .func=func&SD_FUNC_MASK,
                         .addr=addr, .len=nbytes}};
    uint8_t data[4];

    if (func & SD_FUNC_SWAP)
        msg.vals[0] = SWAP16_2(msg.vals[0]);
    else if (func == SD_FUNC_BAK)
        msg.hdr.len += 4;
    io_out(SD_CS_PIN, 0);
    spi_write((uint8_t *)&msg, 32);
    io_mode(SD_CMD_PIN, IO_IN);
    if (func == SD_FUNC_BAK)
        spi_read(data, 32);
    spi_read(dp, nbytes*8);
    io_mode(SD_CMD_PIN, IO_OUT);
    io_out(SD_CS_PIN, 1);

When making a ‘backplane’ read, the first 4 return bytes are discarded; they are padding to give the remote peripheral time to respond.

Now that we have the necessary read/write functions, we can perform a simple check to see if the WiFi chip is responding. The data sheet describes several ‘gSPI registers’ and the ‘test read-only’ register at address 0x14 has the defined constant 0xFEEDBEAD. The first attempt to read this register generally fails, but subsequent reads should return the desired value:

#define SPI_TEST_VALUE 0xfeedbead
bool ok=0;
for (int i=0; i<4 && !ok; i++)
    val = wifi_reg_read(SD_FUNC_BUS_SWAP, 0x14, 4);
    ok = (val == SPI_TEST_VALUE);
if (!ok)
    printf("Error: SPI test pattern %08lX\n", val);

Next we need to configure the SPI interface to our preferences, using register 0, as described in the datasheet. The main change is to eliminate the awkward byte-swapping, but to do that, we need to send a byte-swapped command:

// Write a register using SPI
int spi_reg_write(int func, uint32_t addr, uint32_t val, int nbytes)
    if (func&SD_FUNC_SWAP && nbytes>1)
        val = SWAP16_2(val);
    return(wifi_data_write(func, addr, (uint8_t *)&val, nbytes));

wifi_reg_write(SD_FUNC_BUS_SWAP, SPI_BUS_CONTROL_REG, 0x204b3, 4);

Now we can re-read the test register without the awkward byte-swapping:

wifi_reg_read(SD_FUNC_BUS, 0x14, 4);

Another parameter we’ve set is ‘high-speed mode’, which means that reading & writing occur on the rising clock edge.

Using RP2040 PIO

To maximise the speed of SPI transfers, we need to use a peripheral within the RP2040 CPU. Normally this would be an SPI controller, but this can not control the pins that are connected to the WiFi chip, so we have to use the Programmable I/O (PIO) peripheral instead.

There are plenty of online tutorials explaining how PIO works; it is basically a small state-machine that is programmed in assembly-language. It operates in a highly deterministic fashion, at a rate of up to 125M instructions per second, so is ideally suited to handling the SPI interface.

I wanted to use the PIO as a direct replacement for the bit-bashed spi_read and spi_write functions described above, so the PIO program is:

; Pico PIO program for half-duplex SPI transfers
.program picowi_pio
.side_set 1
.origin 0
public stall:               ; Stall here when transfer complete
    pull            side 0  ; Get byte to transmit from FIFO
    nop             side 0  ; Idle with clock low
    in pins, 1      side 0  ; Fetch next Rx bit
    out pins, 1     side 0  ; Set next Tx bit 
    nop             side 1  ; Idle high
    jmp !osre loop1 side 1  ; Loop if data in shift reg
    push            side 0  ; Save Rx byte in FIFO

The origin statement ensures the program is loaded at address zero, rather than the default, which is to load it at the top of program memory.

The ‘pull’ instruction fetches an 8-bit value from the transmit first-in first-out (FIFO) buffer, then there is a loop to output & input the individual bits of that byte until the transmit shift register is empty, and the receive register is full, so the latter can be pushed onto the receive FIFO.

So for SPI write, the associated C code just needs to keep the 4-entry transmit FIFO topped up with the outgoing data, and discard the incoming data, so the receive FIFO doesn’t overflow.

static PIO my_pio = pio0;
uint my_sm = pio_claim_unused_sm(my_pio, true);
io_rw_8 *my_txfifo = (io_rw_8 *)&my_pio->txf[0];

// Write data block to SPI interface,
// When complete, set data pin as I/P (so it is available for IRQ)
void pio_spi_write(unsigned char *data, int len)
    pio_sm_clear_fifos(my_pio, my_sm);
    while (len)
        if (!pio_sm_is_tx_fifo_full(my_pio, my_sm))
            *my_txfifo = *data++;
            len --;
        if (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
            pio_sm_get(my_pio, my_sm);
    while (!pio_sm_is_tx_fifo_empty(my_pio, my_sm) || !pio_complete())
        while (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
            pio_sm_get(my_pio, my_sm);
    pio_sm_get(my_pio, my_sm);

The SPI read code fills the transmit FIFO with null bytes, and fetches the incoming data from the receive FIFO:

// Read data block from SPI interface
void pio_spi_read(unsigned char *data, int rxlen)
    int txlen=rxlen;
    pio_sm_clear_fifos(my_pio, my_sm);
    while (rxlen > 0 || !pio_complete())
        if (txlen>0 && !pio_sm_is_tx_fifo_full(my_pio, my_sm))
            *my_txfifo = 0;
        if (!pio_sm_is_rx_fifo_empty(my_pio, my_sm))
            *data++ = pio_sm_get(my_pio, my_sm);

Since the reading of a WiFi register involves an SPI write cycle closely followed by a read cycle, it is important that the write cycle is complete before the read cycle starts. This issue proved to be the biggest problem with the PIO code; it is easy to detect when the transmit FIFO is empty, but the code must carry on waiting until the last bit of the last byte has been shifted out. This means that I have to use an explicitly-coded loop in the assembly language, with a check of shift-register-empty, rather than using the auto-load capability of the input & output instructions.

The other tricky issue was how the assembly-language program should signal to the C program that the output-shift is complete. In theory, I can use an IRQ flag to do this signalling, but in practice I could not make that work reliably – the technique would only work at specific clock frequencies, which suggested that there might be a critical race between the two sets of code. The problem with timing-sensitive code is that a small unrelated change to the main program (e.g. addition of an interrupt) can cause the code to fail in a manner that is very difficult to diagnose, so it is essential that the code works reliably over a wide range of SPI frequencies.

The solution I adopted is encapsulated in the pio_complete function:

// Check to see if PIO transfer complete (stalled at FIFO pull)
static inline int pio_complete(void)
    return(my_pio->sm[my_sm].addr == picowi_pio_offset_stall);

This compares the current PIO execution address with the ‘stall’ label in the PIO code; if the transmit FIFO is empty, and this comparison is true, then the PIO is stalled waiting for more data, having shifted out everything it was given.

In a single-threaded program, it won’t be too difficult to keep the transmit FIFO topped up, and the receive FIFO emptied, so there is no risk of an overflow or underflow causing problems. However, this is more difficult when the program is multi-tasking, so it will be necessary to add Direct Memory Access (DMA) transfers to the current code.

In the next part we’ll initialise the WiFi chip.

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

PicoWi: standalone WiFi driver for the Pi Pico W


Raspberry Pi Pico W

The aim of this project is to provide a fast WiFi driver for the CYW43439 chip on the Pi Pico W module, with C code running on the RP2040 processor; it can also be used with similar Broadcom / Cypress / Infineon chips that have an SPI interface, such as the CYW4343W.

It is based on my Zerowi project which performs a similar function on the Pi Zero device CYW43438, which has an SDIO interface. However, due to myriad difficulties getting the code running, the code has been restructured and simplified to emphasise the various stages in setting up the chip, and to provide copious run-time diagnostics.

The structured approach of the WiFi drivers is mirrored in the example programs, and the individual parts of this blog; they range from a simple LED-flash program, to one that provides some TCP/IP functionality.

A major problem with debugging Pico-W code is the difficulty attaching any hardware diagnostic tools, such as an oscilloscope or logic analyser; this has been addressed by supporting an add-on board with a Murata 1DX module and CYW4343W chip; full details are given in part 1 of this project.

Development environment

For simplicity, I use a Raspberry Pi 4 to build the code and program the Pico, with two I/O lines connected to the Pico SWD interface. This is really easy to set up, using a single script that installs the SDK and all the necessary software tools on the Pi 4:

wget https://raw.githubusercontent.com/raspberrypi/pico-setup/master/pico_setup.sh
chmod +x pico_setup.sh

For SWD programming, the Pico must be connected to the I/O pins on the Pi as follows:

Pico SWCLK   Pi pin 22 (GPIO 25)
Pico GND     Pi pin 20
Pico SWDIO   Pi pin 18 (GPIO 24)

The serial interface is used extensively for displaying diagnostic information; pin 1 of the Pico is the serial output, and pin 3 is ground. I use a 3.3 volt FTDI USB-serial adaptor to display the serial output on a PC, but the Pi 4 serial input could be used instead.

If you want to develop using a Windows PC, you can find several online guides as to how this might be set up, but I experienced problems with 2 of the methods I tried. So far, the only good setup for Windows I’ve found is VisualGDB, which has a wide range of very useful features, but is not free.

The PicoWi source code is on github. To load it on the Pi 4:

cd ~
git clone http://github.com/jbentham/picowi
cd ~/picowi/build
chmod +x prog
chmod +x reset

I’ve included the necessary CMakeLists.txt, so the project can be built using the following commands:

cd ~/picowi/build
cmake ..      # create the makefiles
make picowi   # make the picowi library
make blink    # make the 'blink' application
./prog blink  # program the RP2040, and run the application

When building the current pico-SDK, there are some ‘parameter passing’ warnings when the pio assembler is compiled; these can be ignored.

Compilation is reasonably fast on the Pi 4; once the SDK libraries have been built, you can do a complete re-build of the PicoWi library and application within 10 seconds, and reprogram the RP2040 in under 3 seconds.

The ‘reset’ command is useful when you just want to restart the Pico, without loading any new code.

If OpenOCD reports ‘read incorrect DLIPDR’ then there is a problem with the wiring. I’ve set the SWD speed to 2 MHz, which should work error-free, providing the wires are sufficiently short (e.g. under 6 inches or 150 mm) and there are good power & ground connections between the Pi & Pico. I use a short USB cable to power the Pico from the Pi, and this is generally problem-free, though sometimes the Pi won’t boot with the Pico connected; this appears to be a USB communication problem.

Compile-time settings

There is currently only one setting in the CMakeLists.txt file, to choose between the on-board CYW43439 device, or an external CYW4343W module:

# Set to 0 for Pico-W CYW43439, 1 for Murata 1DX (CYW4343W)
set (CHIP_4343W 0)

The Pico-specific settings are in picowi_pico.h:

#define USE_GPIO_REGS   0       // Set non-zero for direct register access
                                // (boosts SPI from 2 to 5.4 MHz read, 7.8 MHz write)
#define SD_CLK_DELAY    0       // Clock on/off delay time in usec
#define USE_PIO         1       // Set non-zero to use Pico PIO for SPI
#define PIO_SPI_FREQ    8000000 // SPI frequency if using PIO

These affect the way the SPI interface is driven; the default is to use the Pico PIO (programmable I/O) with the given clock frequency; 8 MHz is a conservative value, I have run it at 12MHz, and higher speeds should be possible with some tweaking of the I/O settings.

Setting USE_PIO to zero will enable a ‘bit-bashed’ (or ‘bit-banged’) driver; this can run over 7 MHz if using direct register writes, or 2 MHz if using normal function calls.

You’ll note that I haven’t included a driver for the SPI peripheral inside the RP2040; this would have been easier to use than the PIO peripheral, but the on-board CYW43439 chip isn’t connected to suitable I/O pins. The actual pins used are defined in picowi_pico.h:

#define SD_ON_PIN       23
#define SD_CMD_PIN      24
#define SD_DIN_PIN      24
#define SD_D0_PIN       24
#define SD_CS_PIN       25
#define SD_CLK_PIN      29
#define SD_IRQ_PIN      24

You’ll see that pin 24 is performing multiple functions; this hardware configuration is discussed in detail in the next part of this blog. If you are using an external module, the pin definitions can be modified to use any of the RP2040 I/O pins.

Diagnostic settings

My code makes extensive use of a serial console for diagnostic purposes, and I generally use an FTDI USB-serial adaptor connected to pin 1 of the Pico module to monitor this. In view of the large amount of information that can be produced, I modify the serial interface to run at 460800 baud:

Edit pico-sdk/src/rp2_common/hardware_uart/include/hardware/uart.h
Change definition to:

I originally tried running the interface at 921600 baud, but this resulted in occasional characters being lost, which is a major problem when trying to understand what is going wrong with the code.

You can use the Pico USB link instead; it must be enabled in the CMakeLists.txt, using the name of the main file, for example to enable it for the ‘ping’ example program:

pico_enable_stdio_usb(ping 1)

Then you can use a terminal program such as minicom on the Pi 4 to view the console:

# Run minicom, press ctrl-A X to exit. 
minicom -D /dev/ttyACM0 -b 460800

A disadvantage of this approach is that when the Pico is reprogrammed, its CPU is reset, which causes a failure of the USB link. After a few seconds, the link is re-established, but there will be a gap in the console display, which can be misleading. Also, there is the possibility that the extra workload of maintaining a (potentially very busy) USB connection might cause timing problems, so if you are making extensive use of the diagnostics, it might be necessary to use a hard-wired serial interface.

You can control the extent to which diagnostic data is reported on the console; this is done by inserting function calls, rather than using compile-time definitions, to give fine-grained control. The display options are in a bitfield, so can be individually enabled or disabled, for example:

// Display SPI traffic details

// Display nothing

// Display ARP and ICMP data transfers

WiFi network

For the time being, the code does not support the Access Point functionality within the WiFi chip. It can only join a network that is unencrypted, or with WPA1 or WPA2 encryption, as set in the file picowi_join.h:

// Security settings: 0 for none, 1 for WPA_TKIP, 2 for WPA2
#define SECURITY            2

The network name (SSID) and password are defined in the ‘main’ file for each application, e.g. join.c or ping.c, which means they are insecure, as they can be seen by anyone with access to the source code or binary executable:

// Insecure password for test purposes only!!!
#define SSID                "testnet"
#define PASSWD              "testpass"

Other resources

The data sheets for the CYW43439 and CYW4343W are well worth a read, as they contain a good description of the low-level SPI interface, but contain nothing on the inner workings of these incredibly complicated chips. The Infineon WICED development environment has very comprehensive coverage of the WiFi chips, though it would take some work to port this code to the RP2040. The Pi Pico SDK contains the full source code to drive the CYW43439, with the lwIP (lightweight IP) open-source TCP/IP stack.

I’m using a different approach, with a completely new low-level driver, and a built-in IP stack to maximise throughput, as described in the following parts:

Project links
IntroductionProject overview
Part 1Low-level interface; hardware & software
Part 2Initialisation; CYW43xxx chip setup
Part 3IOCTLs and events; driver communication
Part 4Scan and join a network; WPA security
Part 5ARP, IP and ICMP; IP addressing, and ping
Part 6DHCP; fetching IP configuration from server
Part 7DNS; domain name lookup
Part 8UDP server socket
Source codeFull C source code

I’ll be releasing updates with more TCP/IP functionality.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.

EDLA part 3: browser display and Python API for remote logic analyser

This is the third part of a 3-part blog post describing a low-cost WiFi-based logic analyser, that can be used for monitoring equipment in remote or hazardous locations. Part 1 described the hardware, part 2 the unit firmware, now this post describes the Web interface that controls the logic analyser units, and displays the captured data, also a Python class that can be used to remote-control the units for data analysis.

In a previous post, I experimented with shader hardware (via WebGL) for quickly displaying the logic analyser traces in a Web page. Whilst this technique can provide really fast display updates, there were some browser compatibility problems, and also a pure-javascript version proved to be fast enough, given that the main constraint is the time taken to transfer the data over the network.

So the current solution just used HTML and Javascript, with no hardware acceleration.

Network topology

REMLA network topology

In part 2, I described how the analyser units return data in response to Web page requests; the status information is in the form of a JSON string, and the sample data is Base64 encoded. So each unit has a built-in Web server, and it is tempting to load the HTML display files onto them. However, I chose not to do that, for the following reasons:

  • The analyser units use microcontrollers with finite resources, and not much spare storage space.
  • Every time the display software is updated, it would have to be loaded onto all the units individually.
  • It is easier to keep a single central server up-to-date with all the necessary security & access control measures.

So I’m assuming that there is a Web server somewhere on the system that serves the display file, and any necessary library files. This is a bit inconvenient for development, so when debugging I run a Web server on my development PC, for example using Python 3:

python -m http.server 8000

This launches a server on port 8000; if the display file is in a subdirectory ‘test’, its URL would look like:

There is also a question how the display program knows the addresses of the units, so it can access the right one. I had intended to use Multicast DNS (MDNS) for this purpose, but it proved to be a bit unreliable, so I assigned static IP addresses to the units instead.

Data display

The waveforms are drawn as vectors (as opposed to bitmaps), so the display can be re-sized to suit any size of screen. There are two basic drawing methods that can be used: an HTML canvas, or SVG (Scalable Vector Graphics). After some experimentation, I adopted the former, as it seemed to be a more flexible solution; the canvas is just an area of the screen that responds to simple line- and text-drawing commands, for example to draw & label the display grid:

var ctx1 = document.getElementById("canvas1").getContext("2d");

// Draw grid in display area
function drawGrid(ctx) {
  var w=ctx.canvas.clientWidth, h=ctx.canvas.clientHeight;
  var dw = w/xdivisions, dh=h/ydivisions;
  ctx.fillStyle = grid_bg;
  ctx.fillRect(0, 0, w, h);
  ctx.lineWidth = 1;
  ctx.strokeStyle = grid_fg;
  ctx.strokeRect(0, 1, w-1, h-1);
  for (var n=0; n<xdivisions; n++) {
    var x = n*dw;
    ctx.moveTo(x, 0);
    ctx.lineTo(x, h);
    ctx.fillStyle = 'blue';
    if (n)
        drawXLabel(ctx, x, h-5);
    for (var n=0; n<ydivisions; n++) {
      var y = n*dh;
      ctx.moveTo(0, y);
      ctx.lineTo(w, y);

Drawing the logic traces uses a similar method; begin a path, add line drawing commands to it, then invoke the stroke method.


The various control buttons and list boxes need to be part of a form, to simplify the process of sending their values to the analyser unit. So they are implemented as pure HTML:

  <form id="captureForm">
      <select name="unit" id="unit" onchange="unitChange()">
        <option value=1>1</option><option value=2>2</option><option value=3>3</option>
        <option value=4>4</option><option value=5>5</option><option value=6>6</option>
      <button id="load" onclick="doLoad()">Load</button>
      <button id="single" onclick="doSingle()">Single</button>
      <button id="multi" onclick="doMulti()">Multi</button>
      <label for="simulate">Sim</label>
      <input type="checkbox" id="simulate" name="simulate">
..and so on..

To update the parameters on the unit, they are gathered from the form, and sent along with an optional command, e.g. cmd=1 to start a capture.

// Get form parameters
function formParams(cmd) {
  var formdata = new FormData(document.getElementById("captureForm"));
  var params = [];
  for (var entry of formdata.entries()) {
    params.push(entry[0]+ '=' + entry[1]);
  if (cmd != null)
    params.push("cmd=" + cmd);
  return params;

// Get status from unit, optionally send command
function get_status(cmd=null) {
  http_request = new XMLHttpRequest();
  http_request.addEventListener("load", status_handler);
  http_request.addEventListener("error", status_fail);
  http_request.addEventListener("timeout", status_fail);
  var params = formParams(cmd), statusfile=remote_ip()+'/'+statusname;
  http_request.open( "GET", statusfile + "?" + encodeURI(params.join("&")));
  http_request.timeout = 2000;

The result of this HTTP request is handled by callbacks, for example if the request fails, there is a retry mechanism:

// Handle failure to fetch status page
function status_fail(e) {
  var evt = e || event;
  if (retry_count < RETRIES) {
    addStatus(retry_count ? "." : " RETRYING")
  else {

This mechanism was found to be necessary since very occasionally the remote unit fails to respond, for no apparent reason; if there is a real reason (e.g. it has been powered down) then the transfer is halted after 3 attempts.

If the status information has been returned OK, then a suitable action is taken; if a capture has been triggered, and the status page indicates that the capture is complete, then the data is fetched:

// Decode status response
function status_handler(e) {
  var evt = e || event;
  var remote_status = JSON.parse(evt.target.responseText);
  var state = remote_status.state;
  if (state != last_state) {
    last_state = state;
  if (state==STATE_IDLE || state==STATE_PRELOAD || state==STATE_PRETRIG || state==STATE_POSTTRIG) {
    repeat_timer = setTimeout(get_status, 500);
  else if (remote_status.state == STATE_READY) {
  else {

Fetching data

Fetching the data is similar to fetching the status page, since it is a text file containing base64-encoded bytes. The callback converts the text into bytes, then pairs of bytes into an array of numeric values:

// Read captured data (display is done by callback)
function loadData() {
  dispStatus("Reading from " + remote_ip());
  http_request = new XMLHttpRequest();
  http_request.addEventListener("progress", capfile_progress_handler);
  http_request.addEventListener( "load", capfile_load_handler);
  var params = formParams(), capfile=remote_ip()+'/'+capname;
  http_request.open( "GET", capfile + "?" + encodeURI(params.join("&")));

// Display data (from callback event)
function capfile_load_handler(event) {
  sampledata = getData(event.target.responseText);
  if (command == CMD_MULTI)

// Get data from HTTP response
function getData(resp) {
  var d = resp.replaceAll("\n", "");
  return strbin16(atob(d));

// Convert string of 16-bit values to binary array
function strbin16(s) {
  var vals = [];
  for (var n=0; n<s.length;) {
    var v = s.charCodeAt(n++);
    vals.push(v | s.charCodeAt(n++) << 8);
  return vals;

It is probable that this process could be streamlined somewhat, but currently the main speed restriction is the transfer of data from the ESP to the PC over the wireless network, so improving the byte-decoder wouldn’t give a noticeable speed improvement.

Saving the data

There needs to be some way of saving the sample data for further analysis; as it happens, the initial users of the system were already using the open-source Sigrok Pulseview utility for capturing data from small USB pods, so it was decided to save the data in the Sigrok file format.

This a basically a zipfile, with 3 components:

  • Metadata, identifying the channels, sample rate, etc.
  • Version, giving the file format version (currently 2)
  • Logic file, containing the binary data

The metadata format is quite easy to replicate, e.g.

sigrok version=0.5.1

[device 1]
total probes=16
samplerate=5 MHz
total analog=0
..and so on until..

The dummy labels D1, D2 etc. are normally replaced with meaningful descriptions of the signals, followed by the unitsize parameter which gives the byte-width of the data, and marks the end of the labels.

The JSZip library is used to zip the various components together in a single file with the ‘sr’ extension:

function write_srdata(fname) {
  var meta = encodeMeta(), zip = new JSZip();
  var samps = new Uint16Array(sampledata);
  zip.file("metadata", meta);
  zip.file("version", "2");
  zip.file("logic-1-1", samps.buffer);
  zip.generateAsync({type:"blob", compression:"DEFLATE"})
  .then(function(content) {
    writeFile(fname, "application/zip", content);

// Encode Sigrok metadata
function encodeMeta() {
  var meta=[], rate=elem("xrate").value + " Hz";
  for (var key in sr_dict) {
    var val = key=="samplerate" ? rate : sr_dict[key];
    meta.push(val[0]=='[' ? ((meta.length ? "\n" : "") + val) : key+'='+val);
  for (var n=0; n<nchans; n++) {
    meta.push("probe"+(n+1) + "=" + (probes.length?probes[n]:n+1));
  return meta.join("\n");


So far, the only way the units can be configured is by using the browser controls, to set the sample rate, number of samples, threshold etc. Whilst this might be acceptable for a portable system, a semi-permanent installation needs some way of storing the configuration, including the naming of input channels on the display. Since there is a central Web server for the display files, can’t this also be used to store configuration files? The answer is ‘yes’, but there is then a question how these files can be modified in a browser-friendly way.

This is a bit difficult, since there are numerous security protections for the files on a server, to make sure they can’t be modified by a Web client. However, there is an extension to the HTTP protocol known as WebDAV (Web Distributed Authoring and Versioning), which does provide a mechanism for writing to files. Basically you need a general-purpose Web server that can be configured to support Web DAV (such as lighttpd, see this page), or alternatively a special-purpose server, such as wsgidav (see this page).

Assuming you already have a working lighttpd server, the additional configuration file may look something like this, with some_path, dav_username and dav_password being customised for your installation:

File lighttpd/conf.d/30-webdav.conf:

server.modules += ( "mod_webdav" )
$HTTP["url"] =~ "^/dav($|/)" {
  webdav.activate = "enable"
  webdav.sqlite-db-name = "/some_path/webdav.db"
  server.document-root = "/www/"
  auth.backend = "plain"
  auth.backend.plain.userfile = "/some_path/webdav.shadow"
  auth.require = ("" => ("method" => "basic", "realm" => "webdav", "require" => "valid-user"))

File /some_path/webdav.shadow
Create directory www/dav for files

Instead, you can use wsgidav to act as a Web and DAV server, run using the Windows command line:

wsgidav.exe --host --port=8000 -c wsgidav.json

The JSON-format configuration file I’m using is:

    "host": "",
    "port": 8080,
    "verbose": 3,
    "provider_mapping": {
        "/": "/projects/remla/test",
        "/test": "/projects/remla/test",
    "http_authenticator": {
        "domain_controller": null,
        "accept_basic": true,
        "accept_digest": true,
        "default_to_digest": true,
        "trusted_auth_header": null
    "simple_dc": {
        "user_mapping": {
            "*": {
                "dav_username": {
                    "password": "dav_password"
    "dir_browser": {
        "enable": true,
        "response_trailer": "",
        "davmount": true,
        "davmount_links": false,
        "ms_sharepoint_support": true,
        "htdocs_path": null

Again, this will need to be customised for your environment, and you also need to be mindful that the configurations I’ve shown for lighttpd and wsgidav are quite insecure, for example the password isn’t encrypted, so it can easily be captured by anyone snooping on network traffic.

Configuration Web page

I created a simple Web page to handle the configuration, with list boxes for most options, and text boxes to allow the input channels to be named.

At the bottom of the page there are buttons to submit the new configuration to the server, and exit back to the waveform display page.

The key Javascript function to save the configuration on the server uses the ‘davclient’ library, and is quite simple, but it does need to know the host IP address and port number to receive the data. This code attempts to fetch that information using the DOM Location object:

// Save the config file
function saveConfig() {
  var fname = CONFIG_UNIT.replace('$', String(unitNum()));
  var ip = location.host.split(':')
  var host = ip[0], port = ip[1];
  port = !port ? 80 : parseInt(port);
  var davclient = new davlib.DavClient();
  davclient.initialize(host, port, 'http', DAVUSER, DAVPASS);
  davclient.PUT(fname, JSON.stringify(getFormData()), saveHandler)

For simplicity, the DAV username and password are stored as plain text in the Javascript, which means that anyone viewing the page source can see what they are. This makes the server completely insecure, and must be improved.

Python interface

Although some data analysis can be done in Javascript, it is much more convenient to use Python and its numerical library numpy. I have written a Python class EdlaUnit that provides an API for remote control and data analysis, and a program edla_sweep that demonstrates this functionality.

It repeatedly captures a data block, whilst stepping up the threshold voltage. Then for each block, the number of transitions for each channel is counted and displayed.

import edla_utils as edla, base64, numpy as np

unit = edla.EdlaUnit(1, "192.168.8")

MIN_V, MAX_V, STEP_V = 0, 50, 5

def get_data():
    ok = False
    data = None
    status = unit.fetch_status()
    if status:
        ok = unit.do_capture()
        print("Can't fetch status from %s" % unit.status_url)
    if ok:
        data = unit.do_load()
    if data == None:
        print("Can't load data")
    return data

for v in range(MIN_V, MAX_V, STEP_V):
    d = get_data()
    byts = base64.b64decode(d)
    samps = np.frombuffer(byts, dtype=np.uint16)
    diffs = np.diff(samps)
    edges = np.where(diffs != 0)[0]
    totals = np.zeros(16, dtype=int)
    for edge in edges:
        bits = samps[edge] ^ samps[edge+1]
        for n in range(0, 15):
            if bits & (1<<n):
                totals[n] += 1
    s = "%4u," % v
    s += ",".join([("%4u" % val) for val in totals])

The idea is to give a quick overview of the logic levels the analyser is seeing, to make sure they are within reasonable bounds. An example output is:

Volts Ch1  Ch2  Ch3  Ch4  Ch5  Ch6  Ch7  Ch8
0,      0,   0,   0,   0,   0,   0,   0,   0
5,    564, 384, 620, 454, 548, 550, 572, 552
10,   328, 286, 326, 288, 302, 318, 326, 314
15,   260, 246, 262, 244, 260, 254, 260, 250
20,   216, 192, 216, 198, 202, 202, 208, 206
25,    92,   0, 122,   0,  60,  30, 106,  44
30,     0,   0,   0,   0,   0,   0,   0,   0
35,     0,   0,   0,   0,   0,   0,   0,   0
40,     0,   0,   0,   0,   0,   0,   0,   0
45,     0,   0,   0,   0,   0,   0,   0,   0

The absolute count isn’t necessarily very important, since it will vary depending on the signal that is being monitored. What is interesting is the way it changes as the threshold voltage increases. If the number dramatically increases as the ‘1’ logic voltage is approached, one might suspect that there is a noise problem, causing spurious edges. Conversely, if the value declines rapidly before the ‘1’ voltage is reached, the logic level is probably too low.

There is a tendency to assume that all logic signals are a perfect ‘1’ or ‘0’, with nothing in between; this technique allows you to look beyond that, and check whether your signals really are that perfect – and of course you can use the power of Python and numpy to do other analytical tests, or protocol decoding, specific to the signals being monitored.

Part 1 of this project looked at the hardware, part 2 the ESP32 firmware. The source files are on Github.

Copyright (c) Jeremy P Bentham 2022. Please credit this blog if you use the information or software in it.