Sunday, February 7, 2010

ctypes is awesome, but only for python

I've been working on some low-level message processing in C. I struggled to figure out all the rules for overlaying structs of bitfields on messages and getting the right answer on both big-endian and little-endian machines.

The first problem you run into is that little-endian swaps every other byte in a struct of all bitfields, so you have to use something like ntohs to swap bytes in 16 bit words. This isn't so bad.

The real struggle comes in dealing with bitfield ordering. On big-endian, the first bitfield in a struct starts at the MSB and works toward the LSB. On little-endian, the first bitfield starts at the LSB and works toward the MSB. So you end up with nasty #defines in structures based on your platform. I'm pretty sure I'm not crazy on this, on my box /usr/include/netinet/ip.h contains this for the definition of an IPv4 header:



/*
* Structure of an internet header, naked of options.
*/
struct ip
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
unsigned int ip_hl:4; /* header length */
unsigned int ip_v:4; /* version */
#endif
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned int ip_v:4; /* version */
unsigned int ip_hl:4; /* header length */
#endif


Yuck! In the structures I'm dealing with that's a lot of duplicated cryptic code.

While looking at this, I stumbled into the ctypes library in Python. From what I can tell, ctypes rocks. It lets you define structures and unions to correspond to C types. You can then overlay these structures on data you get from a file/socket/whatever and access the data field-by-field. You can also create a structure in python, assign the fields, and convert it to a raw buffer.

One of the coolest features is you can define whether you want a struct to have big-endian or little-endian behavior, and it will do the appropriate thing on whatever box you're on. If your structure inherits from ctypes.BigEndianStructure, bitfields and multiple-byte fields start at the MSB. If you need the smoking-crack-on-another-planet behavior of little endian, just make your structure inherit from ctypes.LittleEndianStructure. Here's the equivalent of the first two fields in the IPv4 header that will do the right thing on any machine that runs python:



class TestStructure(ctypes.BigEndianStructure):

_pack_ = 1

_fields_ = [
('ip_v', ctypes.c_ubyte, 4),
('ip_hl', ctypes.c_ubyte, 4)
]

t = TestStructure()
t.ip_v = 4;
t.ip_hl = 10;
s = ctypes.string_at(ctypes.addressof(t), ctypes.sizeof(t))
print ["0x%02x" % ord(x) for x in s]


$ ./test-ip.py
['0x4a']


Wow! If only all languages had features like this, the world of message processing would be a better place.