Sunday, May 16, 2010

boost::asio

The more I play with boost, the more impressed I am by it. Lately I've been experimenting with boost::asio. I have a fair amount of experience with Apache MINA for Java, and after playing with boost::asio I think I've found its rough equivalent in c++.

Here is the code I've been playing with, an implementation of a TCP proxy (accept connections from 1 to N endpoints, forward them to some remote endpoint). This implementation uses asynchronous operations for all I/O socket operations, and allows for multi-threading - by default it creates a thread pool sized by the number of hardware threads the machine supports.

Saturday, March 6, 2010

Torn

At work I'm constantly researching how to do various technical things using Google. Usually I'm trying to figure out a feature of some software component that has little or no official documentation on the subject. More often than not I end up on someone's blog reading a post saying they spent hours and hours looking into whatever I'm trying to figure out, and they provide the answer.

That's one of the reasons I started this blog - I had some vague idea that when I figured out some interesting technical thing I would post about it, and maybe someday someone would find it useful. Plus there's some egotistical satisfaction in writing things when you pretend somebody might actually be interested in reading them.

But I started realizing that the things I'm researching are almost always for work purposes, and maybe it's a bad idea to share that knowledge with others. I'm not sure any one thing I find out is especially important or secret, but if over time I share lots of things I learn I'm sort of giving away what work is paying me to do.

So what to do? If everybody had this concern, most regular working people like me would never post things on their blogs and my source of technical knowledge would dry up. From that perspective I want to add to this community. But on the other hand, I don't want to give away too much.

So, I wonder: who is it who posts technical knowledge on their blogs? Is it really just people who are doing things in their free time, or am I using research others were paid to do?

Sunday, February 7, 2010

ctypes is awesome, but only for python

I've been working on some low-level message processing in C. I struggled to figure out all the rules for overlaying structs of bitfields on messages and getting the right answer on both big-endian and little-endian machines.

The first problem you run into is that little-endian swaps every other byte in a struct of all bitfields, so you have to use something like ntohs to swap bytes in 16 bit words. This isn't so bad.

The real struggle comes in dealing with bitfield ordering. On big-endian, the first bitfield in a struct starts at the MSB and works toward the LSB. On little-endian, the first bitfield starts at the LSB and works toward the MSB. So you end up with nasty #defines in structures based on your platform. I'm pretty sure I'm not crazy on this, on my box /usr/include/netinet/ip.h contains this for the definition of an IPv4 header:



/*
* Structure of an internet header, naked of options.
*/
struct ip
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
unsigned int ip_hl:4; /* header length */
unsigned int ip_v:4; /* version */
#endif
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned int ip_v:4; /* version */
unsigned int ip_hl:4; /* header length */
#endif


Yuck! In the structures I'm dealing with that's a lot of duplicated cryptic code.

While looking at this, I stumbled into the ctypes library in Python. From what I can tell, ctypes rocks. It lets you define structures and unions to correspond to C types. You can then overlay these structures on data you get from a file/socket/whatever and access the data field-by-field. You can also create a structure in python, assign the fields, and convert it to a raw buffer.

One of the coolest features is you can define whether you want a struct to have big-endian or little-endian behavior, and it will do the appropriate thing on whatever box you're on. If your structure inherits from ctypes.BigEndianStructure, bitfields and multiple-byte fields start at the MSB. If you need the smoking-crack-on-another-planet behavior of little endian, just make your structure inherit from ctypes.LittleEndianStructure. Here's the equivalent of the first two fields in the IPv4 header that will do the right thing on any machine that runs python:



class TestStructure(ctypes.BigEndianStructure):

_pack_ = 1

_fields_ = [
('ip_v', ctypes.c_ubyte, 4),
('ip_hl', ctypes.c_ubyte, 4)
]

t = TestStructure()
t.ip_v = 4;
t.ip_hl = 10;
s = ctypes.string_at(ctypes.addressof(t), ctypes.sizeof(t))
print ["0x%02x" % ord(x) for x in s]


$ ./test-ip.py
['0x4a']


Wow! If only all languages had features like this, the world of message processing would be a better place.

Sunday, January 10, 2010

Playing with boost::thread

Was bored today so I started playing with boost::thread. Made a little producer/consumer example with 2 threads and a thread-safe BlockingQueue, sort of a simple version of what Java provides in the standard API. I had fun using as many boost features as I could squeeze in - boost::shared_ptr, boost::posix_time, boost::bind (really cool by the way and much more powerful than std::mem_fun/std::bind1st, etc).

I'm surprised boost hasn't yet added thread-safe containers to their libraries. According to Google, lots of people seem to think they're a bad idea, even going so far as to say they were a mistake in Java. Really? Is everybody expected to reinvent the wheel here and start from scratch? Doesn't make sense to me. Here's my version of the wheel if anybody doesn't already have one.