Dave Beckett

Raptor and web libraries

2003-03-18 23:34

Unlike in Java, Perl, Python and all those higher level languages, in C when you want to do something like retrieve a web page, there is a lot more to do. There aren't stdurl or stdweb libraries around that you can assume are always available. Since Raptor is a parser for an XML language, libxml is one likely thing that is usable and it has a tiny HTTP implementation, sufficient for GET. There is the defacto portable web library libcURL and so I make that also configurable plus the W3C libwww which is common but rather large. So problem solved.

Or so I thought. It turns out that all those APIs except for the W3C libwww are push - they take the thread of control from the caller and return data to it via callbacks. However I wanted the more I/O stream-like pull i.e. the user application does while(...) { get stuff; do stuff }. You can wrap a push API around a pull one quite easily and efficiently, but not the other way around - you need to store all the pushed content then deliver it pull-by-pull. So, I'm going to have to live with that - provide both and warn users that the pull interface will suck up memory.