Tuesday, October 27, 2009

"Article 9 and 10"

Yesterday evening I checked my real mailbox and I found the following surprise:
My first instinct was to call the police and tell them that some drug dealer must have stashed his stuff in my mailbox. Then I cooled down and checked the hard-to-see-in-the-picture red label and it had my address written on it; I was sure someone was playing a practical joke on me. Then I checked out the back of the label and it had the magic number 85.83 written on it. Mystery solved. While 85.83 might not mean anything to you, it was the cost of some books I ordered from Amazon a few weeks back.
Inside the dubious bag there was a normal Amazon paper box and inside the box my "*-at-work" books in good condition:On the label of the blue bag there was an yellow sticker written in a language resembling English which reads "Goods Do Not Meet The Requirement Of Article 9 and 10 Of The Contract For The Foundation Of The European Community" (verbatim). Googling for this mysterious "Contract" only returns blogposts about the same sticker. If anyone has any idea what is the deal with the weird blue bag please let me know.
Meta-blogging aside, I've started reading the interview with Jamie Zawinski in "Coders at Work" and it seems pretty good. I'll be commenting on it on the blog later on. I can't wait to read the three books.

Tuesday, September 1, 2009

Adventures in C++ land

I have recently discovered an interesting bug in a prototype implementation that I did for my research. For the current discussion, what exactly the implementation is about is irrelevant. It suffices to say that the code is C++ code.

I thought I'd share the experience so that others can avoid the same pitfall. The code in question is slightly complicated, but the problem is well illustrated by the following abstracted version:


std::vector<Foo> foos;

void bar(const Foo &foo)
{
while (*) {
foos.push_back(foo);
}
}

int main()
{
// ...
assert(foos.size() > 0);
while (*) {
bar(foos[0]);
}
}


Here "Foo" is a very well behaved user defined class, such as the favorite class of every C++ book author, the "ComplexNumber" class. The tests in the while loops are irrelevant, so I replaced them by nondeterministic tests (denoted by "*"; so just imagine for example that each loop runs for a small number of iterations).

The above code can crash. I experienced two types of crashes occurring: either the program terminates with an uncaught std::bad_alloc, or simply the program is terminated by a segfault.

So let me describe why the crash happens. The problem is that the "foo" argument is passed by reference. The std::vector class implements a variable-sized array. Therefore, when the underlying implementation of the vector runs out of space, the std::vector class
decides to allocate a bigger piece of memory, copies the old values of the vector into the newly allocated memory, and deletes the old memory.

So this entire thing could very well happen during the push_back call. But what is "foo"? Well, "foo" is exactly a reference to memory that might be deleted during the push_back call. Ooops. Accessing such memory is of course undefined behavior which means very bad things could happen. I solved the problem by passing the argument by value.

So the next time you do the classic "best practice" of passing arguments by const reference you might want to be more careful. It is interesting of course to analyze what led to this bug. One of the problems is having a global variable ("foos") which is used in an imperative fashion. This leads to the surprising sharing of state between the global variable "foos" and the local variable "foo".

The second problem is the leaky mental model that I had of the vector class. I thought of it as an array that can grow. But in fact it's not, it's an array that is sometimes replaced by a bigger array...

Also note that the bug lies entirely in the "bar" function, which is not correct in the particular case where "foo" is a reference to an element of "foos". However, the syntax of passing arguments by reference (indistinguishable from passing by value) does not help in detecting the possible bug at the point of call.

I scratched my head for a long while without finding the source of the bug. I finally installed valgrind, which is a great piece of software of course, and which I recommend whole-heartedly.

Sunday, February 1, 2009

How to Hack Closures in your C Code (or "The Closure Design-Pattern in C").

A lot of languages have support for various forms of closures.

For example, in most functional languages (Common Lisp, Scheme, Ocaml, Haskell), closures are first class citizens. In Java, you can implement them using inner classes. You can use defunctionalization to simulate closures in almost any programming language. However, for several reasons described at the end of the post, this can be impractical in some scenarios.

In this post, I'm not going to talk about any theoretical stuff. I'm going to present a hackish, evil and practical way to dynamically construct closures in C.

C already contains a restricted form of closures that come in the shape of function pointers. Essentially, a function pointer can be though of as a closure without the data. With a little care, you can use function pointers in a type unsafe (platform dependant, ready to blow up in your face) way that gets you closures.

Let's take the classical example of a closure that adds a number to another number. You might implement such a closure in Haskell in the following way:

let add x y = x + y

let adder3 = add 3

There you go: adder3 is exactly our closure: when you evaluate 'adder3 10' you get 13.

How can you do this in C?

Well, we'll start with a simple adder:

int adder3(int x)
{
 int data = 3;
 return data + x;
}

Not much of a closure, is it? What if we want to use something else instead of 3? What we would need is a function

typedef int (*adder)(int);

adder createAdder(int data)
{
  // ...
}

that returns such an adder. But you can't do this in C, because functions are not first class objects. Oh wait! Actually, in C, memory is a first class citizen, and therefore you are free to do whatever you want with it. Including, wait for this, placing code in memory and returning a pointer to it.

Without further discussion, we need the code for a generic adder function. This is not difficult to get, and we'll kindly ask the compiler to provide to us so we don't actually have to write machine code:

int adderGeneric(int x)
{
 int data = MAGIC;
 return data + x;
}

void endOfAdderGeneric() {}

There you go. The machine code for the generic adder function is now between the starting addresses of the adderGeneric and endOfAdderGeneric function. Now our createAdder function only needs to copy this code, modify the data, and return the result:

adder createAdder(int data)
{
 char *start = (char *)adderGeneric;
 char *end = (char *)endOfAdderGeneric;

 char *result = (char *)malloc(end - start);
 memcpy(result, start, end - start);
 
 char *p;
 for (p = result; ; ++p) {
  int *q = (int *)p;
  if (*q == MAGIC) {
  *q = data;
  break;
  }
 }

 return (adder)result;
}

void deleteAdder(adder a)
{
 char *m = (char *)a;
 free(m);
}

The createAdder first allocates enough memory for our adder. It knows how much it needs because (usually) C functions are laid out sequentially in memory; therefore the starting address of function endOfAdderFunction should match the end of the adderFunction.

Then it copies the code of the adderGeneric function into the newly allocated location. Now all it needs to do is set the internal data. An unsafe, platform dependant, ready to blow up in your face way to do this is to search for the constant MAGIC in the function code. When it finds this magic number, it replaces it by the actual data.

I defined MAGIC as follows:

#define MAGIC 0xAEAEAEAE

but you're free to set it to anything reasonable for your platform.

Then the pointer to the newly created block of code is returned, which should work as well as the generic adder.

To complete the example, check out the main code:

int main()
{
 adder add3 = createAdder(3);
 adder add5 = createAdder(5);
 printf("%d %d\n", add3(10), add5(10));
 deleteAdder(add3);
 deleteAdder(add5);
 return 0;
}

There you go: you have closures in C. The only question that remains is when you would want to use such a trick. The answer is: very sparingly. The above code can send you and your team in debugging hell. Any new version of the compiler might decide to optimize stuff in a different way, changing the way functions are laid out in memory and any other number of things can go wrong.

However, you might have to do this when working with legacy code. The above code was inspired by the following question asked on stackoverflow:

http://stackoverflow.com/questions/499153/

The person asking the question wants a way to register a callback function that is a member of an object. Unfortunately, there is not way to explicitly pass around the this pointer. Being unable to change the interface of the library, a quick hack like the one I described can do wonders.

I have uploaded the full code sample to pastie 376806.

Friday, May 30, 2008

Bots and animals

Even though it is available since 2007, I haven't heard of Asirra until today, when I read someone's blogpost on spam.

Asirra is a database of pictures of cats and dogs. Each picture comes labeled with the type of animal it contains. You can't see the database directly; but you can access it through their webservice. The idea is to help fight bot spam by requiring the user to flag 12 images as either cats or dogs. I encourage you to test it out here.

I tried their demo and it is kind of cool. I didn't make any unintentional mistake in about ten tries and it is really quite a joy to look at the pets. Their academic paper is available here and it is a light read if you are interested in the details. In my opinion, Microsoft's researchers did quite a good job of analyzing the technical advantages and disadvantages of their scheme.

However, I believe Asirra will only remain safe if no high-profile website starts using it. Three million pictures is really not a lot, if you have the financial incentive to label them all. Furthermore, the database servers themselves could be compromised. Also, it is a poor business decision to rely on a single service that, at any time, may close shop or decide to charge a possibly absurd amount of money for their service.

All in one, I believe Asirra is a good alternative to CAPTCHAs for smaller sites or amateur projects and I hope the state of the art in the field advances well beyond the annoying text recognition systems we have today.

Sunday, March 16, 2008

"How I invented the personal computer, co-founded Apple, and had fun doing it"

It's been about nine months since I posted my last entry. In the meantime, a lot has happened. I graduated the Faculty of Computer Science from Iasi, Romania, I had a great summer holiday (except I fucked up my Ocaml summer project because of a lack of time), I got a scholarship for a computer science research master at ENS Cachan and I started my internship at LSV.

I now live close to Paris and, of course, it's quite nice. This blog post is about a book I bought and read more than a year ago (and which I'm re-reading right now). I bought it in the US, at the time when I was doing my second internship at Microsoft.

Steve Wozniak was on the Microsoft campus in Redmond to give a talk (I think it was part of the process of promoting his book). I went there, but the room was too small. It seemed that every other person in the campus decided to participate (ok, this is an exaggeration).

Anyway, frustrated I couldn't find a place, I decided to buy the book anyway, and try to see the cast online (which I couldn't, due to technical problems). I'm sorry even now that I didn't have the inspiration to go earlier, to get a seat.

Anyway, the book is simply awesome. It's written by Steve Wozniak (with Gina Smith). I'm not sure about the "official" title, but the cover reads something like this:

iWoz

How I invented the personal computer, co-founded Apple, and had fun doing it

COMPUTER GEEK TO CULT ICON

I knew a little history about Apple and Steve Wozniak and Steve Jobs, but to read it from Steve himself was really cool.

He starts by describing how his father would teach him electronics at a very early age and how much this mattered to him:
"He pulled out a blackboard from time to time, a tiny little blackboard we had in our house on Edmonton Avenue, and when I asked, he would answer anything and make diagrams for it. I remember how he showed me what happened if you put a plus voltage into a transistor and got a minus voltage out the other end of the transistor. There must have been an inverter, a type of logical gate. And he even physically taught me how to make an AND gate and an OR gate out of parts he got -- parts called diodes and resistors. And he showed me how they needed a transistor in between to amplify the signal and connect the output of one gate to the input of the other.

To this very moment, that is the way every single digital device on the planet works at its most basic level.

[...]

It's amazing, really, to think that my dad taught me about transistors back when almost no one saw anything but vacuum tubes. So he was at the top of the state of the art, probably because his secret job put him in touch with such advanced technology. So I ended up at the state of the art, too."

There are books you read, and they forever change the way you think. For me, this was one of those books. Aside from the historically interesting technical content, the book also excels in entertainment value. Wozniak describes some of the pranks he loved to pull:

"I got the idea to build a little electronic metronome -- you know, the thing that goes tick, tick, tick, to keep time when people take piano lessons. I built it, heard the ticking, and thought: Hey this sounds like a bomb. So I took some batteries, took the labels off the batteries so they looked like plain metal canisters, and I taped them together. And then I wrote in big letters on it: CONTACT EXPLOSIVE.

I thought: Oh, this will be funny. I'll stick it in Bill Werner's locker. I just happened to know his locker code. Bill's locker was near mine so I put my so-called electronic metronome in. Now, this was in the morning before school, and after I put it in there, I could barely hear it ticking. Nobody was going to be tricked by this if they couldn't even hear it! I'm thinking: What a bummer and what a waste if this thing isn't going to work. But when I came out of my last final that day, my counselor walked up to me and said: "Steve, the vice principal wants to see you in his office." This was a bad sign. [...]

And the principal starts telling me how the English teacher, Mr, Stottlemeier, had heard a ticking sound in the locker. The principal, Mr. Bryld, told me how he opened the locker, clutched the device to his chest, and then ran all the way out to the football field and dismantled it!

I started laughing, even though I was trying not to, so then I tried to cough to cover it up. But I couldn't even do that, because I knew I had rigged the metronome with a switched resistor to start ticking faster when someone opened up the locker door."

I'm not going to ruin the pleasure you'll get reading this book by quoting more stuff. I'll let you enjoy the chapters on TV jamming, phone phreaking, creating the Apple I and II, crashing a plane and other delicious topics ;-)

Even if you do not understand computer science or maths or physics, I think you will appreciate this book and I hope you enjoy it as much as I do.

Thursday, June 7, 2007

Firefox 3 URL "Feature"

On the programing subreddit I got to this link, which describes some new features proposed for Firefox 3.

The one feature that received my attention (in a bad way), is the highlighting of the domain of the current site. For example (taken from here), the URL bar would look like this: http://www.fred.blogspot.com/archive/2007/04/06/mypost.

I do not like this. The first problem is that the shade of gray is not readable. The next wrong thing, which is not that obvious, but more problematic, is that this could actually help phishers. I can only image an innocent looking Firefox user: "But the site was really Bank Example. I'm sure. The browser even highlighted www.example.com for me (where one of the letters is not what it seems)".

Please, please, drop this "feature". There is no use case for it. (Ok. there is a small use case, where if the address is long, with may sub-domains, you don't immediately see the top-level domain...)

Thursday, May 31, 2007

Google Gears

Just today I've seen this link on reddit. and I'm very excited about this. Remember an earlier post about off-line web applications Well, they're closer than ever. Google released Gears, which is a Firefox/IE extension that provides an API for off-line web applications.

On the developer page, google posted 4 demos showing off their API.
  • the first one shows how an offline database can be used to keep track of stuff on the client side, without it going away the second time you visit that webpage
  • the second demo shows how google gears can remember an webpage and display it even when you are offline and the cache is emptied
  • I didn't understand the usefulness of the third demo, but I'm sure there are scenarios which use it
  • the last demo is about asynchronous methods, but I'm not sure this belongs in gears.
I am very excited about Gears, as it seems to work for both Microsoft Internet Explorer and Mozilla Firefox. Google says Safari support is work in progress.

For the moment, I'm quite busy with my thesis, but I'm dying to get my code snippet web application to use Gears.

Also, I can't wait to see a geared up version of Blogger and Gmail. Imagine preparing your posts without the need to access the internet and saving them to the local database. Next time you go online, Blogger will upload whatever you've been working on. Sounds cool, right?

I like Gears because the API seems simple enough and is super useful. However, in the long term, Gears (or whatever other framework for off-line web apps will prevail) will make web applications even more difficult to create. Also, I suppose a whole new range of security issues will be uncovered by this new type of applications.