Sunday, February 1, 2009

How to Hack Closures in your C Code (or "The Closure Design-Pattern in C").

A lot of languages have support for various forms of closures.

For example, in most functional languages (Common Lisp, Scheme, Ocaml, Haskell), closures are first class citizens. In Java, you can implement them using inner classes. You can use defunctionalization to simulate closures in almost any programming language. However, for several reasons described at the end of the post, this can be impractical in some scenarios.

In this post, I'm not going to talk about any theoretical stuff. I'm going to present a hackish, evil and practical way to dynamically construct closures in C.

C already contains a restricted form of closures that come in the shape of function pointers. Essentially, a function pointer can be though of as a closure without the data. With a little care, you can use function pointers in a type unsafe (platform dependant, ready to blow up in your face) way that gets you closures.

Let's take the classical example of a closure that adds a number to another number. You might implement such a closure in Haskell in the following way:

let add x y = x + y

let adder3 = add 3

There you go: adder3 is exactly our closure: when you evaluate 'adder3 10' you get 13.

How can you do this in C?

Well, we'll start with a simple adder:

int adder3(int x)
{
 int data = 3;
 return data + x;
}

Not much of a closure, is it? What if we want to use something else instead of 3? What we would need is a function

typedef int (*adder)(int);

adder createAdder(int data)
{
  // ...
}

that returns such an adder. But you can't do this in C, because functions are not first class objects. Oh wait! Actually, in C, memory is a first class citizen, and therefore you are free to do whatever you want with it. Including, wait for this, placing code in memory and returning a pointer to it.

Without further discussion, we need the code for a generic adder function. This is not difficult to get, and we'll kindly ask the compiler to provide to us so we don't actually have to write machine code:

int adderGeneric(int x)
{
 int data = MAGIC;
 return data + x;
}

void endOfAdderGeneric() {}

There you go. The machine code for the generic adder function is now between the starting addresses of the adderGeneric and endOfAdderGeneric function. Now our createAdder function only needs to copy this code, modify the data, and return the result:

adder createAdder(int data)
{
 char *start = (char *)adderGeneric;
 char *end = (char *)endOfAdderGeneric;

 char *result = (char *)malloc(end - start);
 memcpy(result, start, end - start);
 
 char *p;
 for (p = result; ; ++p) {
  int *q = (int *)p;
  if (*q == MAGIC) {
  *q = data;
  break;
  }
 }

 return (adder)result;
}

void deleteAdder(adder a)
{
 char *m = (char *)a;
 free(m);
}

The createAdder first allocates enough memory for our adder. It knows how much it needs because (usually) C functions are laid out sequentially in memory; therefore the starting address of function endOfAdderFunction should match the end of the adderFunction.

Then it copies the code of the adderGeneric function into the newly allocated location. Now all it needs to do is set the internal data. An unsafe, platform dependant, ready to blow up in your face way to do this is to search for the constant MAGIC in the function code. When it finds this magic number, it replaces it by the actual data.

I defined MAGIC as follows:

#define MAGIC 0xAEAEAEAE

but you're free to set it to anything reasonable for your platform.

Then the pointer to the newly created block of code is returned, which should work as well as the generic adder.

To complete the example, check out the main code:

int main()
{
 adder add3 = createAdder(3);
 adder add5 = createAdder(5);
 printf("%d %d\n", add3(10), add5(10));
 deleteAdder(add3);
 deleteAdder(add5);
 return 0;
}

There you go: you have closures in C. The only question that remains is when you would want to use such a trick. The answer is: very sparingly. The above code can send you and your team in debugging hell. Any new version of the compiler might decide to optimize stuff in a different way, changing the way functions are laid out in memory and any other number of things can go wrong.

However, you might have to do this when working with legacy code. The above code was inspired by the following question asked on stackoverflow:

http://stackoverflow.com/questions/499153/

The person asking the question wants a way to register a callback function that is a member of an object. Unfortunately, there is not way to explicitly pass around the this pointer. Being unable to change the interface of the library, a quick hack like the one I described can do wonders.

I have uploaded the full code sample to pastie 376806.

Friday, May 30, 2008

Bots and animals

Even though it is available since 2007, I haven't heard of Asirra until today, when I read someone's blogpost on spam.

Asirra is a database of pictures of cats and dogs. Each picture comes labeled with the type of animal it contains. You can't see the database directly; but you can access it through their webservice. The idea is to help fight bot spam by requiring the user to flag 12 images as either cats or dogs. I encourage you to test it out here.

I tried their demo and it is kind of cool. I didn't make any unintentional mistake in about ten tries and it is really quite a joy to look at the pets. Their academic paper is available here and it is a light read if you are interested in the details. In my opinion, Microsoft's researchers did quite a good job of analyzing the technical advantages and disadvantages of their scheme.

However, I believe Asirra will only remain safe if no high-profile website starts using it. Three million pictures is really not a lot, if you have the financial incentive to label them all. Furthermore, the database servers themselves could be compromised. Also, it is a poor business decision to rely on a single service that, at any time, may close shop or decide to charge a possibly absurd amount of money for their service.

All in one, I believe Asirra is a good alternative to CAPTCHAs for smaller sites or amateur projects and I hope the state of the art in the field advances well beyond the annoying text recognition systems we have today.

Sunday, March 16, 2008

"How I invented the personal computer, co-founded Apple, and had fun doing it"

It's been about nine months since I posted my last entry. In the meantime, a lot has happened. I graduated the Faculty of Computer Science from Iasi, Romania, I had a great summer holiday (except I fucked up my Ocaml summer project because of a lack of time), I got a scholarship for a computer science research master at ENS Cachan and I started my internship at LSV.

I now live close to Paris and, of course, it's quite nice. This blog post is about a book I bought and read more than a year ago (and which I'm re-reading right now). I bought it in the US, at the time when I was doing my second internship at Microsoft.

Steve Wozniak was on the Microsoft campus in Redmond to give a talk (I think it was part of the process of promoting his book). I went there, but the room was too small. It seemed that every other person in the campus decided to participate (ok, this is an exaggeration).

Anyway, frustrated I couldn't find a place, I decided to buy the book anyway, and try to see the cast online (which I couldn't, due to technical problems). I'm sorry even now that I didn't have the inspiration to go earlier, to get a seat.

Anyway, the book is simply awesome. It's written by Steve Wozniak (with Gina Smith). I'm not sure about the "official" title, but the cover reads something like this:

iWoz

How I invented the personal computer, co-founded Apple, and had fun doing it

COMPUTER GEEK TO CULT ICON

I knew a little history about Apple and Steve Wozniak and Steve Jobs, but to read it from Steve himself was really cool.

He starts by describing how his father would teach him electronics at a very early age and how much this mattered to him:
"He pulled out a blackboard from time to time, a tiny little blackboard we had in our house on Edmonton Avenue, and when I asked, he would answer anything and make diagrams for it. I remember how he showed me what happened if you put a plus voltage into a transistor and got a minus voltage out the other end of the transistor. There must have been an inverter, a type of logical gate. And he even physically taught me how to make an AND gate and an OR gate out of parts he got -- parts called diodes and resistors. And he showed me how they needed a transistor in between to amplify the signal and connect the output of one gate to the input of the other.

To this very moment, that is the way every single digital device on the planet works at its most basic level.

[...]

It's amazing, really, to think that my dad taught me about transistors back when almost no one saw anything but vacuum tubes. So he was at the top of the state of the art, probably because his secret job put him in touch with such advanced technology. So I ended up at the state of the art, too."

There are books you read, and they forever change the way you think. For me, this was one of those books. Aside from the historically interesting technical content, the book also excels in entertainment value. Wozniak describes some of the pranks he loved to pull:

"I got the idea to build a little electronic metronome -- you know, the thing that goes tick, tick, tick, to keep time when people take piano lessons. I built it, heard the ticking, and thought: Hey this sounds like a bomb. So I took some batteries, took the labels off the batteries so they looked like plain metal canisters, and I taped them together. And then I wrote in big letters on it: CONTACT EXPLOSIVE.

I thought: Oh, this will be funny. I'll stick it in Bill Werner's locker. I just happened to know his locker code. Bill's locker was near mine so I put my so-called electronic metronome in. Now, this was in the morning before school, and after I put it in there, I could barely hear it ticking. Nobody was going to be tricked by this if they couldn't even hear it! I'm thinking: What a bummer and what a waste if this thing isn't going to work. But when I came out of my last final that day, my counselor walked up to me and said: "Steve, the vice principal wants to see you in his office." This was a bad sign. [...]

And the principal starts telling me how the English teacher, Mr, Stottlemeier, had heard a ticking sound in the locker. The principal, Mr. Bryld, told me how he opened the locker, clutched the device to his chest, and then ran all the way out to the football field and dismantled it!

I started laughing, even though I was trying not to, so then I tried to cough to cover it up. But I couldn't even do that, because I knew I had rigged the metronome with a switched resistor to start ticking faster when someone opened up the locker door."

I'm not going to ruin the pleasure you'll get reading this book by quoting more stuff. I'll let you enjoy the chapters on TV jamming, phone phreaking, creating the Apple I and II, crashing a plane and other delicious topics ;-)

Even if you do not understand computer science or maths or physics, I think you will appreciate this book and I hope you enjoy it as much as I do.

Thursday, June 7, 2007

Firefox 3 URL "Feature"

On the programing subreddit I got to this link, which describes some new features proposed for Firefox 3.

The one feature that received my attention (in a bad way), is the highlighting of the domain of the current site. For example (taken from here), the URL bar would look like this: http://www.fred.blogspot.com/archive/2007/04/06/mypost.

I do not like this. The first problem is that the shade of gray is not readable. The next wrong thing, which is not that obvious, but more problematic, is that this could actually help phishers. I can only image an innocent looking Firefox user: "But the site was really Bank Example. I'm sure. The browser even highlighted www.example.com for me (where one of the letters is not what it seems)".

Please, please, drop this "feature". There is no use case for it. (Ok. there is a small use case, where if the address is long, with may sub-domains, you don't immediately see the top-level domain...)

Thursday, May 31, 2007

Google Gears

Just today I've seen this link on reddit. and I'm very excited about this. Remember an earlier post about off-line web applications Well, they're closer than ever. Google released Gears, which is a Firefox/IE extension that provides an API for off-line web applications.

On the developer page, google posted 4 demos showing off their API.
  • the first one shows how an offline database can be used to keep track of stuff on the client side, without it going away the second time you visit that webpage
  • the second demo shows how google gears can remember an webpage and display it even when you are offline and the cache is emptied
  • I didn't understand the usefulness of the third demo, but I'm sure there are scenarios which use it
  • the last demo is about asynchronous methods, but I'm not sure this belongs in gears.
I am very excited about Gears, as it seems to work for both Microsoft Internet Explorer and Mozilla Firefox. Google says Safari support is work in progress.

For the moment, I'm quite busy with my thesis, but I'm dying to get my code snippet web application to use Gears.

Also, I can't wait to see a geared up version of Blogger and Gmail. Imagine preparing your posts without the need to access the internet and saving them to the local database. Next time you go online, Blogger will upload whatever you've been working on. Sounds cool, right?

I like Gears because the API seems simple enough and is super useful. However, in the long term, Gears (or whatever other framework for off-line web apps will prevail) will make web applications even more difficult to create. Also, I suppose a whole new range of security issues will be uncovered by this new type of applications.

Wednesday, May 9, 2007

To the Java refactoring maniacs

Every once in a while I have a virtual fight with some Java programmer over refactoring tools. They are not available for C++, Python, whatever. How can you even use an IDE without refactoring tools? Do you know how much productivity you gain? The Java language is the best language in the world; it was designed to be "refactorable". Bullshit!

Let's see. Java is known to be a poor language from expressiveness' point of view. Tools save it somehow, and whenever I try to tell someone there are better languages out there, IDEs are thrown into my face and the refactoring tools they provide are the ultimate arguments for Java.

So I set out to prove that the most simple refactoring tool one can think of is not totally automatable, even in a language as simple as Java. Let's wait no more. I clicked Eclipse in my Applications/Programming menu, went and grabbed a cup of tea while it was loading, came back just to see it's asking about the workspace (first time use), grabbed another cup of tea while waiting, and set out to test my theory.

I entered this code:


import java.lang.Class;
import java.lang.reflect.InvocationTargetException;

public class MainClass {
public static void poorName() {
System.out.println("This is a stupid method.");
}

public static void main(String[] argv) {
try {
MainClass.class.getMethod("poorName", new Class [] {}).invoke(null, new Object[]{});
}
catch (NoSuchMethodException e1) {
System.out.println("there is no method with name poorName.");
}
catch (IllegalAccessException e2) {
System.out.println("there is an illegal access.");
}
catch (InvocationTargetException e3) {
System.out.println("invocation target exception.");
}
}
}


I was initially aiming at fewer lines of code, but then I decided to handle all the exceptions. Simply put, this program looks for a method called poorName in MainClass, using introspection. It then tries to call that method. If the method is not found or whatever, an error message is printed.

As I run this program, I get the nice message:


This is a stupid method.


Then, because I don't like the name of the method, I set out to rename it. Of course, what better way of renaming a method than right-clicking on it's name, and selecting "Rename" from the "Refactor" menu?

Tadam, no error message, method was renamed. Resulting code?


import java.lang.Class;
import java.lang.reflect.InvocationTargetException;

public class MainClass {
public static void goodName() {
System.out.println("This is a stupid method.");
}

public static void main(String[] argv) {
try {
MainClass.class.getMethod("poorName", new Class [] {}).invoke(null, new Object[]{});
}
catch (NoSuchMethodException e1) {
System.out.println("there is no method with name poorName.");
}
catch (IllegalAccessException e2) {
System.out.println("there is an illegal access.");
}
catch (InvocationTargetException e3) {
System.out.println("invocation target exception.");
}
}
}


When I run this code, guess what? It changed behavior. It now prints:


there is no method with name poorName.


which is true, but sad. There you go: method rename is not computable (anyone care for the mathematical proof?), even in a language as simple as Java (well, the language is not as simple as it used to be, but still).

To all Java refactoring maniacs out there: the next time you want to tell me you're not looking at another language because it doesn't provide automated refactoring tools, please think hard about why that happens in the first place.

Friday, March 2, 2007

Language feature request: define by renaming

A few weeks back, I was looking at Prism. Prism is a probabilistic model checker. This means it can answer interesting questions about your model. For example, when designing a protocol, you can ask it: what is the probability that a message is lost? Or, when building a model of a soft realtime system, you can ask it: what is the probability that an operation will take longer than 100ms. etc.

Anyway, that's besides the point. What I want to talk about today is a feature that I've seen in Prism's model description language (which looks a lot like Promela). Here is the second part of the Prism tutorial. It introduces a language feature that I loved instantly. It is called module renaming and it rocks.

Module renaming basically works in the following way: take this function, which does this. Then create another function, which is exactly the same, but replace this symbol by another.

I can imagine how this can work for a general-purpose programming language. Define a function which sums two numbers:

pseudocode: sum x y = x + y


Then define the product to be like the sum, except replace + by *.

pseudocode: product = sum where + = *


That's nice. So I thought I'd try and see if I can implement this using lisp macros. I don't have much experience with lisp, but here's how I sort-of did it:

First thing's first. Create a function that takes a list and replaces every "source" with "target".

(defun rename-symbol (list source target)
(if (eq list nil)
nil
(cons
(if (listp (first list))
(rename-symbol (first list) source target)
(progn
(if (eq (first list) source)
target
(first list))))
(rename-symbol (rest list) source target))))


Here's how it works:

CL-USER> (rename-symbol `(+ 1 2) '+ '*)
(* 1 2)


Well, I actually want to perform multiple renamings, so I replaced source and target with an association list (sort of like a hash table):

(defun association (item alist)
(rest (assoc item alist)))

(defun rename-symbols (list alist)
(if (eq list nil)
nil
(cons
(if (listp (first list))
(rename-symbols (first list) alist)
(if (not (eq (association (first list) alist) nil))
(association (first list) alist)
(first list)))
(rename-symbols (rest list) alist))))


Which does the same thing, except it takes an association list and can replace multiple symbols. Here's how it works:

CL-USER> (rename-symbols `(defun sum (a b) (+ a b)) '((+ . *) (sum . product)))
(DEFUN PRODUCT (A B) (* A B))


See where we're going? Now let's write a simple macro:

(defun rename (list alist)
(eval (rename-symbols list alist)))


And here's how Lisp's syntax can be extended to allow renaming:

(setq my-function `(defun sum (a b) (+ a b)))

(rename my-function nil)
(rename my-function '((+ . *) (sum . product)))


When evaluating the above, you'll end up with two functions, sum and product, which do what you expect.

Disclaimer: I am a Lisp beginner and I've probably made some mistakes.

I welcome any and all corrections/improvements/suggestions. I would be especially interested in creating a rename macro that works as follows:

(defun sum (a b) (+ a b))
(rename sum product '((+ . *)))


If there is any Lisp guru reading this, please let me know ;-)

Also, I am curios about how this could be implemented in languages such as Ocaml or Haskell. Some of the readers will be quick to point out that this feature is not really required, because you can get the same benefit by creating a function which is more generic that product and sum, which would look like this in Haskell:

combine op x y = op x y
sum = combine (+)
product = combine (*)


But I find the rename feature extremely geeky and cool. So let me have it :D.