Writing a Quine in C

A Quine is a computer program that outputs its own source code. I've always been under the impression that in order to really call yourself a hacker (the good kind; not a cracker, the evil kind) you need to be able to write a quine.

For more enlightenment about what a quine is, I refer you to the Quine article on the Wikipedia.

I came across Russ Cox's article Zip Files All The Way Down recently, which really motivated me to take a stab at writing my own quine. At the beginning of the article he describes a short quine in Python, which uses the repr() function to format a string for the print statement. My quine builds on this idea.

So let's get started. Here's a first attempt (we'll fix the formatting later):

char*s="char*s=\"%s\";\n"
 "int main() \n{\n\tprintf(s,s);\n}";
int main() {
 printf(s,s);
}
The code is built around the printf() statement in the main function. We want to write the string s (the char*s at the top) in such a way that it is at the same time the format string for the printf() and the program itself.

If you compile and run it, ignoring the warnings for now, you get this output:

char*s="char*s="%s";
int main()
{
 printf(s,s);
}",
int main()
{
 printf(s,s);
}
which is definitely not valid output. But it should be obvious that the problem is that the lexical analyser of the C compiler is the enemy here, because it changes all the escape sequences in s (changing "\n" to a newline and "\"" to a "). So to fix it, we need to do the opposite of what the lexical analyser does: change the newlines to "\n" sequences and so on. To do that, I introduce the function destrip():
char*s="char*s=\"%s\";\n"
 "int main() \n{\n\treturn printf(s,destrip(s));\n}";
char*destrip(char*s){
  static char b[512];
  char *p;
  for(p=b;*s;s++)
    if(*s == '\n') {*p++='\\';*p++='n';}
    else if(*s == '\t') {*p++='\\';*p++='t';}
    else if(*s == '\"') {*p++='\\';*p++='\"';}
    else if(*s == '\\') {*p++='\\';*p++='\\';}
    else *p++=*s;
  *p=0;
  return b;
}
int main() {
  return printf(s,destrip(s));
}
Compile and run it again. This is the output:
char*s="char*s=\"%s\";\nint main() \n{\n\treturn printf(s,destrip(s));\n}";
int main()
{
return printf(s,destrip(s));
}
Better, but now the problem is to get the code for destrip() into the output as well. So I introduce a new string t that will contain the string with the code for destrip(). First I need to make some small modifications:
char*s="char*s=\"%s\",\n"
"*t=\"%s\";\n"
"%s\n"
"int main() \n{\n\treturn printf(s,destrip(s),destrip(t),t);\n}",
*t="";
char*destrip(char*s){
static char b[2][512];
static int c = 0;
char *p;
for(p=b1;*s;s++)
if(*s == '\n') {*p++='\\';*p++='n';}
else if(*s == '\t') {*p++='\\';*p++='t';}
else if(*s == '\"') {*p++='\\';*p++='\"';}
else if(*s == '\\') {*p++='\\';*p++='\\';}
else *p++=*s;
*p=0;
return b[c-1];
}
int main() {
return printf(s,destrip(s),destrip(t),t);
}
The big modification here is that the program will now output the variable t as well. The problem is that the original destrip() uses a static buffer wherein it writes the output it returns, which is a problem when you try to call it more than once, as I do in the new version of the main() function. The easiest fix is to just use two static buffers, like I do

It produces this output:

char*s="char*s=\"%s\",\n*t=\"%s\";\n%s\nint main() \n{\n\treturn printf(s,destrip(s),destrip(t),t);\n}",
*t="";
 
int main()
{
 return printf(s,destrip(s),destrip(t),t);
}
Almost there. The tricky part is now to escape the code of destrip() properly to fill in the variable t. Here is the final version:
#include <stdio.h>
char*s="#include <stdio.h>\nchar*s=\"%s\",\n"
"*t=\"%s\";\n"
"%s\n"
"int main() \n{\n\treturn printf(s,destrip(s),destrip(t),t);\n}",
*t="char*destrip(char*s){"
"\n\tstatic char b[2][512];"
"\n\tstatic int c = 0;"
"\n\tchar *p;"
"\n\tfor(p=b1;*s;s++)"
"\n\t\tif(*s == '\\n') {*p++='\\\\';*p++='n';}"
"\n\t\telse if(*s == '\\t') {*p++='\\\\';*p++='t';}"
"\n\t\telse if(*s == '\"') {*p++='\\\\';*p++='\"';}"
"\n\t\telse if(*s == '\\\\') {*p++='\\\\';*p++='\\\\';}"
"\n\t\telse *p++=*s;"
"\n\t*p=0;"
"\n\treturn b[c-1];\n"
"}";
char*destrip(char*s){
	static char b[2][512];
	static int c = 0;
	char *p;
	for(p=b1;*s;s++)
		if(*s == '\n') {*p++='\\';*p++='n';}
	else if(*s == '\t') {*p++='\\';*p++='t';}
	else if(*s == '\"') {*p++='\\';*p++='\"';}
	else if(*s == '\\') {*p++='\\';*p++='\\';}
	else *p++=*s;
	*p=0;
	return b[c-1];
}
int main() {
	return printf(s,destrip(s),destrip(t),t);
}

And we're done! Actually, not quite. Here's the output the program now produces:

#include <stdio.h>
char*s="#include <stdio.h>\nchar*s=\"%s\",\n*t=\"%s\";\n%s\nint main() \n{\n\treturn printf(s,destrip(s),destrip(t),t);\n}",
*t="char*destrip(char*s){\n\tstatic char b[2][512];\n\tstatic int c = 0;\n\tchar *p;\n\tfor(p=b1;*s;s++)\n\t\tif(*s == '\\n') {*p++='\\\\';*p++='n';}\n\t\telse if(*s == '\\t') {*p++='\\\\';*p++='t';}\n\t\telse if(*s == '\"') {*p++='\\\\';*p++='\"';}\n\t\telse if(*s == '\\\\') {*p++='\\\\';*p++='\\\\';}\n\t\telse *p++=*s;\n\t*p=0;\n\treturn b[c-1];\n}";
char*destrip(char*s){
	static char b[2][512];
	static int c = 0;
	char *p;
	for(p=b1;*s;s++)
	if(*s == '\n') {*p++='\\';*p++='n';}
	else if(*s == '\t') {*p++='\\';*p++='t';}
	else if(*s == '"') {*p++='\\';*p++='"';}
	else if(*s == '\\') {*p++='\\';*p++='\\';}
	else *p++=*s;
	*p=0;
	return b[c-1];
}
int main()
{
	return printf(s,destrip(s),destrip(t),t);
}
If you look at it, it is equivalent to the input. The only real difference is that my original code was formatted slightly differently and I made use of the fact that multiple string literals following each other are combined into one by the C compiler to keep my input code readable, so the strings in the input and the output are equivalent. If you save the above output to out.c and repeat the following commands in your shell, you'll find that it always produces the same output.
$ gcc -o out -Wall out.c ; ./out > out.c ; cat out.c
Which concludes my quine.

The code can be cleaned up a bit: The strings s and t can be combined into one, so that you don't need to call destrip() twice in the printf() which in turn allows you to use a single static buffer within destrip(). I'm not worrying about that at the moment.

Where do you go from here? I highly recommend the above mentioned Zip Files All The Way Down article. Although it starts with a Python quine, it is actually about the Zip equivalent of a Quine: If you view a zip file as a small program that's interpreted by the extractor, then it should be possible to write the zip file as a quine (and in the article he proves that indeed it is). The final result of his article is a zip file that when you extract it produces the same zip file. It is really cool to read, and I think it separates him as a master hacker from the rest of us.