The TUNES C translator subproject

"C"

C is a truely ineffectual language which won't be argued here. It's been more reknowned as the portable assembler (with buggy specifications), but is not even good at that.

However, it is the One Standard for system programming, and lots of software, free or not, have been or are being written in C.

That is why we must provide tools to allow smooth transition from C to better languages, and reuse of old code. Particularly, lots of device drivers for many different kind of hardware are written in C, and may be reused if such a project succeeds.

Such tools should enable to compile "C" code to other languages. "other languages" may mean some low-level language to be compiled or interpreted by a compiler-backend, run-time or cpu, and thus the C translator would be used just as yet another C compiler. But even if you're only interested in efficiently compiling C, straightforward translation from C to low-level language is not possible, if the target architecture does not resemble the C programming model of a sequential Von Neuman Machine with linear memory (take for instance, a cluster of MISC).

To allow full reuse of existing "C" sources, more must be possible: these tools should be able to understand the semantics of C routines inside their context, and to generate (with human help and selection) human-understandable programs in a higher-level language, that would express the same general algorithm, and include the C specificities too, but only as annotations that allow to reverse the translation, or to give additional information about ways to efficiently implement the algorithm in low-level contexts like the one from which it was extracted. This way, recycled programs should perform at least as well as the original. That is, the translator will isolate and render back the very subtance of the C program, stripped from its crippling C dependencies, limitations and bogus "optimizations"; it would factor the C program into a generic algorithm and its low-level C'ish bindings.

The C translator could even be used as a C-to-C translator, that'd allow easier manipulation, debugging and maintenance of C programs! By keeping track of all modifications, patches from convergent sources could be merged, whether they are expressed at a low- or high-level; this means that patches should be first-class, and can also be analyzed and factored into various levels of abstraction. The translator may also be used to generate interfaces between code produced by a C compiler and code written in other languages. Applications of such a translator are infinite, and would benefit the universe of C programmers at least as much as it would benefit those who want to get rid of C.

Examples about translating C junk into real programs

You may want to see again the Sieve of Eratosthene; the C translator should eventually be able to find or recognize the most generic form of the sieve program from any lame C version. That's hard, but that's doable (especially with human help).

Side effects

C is a stubborn language which (like Pascal) does not know dynamic creation of objects, and thus forces people to statically define variables that will be modified instead. Being used to having that badly implemented, people often reuse such variables to put conceptually different values inside, to optimize "space". How it sucks!

int t ;
...
t = expr1 ;
...
t = expr2 ;
...
should be translated into something like
...
let t1 = expr1 in
...
let t2 = expr2 in
...

Calling conventions

C allows only return of atomic types or explicit pointers. Even C++, without clear GC policy, makes returning of structured values difficult. Thus, we often see such horrors as:

error_return_type fun_name (argument_type arg,return_type * return_location);
or
return_type fun_name (argument_type * pointer_to_arg);

in C programs.

That's only stupid C dependencies. The C translator should be able to handle that into using exceptions, structured return values, and object arguments.

Preprocessor

C has a braindead preprocessor, with crippled macro-expansion. This makes C programming (and maintaining) quite harder, but C translating easier than would have been if the preprocessor was more powerful but as bogus designed as C.

#define's should be translated into constant (and higher-level if using parameters or concatenation) constructs, and lists of #define's as well as enum's should be translated into high-level case constructs. A few #define rules that do not produce a unique grammatical unit, would require more hacking. All definitions in standard headers ought to be nice, though.

#if's may be expanded at translation time to achieve actual compilation, or kept as higher-level constructs to achieve separate translation of trickier modules.

Typing

Because the C type system is so unexpressive, many programs use hacks that may be translated into well-typed programs with a richer type system. Sometime, the same portion of "C" code can be translated into many differently typed HLL programs.

Some low-level access to objects through pointers should be translated into explicit typecasting through "regular" function call. The HLL may or may not implement it in the straightforward C way, according to its finding a better way or not. But in any case, the typecast is made well-documented and well-specified in the translated, so it is ready for higher-order transformations.

Recycling typing information from such programs as lclint might be a good thing.

To Do