*** The PQC reduced-C compiler *** Copyright (C) Pekka Paalanen 2003 Course project on course "Languages, compilers and interpreters", 2003 spring, Lappeenranta University of Technology http://www.lut.fi http://www.it.lut.fi/kurssit/02-03/010730001/projects.html Author Pekka Paalanen pq@iki.fi http://www.iki.fi/pq/ Compiling A simple 'make' will make the compiler. Other makeables are 'clean', removes intermediate files, and 'mrclean' which removes all executables, .s and .tac files. Usage ./pqc [-s] [-t] [-n] [-O#] file.c It produces file.s from file.c and uses GCC to compile file.s. Without any options pqc prints help. Options -s Create file file.tac and print symbol table contents there. -t Create file file.tac and print TAC list there. -n Optimization feature, see optimization level 3. -O# Set Optimization level to #, values 0..3, default 3. Optimization level 0 No optimizations. level 1 Constant expressions and comparisons are evaluated compile time. level 2 Level 1 and jump/label optimizations. Removes unnecessary jumps and labels and removes dead code. level 3 Level 2 and unused expressions optimization. Operations, from which result is not used, are deleted recursively. By default all user defined (explicit) variables are protected and will never be deleted. However this can be overridden with -n. The algorithm is not supposed to work well with this. Don't use it. The included test.c was compiled with each of the optimization levels and the number of lines in resulting assembly files were: level 0: 356 test.s level 1: 301 test.s level 2: 247 test.s level 3: 201 test.s The -n option was not included since it produces broken code. Compiler internals -------------------- The compiler passes Valgrind test cleanly. Memory handling is perfect :-) Source files parser.y, 782 lines Bison (YACC) source. Home of the grammar and main(). lexer.l, 204 lines Flex source. Strips comments, identifies tokens from input. hashtable.c, 431 lines hashtable.h, 62 lines A generic hash table implementation, for both string and integer type keys. symtable.c, 118 lines symtable.h, 45 lines Symbol table implementation. Uses hash table. labeltable.c, 326 lines labeltable.h, 28 lines Jump/label optimization code. Contains a hash table for storing jump from -instruction addresses and the optimization routines. Backtrace function (debug mode). tac.c, 236 lines tac.h, 77 lines Three address code (TAC) related code. Tac, taclist and tacstack (of taclists) handling. Also an integer stack implementation. asmgen_x86.c, 404 lines asmgen_x86.h, 35 lines Dumb TAC to assembly converter. trace.gdb Gnu Debugger script used to create stack traces. Required by backtrace(). test.c C source code used to test the compiler. Makefile Debug mode can be turned on by setting USE_DEBUG=yes. It is very verbose on parser and optimizers. Known bugs The alredy mentioned -n switch. TAC instructions are scanned in reverse order without taking jumps into account. This might lead to removing too many instructions. Jump/label optimizer has a theoretical weakness. It cannot cope with TAC like this: ... JMP .L01 ... .L02: ... .L03: ... JMP .L02 .L01: JMP .L03 What it does is that it sees the jump to .L01 instruction and starts to delete dead code right after the jump. It sees the .L02 label, but because the jump instruction to that label is between .L01 jump and target, it decides it is dead code. It continues deleting code and then sees label .L03. Now the jump to .L03 is after label .L01 so it is regarded as live code and deleting stops. As we can see, the label .L02 and everything between it and label .L03 is actually live code. I do not know how serious this weakness is because I do not know how to procude TAC like that with accepted source grammar. If it's a problem jump/label optimization can be disabled as a whole. Lexer/parser can't do string concatenation. The compiler cannot compile itself. $Id: readme.txt,v 1.4 2003/07/12 07:54:09 pq Exp $