C Programming/Basics of compilation
Having covered the basic concepts of C programming, we can now briefly discuss the process of compilation.
Like any programming language, C by itself is completely incomprehensible to a microprocessor. Its purpose is to provide an intuitive way for humans to provide instructions that can be easily converted into machine code that is comprehensible to a microprocessor. The compiler is what translates our human-readable source code into machine code.
To those new to programming, this seems fairly simple. A naive compiler might read in every source file, translate everything into machine code, and write out an executable. That could work, but has two serious problems. First, for a large project, the computer may not have enough memory to read all of the source code at once. Second, if you make a change to a single source file, you would have to recompile the entire application.
To deal with these problems, compilers break the job into steps. For each source file (each .c file), the compiler reads the file, reads the files it references via the #include directive, and translates them to machine code. The result of this is an "object file" (.o). After all the object files are created, a "linker" program collects all of the object files and writes the actual executable program. That way, if you change one source file, only that file needs to be recompiled, after which, the application will need to be re-linked.
Without going into details, it can be beneficial to have a superficial understanding of the compilation process.
Preprocessor
[edit | edit source]The preprocessor provides the ability for the inclusion of so called header files, macro expansions, conditional compilation and line control. These features can be accessed by inserting the appropriate preprocessor directives into your code. Before compiling the source code, a special program, called the preprocessor, scans the source code for special instructions starting with (#) and replaces them with other code or data according to specific rules. You can think of the preprocessor as a non-interactive text editor that modifies your code to prepare it for compilation.
You can see one preprocessor directive in the Hello world program:
#include <stdio.h> This directive causes the stdio.h to be included into your source code, as if it had been copied and pasted into your file.
If we remove this line and try to recompile the program, the compiler will tell us that puts is not defined. The C programming language itself doesn't have I/O abilities built into it. It does, however, provide us with a standard library containing, among other things, I/O functions which we can link into our program. stdio.h is part of that standard library. stdio.h doesn't contain any code for puts itself; instead, it contains information that tells the C compiler how to call that function. The standard library code for puts will be 'glued', or linked, to your program later in the compilation process.
Syntax checking
[edit | edit source]This step ensures that the code is valid and will sequence into an executable program. Under most compilers, you may get messages or warnings indicating potential issues with your program (such as a conditional statement always being true or false, etc.)
When an error is detected in the program, the compiler will normally report the file name and line that is preventing compilation.
Object code
[edit | edit source]The compiler produces a machine code equivalent of the source code that can be linked into the final program. At this point the code itself can't be executed, as it requires linking to do so.
Compilation is a "one way street". That is, compiling a C source file into machine code is easy, but "decompiling" (turning machine code into the C source that creates it) is not. Decompilers for C do exist, but the code they create is hard to understand and only useful for reverse engineering.
Linking
[edit | edit source]Linking combines the separate object files into one complete program by integrating libraries and the code and producing either an executable program or a library. Linking is performed by a linker program, which is often part of a compiler suite.
The code for any C standard library functions your program uses is 'glued' with the object code for your program in this stage. Linking is a complex topic, but know that there are two types of linking: static linking, where executable code from the 'glued' library is copied directly into your program, and dynamic linking, where executable code from the 'glued' library lives in a different file on your computer. Most C compilers will automatically link the C standard library with your program in one of these ways.
Common errors during this stage are either missing or duplicate functions.
Automation
[edit | edit source]For large C projects, many programmers choose to automate compilation, both in order to reduce user interaction requirements and to speed up the process by recompiling only modified files. Most IDEs have some kind of project management which makes such automation very easy. However, the project management files are often usable only by users of the same integrated development environment, so anyone desiring to modify the project would need to use the same IDE.
On Unix-like systems, make and Makefiles are often used to accomplish the same. Make is a simple build system and is available as one of the standard developer tools on most Unix and GNU distributions. Other popular build systems for programs written in the C language include Meson build system, GNU Autotools, CMake and Waf.
A simple Makefile for a one-file program using GCC looks like this:
SOURCE=source.c PROGRAM=program $(PROGRAM): $(SOURCE) gcc -Wall -o $(PROGRAM) $(SOURCE) Edit |
| Makefiles require hard tabs; spaces can't be used for indentation, unlike in C! |
At the command line, run make from the source code directory, and your compiled executable will be created with the name program.