17. A: Coding Style▲
This appendix is not about
indenting and placement of parentheses and curly braces, although that will be
mentioned. It is about the general guidelines used in
this book for organizing the code
listings.
Although many of these issues have been introduced throughout the book, this appendix appears at the end so it can be assumed that every topic is fair game, and if you don't understand something you can look it up in the appropriate section.
All the decisions about coding style in this book have been deliberately considered and made, sometimes over a period of years. Of course, everyone has their reasons for organizing code the way they do, and I'm just trying to tell you how I arrived at mine and the constraints and environmental factors that brought me to those decisions.
General
In the text of this book, identifiers (function, variable, and class names) are set in bold. Most keywords will also be set in bold, except for those keywords that are used so much that the bolding can become tedious, such as “class” and “virtual.”
I use a particular coding style for the examples in this book. It was developed over a number of years, and was partially inspired by Bjarne Stroustrup's style in his original The C++ Programming Language.(64) The subject of formatting style is good for hours of hot debate, so I'll just say I'm not trying to dictate correct style via my examples; I have my own motivation for using the style that I do. Because C++ is a free-form programming language, you can continue to use whatever style you're comfortable with.
That said, I will note that it is important to have a consistent formatting style within a project. If you search the Internet, you will find a number of tools that can be used to reformat all the code in your project to achieve this valuable consistency.
The programs in this book are files that are automatically extracted from the text of the book, which allows them to be tested to ensure that they work correctly. Thus, the code files printed in the book should all work without compile-time errors when compiled with an implementation that conforms to Standard C++ (note that not all compilers support all language features). The errors that should cause compile-time error messages are commented out with the comment //! so they can be easily discovered and tested using automatic means. Errors discovered and reported to the author will appear first in the electronic version of the book (at www.BruceEckel.com) and later in updates of the book.
One of the standards in this book is that all programs will compile and link without errors (although they will sometimes cause warnings). To this end, some of the programs, which demonstrate only a coding example and don't represent stand-alone programs, will have empty main( ) functions, like this
int
main() {}
This allows the linker to complete without an error.
The standard for main( ) is to return an int, but Standard C++ states that if there is no return statement inside main( ), the compiler will automatically generate code to return 0. This option (no return statement in main( ))will be used in this book (some compilers may still generate warnings for this, but those are not compliant with Standard C++).
File names
In C, it has been traditional to name header files (containing declarations) with an extension of .h and implementation files (that cause storage to be allocated and code to be generated) with an extension of .c. C++ went through an evolution. It was first developed on Unix, where the operating system was aware of upper and lower case in file names. The original file names were simply capitalized versions of the C extensions: .H and .C. This of course didn't work for operating systems that didn't distinguish upper and lower case, such as DOS. DOS C++ vendors used extensions of hxx and cxx for header files and implementation files, respectively, or hpp and cpp. Later, someone figured out that the only reason you needed a different extension for a file was so the compiler could determine whether to compile it as a C or C++ file. Because the compiler never compiled header files directly, only the implementation file extension needed to be changed. The custom, across virtually all systems, has now become to use cpp for implementation files and h for header files. Note that when including Standard C++ header files, the option of having no file name extension is used, i.e.: #include <iostream>.
Begin and end comment tags
A very important issue with this book is that all code that you see in the book must be verified to be correct (with at least one compiler). This is accomplished by automatically extracting the files from the book. To facilitate this, all code listings that are meant to be compiled (as opposed to code fragments, of which there are few) have comment tags at the beginning and end. These tags are used by the code-extraction tool ExtractCode.cpp in Volume 2 of this book (which you can find on the Web site www.BruceEckel.com) to pull each code listing out of the plain-ASCII text version of this book.
The end-listing tag simply tells ExtractCode.cpp that it's the end of the listing, but the begin-listing tag is followed by information about what subdirectory the file belongs in (generally organized by chapters, so a file that belongs in Chapter 8 would have a tag of C08), followed by a colon and the name of the listing file.
Because ExtractCode.cpp also creates a makefile for each subdirectory, information about how a program is made and the command-line used to test it is also incorporated into the listings. If a program is stand-alone (it doesn't need to be linked with anything else) it has no extra information. This is also true for header files. However, if it doesn't contain a main( ) and is meant to be linked with something else, then it has an {O} after the file name. If this listing is meant to be the main program but needs to be linked with other components, there's a separate line that begins with //{L} and continues with all the files that need to be linked (without extensions, since those can vary from platform to platform).
You can find examples throughout the book.
If a file should be extracted but the begin- and end-tags should not be included in the extracted file (for example, if it's a file of test data) then the begin-tag is immediately followed by a ‘!'.
Parentheses, braces, and indentation
You may notice the formatting style in this book is different from many traditional C styles. Of course, everyone thinks their own style is the most rational. However, the style used here has a simple logic behind it, which will be presented here mixed in with ideas on why some of the other styles developed.
The formatting style is motivated by one thing: presentation, both in print and in live seminars. You may feel your needs are different because you don't make a lot of presentations. However, working code is read much more than it is written, and so it should be easy for the reader to perceive. My two most important criteria are “scannability” (how easy it is for the reader to grasp the meaning of a single line) and the number of lines that can fit on a page. This latter may sound funny, but when you are giving a live presentation, it's very distracting for the audience if the presenter must shuffle back and forth between slides, and a few wasted lines can cause this.
Everyone seems to agree that code inside braces should be indented. What people don't agree on - and the place where there's the most inconsistency within formatting styles - is this: Where does the opening brace go? This one question, I think, is what causes such variations among coding styles (For an enumeration of coding styles, see C++ Programming Guidelines, by Tom Plum and Dan Saks, Plum Hall 1991.) I'll try to convince you that many of today's coding styles come from pre-Standard C constraints (before function prototypes) and are thus inappropriate now.
First, my answer to that key question: the opening brace should always go on the same line as the “precursor” (by which I mean “whatever the body is about: a class, function, object definition, if statement, etc.”). This is a single, consistent rule I apply to all of the code I write, and it makes formatting much simpler. It makes the “scannability” easier - when you look at this line:
int
func(int
a);
you know, by the semicolon at the end of the line, that this is a declaration and it goes no further, but when you see the line:
int
func(int
a) {
you immediately know it's a definition because the line finishes with an opening brace, not a semicolon. By using this approach, there's no difference in where you place the opening parenthesis for a multi-line definition:
int
func(int
a) {
int
b =
a +
1
;
return
b *
2
;
}
and for a single-line definition that is often used for inlines:
int
func(int
a) {
return
(a +
1
) *
2
; }
Similarly, for a class:
class
Thing;
is a class name declaration, and
class
Thing {
is a class definition. You can tell by looking at the single line in all cases whether it's a declaration or definition. And of course, putting the opening brace on the same line, instead of a line by itself, allows you to fit more lines on a page.
So why do we have so many other styles? In particular, you'll notice that most people create classes following the style above (which Stroustrup uses in all editions of his book The C++ Programming Language from Addison-Wesley) but create function definitions by putting the opening brace on a single line by itself (which also engenders many different indentation styles). Stroustrup does this except for short inline functions. With the approach I describe here, everything is consistent - you name whatever it is (class, function, enum, etc.) and on that same line you put the opening brace to indicate that the body for this thing is about to follow. Also, the opening brace is the same for short inlines and ordinary function definitions.
I assert that the style of function definition used by many folks comes from pre-function-prototyping C, in which you didn't declare the arguments inside the parentheses, but instead between the closing parenthesis and the opening curly brace (this shows C's assembly-language roots):
void
bar()
int
x;
float
y;
{
/* body here */
}
Here, it would be quite ungainly to put the opening brace on the same line, so no one did it. However, they did make various decisions about whether the braces should be indented with the body of the code or whether they should be at the level of the “precursor.” Thus, we got many different formatting styles.
There are other arguments for placing the brace on the line immediately following the declaration (of a class, struct, function, etc.). The following came from a reader, and is presented here so you know what the issues are:
Experienced ‘vi' (vim) users know that typing the ‘]' key twice will take the user to the next occurrence of ‘{‘ (or ^L) in column 0. This feature is extremely useful in navigating code (jumping to the next function or class definition). [My comment: when I was initially working under Unix, GNU Emacs was just appearing and I became enmeshed in that. As a result, ‘vi' has never made sense to me, and thus I do not think in terms of “column 0 locations.” However, there is a fair contingent of ‘vi' users out there, and they are affected by this issue.]
Placing the ‘{‘ on the next line eliminates some confusing code in complex conditionals, aiding in the scannability. Example:
if
(cond1
&&
cond2
&&
cond3) {
statement;
}
The above [asserts the reader] has poor scannability. However,
if
(cond1
&&
cond2
&&
cond3)
{
statement;
}
breaks up the ‘if' from the body, resulting in better readability. [Your opinions on whether this is true will vary depending on what you're used to.]
Finally, it's much easier to visually align braces when they are aligned in the same column. They visually "stick out" much better. [End of reader comment]
The issue of where to put the opening curly brace is probably the most discordant issue. I've learned to scan both forms, and in the end it comes down to what you've grown comfortable with. However, I note that the official Java coding standard (found on Sun's Java Web site) is effectively the same as the one I present here - since more folks are beginning to program in both languages, the consistency between coding styles may be helpful.
The approach I use removes all the exceptions and special cases, and logically produces a single style of indentation as well. Even within a function body, the consistency holds, as in:
for
(int
i =
0
; i <
100
; i++
) {
cout <<
i <<
endl;
cout <<
x *
i <<
endl;
}
The style is easy to teach and to remember - you use a single, consistent rule for all your formatting, not one for classes, two for functions (one-line inlines vs. multi-line), and possibly others for for loops, if statements, etc. The consistency alone, I think, makes it worthy of consideration. Above all, C++ is a newer language than C, and although we must make many concessions to C, we shouldn't be carrying too many artifacts with us that cause problems in the future. Small problems multiplied by many lines of code become big problems. For a thorough examination of the subject, albeit in C, see C Style: Standards and Guidelines, by David Straker (Prentice-Hall 1992).
The other constraint I must work under is the line width, since the book has a limitation of 50 characters. What happens when something is too long to fit on one line? Well, again I strive to have a consistent policy for the way lines are broken up, so they can be easily viewed. As long as something is part of a single definition, argument list, etc., continuation lines should be indented one level in from the beginning of that definition, argument list, etc.
Identifier names
Those familiar with Java will notice that I have switched to using the standard Java style for all identifier names. However, I cannot be completely consistent here because identifiers in the Standard C and C++ libraries do not follow this style.
The style is quite straightforward. The first letter of an identifier is only capitalized if that identifier is a class. If it is a function or variable, then the first letter is lowercase. The rest of the identifier consists of one or more words, run together but distinguished by capitalizing each word. So a class looks like this:
class
FrenchVanilla : public
IceCream {
an object identifier looks like this:
FrenchVanilla myIceCreamCone(3
);
and a function looks like this:
void
eatIceCreamCone();
(for either a member function or a regular function).
The one exception is for compile-time constants (const or #define), in which all of the letters in the identifier are uppercase.
The value of the style is that capitalization has meaning - you can see from the first letter whether you're talking about a class or an object/method. This is especially useful when static class members are accessed.
Order of header inclusion
Headers are included in order from “the most specific to the most general.” That is, any header files in the local directory are included first, then any of my own “tool” headers, such as require.h, then any third-party library headers, then the Standard C++ Library headers, and finally the C library headers.
The justification for this comes from John Lakos in Large-Scale C++ Software Design (Addison-Wesley, 1996):
Latent usage errors can be avoided by ensuring that the .h file of a component parses by itself - without externally-provided declarations or definitions... Including the .h file as the very first line of the .c file ensures that no critical piece of information intrinsic to the physical interface of the component is missing from the .h file (or, if there is, that you will find out about it as soon as you try to compile the .c file).
If the order of header inclusion goes “from most specific to most general,” then it's more likely that if your header doesn't parse by itself, you'll find out about it sooner and prevent annoyances down the road.
Include guards on header files
Include guards are always used inside header files to prevent multiple inclusion of a header file during the compilation of a single .cpp file. The include guards are implemented using a preprocessor #define and checking to see that a name hasn't already been defined. The name used for the guard is based on the name of the header file, with all letters of the file name uppercase and replacing the ‘.' with an underscore. For example:
// IncludeGuard.h
#ifndef INCLUDEGUARD_H
#define INCLUDEGUARD_H
// Body of header file here...
#endif
// INCLUDEGUARD_H
The identifier on the last line is included for clarity. Although some preprocessors ignored any characters after an #endif, that isn't standard behavior and so the identifier is commented.
Use of namespaces
In header files, any “pollution” of the namespace in which the header is included must be scrupulously avoided. That is, if you change the namespace outside of a function or class, you will cause that change to occur for any file that includes your header, resulting in all kinds of problems. No using declarations of any kind are allowed outside of function definitions, and no global using directives are allowed in header files.
In cpp files, any global using directives will only affect that file, and so in this book they are generally used to produce more easily-readable code, especially in small programs.
Use of require( ) and assure( )
The require( ) and assure( ) functions defined in require.h are used consistently throughout most of the book, so that they may properly report problems. If you are familiar with the concepts of preconditions and postconditions (introduced by Bertrand Meyer) you will recognize that the use of require( ) and assure( ) more or less provide preconditions (usually) and postconditions (occasionally). Thus, at the beginning of a function, before any of the “core” of the function is executed, the preconditions are checked to make sure everything is proper and that all of the necessary conditions are correct. Then the “core” of the function is executed, and sometimes some postconditions are checked to make sure that the new state of the data is within defined parameters. You'll notice that the postcondition checks are rare in this book, and assure( ) is primarily used to make sure that files were opened successfully.