Thinking in C++ - Volume 2

Date de publication : 25/01/2007 , Date de mise à jour : 25/01/2007


2.1. Strings in Depth
2.1.1. What's in a string?  
2.1.2. Creating and initializing C++ strings
2.1.3. Operating on strings
2.1.3.1. Appending, inserting, and concatenating strings
2.1.3.2. Replacing string characters
2.1.3.3. Concatenation using nonmember overloaded operators
2.1.4. Searching in strings
2.1.4.1. Finding in reverse
2.1.4.2. Finding first/last of a set of characters
2.1.4.3. Removing characters from strings
2.1.4.4. Comparing strings
2.1.4.5. Strings and character traits
2.1.5. A string application
2.1.6. Summary
2.1.7. Exercises


2.1. Strings in Depth

String processing with character arrays is one of the biggest time–wasters in C. Character arrays require the programmer to keep track of the difference between static quoted strings and arrays created on the stack and the heap, and the fact that sometimes you're passing around a char* and sometimes you must copy the whole array.

Especially because string manipulation is so common, character arrays are a great source of misunderstandings and bugs. Despite this, creating string classes remained a common exercise for beginning C++ programmers for many years. The Standard C++ library string class solves the problem of character array manipulation once and for all, keeping track of memory even during assignments and copy-constructions. You simply don't need to think about it.

This chapter(31) examines the Standard C++ string class, beginning with a look at what constitutes a C++ string and how the C++ version differs from a traditional C character array. You'll learn about operations and manipulations using string objects, and you'll see how C++ strings accommodate variation in character sets and string data conversion.

Handling text is one of the oldest programming applications, so it's not surprising that the C++ string draws heavily on the ideas and terminology that have long been used in C and other languages. As you begin to acquaint yourself with C++ strings, this fact should be reassuring. No matter which programming idiom you choose, there are three common things you want to do with a string:

  • Create or modify the sequence of characters stored in the string.
  • Detect the presence or absence of elements within the string.
  • Translate between various schemes for representing string characters.
You'll see how each of these jobs is accomplished using C++ string objects.


2.1.1. What's in a string?

In C, a string is simply an array of characters that always includes a binary zero (often called the null terminator) as its final array element. There are significant differences between C++ strings and their C progenitors. First, and most important, C++ strings hide the physical representation of the sequence of characters they contain. You don't need to be concerned about array dimensions or null terminators. A string also contains certain “housekeeping” information about the size and storage location of its data. Specifically, a C++ string object knows its starting location in memory, its content, its length in characters, and the length in characters to which it can grow before the string object must resize its internal data buffer. C++ strings thus greatly reduce the likelihood of making three of the most common and destructive C programming errors: overwriting array bounds, trying to access arrays through uninitialized or incorrectly valued pointers, and leaving pointers “dangling” after an array ceases to occupy the storage that was once allocated to it.

The exact implementation of memory layout for the string class is not defined by the C++ Standard. This architecture is intended to be flexible enough to allow differing implementations by compiler vendors, yet guarantee predictable behavior for users. In particular, the exact conditions under which storage is allocated to hold data for a string object are not defined. String allocation rules were formulated to allow but not require a reference-counted implementation, but whether or not the implementation uses reference counting, the semantics must be the same. To put this a bit differently, in C, every char array occupies a unique physical region of memory. In C++, individual string objects may or may not occupy unique physical regions of memory, but if reference counting avoids storing duplicate copies of data, the individual objects must look and act as though they exclusively own unique regions of storage. For example:
//: C03:StringStorage.h
#ifndef STRINGSTORAGE_H
#define STRINGSTORAGE_H
#include <iostream>
#include <string>
#include "../TestSuite/Test.h"
using std::cout;
using std::endl;
using std::string;

class StringStorageTest : public TestSuite::Test {
public:
  void run() {
    string s1("12345");
    // This may copy the first to the second or
    // use reference counting to simulate a copy:
    string s2 = s1;
    test_(s1 == s2);
    // Either way, this statement must ONLY modify s1:
    s1[0] = '6';
    cout << "s1 = " << s1
<< endl;  // 62345
    cout << "s2 = " << s2
<< endl;  // 12345
    test_(s1 != s2);
  }
};
#endif //
STRINGSTORAGE_H ///:~
 
//: C03:StringStorage.cpp
//{L} ../TestSuite/Test
#include "StringStorage.h"

int main() {
  StringStorageTest t;
  t.run();
  return t.report();
} ///:~
We say that an implementation that only makes unique copies when a string is modified uses a copy-on-write strategy. This approach saves time and space when strings are used only as value parameters or in other read-only situations.

Whether a library implementation uses reference counting or not should be transparent to users of the string class. Unfortunately, this is not always the case. In multithreaded programs, it is practically impossible to use a reference-counting implementation safely.(32)


2.1.2. Creating and initializing C++ strings

Creating and initializing strings is a straightforward proposition and fairly flexible. In the SmallString.cpp example below, the first string, imBlank, is declared but contains no initial value. Unlike a C char array, which would contain a random and meaningless bit pattern until initialization, imBlank does contain meaningful information. This string object is initialized to hold “no characters” and can properly report its zero length and absence of data elements using class member functions.

The next string, heyMom, is initialized by the literal argument “Where are my socks?” This form of initialization uses a quoted character array as a parameter to the string constructor. By contrast, standardReply is simply initialized with an assignment. The last string of the group, useThisOneAgain, is initialized using an existing C++ string object. Put another way, this example illustrates that string objects let you do the following:

  • Create an empty string and defer initializing it with character data.
  • Initialize a string by passing a literal, quoted character array as an argument to the constructor.
  • Initialize a string using the equal sign (=).
  • Use one string to initialize another.
//: C03:SmallString.cpp
#include <string>
using namespace std;

int main() {
  string imBlank;
  string heyMom("Where are my socks?");
  string standardReply = "Beamed into deep "
    "space on wide angle dispersion?";
  string useThisOneAgain(standardReply);
} ///:~
These are the simplest forms of string initialization, but variations offer more flexibility and control. You can do the following:

  • Use a portion of either a C char array or a C++ string.
  • Combine different sources of initialization data using operator+.
  • Use the string object's substr( ) member function to create a substring.
Here's a program that illustrates these features:
//: C03:SmallString2.cpp
#include <string>
#include <iostream>
using namespace std;

int main() {
  string s1("What is the sound of one clam
napping?");
  string s2("Anything worth doing is worth
overdoing.");
  string s3("I saw Elvis in a UFO");
  // Copy the first 8 chars:
  string s4(s1, 0, 8);
  cout << s4 << endl;
  // Copy 6 chars from the middle of the source:
  string s5(s2, 15, 6);
  cout << s5 << endl;
  // Copy from middle to end:
  string s6(s3, 6, 15);
  cout << s6 << endl;
  // Copy many different things:
  string quoteMe = s4 + "that" +
  // substr() copies 10 chars at element 20
  s1.substr(20, 10) + s5 +
  // substr() copies up to either 100 char
  // or eos starting at element 5
  "with" + s3.substr(5, 100) +
  // OK to copy a single char this way
  s1.substr(37, 1);
  cout << quoteMe << endl;
} ///:~
The string member function substr( ) takes a starting position as its first argument and the number of characters to select as the second argument. Both arguments have default values. If you say substr( ) with an empty argument list, you produce a copy of the entire string, so this is a convenient way to duplicate a string.

Here's the output from the program:
What is
doing
Elvis in a UFO
What is that one clam doing
with Elvis in a UFO?
Notice the final line of the example. C++ allows string initialization techniques to be mixed in a single statement, a flexible and convenient feature. Also notice that the last initializer copies just one character from the source string.

Another slightly more subtle initialization technique involves the use of the string iterators string::begin( ) and string::end( ). This technique treats a string like a container object (which you've seen primarily in the form of vector so far—you'll see many more containers in Chapter 7), which uses iterators to indicate the start and end of a sequence of characters. In this way you can hand a string constructor two iterators, and it copies from one to the other into the new string:
//: C03:StringIterators.cpp
#include <string>
#include <iostream>
#include <cassert>
using namespace std;

int main() {
  string source("xxx");
  string s(source.begin(), source.end());
  assert(s == source);
} ///:~
The iterators are not restricted to begin( ) and end( ); you can increment, decrement, and add integer offsets to them, allowing you to extract a subset of characters from the source string.

C++ strings may not be initialized with single characters or with ASCII or other integer values. You can initialize a string with a number of copies of a single character, however:
//: C03:UhOh.cpp
#include <string>
#include <cassert>
using namespace std;

int main() {
  // Error: no single char inits
  //! string nothingDoing1('a');
  // Error: no integer inits
  //! string nothingDoing2(0x37);
  // The following is legal:
  string okay(5, 'a');
  assert(okay == string("aaaaa"));
} ///:~
The first argument indicates the number of copies of the second argument to place in the string. The second argument can only be a single char, not a char array.


2.1.3. Operating on strings

If you've programmed in C, you are accustomed to the family of functions that write, search, modify, and copy char arrays. There are two unfortunate aspects of the Standard C library functions for handling char arrays. First, there are two loosely organized families of them: the “plain” group, and the ones that require you to supply a count of the number of characters to be considered in the operation at hand. The roster of functions in the C char array library shocks the unsuspecting user with a long list of cryptic, mostly unpronounceable names. Although the type and number of arguments to the functions are somewhat consistent, to use them properly you must be attentive to details of function naming and parameter passing.

The second inherent trap of the standard C char array tools is that they all rely explicitly on the assumption that the character array includes a null terminator. If by oversight or error the null is omitted or overwritten, there's little to keep the C char array functions from manipulating the memory beyond the limits of the allocated space, sometimes with disastrous results.

C++ provides a vast improvement in the convenience and safety of string objects. For purposes of actual string handling operations, there are about the same number of distinct member function names in the string class as there are functions in the C library, but because of overloading the functionality is much greater. Coupled with sensible naming practices and the judicious use of default arguments, these features combine to make the string class much easier to use than the C library char array functions.


2.1.3.1. Appending, inserting, and concatenating strings

One of the most valuable and convenient aspects of C++ strings is that they grow as needed, without intervention on the part of the programmer. Not only does this make string-handling code inherently more trustworthy, it also almost entirely eliminates a tedious “housekeeping” chore—keeping track of the bounds of the storage where your strings live. For example, if you create a string object and initialize it with a string of 50 copies of ‘X', and later store in it 50 copies of “Zowie”, the object itself will reallocate sufficient storage to accommodate the growth of the data. Perhaps nowhere is this property more appreciated than when the strings manipulated in your code change size and you don't know how big the change is. The string member functions append( ) and insert( ) transparently reallocate storage when a string grows:
//: C03:StrSize.cpp
#include <string>
#include <iostream>
using namespace std;

int main() {
  string bigNews("I saw Elvis in a UFO. ");
  cout << bigNews << endl;
  // How much data have we actually got?
  cout << "Size = " <<
bigNews.size() << endl;
  // How much can we store without reallocating?
  cout << "Capacity = " <<
bigNews.capacity() << endl;
  // Insert this string in bigNews immediately
  // before bigNews[1]:
  bigNews.insert(1, " thought I");
  cout << bigNews << endl;
  cout << "Size = " <<
bigNews.size() << endl;
  cout << "Capacity = " <<
bigNews.capacity() << endl;
  // Make sure that there will be this much space
  bigNews.reserve(500);
  // Add this to the end of the string:
  bigNews.append("I've been working too
hard.");
  cout << bigNews << endl;
  cout << "Size = " <<
bigNews.size() << endl;
  cout << "Capacity = " <<
bigNews.capacity() << endl;
} ///:~
Here is the output from one particular compiler:
I saw Elvis in a UFO.
Size = 22
Capacity = 31
I thought I saw Elvis in a UFO.
Size = 32
Capacity = 47
I thought I saw Elvis in a UFO. I've been
working too hard.
Size = 59
Capacity = 511
This example demonstrates that even though you can safely relinquish much of the responsibility for allocating and managing the memory your strings occupy, C++ strings provide you with several tools to monitor and manage their size. Notice the ease with which we changed the size of the storage allocated to the string. The size( ) function returns the number of characters currently stored in the string and is identical to the length( ) member function. The capacity( ) functionreturns the size of the current underlying allocation, meaning the number of characters the string can hold without requesting more storage. The reserve( ) function is an optimization mechanism that indicates your intention to specify a certain amount of storage for future use; capacity( ) always returns a value at least as large as the most recent call to reserve( ). A resize( ) function appends spaces if the new size is greater than the current string size or truncates the string otherwise. (An overload of resize( ) can specify a different character to append.)

The exact fashion that the string member functions allocate space for your data depends on the implementation of the library. When we tested one implementation with the previous example, it appeared that reallocations occurred on even word (that is, full-integer) boundaries, with one byte held back. The architects of the string class have endeavored to make it possible to mix the use of C char arrays and C++ string objects, so it is likely that figures reported by StrSize.cpp for capacity reflect that, in this particular implementation, a byte is set aside to easily accommodate the insertion of a null terminator.


2.1.3.2. Replacing string characters

The insert( ) functionis particularly nice because it absolves you from making sure the insertion of characters in a string won't overrun the storage space or overwrite the characters immediately following the insertion point. Space grows, and existing characters politely move over to accommodate the new elements. Sometimes this might not be what you want. If you want the size of the string to remain unchanged, use the replace( ) function to overwrite characters. There are a number of overloaded versions of replace( ), but the simplest one takes three arguments: an integer indicating where to start in the string, an integer indicating how many characters to eliminate from the original string, and the replacement string (which can be a different number of characters than the eliminated quantity). Here's a simple example:
//: C03:StringReplace.cpp
// Simple find-and-replace in strings.
#include <cassert>
#include <string>
using namespace std;

int main() {
  string s("A piece of text");
  string tag("$tag$");
  s.insert(8, tag + ' ');
  assert(s == "A piece $tag$
of text");
  int start = s.find(tag);
  assert(start == 8);
  assert(tag.size() == 5);
  s.replace(start, tag.size(), "hello
there");
  assert(s == "A piece hello there of text");
} ///:~
The tag is first inserted into s (notice that the insert happens before the value indicating the insert point and that an extra space was added after tag), and then it is found and replaced.

You should check to see if you've found anything before you perform a replace( ).The previous example replaces with a char*, but there's an overloaded version that replaces with a string.Here's a more complete demonstration replace( ):
//: C03:Replace.cpp
#include <cassert>
#include <cstddef>  // For size_t
#include <string>
using namespace std;

void replaceChars(string& modifyMe,
  const string& findMe, const string& newChars)
{
  // Look in modifyMe for the "find string"
  // starting at position 0:
  size_t i = modifyMe.find(findMe, 0);
  // Did we find the string to replace?
  if(i != string::npos)
    // Replace the find string with newChars:
    modifyMe.replace(i, findMe.size(), newChars);
}

int main() {
  string bigNews = "I thought I saw Elvis in a
UFO. "
                   "I have been working too
hard.";
  string replacement("wig");
  string findMe("UFO");
  // Find "UFO" in bigNews and overwrite it:
  replaceChars(bigNews, findMe, replacement);
  assert(bigNews == "I thought I saw Elvis in a
"
         "wig. I have been working too
hard.");
} ///:~
If replace doesn't find the search string, it returns string::npos. The npos data member is a static constant member of the string class that represents a nonexistent character position.(33)

Unlike insert( ), replace( ) won't grow the string's storage space if you copy new characters into the middle of an existing series of array elements. However, it will grow the storage space if needed, for example, when you make a “replacement” that would expand the original string beyond the end of the current allocation. Here's an example:
//: C03:ReplaceAndGrow.cpp
#include <cassert>
#include <string>
using namespace std;

int main() {
  string bigNews("I have been working the
grave.");
  string replacement("yard shift.");
  // The first argument says replace chars
  // beyond the end of the existing string:
  bigNews.replace(bigNews.size() - 1,
    replacement.size(), replacement);
  assert(bigNews == "I have been working the"
         "graveyard shift.");
} ///:~
The call to replace( ) begins “replacing” beyond the end of the existing array, which is equivalent to an append operation. Notice that in this example replace( ) expands the array accordingly.

You may have been hunting through this chapter trying to do something relatively simple such as replace all the instances of one character with a different character. Upon finding the previous material on replacing, you thought you found the answer, but then you started seeing groups of characters and counts and other things that looked a bit too complex. Doesn't string have a way to just replace one character with another everywhere?

You can easily write such a function using the find( ) and replace( ) member functions as follows:
//: C03:ReplaceAll.h
#ifndef REPLACEALL_H
#define REPLACEALL_H
#include <string>

std::string& replaceAll(std::string& context,
  const std::string& from, const std::string&
to);
#endif // REPLACEALL_H ///:~
//: C03:ReplaceAll.cpp {O}
#include <cstddef>
#include "ReplaceAll.h"
using namespace std;

string& replaceAll(string& context, const
string& from,
  const string& to) {
  size_t lookHere = 0;
  size_t foundHere;
  while((foundHere = context.find(from, lookHere))
    != string::npos) {
    context.replace(foundHere, from.size(), to);
    lookHere = foundHere + to.size();
  }
  return context;
} ///:~
The version of find( ) used here takes as a second argument the position to start looking in and returns string::npos if it doesn't find it. It is important to advance the position held in the variable lookHere past the replacement string, in case from is a substring of to. The following program tests the replaceAll function:
//: C03:ReplaceAllTest.cpp
//{L} ReplaceAll
#include <cassert>
#include <iostream>
#include <string>
#include "ReplaceAll.h"
using namespace std;

int main() {
  string text = "a man, a plan, a canal, Panama";
  replaceAll(text, "an", "XXX");
  assert(text == "a mXXX, a plXXX, a cXXXal, PXXXama");
} ///:~
As you can see, the string class by itself doesn't solve all possible problems. Many solutions have been left to the algorithms in the Standard library(34) because the string class can look just like an STL sequence (by virtue of the iterators discussed earlier). All the generic algorithms work on a “range” of elements within a container. Usually that range is just “from the beginning of the container to the end.” A string object looks like a container of characters: to get the beginning of the range you use string::begin( ), and to get the end of the range you use string::end( ). The following example shows the use of the replace( ) algorithm to replace all the instances of the single character ‘X' with ‘Y':
//: C03:StringCharReplace.cpp
#include <algorithm>
#include <cassert>
#include <string>
using namespace std;

int main() {
  string s("aaaXaaaXXaaXXXaXXXXaaa");
  replace(s.begin(), s.end(), 'X', 'Y');
  assert(s == "aaaYaaaYYaaYYYaYYYYaaa");
} ///:~
Notice that this replace( ) is not called as a member function of string. Also, unlike the string::replace( ) functions that only perform one replacement, the replace( ) algorithm replaces all instances of one character with another.

The replace( ) algorithm only works with single objects (in this case, char objects) and will not replace quoted char arrays or string objects. Since a string behaves like an STL sequence, a number of other algorithms can be applied to it, which might solve other problems that are not directly addressed by the string member functions.


2.1.3.3. Concatenation using nonmember overloaded operators

One of the most delightful discoveries awaiting a C programmer learning about C++ string handling is how simply strings can be combined and appended using operator+ and operator+=.These operators make combining strings syntactically similar to adding numeric data:
//: C03:AddStrings.cpp
#include <string>
#include <cassert>
using namespace std;

int main() {
  string s1("This ");
  string s2("That ");
  string s3("The other ");
  // operator+ concatenates strings
  s1 = s1 + s2;
  assert(s1 == "This That ");
  // Another way to concatenates strings
  s1 += s3;
  assert(s1 == "This That The other ");
  // You can index the string on the right
  s1 += s3 + s3[4] + "ooh lala";
  assert(s1 == "This That The other The other oooh
lala");
} ///:~
Using the operator+ and operator+= operatorsis a flexible andconvenient way to combine string data. On the right side of the statement, you can use almost any type that evaluates to a group of one or more characters.


2.1.4. Searching in strings

The find family of string member functions locates a character or group of characters within a given string. Here are the members of the find family and their general usage :

string find member function What/how it finds
 find( ) Searches a string for a specified character or group of characters and returns the starting position of the first occurrence found or npos if no match is found.
 find_first_of( ) Searches a target string and returns the position of the first match of any character in a specified group. If no match is found, it returns npos.
 find_last_of( ) Searches a target string and returns the position of the last match of any character in a specified group. If no match is found, it returns npos.
 find_first_not_of( ) Searches a target string and returns the position of the first element that doesn't match any character in a specified group. If no such element is found, it returns npos.
 find_last_not_of( ) Searches a target string and returns the position of the element with the largest subscript that doesn't match any character in a specified group. If no such element is found, it returns npos.
 rfind( ) Searches a string from end to beginning for a specified character or group of characters and returns the starting position of the match if one is found. If no match is found, it returns npos.
The simplest use of find( ) searches for one or more characters in a string. This overloaded version of find( ) takes a parameter that specifies the character(s) for which to search and optionally a parameter that tells it where in the string to begin searching for the occurrence of a substring. (The default position at which to begin searching is 0.) By setting the call to find inside a loop, you can easily move through a string, repeating a search to find all the occurrences of a given character or group of characters within the string.

The following program uses the method of The Sieve of Eratosthenes to find prime numbers less than 50. This method starts with the number 2, marks all subsequent multiples of 2 as not prime, and repeats the process for the next prime candidate. The SieveTest constructor initializes sieveChars by setting the initial size of the character array and writing the value ‘P' to each of its members.
//: C03:Sieve.h
#ifndef SIEVE_H
#define SIEVE_H
#include <cmath>
#include <cstddef>
#include <string>
#include "../TestSuite/Test.h"
using std::size_t;
using std::sqrt;
using std::string;

class SieveTest : public TestSuite::Test {
  string sieveChars;
public:
  // Create a 50 char string and set each
  // element to 'P' for Prime:
  SieveTest() : sieveChars(50, 'P') {}
  void run() {
    findPrimes();
    testPrimes();
  }
  bool isPrime(int p) {
    if(p == 0 || p == 1) return false;
    int root = int(sqrt(double(p)));
    for(int i = 2; i <= root; ++i)
      if(p % i == 0) return false;
    return true;
  }
  void findPrimes() {
    // By definition neither 0 nor 1 is prime.
    // Change these elements to "N" for Not
Prime:
    sieveChars.replace(0, 2, "NN");
    // Walk through the array:
    size_t sieveSize = sieveChars.size();
    int root = int(sqrt(double(sieveSize)));
    for(int i = 2; i <= root; ++i)
      // Find all the multiples:
      for(size_t factor = 2; factor * i < sieveSize;
           ++factor)
        sieveChars[factor * i] = 'N';
  }
  void testPrimes() {
    size_t i = sieveChars.find('P');
    while(i != string::npos) {
      test_(isPrime(i++));
      i = sieveChars.find('P', i);
    }
    i = sieveChars.find_first_not_of('P');
    while(i != string::npos) {
      test_(!isPrime(i++));
      i = sieveChars.find_first_not_of('P', i);
    }
  }
};
#endif // SIEVE_H ///:~
//: C03:Sieve.cpp
//{L} ../TestSuite/Test
#include "Sieve.h"

int main() {
  SieveTest t;
  t.run();
  return t.report();
} ///:~
The find( ) function can walk forward through a string, detecting multiple occurrences of a character or a group of characters, and find_first_not_of( ) finds other characters or substrings.

There are no functions in the string class to change the case of a string, but you can easily create these functions using the Standard C library functions toupper( ) and tolower( ), which change the case of one character at a time. The following example illustrates a case-insensitive search:
//: C03:Find.h
#ifndef FIND_H
#define FIND_H
#include <cctype>
#include <cstddef>
#include <string>
#include "../TestSuite/Test.h"
using std::size_t;
using std::string;
using std::tolower;
using std::toupper;

// Make an uppercase copy of s
inline string upperCase(const string& s) {
  string upper(s);
  for(size_t i = 0; i < s.length(); ++i)
    upper[i] = toupper(upper[i]);
  return upper;
}

// Make a lowercase copy of s
inline string lowerCase(const string& s) {
  string lower(s);
  for(size_t i = 0; i < s.length(); ++i)
    lower[i] = tolower(lower[i]);
  return lower;
}

class FindTest : public TestSuite::Test {
  string chooseOne;
public:
  FindTest() : chooseOne("Eenie, Meenie, Miney,
Mo") {}
  void testUpper() {
    string upper = upperCase(chooseOne);
    const string LOWER =
"abcdefghijklmnopqrstuvwxyz";
    test_(upper.find_first_of(LOWER) == string::npos);
  }
  void testLower() {
    string lower = lowerCase(chooseOne);
    const string UPPER =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    test_(lower.find_first_of(UPPER) == string::npos);
  }
  void testSearch() {
    // Case sensitive search
    size_t i = chooseOne.find("een");
    test_(i == 8);
    // Search lowercase:
    string test = lowerCase(chooseOne);
    i = test.find("een");
    test_(i == 0);
    i = test.find("een", ++i);
    test_(i == 8);
    i = test.find("een", ++i);
    test_(i == string::npos);
    // Search uppercase:
    test = upperCase(chooseOne);
    i = test.find("EEN");
    test_(i == 0);
    i = test.find("EEN", ++i);
    test_(i == 8);
    i = test.find("EEN", ++i);
    test_(i == string::npos);
  }
  void run() {
    testUpper();
    testLower();
    testSearch();
  }
};
#endif // FIND_H ///:~
//: C03:Find.cpp
//{L} ../TestSuite/Test
#include "Find.h"
#include "../TestSuite/Test.h"

int main() {
  FindTest t;
  t.run();
  return t.report();
} ///:~
Both the upperCase( ) and lowerCase( ) functions follow the same form: they make a copy of the argument string and change the case. The Find.cpp program isn't the best solution to the case-sensitivity problem, so we'll revisit it when we examine string comparisons.


2.1.4.1. Finding in reverse

If you need to search through a string from end to beginning (to find the data in “last in / first out” order), you can use the string member function rfind( ):
//: C03:Rparse.h
#ifndef RPARSE_H
#define RPARSE_H
#include <cstddef>
#include <string>
#include <vector>
#include "../TestSuite/Test.h"
using std::size_t;
using std::string;
using std::vector;

class RparseTest : public TestSuite::Test {
  // To store the words:
  vector<string> strings;
public:
  void parseForData() {
    // The ';' characters will be delimiters
    string
s("now.;sense;make;to;going;is;This");
    // The last element of the string:
    int last = s.size();
    // The beginning of the current word:
    size_t current = s.rfind(';');
    // Walk backward through the string:
    while(current != string::npos) {
      // Push each word into the vector.
      // Current is incremented before copying
      // to avoid copying the delimiter:
      ++current;
      strings.push_back(s.substr(current, last - current));
      // Back over the delimiter we just found,
      // and set last to the end of the next word:
      current -= 2;
      last = current + 1;
      // Find the next delimiter:
      current = s.rfind(';', current);
    }
    // Pick up the first word -- it's not
    // preceded by a delimiter:
    strings.push_back(s.substr(0, last));
  }
  void testData() {
    // Test them in the new order:
    test_(strings[0] == "This");
    test_(strings[1] == "is");
    test_(strings[2] == "going");
    test_(strings[3] == "to");
    test_(strings[4] == "make");
    test_(strings[5] == "sense");
    test_(strings[6] == "now.");
    string sentence;
    for(size_t i = 0; i < strings.size() - 1; i++)
      sentence += strings[i] += " ";
    // Manually put last word in to avoid an extra
space:
    sentence += strings[strings.size() - 1];
    test_(sentence == "This is going to make sense
now.");
  }
  void run() {
    parseForData();
    testData();
  }
};
#endif // RPARSE_H ///:~
//: C03:Rparse.cpp
//{L} ../TestSuite/Test
#include "Rparse.h"

int main() {
  RparseTest t;
  t.run();
  return t.report();
} ///:~
The string member function rfind( ) backs through the string looking for tokens and reports the array index of matching characters or string::npos if it is unsuccessful.


2.1.4.2. Finding first/last of a set of characters

The find_first_of( ) and find_last_of( ) member functions can be conveniently put to work to create a little utility that will strip whitespace characters from both ends of a string. Notice that it doesn't touch the original string, but instead returns a new string:
//: C03:Trim.h
// General tool to strip spaces from both ends.
#ifndef TRIM_H
#define TRIM_H
#include <string>
#include <cstddef>

inline std::string trim(const std::string& s) {
  if(s.length() == 0)
    return s;
  std::size_t beg = s.find_first_not_of("
\a\b\f\n\r\t\v");
  std::size_t end = s.find_last_not_of("
\a\b\f\n\r\t\v");
  if(beg == std::string::npos) // No non-spaces
    return "";
  return std::string(s, beg, end - beg + 1);
}
#endif // TRIM_H ///:~
The first test checks for an empty string; in that case, no tests are made, and a copy is returned. Notice that once the end points are found, the string constructor builds a new string from the old one, giving the starting count and the length.

Testing such a general-purpose tool needs to be thorough:
//: C03:TrimTest.h
#ifndef TRIMTEST_H
#define TRIMTEST_H
#include "Trim.h"
#include "../TestSuite/Test.h"

class TrimTest : public TestSuite::Test {
  enum {NTESTS = 11};
  static std::string s[NTESTS];
public:
  void testTrim() {
    test_(trim(s[0]) == "abcdefghijklmnop");
    test_(trim(s[1]) == "abcdefghijklmnop");
    test_(trim(s[2]) == "abcdefghijklmnop");
    test_(trim(s[3]) == "a");
    test_(trim(s[4]) == "ab");
    test_(trim(s[5]) == "abc");
    test_(trim(s[6]) == "a b c");
    test_(trim(s[7]) == "a b c");
    test_(trim(s[8]) == "a \t b \t c");
    test_(trim(s[9]) == "");
    test_(trim(s[10]) == "");
  }
  void run() {
    testTrim();
  }
};
#endif // TRIMTEST_H ///:~
//: C03:TrimTest.cpp {O}
#include "TrimTest.h"

// Initialize static data
std::string TrimTest::s[TrimTest::NTESTS] = {
  " \t abcdefghijklmnop \t ",
  "abcdefghijklmnop \t ",
  " \t abcdefghijklmnop",
  "a", "ab", "abc",
"a b c",
  " \t a b c \t ", " \t a \t b \t c \t
",
  "\t \n \r \v \f",
  "" // Must also test the empty string
}; ///:~
//: C03:TrimTestMain.cpp
//{L} ../TestSuite/Test TrimTest
#include "TrimTest.h"

int main() {
  TrimTest t;
  t.run();
  return t.report();
} ///:~
In the array of strings, you can see that the character arrays are automatically converted to string objects. This array provides cases to check the removal of spaces and tabs from both ends, as well as ensuring that spaces and tabs are not removed from the middle of a string.


2.1.4.3. Removing characters from strings

Removing characters is easy and efficient with the erase( ) member function, which takes two arguments: where to start removing characters (which defaults to 0), and how many to remove (which defaults to string::npos). If you specify more characters than remain in the string, the remaining characters are all erased anyway (so calling erase( ) without any arguments removes all characters from a string). Sometimes it's useful to take an HTML file and strip its tags and special characters so that you have something approximating the text that would be displayed in the Web browser, only as a plain text file. The following example uses erase( ) to do the job:
//: C03:HTMLStripper.cpp {RunByHand}
//{L} ReplaceAll
// Filter to remove html tags and markers.
#include <cassert>
#include <cmath>
#include <cstddef>
#include <fstream>
#include <iostream>
#include <string>
#include "ReplaceAll.h"
#include "../require.h"
using namespace std;

string& stripHTMLTags(string& s) {
  static bool inTag = false;
  bool done = false;
  while(!done) {
    if(inTag) {
      // The previous line started an HTML tag
      // but didn't finish. Must search for '>'.
      size_t rightPos = s.find('>');
      if(rightPos != string::npos) {
        inTag = false;
        s.erase(0, rightPos + 1);
      }
      else {
        done = true;
        s.erase();
      }
    }
    else {
      // Look for start of tag:
      size_t leftPos = s.find('<');
      if(leftPos != string::npos) {
        // See if tag close is in this line:
        size_t rightPos = s.find('>');
        if(rightPos == string::npos) {
          inTag = done = true;
          s.erase(leftPos);
        }
        else
          s.erase(leftPos, rightPos - leftPos + 1);
      }
      else
        done = true;
    }
  }
  // Remove all special HTML characters
  replaceAll(s, "&lt;",
"<");
  replaceAll(s, "&gt;",
">");
  replaceAll(s, "&amp;",
"&");
  replaceAll(s, "&nbsp;", " ");
  // Etc...
  return s;
}

int main(int argc, char* argv[]) {
  requireArgs(argc, 1,
    "usage: HTMLStripper InputFile");
  ifstream in(argv[1]);
  assure(in, argv[1]);
  string s;
  while(getline(in, s))
    if(!stripHTMLTags(s).empty())
      cout << s << endl;
} ///:~
This example will even strip HTML tags that span multiple lines.(35) This is accomplished with the static flag, inTag, which is true whenever the start of a tag is found, but the accompanying tag end is not found in the same line. All forms of erase( ) appear in the stripHTMLFlags( ) function.(36) The version of getline( ) we use here is a (global) function declared in the <string> header and is handy because it stores an arbitrarily long line in its string argument. You don't need to worry about the dimension of a character array as you do with istream::getline( ). Notice that this program uses the replaceAll( ) function from earlier in this chapter. In the next chapter, we'll use string streams to create a more elegant solution.


2.1.4.4. Comparing strings

Comparing strings is inherently different from comparing numbers. Numbers have constant, universally meaningful values. To evaluate the relationship between the magnitudes of two strings, you must make a lexical comparison. Lexical comparison means that when you test a character to see if it is “greater than” or “less than” another character, you are actually comparing the numeric representation of those characters as specified in the collating sequence of the character set being used. Most often this will be the ASCII collating sequence, which assigns the printable characters for the English language numbers in the range 32 through 127 decimal. In the ASCII collating sequence, the first “character” in the list is the space, followed by several common punctuation marks, and then uppercase and lowercase letters. With respect to the alphabet, this means that the letters nearer the front have lower ASCII values than those nearer the end. With these details in mind, it becomes easier to remember that when a lexical comparison that reports s1 is “greater than” s2, it simply means that when the two were compared, the first differing character in s1 came later in the alphabet than the character in that same position in s2.

C++ provides several ways to compare strings, and each has advantages. The simplest to use are the nonmember, overloaded operator functions: operator ==, operator != operator >, operator <, operator >=,and operator <=.
//: C03:CompStr.h
#ifndef COMPSTR_H
#define COMPSTR_H
#include <string>
#include "../TestSuite/Test.h"
using std::string;

class CompStrTest : public TestSuite::Test {
public:
  void run() {
    // Strings to compare
    string s1("This");
    string s2("That");
    test_(s1 == s1);
    test_(s1 != s2);
    test_(s1 > s2);
    test_(s1 >= s2);
    test_(s1 >= s1);
    test_(s2 < s1);
    test_(s2 <= s1);
    test_(s1 <= s1);
  }
};
#endif // COMPSTR_H ///:~
//: C03:CompStr.cpp
//{L} ../TestSuite/Test
#include "CompStr.h"

int main() {
  CompStrTest t;
  t.run();
  return t.report();
} ///:~
The overloaded comparison operators are useful for comparing both full strings and individual string character elements.

Notice in the following example the flexibility of argument types on both the left and right side of the comparison operators. For efficiency, the string class provides overloaded operators for the direct comparison of string objects, quoted literals, and pointers to C-style strings without having to create temporary string objects.
//: C03:Equivalence.cpp
#include <iostream>
#include <string>
using namespace std;

int main() {
  string s2("That"), s1("This");
  // The lvalue is a quoted literal
  // and the rvalue is a string:
  if("That" == s2)
    cout << "A match" << endl;
  // The left operand is a string and the right is
  // a pointer to a C-style null terminated string:
  if(s1 != s2.c_str())
    cout << "No match" << endl;
} ///:~
The c_str( ) function returns a const char* that points to a C-style, null-terminated string equivalent to the contents of the string object. This comes in handy when you want to pass a string to a standard C function, such as atoi( ) or any of the functions defined in the <cstring> header. It is an error to use the value returned by c_str( ) as non-const argument to any function.

You won't find the logical not (!) or the logical comparison operators (&& and ||) among operators for a string. (Neither will you find overloaded versions of the bitwise C operators &, |, ^, or ~.) The overloaded nonmember comparison operators for the string class are limited to the subset that has clear, unambiguous application to single characters or groups of characters.

The compare( ) member function offers you a great deal more sophisticated and precise comparison than the nonmember operator set. It provides overloaded versions to compare:

  • Two complete strings.
  • Part of either string to a complete string.
  • Subsets of two strings.
The following example compares complete strings:
//: C03:Compare.cpp
// Demonstrates compare() and swap().
#include <cassert>
#include <string>
using namespace std;

int main() {
  string first("This");
  string second("That");
  assert(first.compare(first) == 0);
  assert(second.compare(second) == 0);
  // Which is lexically greater?
  assert(first.compare(second) > 0);
  assert(second.compare(first) < 0);
  first.swap(second);
  assert(first.compare(second) < 0);
  assert(second.compare(first) > 0);
} ///:~
The swap( ) function in this example does what its name implies: it exchanges the contents of its object and argument. To compare a subset of the characters in one or both strings, you add arguments that define where to start the comparison and how many characters to consider. For example, we can use the following overloaded version of compare( ):

s1.compare(s1StartPos, s1NumberChars, s2, s2StartPos,                          s2NumberChars);

Here's an example:
//: C03:Compare2.cpp
// Illustrate overloaded compare().
#include <cassert>
#include <string>
using namespace std;

int main() {
  string first("This is a day that will live in
infamy");
  string second("I don't believe that this is what
"
                "I signed up for");
  // Compare "his is" in both strings:
  assert(first.compare(1, 7, second, 22, 7) == 0);
  // Compare "his is a" to "his is w":
  assert(first.compare(1, 9, second, 22, 9) < 0);
} ///:~
In the examples so far, we have used C-style array indexing syntax to refer to an individual character in a string. C++ strings provide an alternative to the s[n] notation: the at( ) member. These two indexing mechanisms produce the same result in C++ if all goes well:
//: C03:StringIndexing.cpp
#include <cassert>
#include <string>
using namespace std;

int main() {
  string s("1234");
  assert(s[1] == '2');
  assert(s.at(1) == '2');
} ///:~
There is one important difference, however, between [ ] and at( ). When you try to reference an array element that is out of bounds, at( ) will do you the kindness of throwing an exception, while ordinary [ ] subscripting syntax will leave you to your own devices:
//: C03:BadStringIndexing.cpp
#include <exception>
#include <iostream>
#include <string>
using namespace std;

int main() {
  string s("1234");
  // at() saves you by throwing an exception:
  try {
    s.at(5);
  } catch(exception& e) {
    cerr << e.what() << endl;
  }
} ///:~
Responsible programmers will not use errant indexes, but should you want to benefits of automatic index checking, using at( ) in place of [ ] will give you a chance to gracefully recover from references to array elements that don't exist. Execution of this program on one of our test compilers gave the following output:
invalid string position
The at( ) member throws an object of class out_of_range, which derives (ultimately) from std::exception. By catching this object in an exception handler, you can take appropriate remedial actions such as recalculating the offending subscript or growing the array. Using string::operator[ ]( ) gives no such protection and is as dangerous as char array processing in C.(37)


2.1.4.5. Strings and character traits

The program Find.cpp earlier in this chapter leads us to ask the obvious question: Why isn't case-insensitive comparison part of the standard string class? The answer provides interesting background on the true nature of C++ string objects.

Consider what it means for a character to have “case.” Written Hebrew, Farsi, and Kanji don't use the concept of upper- and lowercase, so for those languages this idea has no meaning. It would seem that if there were a way to designate some languages as “all uppercase” or “all lowercase,” we could design a generalized solution. However, some languages that employ the concept of “case” also change the meaning of particular characters with diacritical marks, for example: the cedilla in Spanish, the circumflex in French, and the umlaut in German. For this reason, any case-sensitive collating scheme that attempts to be comprehensive will be nightmarishly complex to use.

Although we usually treat the C++ string as a class, this is really not the case. The string type is a specialization of a more general constituent, the basic_string< > template. Observe how string is declared in the Standard C++ header file:(38)
typedef basic_string<char> string;
To understand the nature of the string class, look at the basic_string< > template:
template<class charT, class traits =
char_traits<charT>,
  class allocator =
allocator<charT> > class basic_string;
In Chapter 5, we examine templates in great detail (much more than in Chapter 16 of Volume 1). For now, just notice that the string type is created when the basic_string template is instantiated with char. Inside the basic_string< > template declaration, the line:
class traits = char_traits<charT>,
tells us that the behavior of the class made from the basic_string< > template is specified by a class based on the template char_traits< >. Thus, the basic_string< > template produces string-oriented classes that manipulate types other than char (wide characters, for example). To do this, the char_traits< > template controls the content and collating behaviors of a variety of character sets using the character comparison functions eq( ) (equal), ne( ) (not equal), and lt( ) (less than). The basic_string< > string comparison functions rely on these.

This is why the string class doesn't include case-insensitive member functions: that's not in its job description. To change the way the string class treats character comparison, you must supply a different char_traits< > template because that defines the behavior of the individual character comparison member functions.

You can use this information to make a new type of string class that ignores case. First, we'll define a new case-insensitive char_traits< > template that inherits from the existing template. Next, we'll override only the members we need to change to make character-by-character comparison case insensitive. (In addition to the three lexical character comparison members mentioned earlier, we'll also supply a new implementation for the char_traits functions find( ) and compare( )) . Finally, we'll typedef a new class based on basic_string, but using the case-insensitive ichar_traits template for its second argument:
//: C03:ichar_traits.h
// Creating your own character traits.
#ifndef ICHAR_TRAITS_H
#define ICHAR_TRAITS_H
#include <cassert>
#include <cctype>
#include <cmath>
#include <cstddef>
#include <ostream>
#include <string>
using std::allocator;
using std::basic_string;
using std::char_traits;
using std::ostream;
using std::size_t;
using std::string;
using std::toupper;
using std::tolower;

struct ichar_traits : char_traits<char> {
  // We'll only change character-by-
  // character comparison functions
  static bool eq(char c1st, char c2nd) {
    return toupper(c1st) == toupper(c2nd);
  }
  static bool ne(char c1st, char c2nd) {
    return !eq(c1st, c2nd);
  }
  static bool lt(char c1st, char c2nd) {
    return toupper(c1st) < toupper(c2nd);
  }
  static int
  compare(const char* str1, const char* str2, size_t n)
{
    for(size_t i = 0; i < n; ++i) {
      if(str1 == 0)
        return -1;
      else if(str2 == 0)
        return 1;
      else if(tolower(*str1) < tolower(*str2))
        return -1;
      else if(tolower(*str1) > tolower(*str2))
        return 1;
      assert(tolower(*str1) == tolower(*str2));
      ++str1; ++str2; // Compare the other chars
    }
    return 0;
  }
  static const char*
  find(const char* s1, size_t n, char c) {
    while(n-- > 0)
      if(toupper(*s1) == toupper(c))
        return s1;
      else
        ++s1;
    return 0;
  }
};

typedef basic_string<char, ichar_traits> istring;

inline ostream& operator<<(ostream& os,
const istring& s) {
  return os << string(s.c_str(), s.length());
}
#endif // ICHAR_TRAITS_H ///:~
We provide a typedef named istring so that our class will act like an ordinary string in every way, except that it will make all comparisons without respect to case. For convenience, we've also provided an overloaded operator<<( ) so that you can print istrings. Here's an example:
//: C03:ICompare.cpp
#include <cassert>
#include <iostream>
#include "ichar_traits.h"
using namespace std;

int main() {
  // The same letters except for case:
  istring first = "tHis";
  istring second = "ThIS";
  cout << first << endl;
  cout << second << endl;
  assert(first.compare(second) == 0);
  assert(first.find('h') == 1);
  assert(first.find('I') == 2);
  assert(first.find('x') == string::npos);
} ///:~
This is just a toy example. To make istring fully equivalent to string, we'd have to create the other functions necessary to support the new istring type.

The <string> header provides a wide string class via the following typedef:
typedef basic_string<wchar_t> wstring;
Wide string support also reveals itself in wide streams (wostream in place of ostream, also defined in <iostream>) and in the header <cwctype>, a wide-character version of <cctype>. This along with the wchar_t specialization of char_traits in the standard library allows us to do a wide-character version of ichar_traits:
//: C03:iwchar_traits.h {-g++}
// Creating your own wide-character traits.
#ifndef IWCHAR_TRAITS_H
#define IWCHAR_TRAITS_H
#include <cassert>
#include <cmath>
#include <cstddef>
#include <cwctype>
#include <ostream>
#include <string>

using std::allocator;
using std::basic_string;
using std::char_traits;
using std::size_t;
using std::towlower;
using std::towupper;
using std::wostream;
using std::wstring;

struct iwchar_traits : char_traits<wchar_t> {
  // We'll only change character-by-
  // character comparison functions
  static bool eq(wchar_t c1st, wchar_t c2nd) {
    return towupper(c1st) == towupper(c2nd);
  }
  static bool ne(wchar_t c1st, wchar_t c2nd) {
    return towupper(c1st) != towupper(c2nd);
  }
  static bool lt(wchar_t c1st, wchar_t c2nd) {
    return towupper(c1st) < towupper(c2nd);
  }
  static int compare(
    const wchar_t* str1, const wchar_t* str2, size_t n)
{
    for(size_t i = 0; i < n; i++) {
      if(str1 == 0)
        return -1;
      else if(str2 == 0)
        return 1;
      else if(towlower(*str1) < towlower(*str2))
        return -1;
      else if(towlower(*str1) > towlower(*str2))
        return 1;
      assert(towlower(*str1) == towlower(*str2));
      ++str1; ++str2; // Compare the other wchar_ts
    }
    return 0;
  }
  static const wchar_t*
  find(const wchar_t* s1, size_t n, wchar_t c) {
    while(n-- > 0)
      if(towupper(*s1) == towupper(c))
        return s1;
      else
        ++s1;
    return 0;
  }
};

typedef basic_string<wchar_t, iwchar_traits>
iwstring;

inline wostream& operator<<(wostream& os,
  const iwstring& s) {
  return os << wstring(s.c_str(), s.length());
}
#endif // IWCHAR_TRAITS_H  ///:~
As you can see, this is mostly an exercise in placing a ‘w' in the appropriate place in the source code. The test program looks like this:
//: C03:IWCompare.cpp {-g++}
#include <cassert>
#include <iostream>
#include "iwchar_traits.h"
using namespace std;

int main() {
  // The same letters except for case:
  iwstring wfirst = L"tHis";
  iwstring wsecond = L"ThIS";
  wcout << wfirst << endl;
  wcout << wsecond << endl;
  assert(wfirst.compare(wsecond) == 0);
  assert(wfirst.find('h') == 1);
  assert(wfirst.find('I') == 2);
  assert(wfirst.find('x') == wstring::npos);
} ///:~
Unfortunately, some compilers still do not provide robust support for wide characters.


2.1.5. A string application

If you've looked at the sample code in this book closely, you've noticed that certain tokens in the comments surround the code. These are used by a Python program that Bruce wrote to extract the code into files and set up makefiles for building the code. For example, a double-slash followed by a colon at the beginning of a line denotes the first line of a source file. The rest of the line contains information describing the file's name and location and whether it should be only compiled rather than fully built into an executable file. For example, the first line in the previous program above contains the string C03:IWCompare.cpp, indicating that the file IWCompare.cpp should be extracted into the directory C03.

The last line of a source file contains a triple-slash followed by a colon and a tilde. If the first line has an exclamation point immediately after the colon, the first and last lines of the source code are not to be output to the file (this is for data-only files). (If you're wondering why we're avoiding showing you these tokens, it's because we don't want to break the code extractor when applied to the text of the book!)

Bruce's Python program does a lot more than just extract code. If the token “{O}” follows the file name, its makefile entry will only be set up to compile the file and not to link it into an executable. (The Test Framework in Chapter 2 is built this way.) To link such a file with another source example, the target executable's source file will contain an “{L}” directive, as in:
//{L} ../TestSuite/Test
This section will present a program to just extract all the code so that you can compile and inspect it manually. You can use this program to extract all the code in this book by saving the document file as a text file(39) (let's call it TICV2.txt) and by executing something like the following on a shell command line:
C:> extractCode TICV2.txt /TheCode
This command reads the text file TICV2.txt and writes all the source code files in subdirectories under the top-level directory /TheCode. The directory tree will look like the following:
TheCode/
   C0B/
   C01/
   C02/
   C03/
   C04/
   C05/
   C06/
   C07/
   C08/
   C09/
   C10/
   C11/
   TestSuite/
The source files containing the examples from each chapter will be in the corresponding directory.

Here's the program:
//: C03:ExtractCode.cpp {-edg} {RunByHand}
// Extracts code from text.
#include <cassert>
#include <cstddef>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

// Legacy non-standard C header for mkdir()
#if defined(__GNUC__) || defined(__MWERKS__)
#include <sys/stat.h>
#elif defined(__BORLANDC__) || defined(_MSC_VER) \
  || defined(__DMC__)
#include <direct.h>
#else
#error Compiler not supported
#endif

// Check to see if directory exists
// by attempting to open a new file
// for output within it.
bool exists(string fname) {
  size_t len = fname.length();
  if(fname[len-1] != '/' && fname[len-1] !=
'\\')
    fname.append("/");
  fname.append("000.tmp");
  ofstream outf(fname.c_str());
  bool existFlag = outf;
  if(outf) {
    outf.close();
    remove(fname.c_str());
  }
  return existFlag;
}

int main(int argc, char* argv[]) {
  // See if input file name provided
  if(argc == 1) {
    cerr << "usage: extractCode file
[dir]" << endl;
    exit(EXIT_FAILURE);
  }
  // See if input file exists
  ifstream inf(argv[1]);
  if(!inf) {
    cerr << "error opening file: "
<< argv[1] << endl;
    exit(EXIT_FAILURE);
  }
  // Check for optional output directory
  string root("./");  // current is default
  if(argc == 3) {
    // See if output directory exists
    root = argv[2];
    if(!exists(root)) {
      cerr << "no such directory: "
<< root << endl;
      exit(EXIT_FAILURE);
    }
    size_t rootLen = root.length();
    if(root[rootLen-1] != '/' &&
root[rootLen-1] != '\\')
      root.append("/");
  }
  // Read input file line by line
  // checking for code delimiters
  string line;
  bool inCode = false;
  bool printDelims = true;
  ofstream outf;
  while(getline(inf, line)) {
    size_t findDelim = line.find("//"
"/:~");
    if(findDelim != string::npos) {
      // Output last line and close file
      if(!inCode) {
        cerr << "Lines out of order"
<< endl;
        exit(EXIT_FAILURE);
      }
      assert(outf);
      if(printDelims)
        outf << line << endl;
      outf.close();
      inCode = false;
      printDelims = true;
    } else {
      findDelim = line.find("//"
":");
      if(findDelim == 0) {
        // Check for '!' directive
        if(line[3] == '!') {
          printDelims = false;
          ++findDelim;  // To skip '!' for next search
        }
        // Extract subdirectory name, if any
        size_t startOfSubdir =
          line.find_first_not_of(" \t",
findDelim+3);
        findDelim = line.find(':', startOfSubdir);
        if(findDelim == string::npos) {
          cerr << "missing filename
information\n" << endl;
          exit(EXIT_FAILURE);
        }
        string subdir;
        if(findDelim > startOfSubdir)
          subdir = line.substr(startOfSubdir,
                               findDelim -
startOfSubdir);
        // Extract file name (better be one!)
        size_t startOfFile = findDelim + 1;
        size_t endOfFile =
          line.find_first_of(" \t",
startOfFile);
        if(endOfFile == startOfFile) {
          cerr << "missing filename"
<< endl;
          exit(EXIT_FAILURE);
        }
        // We have all the pieces; build fullPath name
        string fullPath(root);
        if(subdir.length() > 0)
         
fullPath.append(subdir).append("/");
        assert(fullPath[fullPath.length()-1] == '/');
        if(!exists(fullPath))
#if defined(__GNUC__) || defined(__MWERKS__)
          mkdir(fullPath.c_str(), 0);  // Create subdir
#else
          mkdir(fullPath.c_str());  // Create subdir
#endif
        fullPath.append(line.substr(startOfFile,
                        endOfFile - startOfFile));
        outf.open(fullPath.c_str());
        if(!outf) {
          cerr << "error opening "
<< fullPath
               << " for output"
<< endl;
          exit(EXIT_FAILURE);
        }
        inCode = true;
        cout << "Processing " <<
fullPath << endl;
        if(printDelims)
          outf << line << endl;
      }
      else if(inCode) {
        assert(outf);
        outf << line << endl;  // Output middle
code line
      }
    }
  }
  exit(EXIT_SUCCESS);
} ///:~
First, you'll notice some conditional compilation directives. The mkdir( ) function, which creates a directory in the file system, is defined by the POSIX(40) standard in the header <sys/stat.h>. Unfortunately, many compilers still use a different header (<direct.h>). The respective signatures for mkdir( ) also differ: POSIX specifies two arguments, the older versions just one. For this reason, there is more conditional compilation later in the program to choose the right call to mkdir( ). We normally don't use conditional compilation in the examples in this book, but this particular program is too useful not to put a little extra work into, since you can use it to extract all the code with it.

The exists( ) function in ExtractCode.cpp tests whether a directory exists by opening a temporary file in it. If the open fails, the directory doesn't exist. You remove a file by sending its name as a char* to std::remove( ).

The main program validates the command-line arguments and then reads the input file a line at a time, looking for the special source code delimiters. The Boolean flag inCode indicates that the program is in the middle of a source file, so lines should be output. The printDelims flag will be true if the opening token is not followed by an exclamation point; otherwise the first and last lines are not written. It is important to check for the closing delimiter first, because the start token is a subset, and searching for the start token first would return a successful find for both cases. If we encounter the closing token, we verify that we are in the middle of processing a source file; otherwise, something is wrong with the way the delimiters are laid out in the text file. If inCode is true, all is well, and we (optionally) write the last line and close the file. When the opening token is found, we parse the directory and file name components and open the file. The following string-related functions were used in this example: length( ), append( ), getline( ), find( ) (two versions), find_first_not_of( ), substr( ), find_first_of( ), c_str( ), and, of course, operator<<( ).


2.1.6. Summary

C++ string objects provide developers with a number of great advantages over their C counterparts. For the most part, the string class makes referring to strings with character pointers unnecessary. This eliminates an entire class of software defects that arise from the use of uninitialized and incorrectly valued pointers.

C++ strings dynamically and transparently grow their internal data storage space to accommodate increases in the size of the string data. When the data in a string grows beyond the limits of the memory initially allocated to it, the string object will make the memory management calls that take space from and return space to the heap. Consistent allocation schemes prevent memory leaks and have the potential to be much more efficient than “roll your own” memory management.

The string class member functions provide a fairly comprehensive set of tools for creating, modifying, and searching in strings. String comparisons are always case sensitive, but you can work around this by copying string data to C-style null-terminated strings and using case-insensitive string comparison functions, temporarily converting the data held in string objects to a single case, or by creating a case-insensitive string class that overrides the character traits used to create the basic_string object.


2.1.7. Exercises

Solutions to selected exercises can be found in the electronic document The Thinking in C++ Volume 2 Annotated Solution Guide, available for a small fee from www.MindView.net.

  1. Write and test a function that reverses the order of the characters in a string.
  2. A palindrome is a word or group of words that read the same forward and backward. For example “madam” or “wow.” Write a program that takes a string argument from the command line and, using the function from the previous exercise, prints whether the string was a palindrome or not.
  3. Make your program from Exercise 2 return true even if symmetric letters differ in case. For example, “Civic” would still return true although the first letter is capitalized.
  4. Change your program from Exercise 3 to ignore punctuation and spaces as well. For example “Able was I, ere I saw Elba.” would report true.
  5. Using the following string declarations and only chars (no string literals or magic numbers):

    string one("I walked down the canyon with the moving mountain bikers.");
    string two("The bikers passed by me too close for comfort.");
    string three("I went hiking instead.");

    produce the following sentence:

    I moved down the canyon with the mountain bikers. The mountain bikers passed by me too close for comfort. So I went hiking instead.
  6. Write a program named replace that takes three command-line arguments representing an input text file, a string to replace (call it from), and a replacement string (call it to). The program should write a new file to standard output with all occurrences of from replaced by to.
  7. Repeat the previous exercise but replace all instances of from regardless of case.
  8. Make your program from Exercise 3 take a filename from the command-line, and then display all words that are palindromes (ignoring case) in the file. Do not display duplicates (even if their case differs). Do not try to look for palindromes that are larger than a word (unlike in Exercise 4).
  9. Modify HTMLStripper.cpp so that when it encounters a tag, it displays the tag's name, then displays the file's contents between the tag and the file's ending tag. Assume no nesting of tags, and that all tags have ending tags (denoted with </TAGNAME>).
  10. Write a program that takes three command-line arguments (a filename and two strings) and displays to the console all lines in the file that have both strings in the line, either string, only one string, or neither string, based on user input at the beginning of the program (the user will choose which matching mode to use). For all but the “neither string” option, highlight the input string(s) by placing an asterisk (*) at the beginning and end of each string's occurrence when it is displayed.
  11. Write a program that takes two command-line arguments (a filename and a string) and counts the number of times the string occurs in the file, even as a substring (but ignoring overlaps). For example, an input string of “ba” would match twice in the word “basketball,” but an input string of “ana” would match only once in the word “banana.” Display to the console the number of times the string is matched in the file, as well as the average length of the words where the string occurred. (If the string occurs more than once in a word, only count the word once in figuring the average.)
  12. Write a program that takes a filename from the command line and profiles the character usage, including punctuation and spaces (all character values of 0x21 [33] through 0x7E [126], as well as the space character). That is, count the number of occurrences of each character in the file, then display the results sorted either sequentially (space, then !, ", #, etc.) or by ascending or descending frequency based on user input at the beginning of the program. For space, display the word “Space” instead of the character ' '. A sample run might look something like this:
    Format sequentially, ascending, or descending (S/A/D): D
    t:  526
    r:  490
    etc.
  13. Using find( ) and rfind( ), write a program that takes two command-line arguments (a filename and a string) and displays the first and last words (and their indexes) not matching the string, as well as the indexes of the first and last instances of the string. Display “Not Found” if any of the searches fail.
  14. Using the find_first_of “family” of functions (but not exclusively), write a program that will remove all non-alphanumeric characters except spaces and periods from a file, then capitalize the first letter following a period.
  15. Again using the find_first_of “family” of functions, write a program that accepts a filename as a command-line argument and then formats all numbers in the file to currency. Ignore decimal points after the first until a non-numeric character is found, and round to the nearest hundredth. For example, the string 12.399abc29.00.6a would be formatted (in the USA) to $12.40abc$29.01a.
  16. Write a program that accepts two command-line arguments (a filename and a number) and scrambles each word in the file by randomly switching two of its letters the number of times specified in the second argument. (That is, if 0 is passed into your program from the command-line, the words should not be scrambled; if 1 is passed in, one pair of randomly-chosen letters should be swapped, for an input of 2, two random pairs should be swapped, etc.).
  17. Write a program that accepts a filename from the command line and displays the number of sentences (defined as the number of periods in the file), average number of characters per sentence, and the total number of characters in the file.
  18. Prove to yourself that the at( ) member function really will throw an exception if an attempt is made to go out of bounds, and that the indexing operator ([ ]) won't.
 
(31) Some of the material in this chapter was originally created by Nancy Nicolaisen.
(32) It's difficult to make reference–counting implementations thread safe. (See Herb Sutter, More Exceptional C++, pp. 104–14). See Chapter 10 for more on programming with multiple threads.
(33) It is an abbreviation for “no position,” and is the largest value that can be represented by the string allocator's size_type (std::size_t by default).
(34) Discussed in depth in Chapter 6.
(35) To keep the exposition simple, this version does not handle nested tags, such as comments.
(36) It is tempting to use mathematics here to factor out some of these calls to erase( ), but since in some cases one of the operands is string::npos (the largest unsigned integer available), integer overflow occurs and wrecks the algorithm.
(37) For the safety reasons mentioned, the C++ Standards Committee is considering a proposal to redefine string::operator[] to behave identically to string::at( ) for C++0x.
(38) Your implementation can define all three template arguments here. Because the last two template parameters have default arguments, such a declaration is equivalent to what we show here.
(39) Beware that some versions of Microsoft Word erroneously replace single quote characters with an extended ASCII character when you save a document as text, which causes a compile error. We have no idea why this happens. Just replace the character manually with an apostrophe.
(40) POSIX, an IEEE standard, stands for “Portable Operating System Interface” and is a generalization of many of the low–level system calls found in UNIX systems.

Valid XHTML 1.1!Valid CSS!

Ce document est issu de http://www.developpez.com et reste la propriété exclusive de son auteur. La copie, modification et/ou distribution par quelque moyen que ce soit est soumise à l'obtention préalable de l'autorisation de l'auteur.
Vos questions techniques : forum d'entraide C - Publiez vos articles, tutoriels et cours
et rejoignez-nous dans l'équipe de rédaction du club d'entraide des développeurs francophones
Nous contacter - Hébergement - Participez - Copyright © 2000-2010 www.developpez.com - Legal informations.