Thinking in C++ - Volume 2
Date de publication : 25/01/2007 , Date de mise à jour : 25/01/2007
2.1. Strings in Depth
2.1.1. What's in a string?
2.1.2. Creating and initializing C++
strings
2.1.3. Operating on strings
2.1.3.1. Appending, inserting,
and concatenating strings
2.1.3.2. Replacing string
characters
2.1.3.3. Concatenation using
nonmember overloaded operators
2.1.4. Searching in strings
2.1.4.1. Finding in reverse
2.1.4.2. Finding first/last of a
set of characters
2.1.4.3. Removing characters from
strings
2.1.4.4. Comparing strings
2.1.4.5. Strings and character
traits
2.1.5. A string application
2.1.6. Summary
2.1.7. Exercises
2.1. Strings in Depth
String processing with character arrays is one of the biggest
time–wasters in C. Character arrays require the programmer to keep track of the
difference between static quoted strings and arrays created on the stack and
the heap, and the fact that sometimes you're passing around a char* and
sometimes you must copy the whole array.
Especially because string manipulation is so common,
character arrays are a great source of misunderstandings and bugs. Despite
this, creating string classes remained a common exercise for beginning C++ programmers
for many years. The Standard C++ library string class solves the problem of character array manipulation once and for all, keeping track of memory even during
assignments and copy-constructions. You simply don't need to think about it.
This chapter
(31) examines
the Standard C++
string class, beginning with a look at what constitutes
a C++ string and how the C++ version differs from a traditional C character
array. You'll learn about operations and manipulations using
string
objects, and you'll see how C++
strings accommodate variation in
character sets and string data conversion.
Handling text is one of the oldest programming applications,
so it's not surprising that the C++ string draws heavily on the ideas and
terminology that have long been used in C and other languages. As you begin to
acquaint yourself with C++ strings, this fact should be reassuring. No
matter which programming idiom you choose, there are three common things you
want to do with a string:
- Create or modify the sequence of characters stored in the string.
- Detect the presence or absence of elements within the string.
- Translate between various schemes for representing string
characters.
You'll see how each of these jobs is accomplished using C++ string
objects.
2.1.1. What's in a string?
In C, a string is simply an array of characters that always
includes a binary zero (often called the null terminator) as its final
array element. There are significant differences between C++ strings and
their C progenitors. First, and most important, C++ strings hide the
physical representation of the sequence of characters they contain. You don't need
to be concerned about array dimensions or null terminators. A string
also contains certain “housekeeping” information about the size and storage
location of its data. Specifically, a C++ string object knows its
starting location in memory, its content, its length in characters, and the
length in characters to which it can grow before the string object must
resize its internal data buffer. C++ strings thus greatly reduce the likelihood
of making three of the most common and destructive C programming errors:
overwriting array bounds, trying to access arrays through uninitialized or
incorrectly valued pointers, and leaving pointers “dangling” after an array
ceases to occupy the storage that was once allocated to it.
The exact implementation of memory layout for the string
class is not defined by the C++ Standard. This architecture is intended to be
flexible enough to allow differing implementations by compiler vendors, yet
guarantee predictable behavior for users. In particular, the exact conditions
under which storage is allocated to hold data for a string object are not
defined. String allocation rules were formulated to allow but not require a
reference-counted implementation, but whether or not the implementation uses reference counting, the semantics must be the same. To put this a bit differently,
in C, every char array occupies a unique physical region of memory. In
C++, individual string objects may or may not occupy unique physical
regions of memory, but if reference counting avoids storing duplicate copies of
data, the individual objects must look and act as though they exclusively own unique
regions of storage. For example:
#ifndef STRINGSTORAGE_H
#define STRINGSTORAGE_H
#include <iostream>
#include <string>
#include "../TestSuite/Test.h"
using std::cout;
using std::endl;
using std::string;
class StringStorageTest : public TestSuite::Test {
public:
void run() {
string s1("12345");
string s2 = s1;
test_(s1 == s2);
s1[0] = '6';
cout << "s1 = " << s1
<< endl;
cout << "s2 = " << s2
<< endl;
test_(s1 != s2);
}
};
#endif
STRINGSTORAGE_H |
#include "StringStorage.h"
int main() {
StringStorageTest t;
t.run();
return t.report();
} |
We say that an implementation that only makes unique copies
when a string is modified uses a copy-on-write strategy. This approach
saves time and space when strings are used only as value parameters or in other
read-only situations.
Whether a library implementation uses reference counting or
not should be transparent to users of the
string class. Unfortunately,
this is not always the case. In multithreaded programs, it is practically
impossible to use a reference-counting implementation safely.
(32)
2.1.2. Creating and initializing C++
strings
Creating and initializing strings is a straightforward
proposition and fairly flexible. In the SmallString.cpp example below,
the first string, imBlank, is declared but contains no initial
value. Unlike a C char array, which would contain a random and
meaningless bit pattern until initialization, imBlank does contain
meaningful information. This string object is initialized to hold “no
characters” and can properly report its zero length and absence of data
elements using class member functions.
The next string, heyMom, is initialized by the
literal argument “Where are my socks?” This form of initialization uses a
quoted character array as a parameter to the string constructor. By
contrast, standardReply is simply initialized with an assignment. The
last string of the group, useThisOneAgain, is initialized using an
existing C++ string object. Put another way, this example illustrates
that string objects let you do the following:
- Create an empty string and defer initializing it with
character data.
- Initialize a string by passing a literal, quoted character
array as an argument to the constructor.
- Initialize a string using the equal sign (=).
- Use one string to initialize another.
#include <string>
using namespace std;
int main() {
string imBlank;
string heyMom("Where are my socks?");
string standardReply = "Beamed into deep "
"space on wide angle dispersion?";
string useThisOneAgain(standardReply);
} |
These are the simplest forms of string
initialization, but variations offer more flexibility and control. You can do
the following:
- Use a portion of either a C char array or a C++ string.
- Combine different sources of initialization data using operator+.
- Use the string object's substr( ) member function to create a substring.
Here's a program that illustrates
these features:
#include <string>
#include <iostream>
using namespace std;
int main() {
string s1("What is the sound of one clam
napping?");
string s2("Anything worth doing is worth
overdoing.");
string s3("I saw Elvis in a UFO");
string s4(s1, 0, 8);
cout << s4 << endl;
string s5(s2, 15, 6);
cout << s5 << endl;
string s6(s3, 6, 15);
cout << s6 << endl;
string quoteMe = s4 + "that" +
s1.substr(20, 10) + s5 +
"with" + s3.substr(5, 100) +
s1.substr(37, 1);
cout << quoteMe << endl;
} |
The string member function substr( )
takes a starting position as its first argument and the number of characters to
select as the second argument. Both arguments have default values. If you say substr( )
with an empty argument list, you produce a copy of the entire string, so
this is a convenient way to duplicate a string.
Here's the output from the program:
What is
doing
Elvis in a UFO
What is that one clam doing
with Elvis in a UFO? |
Notice the final line of the example. C++ allows string
initialization techniques to be mixed in a single statement, a flexible and
convenient feature. Also notice that the last initializer copies just one
character from the source string.
Another slightly more subtle initialization technique
involves the use of the string iterators string::begin( )
and string::end( ). This technique treats a string like a container
object (which you've seen primarily in the form of vector so far—you'll
see many more containers in Chapter 7), which uses iterators to indicate
the start and end of a sequence of characters. In this way you can hand a string
constructor two iterators, and it copies from one to the other into the new string:
#include <string>
#include <iostream>
#include <cassert>
using namespace std;
int main() {
string source("xxx");
string s(source.begin(), source.end());
assert(s == source);
} |
The iterators are not restricted to begin( ) and
end( ); you can increment, decrement, and add integer offsets to
them, allowing you to extract a subset of characters from the source string.
C++ strings may not be initialized with single
characters or with ASCII or other integer values. You can initialize a string
with a number of copies of a single character, however:
#include <string>
#include <cassert>
using namespace std;
int main() {
string okay(5, 'a');
assert(okay == string("aaaaa"));
} |
The first argument indicates the number of copies of the
second argument to place in the string. The second argument can only be a
single char, not a char array.
2.1.3. Operating on strings
If you've programmed in C, you are accustomed to the family
of functions that write, search, modify, and copy char arrays. There are
two unfortunate aspects of the Standard C library functions for handling char
arrays. First, there are two loosely organized families of them: the “plain”
group, and the ones that require you to supply a count of the number of
characters to be considered in the operation at hand. The roster of functions
in the C char array library shocks the unsuspecting user with a long
list of cryptic, mostly unpronounceable names. Although the type and number of
arguments to the functions are somewhat consistent, to use them properly you
must be attentive to details of function naming and parameter passing.
The second inherent trap of the standard C char array
tools is that they all rely explicitly on the assumption that the character
array includes a null terminator. If by oversight or error the null is omitted
or overwritten, there's little to keep the C char array functions from
manipulating the memory beyond the limits of the allocated space, sometimes
with disastrous results.
C++ provides a vast improvement in the convenience and
safety of string objects. For purposes of actual string handling
operations, there are about the same number of distinct member function names
in the string class as there are functions in the C library, but because
of overloading the functionality is much greater. Coupled with sensible naming
practices and the judicious use of default arguments, these features combine to
make the string class much easier to use than the C library char
array functions.
2.1.3.1. Appending, inserting,
and concatenating strings
One of the most valuable and convenient aspects of C++ strings
is that they grow as needed, without intervention on the part of the
programmer. Not only does this make string-handling code inherently more
trustworthy, it also almost entirely eliminates a tedious “housekeeping”
chore—keeping track of the bounds of the storage where your strings live. For
example, if you create a string object and initialize it with a string of 50
copies of ‘X', and later store in it 50 copies of “Zowie”, the object itself
will reallocate sufficient storage to accommodate the growth of the data.
Perhaps nowhere is this property more appreciated than when the strings
manipulated in your code change size and you don't know how big the change is. The
string member functions append( ) and insert( )
transparently reallocate storage when a string grows:
#include <string>
#include <iostream>
using namespace std;
int main() {
string bigNews("I saw Elvis in a UFO. ");
cout << bigNews << endl;
cout << "Size = " <<
bigNews.size() << endl;
cout << "Capacity = " <<
bigNews.capacity() << endl;
bigNews.insert(1, " thought I");
cout << bigNews << endl;
cout << "Size = " <<
bigNews.size() << endl;
cout << "Capacity = " <<
bigNews.capacity() << endl;
bigNews.reserve(500);
bigNews.append("I've been working too
hard.");
cout << bigNews << endl;
cout << "Size = " <<
bigNews.size() << endl;
cout << "Capacity = " <<
bigNews.capacity() << endl;
} |
Here is the output from one particular compiler:
I saw Elvis in a UFO.
Size = 22
Capacity = 31
I thought I saw Elvis in a UFO.
Size = 32
Capacity = 47
I thought I saw Elvis in a UFO. I've been
working too hard.
Size = 59
Capacity = 511 |
This example demonstrates that even though you can safely
relinquish much of the responsibility for allocating and managing the memory
your strings occupy, C++ strings provide you with several tools
to monitor and manage their size. Notice the ease with which we changed the
size of the storage allocated to the string. The size( ) function returns the number of characters currently stored in the string and is identical to the length( ) member function. The capacity( ) functionreturns
the size of the current underlying allocation, meaning the number of characters
the string can hold without requesting more storage. The reserve( )
function is an optimization mechanism that indicates your intention to specify
a certain amount of storage for future use; capacity( ) always
returns a value at least as large as the most recent call to reserve( ).
A resize( ) function appends spaces if the new size is greater than
the current string size or truncates the string otherwise. (An overload of resize( )
can specify a different character to append.)
The exact fashion that the string member functions
allocate space for your data depends on the implementation of the library. When
we tested one implementation with the previous example, it appeared that
reallocations occurred on even word (that is, full-integer) boundaries, with
one byte held back. The architects of the string class have endeavored
to make it possible to mix the use of C char arrays and C++ string
objects, so it is likely that figures reported by StrSize.cpp for capacity
reflect that, in this particular implementation, a byte is set aside to easily
accommodate the insertion of a null terminator.
2.1.3.2. Replacing string
characters
The insert( ) functionis particularly
nice because it absolves you from making sure the insertion of characters in a
string won't overrun the storage space or overwrite the characters immediately
following the insertion point. Space grows, and existing characters politely
move over to accommodate the new elements. Sometimes this might not be what you
want. If you want the size of the string to remain unchanged, use the replace( ) function to overwrite characters. There are a number of
overloaded versions of replace( ), but the simplest one takes three
arguments: an integer indicating where to start in the string, an integer
indicating how many characters to eliminate from the original string, and the
replacement string (which can be a different number of characters than the
eliminated quantity). Here's a simple example:
#include <cassert>
#include <string>
using namespace std;
int main() {
string s("A piece of text");
string tag("$tag$");
s.insert(8, tag + ' ');
assert(s == "A piece $tag$
of text");
int start = s.find(tag);
assert(start == 8);
assert(tag.size() == 5);
s.replace(start, tag.size(), "hello
there");
assert(s == "A piece hello there of text");
} |
The tag is first inserted into s (notice that
the insert happens before the value indicating the insert point and that
an extra space was added after tag), and then it is found and replaced.
You should check to see if you've found anything before you
perform a replace( ).The previous example replaces with a char*,
but there's an overloaded version that replaces with a string.Here's
a more complete demonstration replace( ):
#include <cassert>
#include <cstddef>
#include <string>
using namespace std;
void replaceChars(string& modifyMe,
const string& findMe, const string& newChars)
{
size_t i = modifyMe.find(findMe, 0);
if(i != string::npos)
modifyMe.replace(i, findMe.size(), newChars);
}
int main() {
string bigNews = "I thought I saw Elvis in a
UFO. "
"I have been working too
hard.";
string replacement("wig");
string findMe("UFO");
replaceChars(bigNews, findMe, replacement);
assert(bigNews == "I thought I saw Elvis in a
"
"wig. I have been working too
hard.");
} |
If
replace doesn't find the search string, it returns
string::npos. The
npos data member is a static constant member of
the
string class that represents a nonexistent character position.
(33)
Unlike insert( ), replace( ) won't
grow the string's storage space if you copy new characters into the
middle of an existing series of array elements. However, it will grow the storage space if needed, for example, when you make a “replacement” that would
expand the original string beyond the end of the current allocation. Here's an
example:
#include <cassert>
#include <string>
using namespace std;
int main() {
string bigNews("I have been working the
grave.");
string replacement("yard shift.");
bigNews.replace(bigNews.size() - 1,
replacement.size(), replacement);
assert(bigNews == "I have been working the"
"graveyard shift.");
} |
The call to replace( ) begins “replacing” beyond
the end of the existing array, which is equivalent to an append operation.
Notice that in this example replace( ) expands the array
accordingly.
You may have been hunting through this chapter trying to do
something relatively simple such as replace all the instances of one character
with a different character. Upon finding the previous material on replacing,
you thought you found the answer, but then you started seeing groups of
characters and counts and other things that looked a bit too complex. Doesn't string
have a way to just replace one character with another everywhere?
You can easily write such a function using the find( )
and replace( ) member functions as follows:
#ifndef REPLACEALL_H
#define REPLACEALL_H
#include <string>
std::string& replaceAll(std::string& context,
const std::string& from, const std::string&
to);
#endif |
#include <cstddef>
#include "ReplaceAll.h"
using namespace std;
string& replaceAll(string& context, const
string& from,
const string& to) {
size_t lookHere = 0;
size_t foundHere;
while((foundHere = context.find(from, lookHere))
!= string::npos) {
context.replace(foundHere, from.size(), to);
lookHere = foundHere + to.size();
}
return context;
} |
The version of find( ) used here takes as a
second argument the position to start looking in and returns string::npos
if it doesn't find it. It is important to advance the position held in the
variable lookHere past the replacement string, in case from is a
substring of to. The following program tests the replaceAll
function:
#include <cassert>
#include <iostream>
#include <string>
#include "ReplaceAll.h"
using namespace std;
int main() {
string text = "a man, a plan, a canal, Panama";
replaceAll(text, "an", "XXX");
assert(text == "a mXXX, a plXXX, a cXXXal, PXXXama");
} |
As you can see, the
string class by itself doesn't
solve all possible problems. Many solutions have been left to the algorithms in
the Standard library
(34) because
the
string class can look just like an STL sequence (by virtue of the
iterators discussed earlier). All the generic algorithms work on a “range” of
elements within a container. Usually that range is just “from the beginning of
the container to the end.” A
string object looks like a container of
characters: to get the beginning of the range you use
string::begin( ),
and to get the end of the range you use
string::end( ). The
following example shows the use of the
replace( ) algorithm to
replace all the instances of the single character ‘X' with ‘Y':
#include <algorithm>
#include <cassert>
#include <string>
using namespace std;
int main() {
string s("aaaXaaaXXaaXXXaXXXXaaa");
replace(s.begin(), s.end(), 'X', 'Y');
assert(s == "aaaYaaaYYaaYYYaYYYYaaa");
} |
Notice that this replace( ) is not called
as a member function of string. Also, unlike the string::replace( )
functions that only perform one replacement, the replace( )
algorithm replaces all instances of one character with another.
The replace( ) algorithm only works with single
objects (in this case, char objects) and will not replace quoted char
arrays or string objects. Since a string behaves like an STL
sequence, a number of other algorithms can be applied to it, which might solve
other problems that are not directly addressed by the string member
functions.
2.1.3.3. Concatenation using
nonmember overloaded operators
One of the most delightful discoveries awaiting a C
programmer learning about C++ string handling is how simply strings
can be combined and appended using operator+ and operator+=.These
operators make combining strings syntactically similar to adding numeric
data:
#include <string>
#include <cassert>
using namespace std;
int main() {
string s1("This ");
string s2("That ");
string s3("The other ");
s1 = s1 + s2;
assert(s1 == "This That ");
s1 += s3;
assert(s1 == "This That The other ");
s1 += s3 + s3[4] + "ooh lala";
assert(s1 == "This That The other The other oooh
lala");
} |
Using the operator+ and operator+= operatorsis a flexible andconvenient way to combine string data. On
the right side of the statement, you can use almost any type that evaluates to
a group of one or more characters.
2.1.4. Searching in strings
The find family of string member functions
locates a character or group of characters within a given string. Here are the
members of the find family and their general usage :
| string find member function |
What/how it finds |
| find( ) |
Searches a string for a specified character or group of
characters and returns the starting position of the first occurrence found or
npos if no match is found. |
| find_first_of( ) |
Searches a target string and returns the position of the
first match of any character in a specified group. If no match is
found, it returns npos. |
| find_last_of( ) |
Searches a target string and returns the position of the
last match of any character in a specified group. If no match is
found, it returns npos. |
| find_first_not_of( ) |
Searches a target string and returns the position of the
first element that doesn't match any character in a specified
group. If no such element is found, it returns npos. |
| find_last_not_of( ) |
Searches a target string and returns the position of the
element with the largest subscript that doesn't match any
character in a specified group. If no such element is found, it returns npos. |
| rfind( ) |
Searches a string from end to beginning for a specified
character or group of characters and returns the starting position of the
match if one is found. If no match is found, it returns npos. |
The simplest use of find( )
searches for one or more characters in a string. This overloaded
version of find( ) takes a parameter that specifies the
character(s) for which to search and optionally a parameter that tells it where
in the string to begin searching for the occurrence of a substring. (The default
position at which to begin searching is 0.) By setting the call to find inside
a loop, you can easily move through a string, repeating a search to find all
the occurrences of a given character or group of characters within the string.
The following program uses the method of The Sieve of
Eratosthenes to find prime numbers less than 50. This method starts with
the number 2, marks all subsequent multiples of 2 as not prime, and repeats the
process for the next prime candidate. The SieveTest constructor
initializes sieveChars by setting the initial size of the character
array and writing the value ‘P' to each of its members.
#ifndef SIEVE_H
#define SIEVE_H
#include <cmath>
#include <cstddef>
#include <string>
#include "../TestSuite/Test.h"
using std::size_t;
using std::sqrt;
using std::string;
class SieveTest : public TestSuite::Test {
string sieveChars;
public:
SieveTest() : sieveChars(50, 'P') {}
void run() {
findPrimes();
testPrimes();
}
bool isPrime(int p) {
if(p == 0 || p == 1) return false;
int root = int(sqrt(double(p)));
for(int i = 2; i <= root; ++i)
if(p % i == 0) return false;
return true;
}
void findPrimes() {
Prime:
sieveChars.replace(0, 2, "NN");
size_t sieveSize = sieveChars.size();
int root = int(sqrt(double(sieveSize)));
for(int i = 2; i <= root; ++i)
for(size_t factor = 2; factor * i < sieveSize;
++factor)
sieveChars[factor * i] = 'N';
}
void testPrimes() {
size_t i = sieveChars.find('P');
while(i != string::npos) {
test_(isPrime(i++));
i = sieveChars.find('P', i);
}
i = sieveChars.find_first_not_of('P');
while(i != string::npos) {
test_(!isPrime(i++));
i = sieveChars.find_first_not_of('P', i);
}
}
};
#endif |
#include "Sieve.h"
int main() {
SieveTest t;
t.run();
return t.report();
} |
The find( ) function can walk forward through a string,
detecting multiple occurrences of a character or a group of characters, and find_first_not_of( )
finds other characters or substrings.
There are no functions in the string class to change
the case of a string, but you can easily create these functions using the
Standard C library functions toupper( ) and tolower( ),
which change the case of one character at a time. The following example
illustrates a case-insensitive search:
#ifndef FIND_H
#define FIND_H
#include <cctype>
#include <cstddef>
#include <string>
#include "../TestSuite/Test.h"
using std::size_t;
using std::string;
using std::tolower;
using std::toupper;
inline string upperCase(const string& s) {
string upper(s);
for(size_t i = 0; i < s.length(); ++i)
upper[i] = toupper(upper[i]);
return upper;
}
inline string lowerCase(const string& s) {
string lower(s);
for(size_t i = 0; i < s.length(); ++i)
lower[i] = tolower(lower[i]);
return lower;
}
class FindTest : public TestSuite::Test {
string chooseOne;
public:
FindTest() : chooseOne("Eenie, Meenie, Miney,
Mo") {}
void testUpper() {
string upper = upperCase(chooseOne);
const string LOWER =
"abcdefghijklmnopqrstuvwxyz";
test_(upper.find_first_of(LOWER) == string::npos);
}
void testLower() {
string lower = lowerCase(chooseOne);
const string UPPER =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
test_(lower.find_first_of(UPPER) == string::npos);
}
void testSearch() {
size_t i = chooseOne.find("een");
test_(i == 8);
string test = lowerCase(chooseOne);
i = test.find("een");
test_(i == 0);
i = test.find("een", ++i);
test_(i == 8);
i = test.find("een", ++i);
test_(i == string::npos);
test = upperCase(chooseOne);
i = test.find("EEN");
test_(i == 0);
i = test.find("EEN", ++i);
test_(i == 8);
i = test.find("EEN", ++i);
test_(i == string::npos);
}
void run() {
testUpper();
testLower();
testSearch();
}
};
#endif |
#include "Find.h"
#include "../TestSuite/Test.h"
int main() {
FindTest t;
t.run();
return t.report();
} |
Both the upperCase( ) and lowerCase( )
functions follow the same form: they make a copy of the argument string
and change the case. The Find.cpp program isn't the best solution to the
case-sensitivity problem, so we'll revisit it when we examine string
comparisons.
2.1.4.1. Finding in reverse
If you need to search through a string from end to
beginning (to find the data in “last in / first out” order), you can use the
string member function rfind( ):
#ifndef RPARSE_H
#define RPARSE_H
#include <cstddef>
#include <string>
#include <vector>
#include "../TestSuite/Test.h"
using std::size_t;
using std::string;
using std::vector;
class RparseTest : public TestSuite::Test {
vector<string> strings;
public:
void parseForData() {
string
s("now.;sense;make;to;going;is;This");
int last = s.size();
size_t current = s.rfind(';');
while(current != string::npos) {
++current;
strings.push_back(s.substr(current, last - current));
current -= 2;
last = current + 1;
current = s.rfind(';', current);
}
strings.push_back(s.substr(0, last));
}
void testData() {
test_(strings[0] == "This");
test_(strings[1] == "is");
test_(strings[2] == "going");
test_(strings[3] == "to");
test_(strings[4] == "make");
test_(strings[5] == "sense");
test_(strings[6] == "now.");
string sentence;
for(size_t i = 0; i < strings.size() - 1; i++)
sentence += strings[i] += " ";
space:
sentence += strings[strings.size() - 1];
test_(sentence == "This is going to make sense
now.");
}
void run() {
parseForData();
testData();
}
};
#endif |
#include "Rparse.h"
int main() {
RparseTest t;
t.run();
return t.report();
} |
The string member function rfind( ) backs
through the string looking for tokens and reports the array index of matching
characters or string::npos if it is unsuccessful.
2.1.4.2. Finding first/last of a
set of characters
The find_first_of( ) and find_last_of( )
member functions can be conveniently put to work to create a little utility
that will strip whitespace characters from both ends of a string. Notice that
it doesn't touch the original string, but instead returns a new string:
#ifndef TRIM_H
#define TRIM_H
#include <string>
#include <cstddef>
inline std::string trim(const std::string& s) {
if(s.length() == 0)
return s;
std::size_t beg = s.find_first_not_of("
\a\b\f\n\r\t\v");
std::size_t end = s.find_last_not_of("
\a\b\f\n\r\t\v");
if(beg == std::string::npos)
return "";
return std::string(s, beg, end - beg + 1);
}
#endif |
The first test checks for an empty string; in that
case, no tests are made, and a copy is returned. Notice that once the end
points are found, the string constructor builds a new string from
the old one, giving the starting count and the length.
Testing such a general-purpose tool needs to be thorough:
#ifndef TRIMTEST_H
#define TRIMTEST_H
#include "Trim.h"
#include "../TestSuite/Test.h"
class TrimTest : public TestSuite::Test {
enum {NTESTS = 11};
static std::string s[NTESTS];
public:
void testTrim() {
test_(trim(s[0]) == "abcdefghijklmnop");
test_(trim(s[1]) == "abcdefghijklmnop");
test_(trim(s[2]) == "abcdefghijklmnop");
test_(trim(s[3]) == "a");
test_(trim(s[4]) == "ab");
test_(trim(s[5]) == "abc");
test_(trim(s[6]) == "a b c");
test_(trim(s[7]) == "a b c");
test_(trim(s[8]) == "a \t b \t c");
test_(trim(s[9]) == "");
test_(trim(s[10]) == "");
}
void run() {
testTrim();
}
};
#endif |
#include "TrimTest.h"
std::string TrimTest::s[TrimTest::NTESTS] = {
" \t abcdefghijklmnop \t ",
"abcdefghijklmnop \t ",
" \t abcdefghijklmnop",
"a", "ab", "abc",
"a b c",
" \t a b c \t ", " \t a \t b \t c \t
",
"\t \n \r \v \f",
""
}; |
#include "TrimTest.h"
int main() {
TrimTest t;
t.run();
return t.report();
} |
In the array of strings, you can see that the
character arrays are automatically converted to string objects. This
array provides cases to check the removal of spaces and tabs from both ends, as
well as ensuring that spaces and tabs are not removed from the middle of a string.
2.1.4.3. Removing characters from
strings
Removing characters is easy and efficient with the erase( ) member function, which takes two arguments: where to start
removing characters (which defaults to 0), and how many to remove (which
defaults to string::npos). If you specify more characters than remain in
the string, the remaining characters are all erased anyway (so calling erase( )
without any arguments removes all characters from a string). Sometimes it's
useful to take an HTML file and strip its tags and special characters so that
you have something approximating the text that would be displayed in the Web
browser, only as a plain text file. The following example uses erase( )
to do the job:
#include <cassert>
#include <cmath>
#include <cstddef>
#include <fstream>
#include <iostream>
#include <string>
#include "ReplaceAll.h"
#include "../require.h"
using namespace std;
string& stripHTMLTags(string& s) {
static bool inTag = false;
bool done = false;
while(!done) {
if(inTag) {
size_t rightPos = s.find('>');
if(rightPos != string::npos) {
inTag = false;
s.erase(0, rightPos + 1);
}
else {
done = true;
s.erase();
}
}
else {
size_t leftPos = s.find('<');
if(leftPos != string::npos) {
size_t rightPos = s.find('>');
if(rightPos == string::npos) {
inTag = done = true;
s.erase(leftPos);
}
else
s.erase(leftPos, rightPos - leftPos + 1);
}
else
done = true;
}
}
replaceAll(s, "<",
"<");
replaceAll(s, ">",
">");
replaceAll(s, "&",
"&");
replaceAll(s, " ", " ");
return s;
}
int main(int argc, char* argv[]) {
requireArgs(argc, 1,
"usage: HTMLStripper InputFile");
ifstream in(argv[1]);
assure(in, argv[1]);
string s;
while(getline(in, s))
if(!stripHTMLTags(s).empty())
cout << s << endl;
} |
This example will even strip HTML tags that span multiple
lines.
(35) This is
accomplished with the static flag,
inTag, which is
true whenever
the start of a tag is found, but the accompanying tag end is not found in the
same line. All forms of
erase( ) appear in the
stripHTMLFlags( )
function.
(36) The
version of
getline( ) we use here is a (global) function declared
in the
<string> header and is handy because it stores an
arbitrarily long line in its
string argument. You don't need to worry
about the dimension of a character array as you do with
istream::getline( ).
Notice that this program uses the
replaceAll( ) function from
earlier in this chapter. In the next chapter, we'll use string streams to
create a more elegant solution.
2.1.4.4. Comparing strings
Comparing strings is inherently different from comparing
numbers. Numbers have constant, universally meaningful values. To evaluate the
relationship between the magnitudes of two strings, you must make a lexical
comparison. Lexical comparison means that when you test a character to see
if it is “greater than” or “less than” another character, you are actually
comparing the numeric representation of those characters as specified in the
collating sequence of the character set being used. Most often this will be the
ASCII collating sequence, which assigns the printable characters for the
English language numbers in the range 32 through 127 decimal. In the ASCII
collating sequence, the first “character” in the list is the space, followed by
several common punctuation marks, and then uppercase and lowercase letters.
With respect to the alphabet, this means that the letters nearer the front have
lower ASCII values than those nearer the end. With these details in mind, it
becomes easier to remember that when a lexical comparison that reports s1
is “greater than” s2, it simply means that when the two were compared,
the first differing character in s1 came later in the alphabet than the
character in that same position in s2.
C++ provides several ways to compare strings, and each has
advantages. The simplest to use are the nonmember, overloaded operator
functions: operator ==, operator != operator >, operator
<, operator >=,and operator <=.
#ifndef COMPSTR_H
#define COMPSTR_H
#include <string>
#include "../TestSuite/Test.h"
using std::string;
class CompStrTest : public TestSuite::Test {
public:
void run() {
string s1("This");
string s2("That");
test_(s1 == s1);
test_(s1 != s2);
test_(s1 > s2);
test_(s1 >= s2);
test_(s1 >= s1);
test_(s2 < s1);
test_(s2 <= s1);
test_(s1 <= s1);
}
};
#endif |
#include "CompStr.h"
int main() {
CompStrTest t;
t.run();
return t.report();
} |
The overloaded comparison operators are useful for comparing
both full strings and individual string character elements.
Notice in the following example the flexibility of argument
types on both the left and right side of the comparison operators. For
efficiency, the string class provides overloaded operators for the
direct comparison of string objects, quoted literals, and pointers to C-style
strings without having to create temporary string objects.
#include <iostream>
#include <string>
using namespace std;
int main() {
string s2("That"), s1("This");
if("That" == s2)
cout << "A match" << endl;
if(s1 != s2.c_str())
cout << "No match" << endl;
} |
The c_str( ) function returns a const char*
that points to a C-style, null-terminated string equivalent to the contents of
the string object. This comes in handy when you want to pass a string to
a standard C function, such as atoi( ) or any of the functions
defined in the <cstring> header. It is an error to use the value
returned by c_str( ) as non-const argument to any function.
You won't find the logical not (!) or the logical
comparison operators (&& and ||) among operators for a
string. (Neither will you find overloaded versions of the bitwise C operators &,
|, ^, or ~.) The overloaded nonmember comparison operators
for the string class are limited to the subset that has clear, unambiguous
application to single characters or groups of characters.
The compare( ) member function offers you a
great deal more sophisticated and precise comparison than the nonmember
operator set. It provides overloaded versions to compare:
- Two complete strings.
- Part of either string to a complete string.
- Subsets of two strings.
The following example compares complete strings:
#include <cassert>
#include <string>
using namespace std;
int main() {
string first("This");
string second("That");
assert(first.compare(first) == 0);
assert(second.compare(second) == 0);
assert(first.compare(second) > 0);
assert(second.compare(first) < 0);
first.swap(second);
assert(first.compare(second) < 0);
assert(second.compare(first) > 0);
} |
The swap( ) function in this example does what
its name implies: it exchanges the contents of its object and argument. To
compare a subset of the characters in one or both strings, you add arguments
that define where to start the comparison and how many characters to consider.
For example, we can use the following overloaded version of compare( ):
s1.compare(s1StartPos, s1NumberChars, s2, s2StartPos,
s2NumberChars);
Here's an example:
#include <cassert>
#include <string>
using namespace std;
int main() {
string first("This is a day that will live in
infamy");
string second("I don't believe that this is what
"
"I signed up for");
assert(first.compare(1, 7, second, 22, 7) == 0);
assert(first.compare(1, 9, second, 22, 9) < 0);
} |
In the examples so far, we have used C-style array indexing
syntax to refer to an individual character in a string. C++ strings provide an
alternative to the s[n] notation: the at( ) member. These two indexing mechanisms produce the same result in C++ if all goes well:
#include <cassert>
#include <string>
using namespace std;
int main() {
string s("1234");
assert(s[1] == '2');
assert(s.at(1) == '2');
} |
There is one important difference, however, between [ ]
and at( ). When you try to reference an array element that is out
of bounds, at( ) will do you the kindness of throwing an exception,
while ordinary [ ] subscripting syntax will leave you to your own
devices:
#include <exception>
#include <iostream>
#include <string>
using namespace std;
int main() {
string s("1234");
try {
s.at(5);
} catch(exception& e) {
cerr << e.what() << endl;
}
} |
Responsible programmers will not use errant indexes, but
should you want to benefits of automatic index checking, using at( ) in
place of [ ] will give you a chance to gracefully recover from
references to array elements that don't exist. Execution of this program on one
of our test compilers gave the following output:
The
at( ) member throws an object of class
out_of_range,
which derives (ultimately) from
std::exception. By catching this object
in an exception handler, you can take appropriate remedial actions such as
recalculating the offending subscript or growing the array. Using
string::operator[ ]( )
gives no such protection and is as dangerous as
char array processing in
C.
(37)
2.1.4.5. Strings and character
traits
The program Find.cpp earlier in this chapter leads us
to ask the obvious question: Why isn't case-insensitive comparison part of the
standard string class? The answer provides interesting background on the
true nature of C++ string objects.
Consider what it means for a character to have “case.”
Written Hebrew, Farsi, and Kanji don't use the concept of upper- and lowercase,
so for those languages this idea has no meaning. It would seem that if there
were a way to designate some languages as “all uppercase” or “all lowercase,”
we could design a generalized solution. However, some languages that employ the
concept of “case” also change the meaning of particular characters with
diacritical marks, for example: the cedilla in Spanish, the circumflex in
French, and the umlaut in German. For this reason, any case-sensitive collating
scheme that attempts to be comprehensive will be nightmarishly complex to use.
Although we usually treat the C++
string as a class,
this is really not the case. The
string type is a specialization of a
more general constituent, the
basic_string< >
template. Observe how
string is declared in the Standard C++ header file:
(38)
typedef basic_string<char> string; |
To understand the nature of the string class, look at the basic_string< >
template:
template<class charT, class traits =
char_traits<charT>,
class allocator =
allocator<charT> > class basic_string; |
In Chapter 5, we examine templates in great detail (much
more than in Chapter 16 of Volume 1). For now, just notice that the string
type is created when the basic_string template is instantiated with char.
Inside the basic_string< > template declaration, the
line:
class traits = char_traits<charT>, |
tells us that the behavior of the class made from the basic_string< >
template is specified by a class based on the template char_traits< >.
Thus, the basic_string< > template produces
string-oriented classes that manipulate types other than char (wide
characters, for example). To do this, the char_traits< > template
controls the content and collating behaviors of a variety of character sets
using the character comparison functions eq( ) (equal), ne( )
(not equal), and lt( ) (less than). The basic_string< >
string comparison functions rely on these.
This is why the string class doesn't include
case-insensitive member functions: that's not in its job description. To change
the way the string class treats character comparison, you must supply a
different char_traits< > template because that defines
the behavior of the individual character comparison member functions.
You can use this information to make a new type of string
class that ignores case. First, we'll define a new case-insensitive char_traits< >
template that inherits from the existing template. Next, we'll override only
the members we need to change to make character-by-character comparison case
insensitive. (In addition to the three lexical character comparison members
mentioned earlier, we'll also supply a new implementation for the char_traits
functions find( ) and compare( )) . Finally, we'll typedef
a new class based on basic_string, but using the case-insensitive ichar_traits
template for its second argument:
#ifndef ICHAR_TRAITS_H
#define ICHAR_TRAITS_H
#include <cassert>
#include <cctype>
#include <cmath>
#include <cstddef>
#include <ostream>
#include <string>
using std::allocator;
using std::basic_string;
using std::char_traits;
using std::ostream;
using std::size_t;
using std::string;
using std::toupper;
using std::tolower;
struct ichar_traits : char_traits<char> {
static bool eq(char c1st, char c2nd) {
return toupper(c1st) == toupper(c2nd);
}
static bool ne(char c1st, char c2nd) {
return !eq(c1st, c2nd);
}
static bool lt(char c1st, char c2nd) {
return toupper(c1st) < toupper(c2nd);
}
static int
compare(const char* str1, const char* str2, size_t n)
{
for(size_t i = 0; i < n; ++i) {
if(str1 == 0)
return -1;
else if(str2 == 0)
return 1;
else if(tolower(*str1) < tolower(*str2))
return -1;
else if(tolower(*str1) > tolower(*str2))
return 1;
assert(tolower(*str1) == tolower(*str2));
++str1; ++str2;
}
return 0;
}
static const char*
find(const char* s1, size_t n, char c) {
while(n-- > 0)
if(toupper(*s1) == toupper(c))
return s1;
else
++s1;
return 0;
}
};
typedef basic_string<char, ichar_traits> istring;
inline ostream& operator<<(ostream& os,
const istring& s) {
return os << string(s.c_str(), s.length());
}
#endif |
We provide a typedef named istring so that our
class will act like an ordinary string in every way, except that it will
make all comparisons without respect to case. For convenience, we've also
provided an overloaded operator<<( ) so that you can print istrings.
Here's an example:
#include <cassert>
#include <iostream>
#include "ichar_traits.h"
using namespace std;
int main() {
istring first = "tHis";
istring second = "ThIS";
cout << first << endl;
cout << second << endl;
assert(first.compare(second) == 0);
assert(first.find('h') == 1);
assert(first.find('I') == 2);
assert(first.find('x') == string::npos);
} |
This is just a toy example. To make istring fully
equivalent to string, we'd have to create the other functions necessary
to support the new istring type.
The <string> header provides a wide string
class via the following typedef:
typedef basic_string<wchar_t> wstring; |
Wide string support also reveals itself in wide streams
(wostream in place of ostream, also defined in <iostream>)
and in the header <cwctype>, a wide-character version of <cctype>.
This along with the wchar_t specialization of char_traits in the
standard library allows us to do a wide-character version of ichar_traits:
#ifndef IWCHAR_TRAITS_H
#define IWCHAR_TRAITS_H
#include <cassert>
#include <cmath>
#include <cstddef>
#include <cwctype>
#include <ostream>
#include <string>
using std::allocator;
using std::basic_string;
using std::char_traits;
using std::size_t;
using std::towlower;
using std::towupper;
using std::wostream;
using std::wstring;
struct iwchar_traits : char_traits<wchar_t> {
static bool eq(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) == towupper(c2nd);
}
static bool ne(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) != towupper(c2nd);
}
static bool lt(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) < towupper(c2nd);
}
static int compare(
const wchar_t* str1, const wchar_t* str2, size_t n)
{
for(size_t i = 0; i < n; i++) {
if(str1 == 0)
return -1;
else if(str2 == 0)
return 1;
else if(towlower(*str1) < towlower(*str2))
return -1;
else if(towlower(*str1) > towlower(*str2))
return 1;
assert(towlower(*str1) == towlower(*str2));
++str1; ++str2;
}
return 0;
}
static const wchar_t*
find(const wchar_t* s1, size_t n, wchar_t c) {
while(n-- > 0)
if(towupper(*s1) == towupper(c))
return s1;
else
++s1;
return 0;
}
};
typedef basic_string<wchar_t, iwchar_traits>
iwstring;
inline wostream& operator<<(wostream& os,
const iwstring& s) {
return os << wstring(s.c_str(), s.length());
}
#endif |
As you can see, this is mostly an exercise in placing a ‘w'
in the appropriate place in the source code. The test program looks like this:
#include <cassert>
#include <iostream>
#include "iwchar_traits.h"
using namespace std;
int main() {
iwstring wfirst = L"tHis";
iwstring wsecond = L"ThIS";
wcout << wfirst << endl;
wcout << wsecond << endl;
assert(wfirst.compare(wsecond) == 0);
assert(wfirst.find('h') == 1);
assert(wfirst.find('I') == 2);
assert(wfirst.find('x') == wstring::npos);
} |
Unfortunately, some compilers still do not provide robust
support for wide characters.
2.1.5. A string application
If you've looked at the sample code
in this book closely, you've noticed that certain tokens in the comments
surround the code. These are used by a Python program that Bruce wrote to
extract the code into files and set up makefiles for building the code. For
example, a double-slash followed by a colon at the beginning of a line denotes
the first line of a source file. The rest of the line contains information
describing the file's name and location and whether it should be only compiled
rather than fully built into an executable file. For example, the first line in
the previous program above contains the string C03:IWCompare.cpp,
indicating that the file IWCompare.cpp should be extracted into the
directory C03.
The last line of a source file contains a triple-slash
followed by a colon and a tilde. If the first line has an exclamation point
immediately after the colon, the first and last lines of the source code are
not to be output to the file (this is for data-only files). (If you're
wondering why we're avoiding showing you these tokens, it's because we don't
want to break the code extractor when applied to the text of the book!)
Bruce's Python program does a lot more than just extract
code. If the token “{O}” follows the file name, its makefile entry will
only be set up to compile the file and not to link it into an executable. (The
Test Framework in Chapter 2 is built this way.) To link such a file with
another source example, the target executable's source file will contain an “{L}”
directive, as in:
This section will present a program to just extract all the
code so that you can compile and inspect it manually. You can use this program
to extract all the code in this book by saving the document file as a text file
(39) (let's call it
TICV2.txt) and by executing something like the following on a shell command
line:
C:> extractCode TICV2.txt /TheCode |
This command reads the text file TICV2.txt and writes
all the source code files in subdirectories under the top-level directory /TheCode.
The directory tree will look like the following:
TheCode/
C0B/
C01/
C02/
C03/
C04/
C05/
C06/
C07/
C08/
C09/
C10/
C11/
TestSuite/ |
The source files containing the examples from each chapter
will be in the corresponding directory.
Here's the program:
#include <cassert>
#include <cstddef>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
#if defined(__GNUC__) || defined(__MWERKS__)
#include <sys/stat.h>
#elif defined(__BORLANDC__) || defined(_MSC_VER) \
|| defined(__DMC__)
#include <direct.h>
#else
#error Compiler not supported
#endif
bool exists(string fname) {
size_t len = fname.length();
if(fname[len-1] != '/' && fname[len-1] !=
'\\')
fname.append("/");
fname.append("000.tmp");
ofstream outf(fname.c_str());
bool existFlag = outf;
if(outf) {
outf.close();
remove(fname.c_str());
}
return existFlag;
}
int main(int argc, char* argv[]) {
if(argc == 1) {
cerr << "usage: extractCode file
[dir]" << endl;
exit(EXIT_FAILURE);
}
ifstream inf(argv[1]);
if(!inf) {
cerr << "error opening file: "
<< argv[1] << endl;
exit(EXIT_FAILURE);
}
string root("./");
if(argc == 3) {
root = argv[2];
if(!exists(root)) {
cerr << "no such directory: "
<< root << endl;
exit(EXIT_FAILURE);
}
size_t rootLen = root.length();
if(root[rootLen-1] != '/' &&
root[rootLen-1] != '\\')
root.append("/");
}
string line;
bool inCode = false;
bool printDelims = true;
ofstream outf;
while(getline(inf, line)) {
size_t findDelim = line.find(
"/:~");
if(findDelim != string::npos) {
if(!inCode) {
cerr << "Lines out of order"
<< endl;
exit(EXIT_FAILURE);
}
assert(outf);
if(printDelims)
outf << line << endl;
outf.close();
inCode = false;
printDelims = true;
} else {
findDelim = line.find(
":");
if(findDelim == 0) {
if(line[3] == '!') {
printDelims = false;
++findDelim;
}
size_t startOfSubdir =
line.find_first_not_of(" \t",
findDelim+3);
findDelim = line.find(':', startOfSubdir);
if(findDelim == string::npos) {
cerr << "missing filename
information\n" << endl;
exit(EXIT_FAILURE);
}
string subdir;
if(findDelim > startOfSubdir)
subdir = line.substr(startOfSubdir,
findDelim -
startOfSubdir);
size_t startOfFile = findDelim + 1;
size_t endOfFile =
line.find_first_of(" \t",
startOfFile);
if(endOfFile == startOfFile) {
cerr << "missing filename"
<< endl;
exit(EXIT_FAILURE);
}
string fullPath(root);
if(subdir.length() > 0)
fullPath.append(subdir).append("/");
assert(fullPath[fullPath.length()-1] == '/');
if(!exists(fullPath))
#if defined(__GNUC__) || defined(__MWERKS__)
mkdir(fullPath.c_str(), 0);
#else
mkdir(fullPath.c_str());
#endif
fullPath.append(line.substr(startOfFile,
endOfFile - startOfFile));
outf.open(fullPath.c_str());
if(!outf) {
cerr << "error opening "
<< fullPath
<< " for output"
<< endl;
exit(EXIT_FAILURE);
}
inCode = true;
cout << "Processing " <<
fullPath << endl;
if(printDelims)
outf << line << endl;
}
else if(inCode) {
assert(outf);
outf << line << endl;
code line
}
}
}
exit(EXIT_SUCCESS);
} |
First, you'll notice some conditional compilation directives.
The
mkdir( ) function, which creates a directory in the file
system, is defined by the POSIX
(40) standard
in the header
<sys/stat.h>. Unfortunately, many compilers still
use a different header (
<direct.h>). The respective signatures for
mkdir( ) also differ: POSIX specifies two arguments, the older
versions just one. For this reason, there is more conditional compilation later
in the program to choose the right call to
mkdir( ). We normally
don't use conditional compilation in the examples in this book, but this
particular program is too useful not to put a little extra work into, since you
can use it to extract all the code with it.
The exists( ) function in ExtractCode.cpp
tests whether a directory exists by opening a temporary file in it. If the open
fails, the directory doesn't exist. You remove a file by sending its name as a char*
to std::remove( ).
The main program validates the command-line arguments and
then reads the input file a line at a time, looking for the special source code
delimiters. The Boolean flag inCode indicates that the program is in the
middle of a source file, so lines should be output. The printDelims flag
will be true if the opening token is not followed by an exclamation point;
otherwise the first and last lines are not written. It is important to check
for the closing delimiter first, because the start token is a subset, and
searching for the start token first would return a successful find for both
cases. If we encounter the closing token, we verify that we are in the middle
of processing a source file; otherwise, something is wrong with the way the
delimiters are laid out in the text file. If inCode is true, all is
well, and we (optionally) write the last line and close the file. When the
opening token is found, we parse the directory and file name components and
open the file. The following string-related functions were used in this
example: length( ), append( ), getline( ), find( )
(two versions), find_first_not_of( ), substr( ), find_first_of( ),
c_str( ), and, of course, operator<<( ).
2.1.6. Summary
C++ string objects provide developers with a number
of great advantages over their C counterparts. For the most part, the string
class makes referring to strings with character pointers unnecessary. This
eliminates an entire class of software defects that arise from the use of
uninitialized and incorrectly valued pointers.
C++ strings dynamically and transparently grow their
internal data storage space to accommodate increases in the size of the string
data. When the data in a string grows beyond the limits of the memory initially
allocated to it, the string object will make the memory management calls that
take space from and return space to the heap. Consistent allocation schemes
prevent memory leaks and have the potential to be much more efficient than
“roll your own” memory management.
The string class member functions provide a fairly
comprehensive set of tools for creating, modifying, and searching in strings.
String comparisons are always case sensitive, but you can work around this by
copying string data to C-style null-terminated strings and using
case-insensitive string comparison functions, temporarily converting the data
held in string objects to a single case, or by creating a case-insensitive
string class that overrides the character traits used to create the basic_string
object.
2.1.7. Exercises
Solutions
to selected exercises can be found in the electronic document The Thinking
in C++ Volume 2 Annotated Solution Guide, available for a small fee from www.MindView.net.
- Write and test a function that reverses the order of the
characters in a string.
- A palindrome is a word or group of words that read the same
forward and backward. For example “madam” or “wow.” Write a program that takes
a string argument from the command line and, using the function from the
previous exercise, prints whether the string was a palindrome or not.
- Make your program from Exercise 2 return true even if
symmetric letters differ in case. For example, “Civic” would still return true
although the first letter is capitalized.
- Change your program from Exercise 3 to ignore punctuation and
spaces as well. For example “Able was I, ere I saw Elba.” would report true.
- Using the following string declarations and only chars (no
string literals or magic numbers):
string one("I walked down the canyon with the moving mountain bikers.");
string two("The bikers passed by me too close for comfort.");
string three("I went hiking instead.");
produce the following sentence:
I moved down the canyon with the mountain bikers. The mountain bikers passed by
me too close for comfort. So I went hiking instead.
- Write a program named replace that takes three
command-line arguments representing an input text file, a string to replace
(call it from), and a replacement string (call it to). The
program should write a new file to standard output with all occurrences of from
replaced by to.
- Repeat the previous exercise but replace all instances of from
regardless of case.
- Make your program from Exercise 3 take a filename from the command-line,
and then display all words that are palindromes (ignoring case) in the file. Do
not display duplicates (even if their case differs). Do not try to look for
palindromes that are larger than a word (unlike in Exercise 4).
- Modify HTMLStripper.cpp so that when it encounters a tag,
it displays the tag's name, then displays the file's contents between the tag
and the file's ending tag. Assume no nesting of tags, and that all tags have
ending tags (denoted with </TAGNAME>).
- Write a program that takes three command-line arguments (a
filename and two strings) and displays to the console all lines in the file
that have both strings in the line, either string, only one string, or neither
string, based on user input at the beginning of the program (the user will
choose which matching mode to use). For all but the “neither string” option,
highlight the input string(s) by placing an asterisk (*) at the beginning and
end of each string's occurrence when it is displayed.
- Write a program that takes two command-line arguments (a filename
and a string) and counts the number of times the string occurs in the file,
even as a substring (but ignoring overlaps). For example, an input string of “ba”
would match twice in the word “basketball,” but an input string of “ana” would
match only once in the word “banana.” Display to the console the number of
times the string is matched in the file, as well as the average length of the
words where the string occurred. (If the string occurs more than once in a
word, only count the word once in figuring the average.)
- Write a program that takes a filename from the command line and
profiles the character usage, including punctuation and spaces (all character
values of 0x21 [33] through 0x7E [126], as well as the space character). That is,
count the number of occurrences of each character in the file, then display the
results sorted either sequentially (space, then !, ", #, etc.) or by
ascending or descending frequency based on user input at the beginning of the
program. For space, display the word “Space” instead of the character ' '. A
sample run might look something like this:
Format sequentially, ascending, or descending
(S/A/D): D
t: 526
r: 490
etc.
- Using find( ) and rfind( ), write a
program that takes two command-line arguments (a filename and a string) and
displays the first and last words (and their indexes) not matching the string,
as well as the indexes of the first and last instances of the string. Display “Not
Found” if any of the searches fail.
- Using the find_first_of “family” of functions (but not
exclusively), write a program that will remove all non-alphanumeric characters
except spaces and periods from a file, then capitalize the first letter
following a period.
- Again using the find_first_of “family” of functions, write
a program that accepts a filename as a command-line argument and then formats
all numbers in the file to currency. Ignore decimal points after the first
until a non-numeric character is found, and round to the nearest hundredth. For
example, the string 12.399abc29.00.6a would be formatted (in the USA) to
$12.40abc$29.01a.
- Write a program that accepts two command-line arguments (a
filename and a number) and scrambles each word in the file by randomly
switching two of its letters the number of times specified in the second
argument. (That is, if 0 is passed into your program from the command-line, the
words should not be scrambled; if 1 is passed in, one pair of randomly-chosen
letters should be swapped, for an input of 2, two random pairs should be
swapped, etc.).
- Write a program that accepts a filename from the command line and
displays the number of sentences (defined as the number of periods in the
file), average number of characters per sentence, and the total number of
characters in the file.
- Prove to yourself that the at( ) member function
really will throw an exception if an attempt is made to go out of bounds, and
that the indexing operator ([ ]) won't.
| |
| (31) | Some
of the material in this chapter was originally created by Nancy Nicolaisen. |
| (32) | It's
difficult to make reference–counting implementations thread safe. (See Herb
Sutter, More Exceptional C++, pp. 104–14). See Chapter 10 for more on
programming with multiple threads. |
| (33) | It
is an abbreviation for “no position,” and is the largest value that can be
represented by the string allocator's size_type (std::size_t by
default). |
| (34) | Discussed
in depth in Chapter 6. |
| (35) | To
keep the exposition simple, this version does not handle nested tags, such as
comments. |
| (36) | It
is tempting to use mathematics here to factor out some of these calls to erase( ),
but since in some cases one of the operands is string::npos (the largest
unsigned integer available), integer overflow occurs and wrecks the algorithm. |
| (37) | For
the safety reasons mentioned, the C++ Standards Committee is considering a
proposal to redefine string::operator[] to behave identically to string::at( )
for C++0x. |
| (38) | Your
implementation can define all three template arguments here. Because the last
two template parameters have default arguments, such a declaration is
equivalent to what we show here. |
| (39) | Beware
that some versions of Microsoft Word erroneously replace single quote
characters with an extended ASCII character when you save a document as text,
which causes a compile error. We have no idea why this happens. Just replace
the character manually with an apostrophe. |
| (40) | POSIX,
an IEEE standard, stands for “Portable Operating System Interface” and is a
generalization of many of the low–level system calls found in UNIX systems. |


Ce document est issu de http://www.developpez.com et reste la propriété exclusive de son auteur. La copie, modification et/ou distribution par quelque moyen que ce soit est soumise à l'obtention préalable de l'autorisation de l'auteur.