Dr. Dobb's TechNetcast  


h o m e
s c h e d u l e
a r c h i v e s
f o r u m
c h a t
f a q
t o o l s
a b o u t

dr. dobbs journal



technetcast 980227
technetcast 980313

C++

with Nathan Myers, Member, ISO/ANSI C++ Standard committee since 1993. Among other contributions, Nathan designed the localization (chapter 22) components of the draft standard. His articles have appeared in the Dr. Dobb's Journal and the C++ Report.

In this two part series, Nathan discusses the ANSI/ISO standardization process and describes some of the more recent additions to the draft standard: exceptions, namespaces, the standard template library, localization and more... Part 1 is a general overview of these additions. Part 2 focuses more specifically on the STL, generic programming and RTTI.

click here for audio/video streams:

part 1
video
part 2


part 1 transcript:

TNC: Welcome to the Dr. Dobb's TechNetCast here on the Pseudo Online Network. This week we're talking about C++ with Nathan Myers. Nathan has been a member on the ANSI/ISO C++ standards committee since 1993. He's also the creator and publisher of www.cantrip.org - his personal site and also a jump-site for other C++ resources on the web. It also features links to some of Nathan's articles from the Dr. Dobb's Journal. Nathan was brought up in Hawaii. Now, why would anyone leave Hawaii, Nathan?

NM: Tropical paradise is wasted on the young.

Standardization

TNC: Let's talk about standardization. You're part of the standardization committee. The entire standardization process is confusing to many people.

What is a proposal, what is a draft, what is a standard, and where does the C++ standard stand right now?

NM: Okay. I got on the committee when I was working at Rogue Wave Software, and after I left Rogue Wave I stayed on it. The committee is -- well, the C++ committee was kind of unusual because it's two committees. It's the American National Standards Institute committee and the ISO International Standards Organization. We're meeting jointly. So we had representatives from ANSI, and anybody can join that. There's a few hundred dollars a year, and the main thing is that you need to come to the meetings, and that's probably most of the expense. Then there's the ISO which sends national body representatives.

So we have a meeting with between 40 and 80 ANSI people there, and then half a dozen ISO representatives, people that -- somebody would come from Canada, and somebody would come from England, and somebody would come from Sweden, and Germany.

We meet three times a year. Twice in the United States and then once of them overseas for international political purposes. And, you know, in the overseas meetings -- well, we had one in London, there was one in Munich. We had one in Hawaii. We had one in Tokyo. And what was really frustrating was that we'd go to these meetings, we'd be at them for a week and really never get to go outside because we'd work 16 hours a day.

TNC: What is the ultimate objective of the entire process?

NM: The ultimate objective is an International Standards Organization document that is a standard. And this is something that has an importance in that it's mentioned in legal documents. In other words, somebody's writing a compiler or selling a compiler and they will certify to a customer that, yes, this compiler satisfies or conforms to, we say, the standard international standard. And somebody buying a compiler looks around at what's available and, you know, pretty much after there is a standard then everybody has to conform to it to stay in business. And it becomes important when there's something wrong and you can say, "Well, you guys contracted to provide me with something that conforms to the standard and this doesn't." And this allows the users to --

TNC: Have some legal footing.

NM: Yes -- to get something that actually does allow them to write portable code.

TNC: The process actually started for C++ a while ago in the late 80s. Where are we now?

NM: Well, we have what we call a final draft international standard, FDIS. And that's something that all the people who attended the meetings have voted on. It was passed unanimously in November of last year.

And we're now waiting for all the member bodies of ISO to vote on it.

TNC: Okay. So we have, after ten years we have this document. So we go through proposals --

NM: No, eight years.

TNC: Eight years. We have proposals, drafts, and now we have a final draft.

NM: Uh-huh.

TNC: So this is now going to be approved by all the different member constituents?

NM: That's right. And all we're doing is just waiting for the ISO brass to organize a vote. And then in March the committee will get together and mostly talk about what will happen next.

TNC: Okay. So at this point the standard is pretty much finalized. Does that mean that no new features will be added to the language, or does it simply mean that no new features that will be part of the standard will be added to the language?

NM: Actually, you know, the meeting, the committee comes back together every five years, or after five years. So when we standardize this then things kind of stay stable for five years. And then the committee gets back together and talks about whether anything needs to be added or changed or even removed. And it's pretty much carte blanche. If the members want to remove something, they can. But, of course, the members aren't going to want to remove things because it will break their own code.

TNC: C++ has been around for a while. Most programmers have been coding according to whatever was specified in the original draft and the Annotated Reference Manual.

NM: Yes.

TNC: Now, since then the original draft, there have been additions to the draft proposals. And they're going to make it into the standard, is that correct?

NM: Yes. The largest changes really were in the library. But there are a number of very important language features that really have changed the landscape of C++ programming. It's possible to do things now that not only were not possible before in C++, they really have never been possible in any other language. And it's pretty exciting.

Exceptions

TNC: I did quite a lot of C++ programming a few years ago and then got involved in a project where I did mostly C. And now I look at C++ and it really looks like a different language, mostly because of the prevalence of templates.

NM: Yes. It is a different programming experience. Exceptions turn out to make a big difference as well.

TNC: Although some compilers already were implementing exceptions. So you were able to use them without them being part of the language.

NM: That's right-- they have been a part of the language for a very long time. They weren't specified very clearly in a number of areas and that made it difficult to write portable code.

But the main change that results from exceptions is that we've learned more about how to use them and how to use them safely.

TNC: Okay. Very quickly, can you give us a quick rundown of exceptions? Try, catch...

NM: Yes. Exceptions are -- we have a block headed with try. So you say "try {", and a bunch of code, and then a close "}", and then you can catch things that are thrown from it. So you can have a throw statement somewhere, maybe you call a function and it says throw. And this, for people who are familiar with C, is similar in operation to setjmp/longjmp, except that the language knows about it and it knows that when you throw something that you've got objects around that need to have destructors called and it takes care of that. So it's a safe --

TNC: And when you throw an exception, you throw an exception object of a given type.

NM: And then the catch, and then where you say catch, you list what you're going to catch just as if it was a function argument. So you can catch things by pointer or by reference.

The standard library provides a set of standard exception classes that you're probably best off deriving from and throwing those, just because it makes it easier for people reading the code to understand what's going on.

And what has changed in programming as a result of exceptions is that you have to, when you're writing code and, say, you call a function, you don't know if somebody's going to throw an exception out of that. So you have to be a little more careful about leaving things in an intermediate state. Somebody may throw an exception somewhere down lower in the call tree and you have to be careful that everything will be in a stable condition on the way through, or that you have placed a catch clause or a destructor somewhere that will clean up on the way out.

And it's a lot like coding for Threads.

TNC: This encourages good programming habits. Now, there is a performance hit associated with performance handling. Overhead.

NM: Yes. People are worried about overhead on exceptions, there have been reports that there is a one or two or five percent overhead in certain compilers for having exceptions turned on. And the answer for that is that there is an overhead compared to doing no error checking at all. But if you don't have exceptions that means that you've got to return error codes and check them everywhere if you want to have a reliable system.

And exceptions are more efficient than code that has returned error codes and have to test them at every point up and down the call chain.

TNC: It's also a cleaner design.

NM: Right. If you actually did check error codes everywhere, your logic would just be entirely obscured. So another result is that people don't generally do complete error checking and they have less reliable systems as a result.

TNC: Okay. What is the unexpected keyword?

NM: Okay. Unexpected is the exception that's thrown [rather, function that's called] if you've listed -- if you have declared what exceptions your function can return, and then you throw something that's not on the list, then it calls it unexpected.

And that's, you know, once you get into unexpected, you're in pretty deep trouble. Most programs won't have anything to do with unexpected. But it's there for those cases where a program just absolutely cannot shut down and needs to do something, do something else. It's kind of like having a signal handler on segment violation, if you're in UNIX. I don't know what you call that in the other environments.

TNC: An access violation.

Namespaces

TNC: Okay, let's move down the list of newer items in the C++ draft. One that I find very interesting is namespaces.

Now, namespaces are actually a way to partition variables, objects from different modules when they get statically linked together.

NM: That's right. It splits up your global name space. The old C++ global namespace was basically a swamp. It was just full of all kinds of names. You don't know what names you might get in various headers.

TNC: Let's just give an example here. For example, okay, Nathan, you write a library and you have a list class.

NM: Uh-huh.

TNC: And you call it "list".

NM: Right.

TNC: Then you use it and link in somebody else's library. And that product also exports a list class called "list". You have a problem there.

NM: That's right. The names will collide. And maybe everything works now and then I get a new version of a library or I get a new version of the operating system and they've added a new name. And suddenly my program doesn't work anymore.

Now, if I'm selling this library that uses this name, I can't just go change that name, you know, to something that the operating system isn't using, because my customers are depending on that name being the same.

And so name spaces partitions all that up so that all your names, you can put all your names in a namespace and then they won't collide with whatever some C header happens to mention somewhere.

TNC: Okay. You partition your stuff into a separate space. How do you do that, actually?

NM: Okay. You just, you pick a name for the name space. You can pick a fairly long name because people can alias to something shorter. I usually alias them to one letter so I'll have a library for file and I'll alias it to just F in my implementation [".C"] file.

TNC: And actually by aliasing, you can also switch from library to library pretty easily.

NM: That's right. If you want to work with the standard library, which is in the "std::" name space, and then you decide that you want to use lists from a special version of the library, such as one that supports persistence, for instance, you can go to the persistent list if you've got it aliased without changing much code.

TNC: So what does the code look like to set up a variable in a specified namespace?

NM: It's a block. So you say, namespace, open bracket, and then you just put your code in there just as if it was in a regular header file, and then close bracket.

Now, the thing about namespaces is that you can reopen them. So you can have several header files and each one says "namespace LibFile {" and defines a bunch of stuff, and close "}".

Nowadays there's no reason for any name to be global, except for the namespace itself.

TNC: Let's talk about things that are part of the standard library or part of the operating system. Where do they go?

NM: All the names in the standard library, that is, the library defined by the ISO standard, are in the namespace "std". And that's a pretty big name space. It's got something like 800 names in it. And the committee was completely free about using up good names, like list and map and sort and copy, because they're all in this name space and that users can use those names for their own code as well.

TNC: Do you need to explicitly specify the standard namespace, or is it special in you do not need to bring it in, to lock it?

NM: Okay. The "std" namespace is just a namespace. There are some rules in the standard that say that you can't define other things in it, besides what's standard. In other words, if you wanted to define a list you'd be better off making your own name space for that. And there's no shortage of name spaces. You can make them up as you go. Although having too many namespaces would probably be as much a mistake as not having any.

TNC: Okay. So we know how to declare a namespace. How do we use it? How do we reference a variable that's in the name space?

NM: The language provides several ways. The clearest way is if you say, for instance, the standard name space you'd say std:: and the name. But there are a number of shortcuts provided.

One of them, probably the most important, is that we have a using-declaration. You can say "using std::list;", and then after that you can say "list". And the compiler then knows that you're talking about the standard list.

Now, there's something else called a using-directive, and this is the hard part for beginners to remember the difference between a using-declaration and using-directive, because they sound so much alike.

A using-declaration is just like an ordinary declaration. You mention the name and then it's declared in your scope. A using-directive is a kind of an odd beast. What that does is, it dissolves the name space boundary. So if you said "using namespace std;", then all the names that are in std:: would suddenly become global, for practical purposes.

TNC: So it's almost like a pragma for some compilers. It's sort of an instruction to the compiler.

NM: Right. It changes the way name look-up happens. And so those of us who are careful and concerned about our code being correct and usable are very very leery of using this namespace-directive.

What it's good for is in transitioning between not using namespaces, and using them. Because you'll have a library that has everything in the global name space, and you wrap it up in a namespace, but then you've got users who are still using the old interface. And so they can use that to get back to the old way of doing it -- temporarily, one hopes.

There's something really interesting called Koenig look-up. It's named after Andrew Koenig. He's the author most recently of "Ruminations on C++," which I recommend, and he has a regular column in the Journal of Object Oriented Programming.

Koenig look-up says that if you use a type that's defined in a namespace when you call a function, the language will look up that function in the name space where the type is defined.

Now, a good example of that is if you have a type defined, say a date class defined in your LibDate namespace and -- or maybe your LibDateTime namespace -- and you define operator<< for it so that you could write these things to ostreams.

Well, if you didn't have this Koenig look-up, then when you tried to call operator<< it would say, well, we don't see any operator<< to call. But since Koenig look-up is there, it will look in the name spacewhere you define the date type --

TNC: Isn't that sort of confusing? I mean, wouldn't it be preferable just to explicitly state which operator<< you want to use?

NM: Well, you could, but then you'd have to say something like LibDate::operator<<. And that's not syntax you want to use for operator<<.

So it opens up a channel between namespaces that can cause some confusion if you've defined names that could overload in confusing ways. So there are some things to be careful about. There will be articles about how to use this.

TNC: This behavior is part of the standard.

NM: It's absolutely standard, although very few compilers have implemented it yet, even the ones that actually claim to implement namespaces.

STL

TNC: I want to move on and talk about the STL, Standard Template Library.

NM: Okay.

TNC: A big piece of some of the newer stuff in the draft.

NM: STL is probably the most important thing in the library, so much so that a lot of people think there isn't anything else in the library but STL.

STL is a framework. It's something that Alex Stepanov worked on for 20 years, and which he had tried to do in common LISP and ADA.

He finally found a language where it could be implemented, and that's C++, and there still aren't any other languages that can really meet the goals that STL was approaching.

TNC: I guess one way to start with the STL is that it is based on templates. Basically it describes a number of container classes and algorithms. These algorithms apply to all objects in the library. So it's really containers plus algorithms.

NM: The key feature of STL really is the iterator. And that's interesting, because what -- when Stepanov started out, he wanted to find a formalism for graph theory and the graph algorithms. And he found that there wasn't even a formalism for all the basic algorithms that we have in, say, in Knuth. Every time you want to write a sort routine, you end up having to recode it or do something inefficient like pass function pointers around.

The revolutionary feature of STL is actually this iterator formalism that allows you to define sequential access to any data structure and then define algorithms that will work on any data structure where you can define an iterator. He defined a formalism for containers, which allows any algorithm, any sequential algorithm to work on them.

And so the things that are in the standard library, the algorithms, there's a long list of algorithms. I think there's 80-something of them.

TNC: Just for our viewers, can you just give some examples of algorithms -sorting, searching?

NM: Okay. There's a sort. There's a stable sort. There are various kinds of search. There are filtering kind of algorithms that walk through a sequence and apply some function to each element in the sequence.

They're basically all the things that you find in Knuth. And, you know, as basic as copy. Here's an iterator or here's a source, here's a destination. Copy everything from here to there. And this copy function has exactly the same syntax and semantics regardless of where things come from.

Are they coming from a string or an array, or are they coming from input stream? Are you walking through a hash table? It doesn't matter.

TNC: So you can walk through a string and a hash table and all these different types of containers but the interface is the same.

NM: That's right. So you only have to learn copy once and when you see it used you always know exactly what it's doing.

And the same thing applies to sort and there's partition and there's something to build a heap. There are hundreds or dozens and dozens of these things. And you can look on the Web to find out what's in there.

NM: The most important thing about the STL is the things that are in there, the algorithms, the containers, are just examples. And so the biggest, the most frequent question we get is how come there are no hash tables.

TNC: How come there are no hash tables?

NM: How come there are no hash tables? They weren't proposed until it was too late to put them in. But you can get hash tables on the net easily enough. And probably in five or ten years there will be hash tables. But that doesn't stop you from using them, yourself.

TNC: Are there any other big holes like that or things that are really obvious that are not there?

NM: Well, you know, people complain there's no singly linked list and the list that's in the STL is a doubly linked list.

TNC: So it takes up another object space, reference space.

NM: Well, in practice the amount of space taken up by the extra back link in list doesn't really seem to cost people very much.

Somebody did some statistics and found that in real programs it might amount to a 12 percent.

TNC: And it also depends on allocation sizes and granularity of allocation.

NM: That's right. If you have a list of pointers, it's going to make a lot more difference than if you have a list of customers. The size of a customer record is going to swamp the size of a link anyway.

TNC: Now, it seems that in general, the STL did not generate as many conflicts as other aspects of the language.

NM: Yes. The STL was interesting because it was proposed very very late in the cycle, but it looked so important that it went in with remarkably little objection.

TNC: I guess that's also a testament to its design of it.

NM: It's a very good design. It has needed a great deal of fiddling to get everything right. And so we've had a lot of work to do. The most important piece of work was getting exception safety. And exception safety was probably the most important thing that's happened at the last two meetings.

There's an article about exception safety in the library in C++ Report, I think in the January issue.

TNC: Now, how about implementation of the STL? What implementations are available? And what are their genealogy?

NM: Okay. The original implementation was done by Stepanov at HP and was released into the public domain. Stepanov has moved over to Silicon Graphics and continued this policy of releasing it in the public, and SGI has continued releasing their version in the public domain.

The SGI version is the most up to date and complete and bug free. And it forms the basis for all the best commercial implementations as well.

TNC: And what are some of the better ones? Or maybe you want to tackle what are some of the worst ones?

NM: I don't want to get into too much that's invidious here. There's one done by Bill Plauger which is the basis for the Microsoft library.

TNC: That's the Dinkumware library.

NM: That's the Dinkumware library. And then there are subsets of the standard library, such as the ObjectSpace and --

TNC: I guess what I'm trying to get at, practically is there anything that a developer should avoid or, you know, maybe look at?

NM: What I would recommend is that you, is for at least the next year or so that you avoid using the one that comes with the compiler, if it does, and go directly to SGI and get theirs.

There is also in the GNU EGCS, the egcs compiler comes with an STL as well. And that's based on a version of the SGI library.

None of the commercial ones have exception safety, and none of the commercial ones have very good memory management yet.

Localization

TNC: Okay. Let's maybe move away from the STL because we only have a few minutes left.

NM: Okay.

TNC: You worked on localization and internationalization.

NM: Internationalization is a huge part of the library. But it's not very well understood. It allows a programmer to write a program, and write it once and have it be portable for users in other countries using other languages, without having to rewrite the code. And this includes dates and messages. Elements that programmers need to customize without having access to the source code.

TNC: The C library has limited localization. The locale.

NM: Yes. The locale in the standard C library is not really very usable in practice. And so that was one of the goals was to try to get something that actually was practically usable so we don't have to keep rewriting this in every program that we write.

And that was how I got into it, because I really detest the whole thing. And I figured that if I didn't put it in the standard I'd have to keep rewriting it in every program I ever worked on.

explicit

TNC: You're also responsible for the explicit keyword.

NM: Yes. In the conventional C++ approach, an implicit conversion can occur when where a constructor takes just one argument. This turns out to be a bit of a problem, because it means that it's entirely too easy to make automatic conversions. And conversions occur where they were not intended. And so explicit means, "don't make this a conversion." And I wish it was the default but...

TNC: Just a quick example, Nathan. You create a list and initialize it by passing "3" to the constructor. And that creates a list with three elements.

NM: That's right. If it wasn't declared explicit, then a function that took both a float and a list might be ambiguous.

TNC: And this has your name written all over it.

NM: I did the work on getting that into the core language.

Compilers

TNC: Compilers. What compilers are available and how do they differ in their support of the new features?

NM: Okay. There's been a lot of progress in compiler support for the new standard features. Probably the easiest compilers to get that have the best support for the language as a whole are based on the EDG front end. And you can look them up at their web site, edg.com. And they list what the compilers that use their front end.

You can buy an EDG compiler that drops into developer studio and avoid Microsoft C++ compiler crashes.

There's another compiler that's up and coming. It's the egcs compiler, which is a free product, and you can download that from Cygnus. The major feature it's lacking at this point is namespaces, but that's being fixed. It supports template overloading and partial specialization. These are some very important features.

There's a library called Blitz++ and there's a link to it on my web page, that does numeric array processing. It's a library for numeric work that allows you to write programs that are just as fast as an extreme optimizing Fortran compiler can do, without needing any special support in the compiler, itself. And this is very exciting. It means that the language design is a success because it means you can write things in libraries that in other languages have had to be folded into the compiler itself.

What else haven't we talked about? The question of what happens next with the standard and with things that are broken in the standard or things that are missing. There will be a period when defect reports are being processed. So if you find a bug, and... they're going to publish the draft international standard very soon, and you can buy it. That is probably this month and certainly, if not this month, then early next month. It has to be released before the next meeting. And if you find a mistake there, there are procedures that will be published about how to report those. And it's still possible to fix a few things.

TNC: We're going to have to wrap up, Nathan. Thanks very much for being on the show.

NM: Thanks for having me.