http://www.signitek.com/Articles/CPP/MFC/Contain
http://www.signitek.com/Articles/CPP/MFC/Contain
http://www.signitek.com/Articles/CPP/MFC/Contain
http://www.signitek.com/Articles/CPP/MFC/Contain
Copyright 1998, All Rights Reserved.
Draft
Comments may be sent to sh@signitek.com .
This article discusses the containers defined in the Microsoft Foundation Class Library and the tecniques to serialize (save and restore) them to and from flat files.
Normally when programmers discuss the MFC (Microsoft Foundation Class) library, they think of it as a means for implementing graphical user interfaces. In this article, however, we are going to be exploring two important aspects of MFC that are of interest to a broader set of programmers: containers and serialization. Containers store objects (such as strings, dates, rectangles and inventory items) and serialization allows to implement a vital feature in our programs: the ability to modify and save our data restored from the previous session.
MFC offers two approaches: the newer type safe template based containers that can store objects directly and the legacy approach of necessarily storing pointers to ancestor classes. This article will focus largely on the former approach. See the side bar at the end of this article for a comparison of the two approaches.
The focus of this article is the remarkable capabilities and dreadful perils of template based container classes and their serialization. These are advanced topics. To accommodate any new comers however, we will initially digress enough to provide a minimal review of the basics. Then after you listen to me bemoan the pitfalls, we will identify how they might be overcome by creating descendant classes. Finally, we will see the true and unique power of MFC: to easily store and retrieve arbitrarily complex data structures that may contain instances of ancestors as well as descendants.
MFC, like the STL (standard template library) implements the common structures: maps, lists and vectors. I'm hoping vectors and lists are familiar terms: a vector is a set of cells in memory that can be indexed with an integer. The favorite implementation of this class is the array feature of the underlying C language. Unfortunately, Microsoft seems to use the two terms vector and array interchangeably and to be consistent with Microsoft, I will abandon any further use of the term vector. A list is another set elements whose favorite implementation (and only implementation in MFC) is a doubly linked list. While inserting in the middle of a linked list is much more efficient than inserting in the middle of an array, linked lists cannot be indexed as efficiently as arrays.
So what is a map? I like to think of maps as two column tables where the first column is an optimized index for uniquely identifying elements in the second column. My favorite implementation of a map is a tree because that keeps my rows in alphabetical order. However, Microsoft uses a hash table instead for speed.
Class library designers face dilemma when using the template feature to implement classes. Should they force the user to always pass by reference or force them to pass by value? Needless to say, the latter is not acceptable for text strings for which we like to avoid superfluous copy operations - they are just too expensive. Microsoft resolves this dilemma with additional template arguments. The latter argument tells the compiler how to pass instances of the element. For example:
001 #include <wtypes.h> 002 #include <afxtempl.h> 003 CArray<WORD, WORD> x; /* an array of unsigned short integers passed by value*/ 004 CList<CString, CString&> y; /* a doubly linked list of text strings*/
Listing 1
Microsoft is allowing us to shave off a few nanoseconds by passing our integers by value instead of by reference. Later we will find that passing by value is rarely attractive.
Here is a two column table where the first column is a string that is used to uniquely identify an unsigned short integer:
CMap<CString, WORD, CString&, WORD&> z;
Personally, I find these extra template parameters very cumbersome - especially when I start nesting maps inside of maps.
Unfortunately, for the above to work, we have to define our own hashing function because their generic hashing function does not accommodate CString. I'm dismayed by this: one would think that Microsoft would supply a hashing function for class CString since that is one of the most popular data types to use with a CMap. (Actually, there is such a hashing function for char*, but that has some problems so I always end up using my hashing function for CString).
001 template<>
002 inline UINT AFXAPI HashKey<CString&>(CString& key){
003 DWORD sum = 0;
004 for(int ii = key.GetLength()-1; ii >= 0; ii--)
005 sum+= key[ii];
006 return ((UINT)(void*)(DWORD)sum) >> 4;
007 }
Listing 2
Let's start with a little example using class Inventory Item or II for short. From this we can implement a recipe as a map, array or list of inventory items. Note that it would be is implemented with pointers which will cause problems that we will address in a moment. In the mean time, let's discuss how we may save, restore and display a recipe as implemented with an array, list or map.
Let's take a simple example: we want to implement recipes as collections of inventory items. As it turns out, we don't need a class recipe, we can just use the MFC container classes. All we need is a simple inventory item class:
001 struct II {
002 II(const char* d, WORD a): amount(a){ strcpy(desc,d); }
003 char desc[60];
004 WORD amount;
005 };
006 inline ostream& operator<<(ostream& os, const II& ii){
007 return os << desc << " " << amount << " ";
008 }
009
Listing 3
Take special note that we are not using any pointers here. I dislike pointers and try to avoid them when possible as they make our code much more complex and often slower.
Note further we are not using int. Why? Well, suppose we want to exchange data between a 16 bit and 32 bit operating system? int is ambiguous. By using Microsoft's definition of WORD, we are assured to always get exactly 16 bits of unsigned integer. For example, when we eventually switch to 64 bit C++ compilers, there will be some uncertainty as to how to generate a 16 bit unsigned integer. Microsoft assures us that they will adjust the header files so WORD will always generate a 16 bit unsigned integer.
Now, traversing arrays is easy: you start at zero and go to the length:
001 template<class V>
002 inline ostream& operator<<(ostream& os, const CArray<V, V&>& a){
003 os << "{";
004 for(int ii = a.GetUpperBound(); ii >=0; ii--)
005 os << a[ii];
006 return os << "}";
007 }
008 CArray<II, II&> chili;
009 chili.Add(II("beans", 2));
010 chili.Add(II("tomatoes", 3));
011 chili.Add(II("onions", 6));
012 chili.Add(II("beef", 6));
013 // display
014 cout << chili << endl;
Listing 4
Lists are only slightly less obvious:
001 template<class T>
002 inline ostream& operator<<(ostream& os, const CList<T,T&>& l){
003 POSITION ii = l.GetHeadPosition();
004 while(ii)
005 os << l.GetNext(ii) << " ";
006 return os;
007 }
008 CList<II, II&> chili;
009 chili.AddTail(II("beans", 2));
010 chili.AddTail(II("tomatoes", 3));
011 chili.AddTail(II("onions", 6));
012 chili.AddTail(II("beef", 6));
013 // display
014 cout << chili << endl;
Listing 5
Maps are a little less obvious:
001 template<class K, class V>
002 inline ostream& operator<<(ostream& os, const CMap<K, K&, V, V&>& m){
003 K k; V v;
004 POSITION kk = m.GetStartPosition();
005 while(kk){
006 m.GetNextAssoc(kk, k, v);
007 os << "[" << k << "] "<< v << " " << endl;
008 }
009 return os; // <<"}";
010 }
011 CMap<LPCSTR, LPCSTR&, WORD, WORD&> chili;
012 chili["beans"] = 2;
013 chili["tomatoes"] = 5;
014 chili["onions"] = 6;
015 chili["beef"] = 6;
016 // display
017 cout << chili << endl;
Listing 6
Personally, the CMap class is my favorite. It requires less code to do the same job because I don't have to define a class calledII.
At this point, saving any one of these definitions is easy.
001 CFile file("data",
002 CFile::modeCreate|CFile::modeWrite);
003 CArchive out(&file, CArchive::store);
004 chili.Serialize(out);
Listing 7
Reading is just as easy.
001 CFile in("data", CFile::modeRead);
002 CArchive ar(&in, CArchive::load);
003 chili.Serialize(ar);
Listing 8
So what is this archive class? Do we need it? Well, if we were re-inventing MFC, we could allow programmers to write directly to the CFile class and thus abandon class CArchive. Later, however, we shall find merit in MFC's decision to have a separate class CArchive.
Well, so far so good. It is easy so long as our maps or arrays don't contain elements that are implemented with pointers. The CArray::Serialize, CMap::Serialize and CList::Serialize functions call a generic function called SerializeElements (provided by MFC) which merely traverses the list and performs a sizeof operation on each element (or, in the case of a map, each element and each key) and performs a verbatim write. The technique we have described thus far will be called Strategy #1.
This strategy, of course, works for chars and WORDs but does not work well if there are pointers to be dereferenced. Writing RAM pointers to disk is not very effective. An exception is class CString because MFC provides a special instance of the template function SerializeElements to appropriately dereference the pointer used to implement a CString object.
So let's define Strategy #2 and consider container elements implemented with pointers. To do that, however, we must first learn how to serialize a scalar. So lets get a little more realistic and modify our definition.
001 struct II : public CObject{
002 DECLARE_SERIAL();
003 II(const CString& d, WORD a): desc(d), amount(a){}
004 CString desc;
005 WORD amount;
006 void Serialize(CArchive& ar){
007 if(ar.IsLoading()) ar >> desc >> amount;
008 else ar << desc << amount;
009 }
010 };
011 IMPLEMENT_SERIAL(II,CObject);
012 void SerializeElements(CArchive& ar, II* pE, int nCount){
013 for(int ii = 0; ii < nCount; ii++)
014 pE[ii].Serialize(ar);
015 }
Listing 9
As you can see, we are providing our own SerializeElements function. It is unfortunate that the default version provided by MFC does not call the member function Serialize!
But how, you ask, do these changes solve our problem? Why could we not have previously had a data member of type CString? The reason is MFC, by default, does not dereference the pointer internal to instances of CString and we would never store the actual characters in the file - just a useless RAM pointer instead.
I'm expecting someone to ask about the serialize function at this point. Clearly to serialize an instance of II we call a virtual Serialize function. But why don't we call a serialize function for CString and WORD? The answer is simple: there are no Serialize member functions for these types. On lines 7 and 8 of listing 9, we use the overload shift operators provided by MFC to the fundamental types (which includes CString).
So Strategy #2 requires we write a lot more code to accommodate pointers. This is one reason I like to avoid pointers.
This concludes our brief review of the basics. I refer you to the online help for more details. Let's consider some of the remarkable capabilities of MFC and some resolutions of some truly sinister bugs.
Clearly there is a disadvantage to our approach storing objects in our array cells. What if we have a descendant of class II? Can instances of a descendant of class II be stored amongst the II objects in the same map, list or array? No! Not using the present technique. We might be tempted to use pointers instead:
001 CList<II*, II*> chili;
002 CFile in("data", CFile::modeRead);
003 CArchive ar(&in, CArchive::load);
004 chili.Serialize(ar);
Listing 10
Now we can not only store objects of class II, but any descendant of class II. However, we have just introduced two bugs, one is extremely subtle. If you don't see either bug, consider the following piece of code which has one of the same problems:
001 void readMoney(float x){
002 cin >> x;
003 }
004 main(){
005 float money;
006 cout << "How much money do you have?";
007 readMoney(money);
008 cout << money;
009 return 0;
010 }
Listing 11
Can you see the common bug in these two fragments? Remember that the second argument to the template classes, in the above definition of chili tells us to always pass by value where possible. When we read object chili or object x, we are passing by value and making a copy so the actual read operation only modifies the copy (which is immediately destroyed) instead of the original. The resolution is cryptic but simple:
001 CList<II*, II*&> chili;
Listing 12
Here we are passing pointers by reference. (This may be a little confusing to the beginner but necessary). What does this mean? It means that on the very rare occasions that you might want to pass, say WORDs by value instead of by reference to shave off a few nanoseconds, you would still want to pass by reference so you could restore them from a file.
So let's consider the second bug: When we store ingredients in our chili object, are we going to store floats and character strings, or pointers to II objects? Well, given the above code, we would be storing pointers values. Trying to restore such an object would be disastrous! The problem, of course, is that the MFC implementation of the function SerializeElements does not dereference pointers. However, we could supply such a function that does dereference pointers and our problems would be solved:
001 void SerializeElements(CArchive& ar, II** ppE, int nCount){
002 for(int ii = 0; ii < nCount; ii++)
003 if(ar.IsStoring())
004 ar << ppE[ii];
005 else
006 ar >> ppE[ii];
007 }
008
Listing 13
Using this approach, our sample code looks like this now:
001 CList<II*, II*&> chili;
002 // Read
003 {
004 CFile in("recipes.dat", CFile::modeRead);
005 CArchive ar ( &in, CArchive::load);
006 chili.Serialize(ar);
007 }
008 //Populate
009 chili.AddTail(new II("beans", 2));
010 chili.AddTail(new II("tomatoes", 3));
011 chili.AddTail(new II("Bean pot", 3));
012 // Write
013 {
014 CFile out("recipes.dat",
015 CFile::modeCreate|CFile::modeWrite);
016 CArchive ar(&out, CArchive::store);
017 chili.Serialize(ar);
018 }
019 // Delete
020 POSITION ii = chili.GetHeadPosition();
021 while(ii)
022 delete chili.GetNext(ii) << " ";
Listing 14
There are lots of other reasons I don't like pointers. For example, I like to minimize the number of calls to new and delete because they are extremely slow compared to memory allocated statically or on the stack. (Using pointers often means you are calling new and delete). Another problem is that lines 24-26 (of listing 14) cannot be implemented in a destructor. Why cannot we have put 24-26 in destructor? Let's focus on lines 9 and 10 of listing 14 as we consider an example:
001 CList<II*, II*&> chili, burritos;
002 //Populate
003 chili.AddTail(new II("beans", 2));
004 burritos.AddTail(new II("beans", 4));
005 // Use the same bean pot twice
006 II* pot = new II("Bean pot", 3);
007 chili.AddTail(pot);
008 burritos.AddTail(pot);
009 // Write
010 {
011 CFile out("recipes.dat",
012 CFile::modeCreate|CFile::modeWrite);
013 CArchive ar(&out, CArchive::store);
014 buritos.Serialize(ar);
015 chili.Serialize(ar);
016 }
017 // Delete
018 POSITION ii = chili.GetHeadPosition();
019 while(ii)
020 delete chili.GetNext(ii);
021 // Delete
022 POSITION jj = burritos.GetHeadPosition();
023 while(jj)
024 delete burritos.GetNext(jj);
Listing 15
Do you see the problem here? Both recipes share the bean pot and in line 22, we erroneously delete it a second time. Had we put lines 18-20 in a destructor to a descendant of class CList, we would have to be very careful to not commit the same offense. As a matter of general principle then, containers of pointers generally require that you manually insert the code to delete the objects. Had we not been using pointers and storing objects directly, we would not need a destructor to delete the inventory items. (Of course, a more sophisticated approach would be to completely encapsulate the pointers in a reference counting class for which copy constructors and overloaded assignment operators would bump the usage count for that object and destructors would decrement the usage count -- but that is beyond the scope of this article).
However, having an element be contained by two different objects (such as recipe burritos and recipe chili) can be very powerful. For example, we might need to model the fact that both recipes might simultaneously share the same pot or, both recipes might need to be scheduled to avoid using the same bean pot at the same time because the pot is not big enough. Fortunately, MFC recognizes this in its implementation of serialization.
001 CList<II*, II*&> chili, burritos;
002 // Read
003 {
004 CFile in("recipes.dat", CFile::modeRead);
005 CArchive ar ( &in, CArchive::load);
006 buritos.Serialize(ar);
007 chili.Serialize(ar);
008 }
Listing 16
In this example, we again only have a single bean pot object shared between the two recipes and MFC correctly serializes this once when writing it out and correctly reconstructs a shared bean pot when reconstructing (deserializing) the lists.
So you have heard me complain about pointers. Yeah, I don't like them: they make are code slow, buggy and complex. But I have to admit, you cannot really exploit inheritance without them. (Actually, pointers are tolerable if they are encapsulated as sugggested previously). And you will be hearing me complain more about pointers and MFC. But presently, lets take a look at where MFC really shines: storing instances of descendant classes.
So suppose you store pointers to objects instead of objects and you implement the Serialize, SerializeElements functions correctly. Then suppose you need to enhance some of your objects with additional data members. What now? Well you are in luck!
001 CObject* p = new II;
002 CFile out("inventory.dat",
003 CFile::modeCreate|CFile::modeWrite);
004 CArchive ar(&out, CArchive::store);
005 p->Serialize(ar);
006 delete p;
Listing 17
Now, six months later, you want to read the object back. You wisely use pointers!
001 CObject* p = 0;
002 CFile in("recipes.dat", CFile::modeRead);
003 CArchive ar ( &in, CArchive::load);
004 p->Serialize(ar); // Ooops! Error!
Listing 18
So why was it so wise to use pointers? Because one year later, we don't know if there is still only an II object, or a descendant of II that requires room for many additional data members. But the above code doesn't work! Here's the solution:
001 CObject* p = 0;
002 CFile in("recipes.dat", CFile::modeRead);
003 CArchive ar ( &in, CArchive::load);
004 ar >> p;
Listing 19
Ah Hah! Magic! Now we see the power of the CArchive class! It stored the class information with the object so even if there was a descendant of class II stored in the file, we could continue to treat it as an instance of II. MFC allows us to implement a key aspect of object-oriented programming sometimes known as the Liskov Substitution Principle: an instance of any descendant can be treated as an ancestor. This was no simple feat on behalf of MFC.
Oh, and while we are here, lets ask that question again: "When do we use the serialize member function and when do we use the overloaded extraction operator?" Simple: In listings 18 and 19 there was no serialize member function because there was no object to be a member of! Therefor, we use the extraction operator which passes the pointer by reference.
When I teach C++, I am very emphatic that copy constructors and assignment operators should be used as needed. So another reason I like to avoid pointers is that I can often avoid the extra code for copy constructors and overloaded assignment operators. In our examples in listings 3 and 9 we carefully chose data members that were not implemented (directly) with pointers. This eliminates the need for us to implement copy constructors and assignment operator. Careful examination of line 5 of listing 5 and line 6 of listing 6 reveals that MFC will make copies of our objects and if we don't want shallow copies, we may need to implement copy constructors and assignment operators. I dearly regret this takes a very sharp eye.
Well, I wish that the authors of MFC had attended my C++ class. While MFC requires that we class users implement copy constructors and assignment operators for the classes to be stored in the MFC containers, guess who forgot to implement copy constructors and assignment operators themselves!
Let's consider these examples:
001 CMap<WORD, WORD, CMap<WORD, CPoint>, CMap<WORD, CPoint>&> picture, p2; 002 CList<CList<CString, CString&>, CList<CString, CString&>&> tasks, t2; 003 CArray<CArray<Piece, Piece&>, CArray<Piece, Piece&>&> chessboard, b2;
Listing 20
Aside from being a bit tedious, these definitions don't work. Even if we could compile and execute listing 20, would listing 21 work? No!
001 picture = p2; 002 tasks = t2; 003 chessboard = b2;
Listing 21
This ability to nest containers is extremely useful. How can we overcome these limitations? I tried implementing the CopyElements function, but this did not solve my problem.
A solution that did work was to create descendant classes List, Array and Map that have copy constructors and assignment operators implemented. And, since I really wasn't impressed with all of those superfluous template arguments, I did away with them too. Listing 22 shows an example of using my new classes.
001 Map<WORD, Map<WORD, CPoint> > picture, p2; 002 List<List<CString> > tasks, t2; 003 Array<Array<Piece> > chessboard, b2; 004 picture = p2; 005 tasks = t2; 006 chessboard = b2; 007 CFile f(...); CArchive ar(&f, ...); 008 p2.Serialize(ar); t2.Serialize(ar);
Listing 22
Incidentally, the Standard Template Library (STL) allows one to implement nested containers as well. However, I ran into difficulties when going beyond two dimensions.
Let's suppose we want to store airline routes where row gives the origin, the column gives the destination and layer gives the airline name.
Layer 1
Layer 2
What happens if we want to test to see if a certain route exists? Here is one approach that does not work!
001 Map<CString,Map<CString,Map<CString, WORD> > > routes;
002 if(routes["SetOPants"]["Timbuktu"].Lookup( "Katmandu",m)){...}
Listing 23
Well this does not work because it will add a row for Timbuktu even if it does not exist. Here is the ugly solution:
001 Map<CString, WORD> z;
002 Map<CString, Map<CString, WORD> > y;
003 Map<Cstring, Map<CString, Map<CString, WORD> > > x;
004 if(routes.Lookup("SetOPants", x)){
005 if(routes["SetOPants"].Lookup("Timbuktu",y))
006 if(routes["SetOPants"]["Timbuktu"].Lookup(
007 "Katmandu",z))
008 cout << routes["SetOPants"]["Timbuktu"]
009 ["Katmandu"] << endl;
010 }
011 else cout << "No such route"<<endl;
Listing 24
This is not only ugly, it's slow too!
Listing 25 is a prettier solution, but it requires some enhancements to our descendant class.
001 try{
002 const Map<CString, Map<CString, Map<CString, DWORD> > >& tmp=route;
003 cout << "Distance is " <<
004 tmp["SetOPants"]["Timbuktu"]["Katmandu"] << endl;
005 }
006 catch(KeyNotFound&){
007 cout << "No such route." << endl;
008 }
Listing 25
Here we are creating a constant reference to an existing object. We should be able to access this like the original since it is the same type. However, should we try to index into a layer, row or column that does not exist, we should get an exception or some sort of error. Unfortunately, we do not. However, listing 25 shows how we can enhance our descendant class to correct this problem.
Technically, listing 25 is illegal because exceptions are reserved for exceptional cases, not code in the critical path. I believe this is to give compiler vendors an excuse to not optimize exception handling code. Therefor, we might be doomed to using the technique in listing 24 if we are concerned about this. Personally, however, I find the ugliness of listing 24 making me lean to towards listing 25 - especially when you consider how slow listing 24 is with all of its superfluous copies. Listing 26 shows how listing 25 can be implemented:
001 struct KeyNotFound{};
002 template <class K, class V>
003 class Map : public CMap<K,K&,V,V&>{
004 public:
005 V& operator[](const K& k){ return CMap<K,K&,V,V&>::operator[]((K&)k); }
006 const V& operator[](const K& k)const{
007 static V v;
008 if( !Lookup((K&)k, v) )
009 throw KeyNotFound();
010 return v;
011 }
012 ...
013 };
Listing 26
Note that function on line 5. It looks superfluous but it is necessary to accommodate the fact that C++ requires all overloading must be done on the same generation.
Needless to say, there is a price to pay for this feature. On line 5 I am using a static variable, and this is not thread safe. However, one could use the new __declspec(thread) feature (a Microsoft specific extension to the C language) to make line 8 of listing 26 thread safe. Another option would be to remove the static declaration and remove the first const from line 6. This could be very expensive, however.
Now we see the true power of MFC's serialization for both the legacy inheritance approach and the new template based approach: We can arbitrarily nest maps inside of maps inside of lists inside of arrays ad infinitum and save and restore our data with a single and simple call to the serialize member function. This is wonderful and I think it makes all our pain worthwhile. Hopefully, some library like STL will become mainstream and provide the serialization feature in a much less perilous way. In the meantime however, MFC will be the choice of tools for many developers.
Well, nesting containers inside of containers is very powerful but there is another problem: those copy constructors and assignment operators we had to add to our descendants Map, List, Array really slow us down - especially when we have three and four dimensional containers. The culprits are, of course, our old friends:
001 template<class K, k, V, v> 002 POSITION CMap::GetNextAssoc(POSITION&, k, v); 003 template<class V, v> 004 V CList::GetNext(POSITION&);
Listing 27
The fact that these require superfluous copies be made can really slow us down. So what is the solution? Enhance the descendant classes to include these overloaded functions:
001 template<class K, V> 002 POSITION Map::GetNextAssoc(POSITION&, K*, V*); 003 template<class V, v> 004 V* List::GetNext(POSITION&);
Listing 28
Using this approach of passing by pointers instead of by reference, these functions need not call the assignment operator or the copy constructor to create superfluous copies. Needless to say, when you implement these functions you will have to access the protected data members and hope that MFC never changes the implementation of CList and CMap. Fortunately, MFC did not declare any of the data members to be private so this approach is quite feasible.
Well, it's been a long and strange trip to the land of MFC serialization and containers. Clearly, there are things Microsoft could have done better. You've heard me rant and rave. MFC has some serious shortcomings. Clearly, other class libraries such as the Standard Template Library (STL) do many things better. However, one feature keeps me coming back to MFC: serialization. STL does not have it. Other class libraries that do implement serialization must be purchased and do not have the market share that Microsoft does.