狭路相逢勇者胜: Inside the C++ Object Model

Inside The C++ Object Model by Stanley B. Lippman

Chapter 1. Object Lessons
Chapter 2. The Semantics of Constructors
Chapter 3. The Semantics of Data
Chapter 4. The Semantics of Functions
Chapter 5. The Semantics of Construction, Destruction, and Copy

Chapter 1. Object Lessons

Stroustrup's original (and still prevailing) C++ Object Model is derived from the simple object model by optimizing for space and access time. Nonstatic data members are allocated directly within each class object. Static data members are stored outside the individual class object. Static and nonstatic function members are also hoisted outside the class object. Virtual functions are supported in two steps:

A table of pointers to virtual functions is generated for each class (this is called the virtual table).

A single pointer to the associated virtual table is inserted within each class object (traditionally, this has been called the vptr). The setting, resetting, and not setting of the vptr is handled automatically through code generated within each class constructor, destructor, and copy assignment operator. The type_info object associated with each class in support of runtime type identification (RTTI) is also addressed within the virtual table, usually within the table's first slot.

The data members within a single access section are guaranteed within c++ to be laid out in the order of their declaration. The layout of data contained in multiple access sections, however, is left undefined. Similarly, the layout of data members of the base and derived classes is left undefined.

Polymorphism, the potential to be of more than one type, is not physically possible in directly accessed objects. Only the indirect manipulation of the object through a pointer or reference supports the polymorphism necessary for OO programming. The actual type of the object addressed is not resolved in principle until runtime at each particular point of execution. A pointer and a reference support polymorphism because they do not involve any type-dependent commitment of resources.

The memory requirements to represent a class object in general are the following:
1. The accumulated size of its nonstatic data members
2. Plus any padding (between members or on the aggregate boundary itself) due to alignment constraints (or simple efficiency)
3. Plus any iinternally generated overhead to support the virtuals.

The memory requirement to represent a pointer, however, is a fixed size regardless of the type it addresses.
ZooAnimal * px;
int * pi;
Array <> *pta;

In terms of memory requirements, there is generally no difference: all three need to be allocated sufficient memory to hod a machine address (usually a machine word). The type of a pointer instructs the compiler as to how to interpret the memory found at a particular address and also just how much memory that interpretation should span.

What address space does a void* pointer that holds memory span? We don't know. That's why a pointer of type void* can only hold an address and not actually operate on the object it addresses.

So a cast in general is a kind of compiler directive. In most cases, it does not alter the actual address a pointer contains. Rather, it alters only the interpretation of the size and composition of the memory being addressed.

The compiler intercedes in the initialization and assignment of one class object with another. The compiler must ensure that if an object contains one or more vptrs, those vptr values are not initialized or changed by the source object.

Any attempt to alter the actual size of the object, however, violates that contracted resource requirements of its definition. When a base class object is directly initialized or assigned with a derived class object, the derived object is sliced to fit into the available memory resources of the base type.

Chapter 2. The Semantics of Constructors (重点研读)

The keyword explicit, in fact, was introduced into the language in order to give the programmer a method by which to suppress application of a single argument constructor as a conversion operator. "Behind the back" type of activities are much more likely to occur in support of memberwise initialization or in the application of what is referred to as the named return value optimization (NRV).

2.1 Default Constructor Construction

"Default constructors...are generated (by the compiler) where needed..." Needed by whom? To do what?

Global objects are guaranteed to have their associated memory "zeroed out" at program start-up. Local objects allocated on the program stack and heap objects allocated on the free-store do not have their associated memory zeroed out; rather, the memory retains the arbitrary bit pattern of its previous use.

When is a default constructor synthesized, then? Only when the implementation needs it.

The standard states the following:

If there is no user-defined constructor for class X, a default constructor is implicitly declared... A constructor is trivial if it is an implicitly declared default constructor...

A nontrivial default constructor is one that is needed by the implementation and, if necessary, is synthesized by the compiler. There are four conditions under which the default constructor is nontrivial.

1) Member Class Object with Default Constructor
2) Base Class with Default Constructor
3) Class with a Virtual function
4) Class with a Virtual Base Class

1) Member Class Object with Default Constructor

If a class without any constructors contains a member object of a class with a default constructor, the implicit default constructor of the class is nontrivial and the compiler needs to synthesize a default constructor for the containing class.

class Foo { public: Foo(), Foo(int) ...};
class Bar { public: Foo foo; char *str;};
void foo_bar()
{
Bar bar; // bar::foo must be initialized here...
if ( str ) { } ...
}

The synthesized default constructor contains the code necessary to invoke the class Foo default constructor on the member object Bar::foo, but it does not generate any code to initialize Bar::str. Initialization of Bar::foo is the compiler's responsibility; initialization of Bar::str is the programmer's.

inline Bar::Bar()
{
//Pseudo C++ code
foo.Foo::Foo();
}

The sythesized default constructor meets only the needs of the implementation, not the needs of the program. If the programmer provides for the initialization of str via the following default constructor:

//Programmer defined default constructor
Bar::Bar () { str = 0;}

Because the default constructor is explicitly defined, the compiler cannot synthesize a second instance to do its work. In this case, the compiler augments the existing constructors, inserting code that invokes the necessary default constructors prior to the execution of the user code.

//Augment default constructor
Bar::Bar ()
{
foo.Foo::Foo(); //augmented compiler code
str = 0; //explicit user code
}

For multiple class members requiring constructor initialization, the compiler will insert code within each constructor, invoking the associated default constructors for each member in the order of member declaration. This code is inserted just prior to the explicitly supplied user code.

2) Base Class with Default Constructor

The synthesized default constructor of the derived class invokes the default constructor of each of its immediate base classes in the order of their declaration.

What if the designer provides multiple constructors but no default constructor? The compiler will augment each constructor with the code necessary to invoke all required default constructors.

3) Class with a Virtual function

There are two additional cases in which a synthesized default constructor is needed:
3.1 The class either declares (or inherits) a virtual function
3.2 The class is derived from an inheritance chain in which one or more base classes are virtual.

The following two class "augmentations" occur during compilation:

1. A virtual function table is generated and populated with the addresses of the active virtual functions for that class. 如何判定一个方程是 active的呢？

2. Within each class object, an additional pointer member (the vptr) is synthesized to hold the address of the associated class vtbl.

widget.flip()
{
( * widget.vptr[1]) (&widget);
}

In classes that do not declare any constructors, the compiler synthesizes a default constructor in order to correctly initialize the vptr of each class object.

4) Class with a Virtual Base Class

We need to make the virtual class location within each derived class object available at runtime. All reference and pointer access of a virtual base class is achieved through the associated pointer. For each constructor the class defines, the compiler inserts code that permits runtime access of each virtual base class.

There are four characteristics of a class under which the compiler needs to synthesize a default constructor for classes that declare no constructor at all. The Standard refers to these as implicit, nontrivial default constructors. The synthesized constructor fulfills only an implementation need. It does this by invoking member object or base class default constructors or initializing the virtual function or virtual base class mechanism for each object. Classes that do not exhibit these characteristics and that declare no constructor at all are said to have implicit, trivial default constructors. In practice, these trivial default constructors are not synthesized.

Within the synthesized default constructor, only the base class subobjects and member class objects are initialized. All other nonstatic data members, such as integers, pointers to integers, arrays of integers, and so on, are not initialized. These initializations are needs of the program, not of the implementation. If there is a program need for a default constructor, such as initializing a pointer to 0, it is the programmer's responsibility to provide it in the course of the class implementation.

Programmers new to C++ often have two common misunderstandings:

That a default constructor is synthesized for every class that does not define one

That the compiler-synthesized default constructor provides explicit default initializers for each data member declared within the class

As you have seen, neither of these is true.

2.2 Copy Constructor Construction

copy constructor is a constructor requiring a single argument of its own class type. X::X (const X &);

There are three program instances in which a class object is initialized with another object of its class.

1) an object's explicit initialization
class X {...};
X x;
X xx = x; //explicit initialization of one class object with another

2) an object is passed as an argument to a function

3) when a function returns a class object.

This may result in the generation of a temporary class object or the actual transformation of program code (or both).

Default memberwise Initialization

What if the class does not provide an explicit copy constructor?
default memberwise initialization copies the value of each built-in or derived data member from the one class object to another. A member class object, however, is not copied; rather, memberwise initialization is recursively applied.

In practice, a good compiler can generate bitwise copies for most class objects since they have bitwise copy semantics.

The standard states that:
A class object can be copied in two ways, by initialization and by assignment. Conceptually, these two operations are implemented by a copy constructor and a copy assignment operator.

The standard distinguishes between a trival and nontrivial copy constructor. It is only the nontrivial instance that in practice is synthesized within the program. The criteria for determining whether a copy constructor is trivial is whether the class exhibits bitwise copy semantics. A default copy constructor need not be synthesized, if the declaration exhibits bitwise copy semantics, and the initialization need not result in a function call.

There are four instances when bitwise copy senmantics NOT exhibited by a class:
1) When the class contains a member object of a class for which a copy constructor exists (either explicitly declared by the class designer or synthesized by the complier).
2) When the class is derived from a base class for which a copy constructor exists
3) When the class declares one or more virtual functions 需要正确地初始化vptr
4) When the class is derived from an inheritance chain in which one or more base classes are virtual.

Resetting the Virtual Table Pointer: 当用派生类显式初始化基类的时候，需要正确设置 vptr,使得指向基类的virtual table.

Handling the Virtual Base Class Subobject: The problem is not when one object of a class is initialized with a second object of the same exact class. It is when an object is initialized with an object of one of its derived classes.

2.3 Program Transformation Semantics

Application of the copy constructor requires the compiler to more or less transform portions of your program.

Explicit Initialization

X x0;
void foo_bar()
{
X x1 (x0);
X x2 = x0;
X x3 = x(x0);
}

The required program transformation is two-fold:
1. Each definition is rewritten with the initialization stripped out
2. An invocation of the class copy constructor is inserted

void foo_bar()
{
X x1;
X x2;
X x3;

// Compiler inserted invocations of copy constructor for X
x1.X::X(x0);
x2.X::X(x0);
x3.X::X(x0);
}

Argument Initialization

The standard states that passing a class object as an argument to a function ( or as that function's return value) is equivalent to the following form of initialization:

X xx = arg;

where xx represents the formal argument (or return value) and arg represents the actual argument.

void foo (X x0);
X xx;
//...
foo (xx);

One implementation strategy is to introduce a temporary object, initialize it with a call of the copy constructor, and then pass that temporary object to the function. The previous code fragment would be transformed as follows:

X __temp0;
__temp0.X::X(xx);
foo (_temp0); // the declaration of foo() must also be transformed as void foo( X& x0);

Another implementation is to copy construct the actual argument directly onto its place within the function's activation record on the program stack.

Return Value Initialization

X bar()
{
X xx;
// process xx ...
return xx;
}

How might bar()'s return value by copy constructed from its local object xx? Stroustrup's solution in cfront is a two-fold transformation:
1. Add an additional argument of type reference to the class object. This argument will hold the copy constructed "return value."
2. Insert an invocation of the copy constructor prior to the return statement to initialize the added argument with the value of the object being returned.

void bar (X& __result)
{
X xx;
xx.X::X(); // compiler generated invocation of default constructor
//...processing xx
__result.X::X(xx); //compiler generated invocation of copy constructor
return;
}

Optimization at the Compiler Level

Named Return Value (NRV) optimization: substituting the __result for the named return value xx. (All return statements return the same named value)

void bar(X& __result)
{
__result.X::X(); // We save a copy constructor
return;
}

X xx = bar(); will be transformed into:
//note: no default constructor applied
X xx;
bar(xx);

bar().memfunc(); will be transformed into:
X __temp0; // compiler generated temporary
(bar(__temp0), __temp0).memfunc();

The NRV optimization is now considered an obligatory Standard C++ compiler optimization. The presence of the copy constructor will "turn on" the NRV optimization within the C++ compiler.

The following three initializations are semantically equivalent:

X xxx0(1024);
//the following two result in two constructor invocations, a temporary object, and a call to the destructor of class X on the temporary object.
X xxx1 = X(1024);
X xxx2 = (X) 1024;

The second and third will be tranformed into:
X __temp0;
__temp0.X::X(1024);
xxx1.X::X(__temp0);
__temp0.X::~X();

The copy constructor: To Have or To Have Not?
If the default trivial copy constructor is efficient, there is no need to provide your own copy constructor. If you envision returning by value, then it makes excellent sense to provide an explicit inline instance of the copy constructor - that is, provided your compiler provides the NRV optimization.

狭路相逢勇者胜

Sunday, June 10, 2007

Inside the C++ Object Model

No comments:

Blog Archive

Labels

Books