Efficient Representation of Shared Data

by Kimberley Burchett, September 18, 2002

Update May 15, 2003: I mentioned this article in a thread on Lambda the Ultimate, and Oleg came up with an alternate technique that is not described here, and will usually be a better solution. The technique is described here (the second of the two alternatives is the better one; the first won't work).

Introduction

There are many programming problems that involve several objects all of which want to share some data that they have in common. This document describes a technique for achieving this with minimal memory usage. To be precise, if you have N objects, the overhead will be

N + sizeof (int) + sizeof (pointer) bytes.

This compares favorably with the traditional solution, which requires

N * sizeof (pointer) bytes.

This technique can only be used in languages that permit raw pointer arithmetic. In C++, it can be completely isolated behind a regular class interface.

Traditional Technique #1

Nearly every non-trivial program requires sharing data between multiple objects. The traditional way of approaching this problem is to store a pointer to the shared data, as illustrated by the following C code:

	struct SharedData {
		/* shared data */
	};

	struct MyObject {
		SharedData *sharedData;
		/* non-shared data */
	};

This technique clearly requires overhead equal to one pointer for every object. Perhaps the most pervasive use of this technique is the implementation of C++ virtual function pointers, where the shared data is the virtual function table.

Traditional Technique #2

An alternate technique, but one which loses some flexibility, is to "reverse" the direction of the pointer, and store it in the shared data instead:

	struct SharedData {
		/* shared data */
		MyObject *objects;
		int numObjects;
	};

	struct MyObject {
		/* non-shared data */
	};

This technique is clearly more efficient in terms of memory usage; it has a fixed overhead of one pointer and one integer, amortized over several objects.

The problem with is technique that you lose the ability to refer directly to objects, since it is impossible to get from an object to its shared data. If the shared data is essential for supporting the API of the object, then the object can no longer implement its own API, and must instead rely on the shared data object to implement the API. For this reason, this technique is more appropriate for implementing containers than for true shared data.

On the other hand, this technique has an advantage that the first technique does not have: it is possible to refer to the entire collection of objects, simply by referring to the "container".

Efficient Technique

This technique has the advantages of both of the previous techniques: objects are individually addressable, and it is possible to refer to an entire collection of objects with a single pointer.

The two traditional techniques discussed above can be thought of as being variations on the following picture:

In the first traditional technique the pointers go from right to left, while in the second they go from left to right. The second traditional technique also allocates all the objects contiguously in memory, so that a single pointer is sufficient to refer to all of them.

The "efficient" technique effectively has pointers in both directions. It does this by making it possible to recover the address of the shared data, given a pointer to the non-shared data.

Memory is layed out like so:

In C:

	struct SharedData {
		/* ... shared data ... */
		int numObjects;
		ObjectData *objData;
		unsigned char objectIndices[0];
	};
	struct ObjectData {
		/* ... non-shared data ... */
	};

There are a couple tricks going on here. First, the SharedData object is variable-sized. It contains one additional byte for every object that shares it, and these bytes are given consecutive values starting at zero. This implies that a given SharedData object can have at most 256 objects referring to it (if this is a real limitation, then you can simply make the SharedData object contain a pointer to the actual shared data, which can then be reused an arbitrary number of times).

The second trick is that a "pointer to an object" doesn't actually point directly to the data for that object. Instead, it points into the objectIndices array in the SharedData object. Since the elements in the objectIndices array are given consecutive values, we can use address arithmetic to get from the object pointer to the SharedData object:

	SharedData *get_shared_data(Object *object) {
		unsigned char *index = (unsigned char *)object;
		return (SharedData *)
			(index - *index - sizeof(SharedData));
	}
	ObjectData *get_nonshared_data(Object *object) {
		unsigned char *index = (unsigned char *)object;
		return &get_shared_data(object)->objData[*index];
	}

Note that the call to get_shared_data() from within get_nonshared_data() should be inlined (either by the compiler or by hand). The extra level of indirection (from object pointer, to shared data, to object data) can also cause a small performance penalty, and should be considered when deciding whether to use the technique.

Advantages and Disadvantages

  Traditional #1 Traditional #2 Efficient
Objects individually addressable? yes no yes
Objects collectively addressable? no yes yes
Objects can access shared data? yes no yes
Objects allocatable one at a time? yes no no
Simple? yes yes no
Objects can be variable sized? yes no no
Levels of indirection 1 1 2
Memory overhead sizeof(ptr) * N sizeof(ptr) + sizeof(int) N + sizeof(ptr) + sizeof(int)

Source Code

SharedData *create_shared_data(int numObjects) {
	SharedData *result = (SharedData *)
		malloc(sizeof(SharedData)+numObjects);
	result->numObjects = numObjects;
	for (int i=0; i < numObjects; i++) {
		result->objectIndices[i] = i;
	}
	/* other initialization code ... */
}

Object *get_object(SharedData *shared, int objectIndex) {
	return (Object *)&shared->objectIndices[objectIndex];
}

SharedData *get_shared_data(Object *object) {
	unsigned char *index = (unsigned char *)object;
	return (SharedData *)
		(index - *index - sizeof(SharedData));
}

ObjectData *get_nonshared_data(Object *object) {
	unsigned char *index = (unsigned char *)object;
	return &get_shared_data(object)->objData[*index];
}

Feedback

You can email the author at: kimbly -at- kimbly -dot- com.
Back to Kimberley's Code.
Back to Kimberley's Home Page.