SPECIALIZATION OF LIBRARY CODE Peter Sestoft, 2006-10-03 BACKGROUND: We have a library of generic collection classes for C#/.NET, called C5 (http://www.itu.dk/research/c5/). It comprises 27,000 lines of C# source code and 28,000 lines of unit tests. The library offers comprehensive functionality such as update event listeners, persistent (snapshottable) tree-based sets and bags, updatable slidable sublist views, hash-indexed linked lists, cached hashcodes of collections, and much more. PROBLEM: Library users pay for all this great functionality, at least in memory consumption, even when they don't use it. VISION: A user should be able to specify a specialized version of the library, or specialized versions of a class: "I want a linked list class with hash index and constant-time Count property, but no slidable views, no event listeners, no support for enumeration, and no caching of the list's hashcode". A tool should check the consistency of these choices and then generate an implementation that carries no superfluous data and performs no superfluous runtime actions. CHALLENGE: There are many techniques that can help achieve this vision (see below), but do we have the notations and tools that can do it in a practical manner, without getting in the way of maintenance and further development of the library source code? DESIRABLE PROPERTIES: - Presumably some form of annotation of the source code is needed for the tool to operate. Such annotation should not preclude maintenance and evolution of the library source code. - The consistency of such source code annotations should be checkable prior to any user choices being made. For instance, if a field is left out, it must be used nowhere in the specialized code; and if the field is included, all uses must be consistent with its type. - The library user's selection of properties should be checked for consistency, so that the generated code compiles and does not fail at run time. - It should be possible to specialize (or cut down) the unit test cases to suit a particular specialization of the library. RELEVANT TECHNIQUES: - Conditional compilation directives (#ifdef) - Backwards slicing to compute only the desired output - Partial evaluation, specialization and forwards slicing - Aspect-oriented programming - C# specific techniques: Conditional methods, partial classes - Larsen, Larsen and Wasowski's work on statechart specialization relative to restricted environments - Smaragdakis et al.'s Java type conditions -- do not reduce the memory consumption, though - Doug McIlroy's 1968 paper on "Software Components" seems to describe/dream about a scenario much like the one we want. EXAMPLE 1: SPECIALIZATIONS OF C5'S HASHED LINKED LISTS The C5 implementation of LinkedList and HashedLinkedList provides an example of the overhead and complications caused by combinations of the rich functionality. Below we show all the fields of a LinkedList instance, classifying them as follows: (c) fields support an efficient Count property (e) fields are used for event handling (f) fields support fail-early enumeration (e.g. foreach) (h) fields are used only for caching the hash code of the list (v) fields support updatable slidable sublist views. Unclassified fields support the core linked list functionality. Fields inherited from CollectionValueBase: (e) EventBlock events Fields inherited from CollectionBase: bool isSortedType bool isReadOnlyBase SCG.IEqualityComparer itemequalityComparer (h) int iUnSequencedHashCode (h) int iUnSequencedHashCodeStamp (c) int size (f) int stamp Fields inherited from SequencedBase: (h) int iSequencedHashCode (h) int iSequencedHashCodeStamp Fields declared in LinkedList itself: Node startsentinel Node endsentinel bool fifo (v) bool isValid (v) int offset (v) List underlying (v) WeakViewList> views (v) WeakViewList>.Node myWeakReference A LinkedList.Node is a reference type with three fields: T item Node prev Node next An instance of class HashedLinkedList has the following fields in addition to those inherited from CollectionValueBase, CollectionBase and SequencedBase: Fields declared in HashedLinkedList itself: HashDictionary dict (v) int taggroups Node startsentinel Node endsentinel bool fifo (v) bool isValid (v) int offset (v) List underlying (v) WeakViewList> views (v) WeakViewList>.Node myWeakReference A HashedLinkedList.Node is a reference type with five fields: T item Node prev Node next (v) int tag (v) TagGroup taggroup Hence in the user selection scenario outlined above, an implementation specialized to the user's needs can save 11 out of 19 word-size fields in each instance of hash-indexed linked list, and can save 2 out of 5 word-size fields in each item in such a list. Moreover, the runtime overhead of maintaining and checking the fields can be avoided. EXAMPLE 2: SPECIALIZATION OF C5'S TREE-BASED SETS The C5 implementation of persistent (snapshottable) tree-based sets presents another case of the memory and runtime costs of rich functionality. This overhead can be avoided if it is known in advance that a snapshot will never be made from a set instance. Below we show all the fields of a TreeSet instance (except those inherited from CollectionValueBase, CollectionBase, and SequencedBase), where (s) fields support constant-time snapshots Unclassified fields support the core tree set functionality. Fields declared in TreeSet itself: int blackdepth SCG.IComparer comparer int[] dirs (stack for iterative Add and Remove) (s) int generation (s) bool isSnapShot bool isValid Node[] path (stack for iterative Add and Remove) Node root (s) SnapRef snapList A TreeSet.Node instance has the following fields: (s) int generation T item (s) lastgeneration Node left (s) bool leftnode (s) Node oldref bool red Node right int size Hence if a tree-based set need not support snapshots, we can leave out 3 out of 19 word-size fields per tree set, and 4 out of 9 word-size fields per member of the set. Moreover, the runtime overhead of maintaining and checking the fields can be avoided. ------------------------------------------------------------