|
The Erwin library is a data structure template library for C and C++.
To avoid the C++ template mechanism (it should work for C, too), a Perl script generates C files that are compiled into a library.
The library is heavily used by projects of our research group and in commercial applications. It can be included in other libraries or in applications, works as a C and/or C++ and runs under Unix and Windows.
The implemented data types are vectors (dynamic array) and lists both of arbitrary element types and hash tables (maps) of arbitrary key and value types. The implementation is very efficient and also provides a sophisticated, easy to learn interface with a lot of useful functions, e.g. fprintf-like printing into vectors of char.
http://www.theiling.de/downloads/?name=erwin
http://www.theiling.de/refman/erwin2/
http://www.theiling.de/erwin-html/ (old version)
provide data structures for arbitrary key and value types in C and C++
be as type-safe as C/C++ allows
provide interoperability between C and C++ data types
provide the fastest possible implementation
hide the implementation details from the user
be extremely customisable
use as few as necessary features from C++ to be compilable on any C++ compiler and as many as needed to make life easy
have a stable interface -- don't change it with every release
have a stable implementation -- priority is on bug fixes now. New features are implemented only when they don't change the interface or implementation of existing features
allow development under Linux (or at least a GNU-infected Unix) and cross-compilation under other Unices and MS Windows and whatever with their native compilers.
make applications independent from the library (C files are generated from the templates)
never let the user get in touch with the ugly, ugly type void* and use as few type casts as possible
conform to the Linux file system standard
provide compile-time and run-time checks to warn about errors
be free for everyone
It is not designed to do the following: - be a pure C++ library not using #define
Wrt. the above goals it aims to be the ultimate implementation. :-) To achieve the goals, an instantiation mechanism is used for each data structure.
The library currently has over 60,000 lines of code where around 30,000 are for the data structures themselves, the rest for customisation, instantiation and configuration tools.
a C/C++ compiler (gcc is tested most extensively)
Perl 5 (the applications developed with Erwin do not need this)
GNU M4
autoconf, autoheader
this generates template instantiations for the data structures
a simple tool for extracting documentation from source files. It is adapted to Erwin template files. In the basic mode, it can extract documentation without relying on hot characters marking relevant comments, so you can possibly generate HTML docu from packages where no-one thought about docu when it was kicked off.
Further, this script can help you generate C++ interfaces more NULL safe and more type-safe.
Finally, it can generate a C interface for a C++ library.
generates macros in a form similar to GTKs g_return_val_if_fail. Only this provides a lot of features, e.g. printing variables in case of an error. These macros even work in the Linux kernel.
extracts identifiers of the form sym_... from source code and generates an include file for use with a symbol table implementation. This way, you can use hashed strings (=symbols) natively in C and C++.
There are NULL-safe versions of many POSIX functions found in <string.h>. This was because the author did not like that e.g. strcmp crashes in strcmp ("", NULL) on some OSs, but not on others.
There is a stable sort implementation: Global_erwin_merge_sort.
The library unifies access to library functions and language features that are different under Unix and Windos.
allow a user-defined way a data structure is copied, freed, hashed and compared
simple data types (int, char, double, pointers) can be stored un-boxed without ugly casts (e.g. void* to int or something like that)
structs and classes can directly be inserted into the structures (thus, without the need of a pointer to them, namely un-boxed)
in C++, data structures can be nested without restriction: boxed or un-boxed, hashed by the pointer or by recursive hashing the data structure, with deep-copying or flat only, etc.
there are sensible default values in order to keep customisation effort at a minimum. E.g. the library knows how to hash an int, a char, a pointer, a map of integers to char*, etc.
the data types of the user interface and the internal data types may be different. This enables complete hiding of copy/free actions.
type info is optionally available for all Erwin generated types.
types can be made deterministic (this is not automatically the case with optimally hashed maps)
C++: Dynamic String class:
FILE *f= fopen (VChar().format ("%s.txt", argv[0]), "rt");
Quotation:
FILE *f= popen ( VChar().format (FO_QUOTE_SHELL, "find . -name %s.txt", argv[0]), "r");
Iterators:
map_forall (m, k, v) { ... } map_forall_keys_sorted_by_values (m, k) { ... } vector_forall_values_ptr (v, k) { ... }
Automatically resized, arbitrary length arrays.
Some Special Features:
For vectors of char, a lot of special things are implemented, e.g., formatted printing into the vector is provided to prevent buffer overflow bugs.
This implementation has NULL-safe %s printing, and some nice features of glibc, e.g. %m (the text of the error message in errno). Of course, this is also available when compiling for other OSs.
Vectors can directly be used as heaps, so they implement priority queues. Vector_head_sort is using this to implement another sort algorithm.
hash tables
singly or doubly linked, with an element counter (faster) or without (smaller).
Please note that vectors are generally more efficient than lists. The only exception being O(1) deletion without changing the order of the list. Almost anything else is slower with lists. That's the reason why this is the youngest data structure.
wrappers around vector, map, list: a data structure header file that uses another, full implementation to implement the same structure for different types. A generated header file contains only trampoline routines encapsulating the necessary casts to invoke the corresponding functions of the real data structure implementation. E.g. to implement
vector oType='unsigned *'
an existing data structure
vector oType='int *'
can be used without any byte of code to be generated.
I had a lot of other data structures in mind, but either were they easy and fast to implement for a given purpose (e.g. union-find), or they had too many shades to be implemented generically (e.g. trees).
BTW: The name 'Erwin' comes from German 'Erweiterung' (extension) plus another syllable 'in' to make it sound good. This name has a long tradition. :-)