> Main .ColdMudDatabase  

ColdMUD's database is a good example of an ObjectDatabase using object serialization.

It has a single on-disk store, in which all objects are stored. It has a cache in memory to reduce the amount of disk I/O involved.

The disk storage component of ColdMUD is basically a block storage system, storing serialized objects and an object number key by which the object is known.

The main file on disk is the 'objects' file which contains all of the object data. This file is divided into blocks of 256 bytes. Objects are stored in a contiguous set of blocks, and only one object may be stored per block. While this leads to some wasted space, it makes it easy to read/write whole objects.

Allocation within the backing store are tracked using a bitmap.

There are 2 lookup indexes:

  • Object name to object number: This maps the programmer-visible object names like $root to the their internal counterparts (#0).
  • Object number to size/offset: This maps the object number to the offset in the flat disk file as well as the size of the object on disk.

These lookup indexes use either a DBM implementation or BerkeleyDB?. BDB support was added after it was found that there were some O(N^2) algorithms in DBM which made the creation of large databases with a million objects a very time consuming operation.

Optimizations:

Storing the size allows it to read in the entire object with a single disk read. It also provides some hinting for how large a buffer may be needed the next time the object is serialized to avoid realloc overheads.

The largest part of start-up time right now is reading in the entire lookup index for object number to size/offset to calculate the bitmap of the free space within the storage file. This is fast for small DBs, but can take up to 30 seconds or so on a 2G DB that has roughly 1.2 million objects. By storing the bitmap on disk in between invocations of the server, startup time could be kept nearly constant regardless of the size of the database.

Concerns:

Having 256 byte blocks is nice from the perspective of the MUD administrator since that means that there is relatively little disk space wated, compared to a larger block size. However, with the actual block size of the disk being 512 bytes or 1k, this can lead to extra I/O at the OS level since the OS will read in a whole block from disk, modify the 256 bytes, and then write it back. This can have similar pejorative properties for doing seeks as well. However, ColdMUD's disk backend performs well enough for all usages to date and these issues haven't caused any problems.

-- BruceMitchener - 06 Sep 2001

I've measured the actual fragmentation of the ColdMudDatabase allocation scheme, a long time ago. The blocks are allocated using next-fit algorithm (IIRC). After a while, the fragmentation stabilises at about 30-40%. This means that the stable-state database size is 1.3 times the database size after recompilation from text format.

-- MiroslavSilovic - 06 Sep 2001

TheEternalCity has been running with a fragmentation around 20% or so. But, there are 2 concerns really:

  • Fragmentation of free space, or the wasted blocks between objects.
  • Wasted space due to the blocksize and parts of blocks remaining unused.

It was the latter that I was referring to earlier. Using a blocksize that was the same as the actual blocksize that the OS uses when writing to the disk would be more optimal, but would increase the amount of space that was wasted.

-- BruceMitchener - 08 Sep 2001

ColdMUD's database can also do asynchronous backups. What this means is that the server doesn't need to halt and cease normal processing for the full duration of the backup. Instead, it halts briefly to write the objects in memory to disk and to copy the lookup database.

After that, it allows the driver to return to normal operation while the copying of the actual object files is performed. This is usually the bulk of the time spent, especially once the DB has grown past a few megabytes in size. To prevent objects being written out and corrupting the backup database, it determines where the object will be written out and checks that the area has already been backed up. If not, then it backs up that area of the database immediately, and then writes out the object.

This allows for the time that a server halts to correspond with the amount of data in memory, rather than the amount of data on disk.

-- BruceMitchener - 16 Sep 2001


   

      Revision r1.5 - 24 Nov 2001 - 08:55 GMT - BruceMitchener

   

Copyright © 2001-2002 by the contributing authors. All material on this collaboration tool is the property of the contributing authors. See CopyRight for further information.
Ideas, requests, problems regarding Agora? Send feedback.