Thursday, January 22, 2015

Faster, more memory efficient and more ordered dictionaries on PyPy

Hello everyone!

As of today, we merged the latest branch that brings better dictionaries to PyPy by default. The work is based on an idea by Raymond Hettinger on python-dev, with prior work done notably in Java.  It was done by Maciej Fijałkowski and Armin Rigo, with Laurence Tratt recently prodding us to finish it.  (Earlier work going in a similar direction include Alex Gaynor's work on ordered dicts in Topaz, which was also used in the Hippy VM.  Each of these pieces of work is itself based on the original dict implementation in RPython, whose origins fade in the Subversion prehistory of PyPy.)  Coincidentally, a very similar idea has been implemented in Zend PHP very recently. Zend implementation description.

This post covers the basics of design and implementation as well as some basic benchmarks.

Dictionaries are now ordered!

One surprising part is that the new design, besides being more memory efficient, is ordered by design: it preserves the insertion order.  This is not forbidden by the Python language, which allows any order.  It makes the collections.OrderedDict subclass much faster than before: it is now a thin subclass of dict.  Obviously, we recommend that any portable Python program continues to use OrderedDict when ordering is important.  Note that a non-portable program might rely on more: for example, a **keywords argument now receives the keywords in the same order as the one in which they were given in the call.  (Whether such a thing might be called a language design change or not is a bit borderline.)  The point is that Python programs that work on CPython or previous versions of PyPy should continue to work on PyPy.

There is one exception, though.  The iterators of the OrderedDict subclass are now working just like the ones of the dict builtin: they will raise RuntimeError when iterating if the dictionary was modified.  In the CPython design, the class OrderedDict explicitly doesn't worry about that, and instead you get some result that might range from correct to incorrect to crashes (i.e. random Python exceptions).

Original PyPy dictionary design

Originally, PyPy dictionaries, as well as CPython dictionaries are implemented as follows (simplified view):

struct dict {
   long num_items;
   dict_entry* items;   /* pointer to array */
}

struct dict_entry {
   long hash;
   PyObject* key;
   PyObject* value;
}

Where items is a sparse array, with 1/3 to 1/2 of the items being NULL. The average space occupied by a dictionary is 3 * WORD * 12/7 plus some small constant (the smallest dict has 8 entries, which is 8 * 3 * WORD + 2 * WORD = 26 WORDs).

New PyPy dictionary design

The new PyPy dictionary is split in two arrays:

struct dict {
    long num_items;
    variable_int *sparse_array;
    dict_entry* compact_array;
}

struct dict_entry {
    long hash;
    PyObject *key;
    PyObject *value;
}

Here, compact_array stores all the items in order of insertion, while sparse_array is a 1/2 to 2/3 full array of integers. The integers themselves are of the smallest size necessary for indexing the compact_array. So if compact_array has less than 256 items, then sparse_array will be made of bytes; if less than 2^16, it'll be two-byte integers; and so on.

This design saves quite a bit of memory. For example, on 64bit systems we can, but almost never, use indexing of more than 4 billion elements; and for small dicts, the extra sparse_array takes very little space.  For example a 100 element dict, would be on average for the original design on 64bit: 100 * 12/7 * WORD * 3 =~ 4100 bytes, while on new design it's 100 * 12/7 + 3 * WORD * 100 =~ 2600 bytes, quite a significant saving.

GC friendliness

The obvious benefit of having more compact dictionaries is an increased cache friendliness. In modern CPUs cache misses are much more costly than doing additional simple work, like having an additional level of (in-cache) indirection. Additionally, there is a GC benefit coming from it. When doing a minor collection, the GC has to visit all the GC fields in old objects that can point to young objects. In the case of large arrays, this can prove problematic since the array grows and with each minor collection we need to visit more and more GC pointers. In order to avoid it, large arrays in PyPy employ a technique called "card marking" where the GC only visits "cards" or subsets of arrays that were modified between collections. The problem with dictionaries was that by design modifications in a dictionary occur randomly, hence a lot of cards used to get invalidated. In the new design, however, new items are typically appended to the compact_array, hence invalidate much fewer cards --- which improves GC performance.  (The new sparse_array is an array of integers, so it does not suffer from the same problems.)

Deletion

Deleting entries from dictionaries is not very common, but important in a few use cases.  To preserve order, when we delete an entry, we mark the entry as removed but don't otherwise shuffle the remaining entries.  If we repeat this operation often enough, there will be a lot of removed entries in the (originally compact) array.  At this point, we need to do a "packing" operation, which moves all live entries to the start of the array (and then reindexes the sparse array, as the positions changed).  This works well, but there are use cases where previously no reindexing was ever needed, so it makes these cases a bit slower (for example when repeatedly adding and removing keys in equal number).

Benchmarks

The PyPy speed benchmarks show mostly small effect, see changes. The microbenchmarks that we did show large improvements on large and very large dictionaries (particularly, building dictionaries of at least a couple 100s of items is now twice faster) and break-even on small ones (between 20% slower and 20% faster depending very much on the usage patterns and sizes of dictionaries). The new dictionaries enable various optimization possibilities which we're going to explore in the near future.

Cheers,
fijal, arigo and the PyPy team


Thursday, January 15, 2015

Leysin Winter Sprint (20-28th February 2015)

The next PyPy sprint will be in Leysin, Switzerland, for the tenth time. This is a fully public sprint: newcomers and topics other than those proposed below are welcome.

Goals and topics of the sprint

The details depend on who is here and ready to work. We might touch topics such as:

  • cleaning up the optimization step in the JIT, change the register allocation done by the JIT's backend, or improvements to the warm-up time

  • STM (Software Transaction Memory), notably: try to come up with benchmarks, and measure them carefully in order to test and improve the conflict reporting tools, and more generally to figure out how practical it is in large projects to avoid conflicts

  • vmprof - a statistical profiler for CPython and PyPy work, including making it more user friendly.

  • Py3k (Python 3.x support), NumPyPy (the numpy module)

  • added: cffi 1.0, trying out pygame+cffi on Raspberry Pi devices

  • And as usual, the main side goal is to have fun in winter sports :-) We can take a day off for ski.

Exact times

For a change, and as an attempt to simplify things, I specified the dates as 20-28 Februrary 2015, where 20 and 28 are travel days. We will work full days between the 21 and the 27. You are of course allowed to show up for a part of that time only, too.

Location and Accomodation

Leysin, Switzerland, "same place as before". Let me refresh your memory: both the sprint venue and the lodging will be in a very spacious pair of chalets built specifically for bed & breakfast: Ermina. The place has a good ADSL Internet connection with wireless installed. You can of course arrange your own lodging anywhere (as long as you are in Leysin, you cannot be more than a 15 minutes walk away from the sprint venue), but I definitely recommend lodging there too -- you won't find a better view anywhere else (though you probably won't get much worse ones easily, either :-)

Please confirm that you are coming so that we can adjust the reservations as appropriate. In the past, the rates were around 60 CHF a night all included in 2-person rooms, with breakfast. Now, the rooms available are either single-person (or couple), or rooms for 3 persons. The latter choice is recommended and should be under 60 CHF per person.

Please register by Mercurial, or on the pypy-dev mailing list if you do not yet have check-in rights.

You need a Swiss-to-(insert country here) power adapter. There will be some Swiss-to-EU adapters around, and at least one EU-format power strip.