Title | Rehashing large hash tables breaks nursery collection strategy |
Status | closed |
Priority | essential |
Assigned user | Nick Barnes |
Organization | Ravenbrook |
Description | Simple use of location dependencies in flat hash tables has a very bad performance conflict with our generational collection scheduling strategy, once a hash table is large enough for the main hash table array to be a significant fraction of the nursery size (assuming that at least some of the objects in the hash table are new or nearly new). |
Analysis | The hash table LD refset includes (at least) the refset of the nursery (because objects in the hash table are new). On a rehash, the hash table array is reallocated in the nursery, adding to the nursery generation's "new size". A nursery collection is scheduled once this "new size" exceeds the nursery's capacity (or will do so "soon"). That collection makes the hash table's LD stale (for all references) because the collection puts the nursery refset into the arena's LD history. Then the next time the hash table's stale test is taken (e.g. on the next insert or miss), another rehash will be triggered. For a hash table which is a significant fraction of the nursery size, this causes a lot of rehashes. Hash table code might even go into an endless loop (because a rehash involves a lot of insert operations, which - if not written with this possibility in mind - could trigger another rehash). There are several possible solutions. A quick-and-dirty hack being applied on a custom branch is for buffer fills greater than a fraction of the size of the containing generation not to count towards that generation's "new size". |
How found | customer |
Evidence | http://info.ravenbrook.com/mail/2013/02/22/14-04-36/0.txt |
Observed in | 1.110.0 |
Created by | Nick Barnes |
Created on | 2013-03-07 19:50:31 |
Last modified by | Richard Brooksby |
Last modified on | 2013-05-01 16:36:17 |
History | 2013-03-07 NB Created. |