This is a list of questions that represent the problems people often have with memory management. Some answers appear below, with links to helpful supporting material, such as the Memory Management Glossary, the Bibliography, and external sites. For a full explanation of any terms used, see the glossary.
Yes. Various conservative garbage collectors for C exist as add-on libraries.
For small programs, and during light testing, it is true that malloc usually succeeds. Unfortunately, there are all sorts of unpredictable reasons why malloc might fail one day; for example:
In this case, malloc will return NULL, and your program will attempt to store data by resolving the null pointer. This might cause your program to exit immediately with a helpful message, but it is more likely to provoke mysterious problems later on.
If you want your code to be robust, and to stand the test of time, you must check all error or status codes that may be returned by functions you call, especially those in other libraries, such as the C run-time library.
If you really don’t want to check the return value from malloc, and you don’t want your program to behave mysteriously when out of memory, wrap malloc in something like this:
#include <stdio.h>
#include <stdlib.h>
void *my_malloc(size_t size)
{
void *p = malloc(size);
if (p == NULL) {
fputs("Out of memory.\n", stderr);
exit(EXIT_FAILURE);
}
return p;
}
Undefined behavior is worth eliminating even in small programs.
Manual memory management, such as malloc and free(2), forces the programmer to keep track of which memory is still required, and who is responsible for freeing it. This works for small programs without internal interfaces, but becomes a rich source of bugs in larger programs, and is a serious problem for interface abstraction.
Automatic memory management frees the programmer from these concerns, making it easier for him to code in the language of his problem, rather than the tedious details of the implementation.
See also
Malloc provides a very basic manual memory management service. However, it does not provide the following things, which may be desirable in your memory manager:
Many of these can be added on top of malloc, but not with full performance.
Yes. The C++ specification has always permitted garbage collection. Bjarne Stroustrup (C++’s designer) has proposed that this be made explicit in the standard. There exist various conservative and semi-conservative garbage collectors for C++.
Often delete must perform a more complex task than simply freeing the memory associated with an object; this is known as finalization. Finalization typically involves releasing any resources indirectly associated with the object, such as files that must be closed or ancillary objects that must be finalized themselves. This may involve traversing memory that has been unused for some time and hence is paged out.
With manual memory management (such as new and delete), it is perfectly possible for the deallocation operation to vary in complexity. Some systems do quite a lot of processing on freed blocks to coalesce adjacent blocks, sort free blocks by size (in a buddy system, say), or sort the free list by address. In the last case, deallocating blocks in address order (or sometimes reverse address order) can result in poor performance.
In C++, it may be that class libraries expect you to call delete on objects they create, to invoke the destructor(2). Check the interface documentation.
Failing this, if there is a genuine memory leak in a class library for which you don’t have the source, then the only thing you can try is to add a garbage collector. The Boehm–Weiser collector will work with C++.
Carefully designed C++ constructors(2) and destructors(2) can go a long way towards easing the pain of manual memory management. Objects can know how to deallocate all their associated resources, including dependent objects (by recursive destruction). This means that clients of a class library do not need to worry about how to free resources allocated on their behalf.
Unfortunately, they still need to worry about when to free such resources. Unless all objects are allocated for precisely one purpose, and referred to from just one place (or from within one compound data structure that will be destroyed atomically), then a piece of code that has finished with an object cannot determine that it is safe to call the destructor; it cannot be certain (especially when working with other people’s code) that there is not another piece of code that will try to use the object subsequently.
This is where garbage collection has the advantage, because it can determine when a given object is no longer of interest to anyone (or at least when there are no more references to it). This neatly avoids the problems of having multiple copies of the same data or complex conditional destruction. The program can construct objects and store references to them anywhere it finds convenient; the garbage collector will deal with all the problems of data sharing.
Java, C#, Python, Lisp, ML, … the list goes on. It surprises many to learn that many implementations of BASIC use garbage collection to manage character strings efficiently.
C++ is sometimes characterized as the last holdout against garbage collection, but this is not accurate. See Can I use garbage collection in C++?
The notion of automatic memory management has stood the test of time and is becoming a standard part of modern programming environments. Some will say “the right tool for the right job”, rejecting automatic memory management in some cases; few today are bold enough to suggest that there is never a place for garbage collection among tools of the modern programmer—either as part of a language or as an add-on component.
Garbage collection frees you from having to keep track of which part of your program is responsible for the deallocation of which memory. This freedom from tedious and error-prone bookkeeping allows you to concentrate on the problem you are trying to solve, without introducing additional problems of implementation.
This is particularly important in large-scale or highly modular programs, especially libraries, because the problems of manual memory management often dominate interface complexity. Additionally, garbage collection can reduce the amount of memory used because the interface problems of manual memory management are often solved by creating extra copies of data.
In terms of performance, garbage collection is often faster than manual memory management. It can also improve performance indirectly, by increasing locality of reference and hence reducing the size of the working set, and decreasing paging.
While it is true that the major advantages of garbage collection are only seen in complex systems, there is no reason for garbage collection to introduce any significant overhead at any scale. The data structures associated with garbage collection compare favorably in size with those required for manual memory management.
Some older systems gave garbage collection a bad name in terms of space or time overhead, but many modern techniques exist that make such overheads a thing of the past. Additionally, some garbage collectors are designed to work best in certain problem domains, such as large programs; these may perform poorly outside their target environment.
While early garbage collectors had to complete without interruption and hence would pause observably, many techniques are now available to ensure that modern collectors can be unobtrusive.
No, updating reference counts is quite expensive, and they have a couple of problems:
There are many systems that use reference counts, and avoid the problems described above by using a conventional garbage collector to complement it. This is usually done for real-time benefits. Unfortunately, experience shows that this is generally less efficient than implementing a proper real-time garbage collector, except in the case where most reference counts are one.
Garbage collectors usually have to manipulate vulnerable data structures and must often use poorly-documented, low-level interfaces. Additionally, any garbage collection problems may not be detected until some time later. These factors combine to make most garbage collection bugs severe in effect, hard to reproduce, and difficult to work around.
On the other hand, commercial garbage collection code will generally be heavily tested and widely used, which implies it must be reliable. It will be hard to match that reliability in a manual memory manager written for one program, especially given that manual memory management doesn’t scale as well as the automatic variety.
In addition, bugs in the compiler or run-time (or application if the language is as low-level as C) can corrupt the heap in ways that only the garbage collector will detect later. The collector is blamed because it found the corruption. This is a classic case of shooting the messenger.
This may be true of primitive collectors (like the two-space collector), but this is not generally true of garbage collection. The data structures used for garbage collection need be no larger than those for manual memory management.
No. Benjamin Zorn (1992) found that:
the CPU overhead of conservative garbage collection is comparable to that of explicit storage management techniques. […] Conservative garbage collection performs faster than some explicit algorithms and slower than others, the relative performance being largely dependent on the program.
Note also that the version of the conservative collector used in this paper is now rather old and the collector has been much improved since then.
It is possible for manual memory management to pause for considerable periods, either on allocation or deallocation. It certainly gives no guarantees about performance, in general.
With automatic memory management, such as garbage collection, modern techniques can give guarantees about interactive pause times, and so on.
When you are using a virtual memory system, the computer may have to fetch pages of memory from disk before they can be accessed. If the total working set of your active programs exceeds the physical memory(1) available, paging will happen continually, your disk will rattle, and performance will degrade significantly. The only solutions are to install more physical memory, run fewer programs at the same time, or tune the memory requirements of your programs.
The problem is aggravated because virtual memory systems approximate the theoretical working set with the set of pages on which the working set lies. If the actual working set is spread out onto a large number of pages, then the working page-set is large.
When objects that refer to each other are distant in memory, this is known as poor locality of reference. This happens either because the program’s designer did not worry about this, or the memory manager used in the program doesn’t permit the designer to do anything about it.
Note that copying garbage collection can dynamically organize your data according to the program’s reference patterns and thus mitigate this problem.
See also
Many modern languages have garbage collection built in, and the language documentation should give details. For some other languages, garbage collection can be added, for example via the Boehm–Weiser collector.
See also
The Boehm–Weiser collector is suitable for C or C++. The best way to get a garbage collector, however, is to program in a language that provides garbage collection.
See also
If you are using manual memory management (for example, malloc and free(2) in C), it is likely that your program is failing to free memory blocks after it stops using them. When your code allocates memory on the heap, there is an implied responsibility to free that memory. If a function uses heap memory for returning data, you must decide who takes on that responsibility. Pay special attention to the interfaces between functions and modules. Remember to check what happens to allocated memory in the event of an error or an exception.
If you are using automatic memory management (almost certainly garbage collection), it is probable that your code is remembering some blocks that it will never use in future. This is known as the difference between liveness and reachability. Consider clearing variables that refer to large blocks or networks of blocks, when the data structure is no longer required.
If you are using manual memory management, it is likely that the library is allocating data structures on the heap every time it is used, but that they are not being freed. Check the interface documentation for the library; it may expect you to take some action when you have finished with returned data. It may be necessary to close down the library and re-initialize it to recover allocated memory.
Unfortunately, it is all too possible that the library has a memory management bug. In this case, unless you have the source code, there is little you can do except report the problem to the supplier. It may be possible to add a garbage collector to your language, and this might solve your problems.
With a garbage collector, sometimes objects are retained because there is a reference to them from some global data structure. Although the library might not make any further use of the objects, the collector must retain the objects because they are still reachable.
If you know that a particular reference will never be used in future, it can be worthwhile to overwrite it. This means that the collector will not retain the referred object because of that reference. Other references to the same object will keep it alive, so your program doesn’t need to determine whether the object itself will ever be accessed in future. This should be done judiciously, using the garbage collector’s tools to find what objects are being retained and why.
If your garbage collector is generational, it is possible that you are suffering from premature tenuring, which can often be solved by tuning the collector or using a separate memory area for the library.
If you are sure that your program is spending a large proportion of its time in memory management, and you know what you’re doing, then it is certainly possible to improve performance by writing a suballocator. On the other hand, advances in memory management technology make it hard to keep up with software written by experts. In general, improvements to memory management don’t make as much difference to performance as improvements to the program algorithms.
Benjamin Zorn (1992) found that:
In four of the programs investigated, the programmer felt compelled to avoid using the general-purpose storage allocator by writing type-specific allocation routines for the most common object types in the program. […] The general conclusion […] is that programmer optimizations in these programs were mostly unnecessary. […] simply using a different algorithm appears to improve the performance even more.
and concluded:
programmers, instead of spending time writing domain-specific storage allocators, should consider using other publicly-available implementations of storage management algorithms if the one they are using performs poorly.
Global, or static, data is fixed size; it cannot grow in response to the size or complexity of the data set received by a program. Stack-allocated data doesn’t exist once you leave the function (or program block) in which it was declared.
If your program’s memory requirements are entirely predictable and fixed at compile-time, or you can structure your program to rely on stack data only while it exists, then you can entirely avoid using heap allocation. Note that, with some compilers, use of large global memory blocks can bloat the object file size.
It may often seem simpler to allocate a global block that seems “probably large enough” for any plausible data set, but this simplification will almost certainly cause trouble sooner or later.
See also
While virtual memory can greatly increase your capacity to store data, there are three problems typically experienced with it:
See also