=== Memory allocation ===
Using plain old '''malloc()''' can cause serious bottlenecks and slowdown in parallel programming, due to contention on allocator. Many alternatives are available, for example:
* [http://threadingbuildingblocks.org Intel Threading Building Blocks] has a good allocator.
* The [http://www.hoard.org/ Hoard] memory allocator.
* Google's [http://goog-perftools.sourceforge.net/doc/tcmalloc.html TCMalloc].
All these allocators take into account that there are multiple threads running, and can avoid common pitfalls introduced by modern hardware (multi-core).
=== CPU memory caches ===
Concurrent programming imposes new problems for assuring optimal usage of CPU memory caches. In particular, two things are particularly important for performance:
* Keep unrelated data as far away from each other as possible. Modern CPUs have minimum cache line sizes, and there are implicit write locks on each such cache line, so you can in effect introduce lock contention if you try to write on the same cache line from multiple threads. This problem is commonly referred to as '''False Sharing''', and it should be avoided (to avoid contention on writes).
* Keep related data as close to each other as possible. This will increase cache locality, and reduce the number of cache lines used for a particular thread / task.
=== Memory barriers ===
TBA
Memory management
Powered by Drupal
Add new comment