merge(third_party/git): Merge squashed git subtree at v2.23.0
Merge commit '1b593e1ea4' as 'third_party/git'
This commit is contained in:
commit
7ef0d62730
3629 changed files with 1139935 additions and 0 deletions
136
third_party/git/compat/nedmalloc/Readme.txt
vendored
Normal file
136
third_party/git/compat/nedmalloc/Readme.txt
vendored
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
nedalloc v1.05 15th June 2008:
|
||||
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
|
||||
|
||||
by Niall Douglas (http://www.nedprod.com/programs/portable/nedmalloc/)
|
||||
|
||||
Enclosed is nedalloc, an alternative malloc implementation for multiple
|
||||
threads without lock contention based on dlmalloc v2.8.4. It is more
|
||||
or less a newer implementation of ptmalloc2, the standard allocator in
|
||||
Linux (which is based on dlmalloc v2.7.0) but also contains a per-thread
|
||||
cache for maximum CPU scalability.
|
||||
|
||||
It is licensed under the Boost Software License which basically means
|
||||
you can do anything you like with it. This does not apply to the malloc.c.h
|
||||
file which remains copyright to others.
|
||||
|
||||
It has been tested on win32 (x86), win64 (x64), Linux (x64), FreeBSD (x64)
|
||||
and Apple MacOS X (x86). It works very well on all of these and is very
|
||||
significantly faster than the system allocator on all of these platforms.
|
||||
|
||||
By literally dropping in this allocator as a replacement for your system
|
||||
allocator, you can see real world improvements of up to three times in normal
|
||||
code!
|
||||
|
||||
To use:
|
||||
-=-=-=-
|
||||
Drop in nedmalloc.h, nedmalloc.c and malloc.c.h into your project.
|
||||
Configure using the instructions in nedmalloc.h. Run and enjoy.
|
||||
|
||||
To test, compile test.c. It will run a comparison between your system
|
||||
allocator and nedalloc and tell you how much faster nedalloc is. It also
|
||||
serves as an example of usage.
|
||||
|
||||
Notes:
|
||||
-=-=-=
|
||||
If you want the very latest version of this allocator, get it from the
|
||||
TnFOX SVN repository at svn://svn.berlios.de/viewcvs/tnfox/trunk/src/nedmalloc
|
||||
|
||||
Because of how nedalloc allocates an mspace per thread, it can cause
|
||||
severe bloating of memory usage under certain allocation patterns.
|
||||
You can substantially reduce this wastage by setting MAXTHREADSINPOOL
|
||||
or the threads parameter to nedcreatepool() to a fraction of the number of
|
||||
threads which would normally be in a pool at once. This will reduce
|
||||
bloating at the cost of an increase in lock contention. If allocated size
|
||||
is less than THREADCACHEMAX, locking is avoided 90-99% of the time and
|
||||
if most of your allocations are below this value, you can safely set
|
||||
MAXTHREADSINPOOL to one.
|
||||
|
||||
You will suffer memory leakage unless you call neddisablethreadcache()
|
||||
per pool for every thread which exits. This is because nedalloc cannot
|
||||
portably know when a thread exits and thus when its thread cache can
|
||||
be returned for use by other code. Don't forget pool zero, the system pool.
|
||||
|
||||
For C++ type allocation patterns (where the same sizes of memory are
|
||||
regularly allocated and deallocated as objects are created and destroyed),
|
||||
the threadcache always benefits performance. If however your allocation
|
||||
patterns are different, searching the threadcache may significantly slow
|
||||
down your code - as a rule of thumb, if cache utilisation is below 80%
|
||||
(see the source for neddisablethreadcache() for how to enable debug
|
||||
printing in release mode) then you should disable the thread cache for
|
||||
that thread. You can compile out the threadcache code by setting
|
||||
THREADCACHEMAX to zero.
|
||||
|
||||
Speed comparisons:
|
||||
-=-=-=-=-=-=-=-=-=
|
||||
See Benchmarks.xls for details.
|
||||
|
||||
The enclosed test.c can do two things: it can be a torture test or a speed
|
||||
test. The speed test is designed to be a representative synthetic
|
||||
memory allocator test. It works by randomly mixing allocations with frees
|
||||
with half of the allocation sizes being a two power multiple less than
|
||||
512 bytes (to mimic C++ stack instantiated objects) and the other half
|
||||
being a simple random value less than 16Kb.
|
||||
|
||||
The real world code results are from Tn's TestIO benchmark. This is a
|
||||
heavily multithreaded and memory intensive benchmark with a lot of branching
|
||||
and other stuff modern processors don't like so much. As you'll note, the
|
||||
test doesn't show the benefits of the threadcache mostly due to the saturation
|
||||
of the memory bus being the limiting factor.
|
||||
|
||||
ChangeLog:
|
||||
-=-=-=-=-=
|
||||
v1.05 15th June 2008:
|
||||
* { 1042 } Added error check for TLSSET() and TLSFREE() macros. Thanks to
|
||||
Markus Elfring for reporting this.
|
||||
* { 1043 } Fixed a segfault when freeing memory allocated using
|
||||
nedindependent_comalloc(). Thanks to Pavel Vozenilek for reporting this.
|
||||
|
||||
v1.04 14th July 2007:
|
||||
* Fixed a bug with the new optimised implementation that failed to lock
|
||||
on a realloc under certain conditions.
|
||||
* Fixed lack of thread synchronisation in InitPool() causing pool corruption
|
||||
* Fixed a memory leak of thread cache contents on disabling. Thanks to Earl
|
||||
Chew for reporting this.
|
||||
* Added a sanity check for freed blocks being valid.
|
||||
* Reworked test.c into being a torture test.
|
||||
* Fixed GCC assembler optimisation misspecification
|
||||
|
||||
v1.04alpha_svn915 7th October 2006:
|
||||
* Fixed failure to unlock thread cache list if allocating a new list failed.
|
||||
Thanks to Dmitry Chichkov for reporting this. Further thanks to Aleksey Sanin.
|
||||
* Fixed realloc(0, <size>) segfaulting. Thanks to Dmitry Chichkov for
|
||||
reporting this.
|
||||
* Made config defines #ifndef so they can be overridden by the build system.
|
||||
Thanks to Aleksey Sanin for suggesting this.
|
||||
* Fixed deadlock in nedprealloc() due to unnecessary locking of preferred
|
||||
thread mspace when mspace_realloc() always uses the original block's mspace
|
||||
anyway. Thanks to Aleksey Sanin for reporting this.
|
||||
* Made some speed improvements by hacking mspace_malloc() to no longer lock
|
||||
its mspace, thus allowing the recursive mutex implementation to be removed
|
||||
with an associated speed increase. Thanks to Aleksey Sanin for suggesting this.
|
||||
* Fixed a bug where allocating mspaces overran its max limit. Thanks to
|
||||
Aleksey Sanin for reporting this.
|
||||
|
||||
v1.03 10th July 2006:
|
||||
* Fixed memory corruption bug in threadcache code which only appeared with >4
|
||||
threads and in heavy use of the threadcache.
|
||||
|
||||
v1.02 15th May 2006:
|
||||
* Integrated dlmalloc v2.8.4, fixing the win32 memory release problem and
|
||||
improving performance still further. Speed is now up to twice the speed of v1.01
|
||||
(average is 67% faster).
|
||||
* Fixed win32 critical section implementation. Thanks to Pavel Kuznetsov
|
||||
for reporting this.
|
||||
* Wasn't locking mspace if all mspaces were locked. Thanks to Pavel Kuznetsov
|
||||
for reporting this.
|
||||
* Added Apple Mac OS X support.
|
||||
|
||||
v1.01 24th February 2006:
|
||||
* Fixed multiprocessor scaling problems by removing sources of cache sloshing
|
||||
* Earl Chew <earl_chew <at> agilent <dot> com> sent patches for the following:
|
||||
1. size2binidx() wasn't working for default code path (non x86)
|
||||
2. Fixed failure to release mspace lock under certain circumstances which
|
||||
caused a deadlock
|
||||
|
||||
v1.00 1st January 2006:
|
||||
* First release
|
||||
Loading…
Add table
Add a link
Reference in a new issue