I was planning to write some notes about ptmalloc, tcmalloc and jemalloc. Well, it is impossible for sure. So I decide to read jemalloc first, because this is the first malloc library that I learned while reading redis source code.


  • TSD, tsd: thread specific data
  • TLS, tls: thread local storage

Jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. For further information please check the links down below. I will focus on je_malloc, the main function overriding libc malloc.

The version I am reading is in HEAD commit fb56766ca9b398d07e2def5ead75a021fc08da03 due to a new implementation for performance improvement in je_malloc.

To enable static linking with glibc, there must be a jemalloc specific malloc function implementation. The entry point is in jemalloc.c.

void *__libc_malloc(size_t size) PREALIAS(je_malloc);
#    endif

Tracking into je_malloc. Current je_malloc in this dev branch is not the same as released. They add some code to improve performance which is based on these concepts:

  • caching by tcache (thread cache)
  • tail-calling the old je_malloc



This function return bool value based on platform information. So do tsd_boot0 tsd_boot1, tsd_boot, tsd_booted_get, tsd_get_allocates, tsd_get, and tsd_set.

#include "jemalloc/internal/tsd_malloc_thread_cleanup.h"
#elif (defined(JEMALLOC_TLS))
#include "jemalloc/internal/tsd_tls.h"
#elif (defined(_WIN32))
#include "jemalloc/internal/tsd_win.h"
#include "jemalloc/internal/tsd_generic.h"

unlikely, likely

These functions are used for static branch prediction. Compiler would try to place instructions followed by a branch or not according to whether the branch is likely or unlikely to be taken.

je_malloc(size_t size) {
	LOG("core.malloc.entry", "size: %zu", size);

	if (tsd_get_allocates() && unlikely(!malloc_initialized())) {
		return malloc_default(size);

	tsd_t *tsd = tsd_get(false);
	if (unlikely(!tsd || !tsd_fast(tsd) || (size > SC_LOOKUP_MAXCLASS))) {
		return malloc_default(size);

	tcache_t *tcache = tsd_tcachep_get(tsd);

	if (unlikely(ticker_trytick(&tcache->gc_ticker))) {
		return malloc_default(size);

	szind_t ind = sz_size2index_lookup(size);
	size_t usize;
	if (config_stats || config_prof) {
		usize = sz_index2size(ind);
	/* Fast path relies on size being a bin. I.e. SC_LOOKUP_MAXCLASS < SC_SMALL_MAXCLASS */
	assert(ind < SC_NBINS);
	assert(size <= SC_SMALL_MAXCLASS);

	if (config_prof) {
		int64_t bytes_until_sample = tsd_bytes_until_sample_get(tsd);
		bytes_until_sample -= usize;
		tsd_bytes_until_sample_set(tsd, bytes_until_sample);

		if (unlikely(bytes_until_sample < 0)) {
			 * Avoid a prof_active check on the fastpath.
			 * If prof_active is false, set bytes_until_sample to
			 * a large value.  If prof_active is set to true,
			 * bytes_until_sample will be reset.
			if (!prof_active) {
				tsd_bytes_until_sample_set(tsd, SSIZE_MAX);
			return malloc_default(size);

	cache_bin_t *bin = tcache_small_bin_get(tcache, ind);
	bool tcache_success;
	void* ret = cache_bin_alloc_easy(bin, &tcache_success);

	if (tcache_success) {
		if (config_stats) {
			*tsd_thread_allocatedp_get(tsd) += usize;
		if (config_prof) {
			tcache->prof_accumbytes += usize;

		LOG("core.malloc.exit", "result: %p", ret);

		/* Fastpath success */
		return ret;

	return malloc_default(size);