Corralx

My slice of reality

A simple Tagged Pointer

So, since you gotta start somewhere, I thought it would have been a lot more interesting to actually begin this new adventure with an actual post, albeit simple, leaving the presentation and every personal information for the About page.

In this post I am going to present a very simple implementation of a tagged pointer in C++11, exploring the design choices and rationale behind it. The complete source can be found on my GitHub here. The only requirement is a fairly recent C++11 compiler with variadic templates and constexpr support. If you encounter any problem or bug using it don’t hesitate contacting me using the social buttons on top or by opening a GitHub issue!

So, what is it?

Generally speaking, a tagged pointer is just a raw pointer to a memory address with any type of additional data (hereafter called tag) added to it, which carries some special meaning for us. We can actually think of three ways to implement this generic idea:

  • Use a struct to store both the pointer and the tags. While this is the simpler approach and it poses no restriction on the amount of data associated to it, it’s just a plain object (and we are obviously not going down this road)
struct tagged_pointer
{
  object_type* pointer;
  bool dirty_flag;
  bool writable_flag;
};
  • Set the value of the pointer to some sentinel value, and test for it accordingly in your code. This is similar to check for null pointers using 0 as an invalid address, but it is not really useful since you can use this approach only when you don’t care anymore about the value of the pointer itself
  • Exploit the alignment of data in memory and reuse the lower bits of pointers to save our tags. This is clearly architecture/compiler dependent and a lot of things may break. For this reason we’ll obviously go down this road.

How does it work?

We call a memory address aligned if it’s a multiple of the size of the object stored at that address (note that we are not referring to the OOP notion of objects here, but just to some data stored in memory, like an integer or a float). This is often called natural alignment. When the compiler decides how and where to put your data in memory it must consider two things:

  • adhere to the language standard if it poses any requirement on the alignment
  • respect the target architecture requirements (and possibly any performance hint)

These two can give us the opportunity to do what we want. The C++ standard however, does not impose any restriction on the alignment of data so the former is a no go for us. But we can do something about the latter, since different architectures force different requirements for data alignment.

The x86 architecture, both Intel and AMD, can issue a load instruction on unaligned memory for scalar data types, but this can cause a performance hit as stated on the architecture manual and therefore the compiler will tipically try to store the data respecting the alignment (note that on the latest Intel processors there seems to be no penalty anymore for unaligned access as the fixup is done directly by the hardware). This is not true however for SIMD data types, like the SSE/AVX extensions, because those have stricter requirements and the data must be aligned to the size of the object.
On the ARM side the things are a bit different, since only the latest architecture updates added the requirement to support unaligned memory access (with a performance penalty to fixup the load operation in most cases), while on older processor any unaligned access will just cause a trap with a consequent memory access violation.
Given all the possible combination of architectures, requirements and performance considerations the default behaviour of all major compilers, if not specifically asked otherwise, is to align the data in memory respecting the size of the objects, eventually inserting padding where needed.
How much data can we save on an aligned pointer then? Given a pointer aligned to n bytes, the least significant log (n) bit will always be 0, and that is exactly how much space we have and where we are going to save our tag.

Up until this point though, we have only considered basic types like integer or float.
What happens when we have a pointer to an aggregate type, like a real OOP object? The standard again does not require any alignment per se, but we can infer that the alignment of an object must be at least the maximum between the alignment of its member. This happens because, while the compiler can add padding between members, it cannot do the same between array elements. Since an array can have any aggregate type as elements, the compiler must align the object’s member to their size using paddings, to ensure it can pack the array elements tightly. A more in-depth explanation can be found on this great stack overflow answer.
Luckily the C++11 standard added the alignof operator which tells us exactly the alignment in bytes, so we are not required to compute (or even worst, guess) the alignment ourselves.

What we are going to build then, is a smart pointer with the following requirements:

  • It must closely follow the semantic and behaviour of raw pointers
  • It can save a tag for us without using any additional memory
  • The behaviour should not depend on the value of the tag itself

But enough talking, lets write some code.

Implementation

This will be our basic declaration:

template<typename T>
class tag_ptr
{
private:
  using pointer = T*;

  union
  {
    pointer _ptr;
    uintptr_t _ptr_bits;
  };
};

The only interesting to note, is the use of an anonymous union, to access the raw pointer bits using a uintptr_t which is an unsigned integer type big enough to hold a pointer on the target architecture.

The next thing we need is a way to calculate how much space (namely bits) we have at our disposal, given the pointed type and its alignment. Since this information is available at compile time because it only depends on the alignment, we are going to use some basic template metaprogramming coupled with a constexpr variable to hold the result.

template<size_t V>
struct log2
{
  enum { value = log2<V / 2>::value + 1 };
};

template<>
struct log2<1>
{
  enum { value = 0 };
};

template<typename T>
class tag_ptr
{
public:
  static constexpr uint8_t tag_bits = log2<alignof(T)>::value;
};

But wait, why aren’t we using the log2 function of the stl math library? Well, this is because nearly no math function has a constexpr version in C++11, mainly because of side effects. This as changed in C++14, but we are stickying with C++11 for this and a little metaprogramming has never hurt anyone, right?

The last important bit missing (pun intended), is to provide a way to the user to extract the raw pointer and use the tag. This can be easily done with a mask and some basic bit manipulation (that’s why we used uintptr_t in the first place).

/* Inside class declaration ... */

static constexpr uint8_t tag_mask = alignof(T) - static_cast<uint8_t>(1);

T* get() const
{
  return reinterpret_cast<pointer>(_ptr_bits & ~tag_mask);
}

uint8_t tag() const
{
  return static_cast<uint8_t>(_ptr_bits & static_cast<uintptr_t>(tag_mask));
}

void tag(uint8_t value)
{
  assert((value & tag_mask) == value);
  _ptr_bits = reinterpret_cast<uintptr_t>(get()) | static_cast<uintptr_t>(value & tag_mask);
}

Note the use of reinterpret_cast when casting between pointer and uintptr_t. We couldn’t use static_cast here, because pointers and integers are not related types and the conversion is forbidden.

Now the rest of the class declaration is reported for completeness. As stated before the rationale here is to behave as a raw pointer, so some operators are needed.

/* Inside class declaration ... */

tag_ptr() : _ptr(nullptr) {}
explicit tag_ptr(pointer ptr, uint8_t value = 0) : _ptr(ptr) { tag(value); }
tag_ptr(const tag_ptr& o) : _ptr(o._ptr) {}
~tag_ptr() = default;

tag_ptr& operator=(const tag_ptr& o)
{
  _ptr = o._ptr;
  return *this;
}

T& operator*() const { return *get(); }
pointer operator->() const { return get(); }

operator bool() const
{
  return static_cast<bool>(_ptr_bits & ~static_cast<uintptr_t>(tag_mask));
}
  • The explicit specifier is used on the single parameter constructor, to avoid assigning a raw pointer thus losing the current tag. This can happen because a new tag_ptr can be constructed implicitly from a raw pointer and then used as a parameter for the copy assignment operator
  • The implicit bool conversion, deference and arrow operators are implemented following the semantics of pointers, therefore ignoring the tag
  • Move constructor and move assignment operator are not declared and this is because the default move semantic on pointers is the copy

Final Remarks

There are a couple more things to note about this implementation and what you will find in the complete source on GitHub:

  • The standard comparison operators are implemented by comparing the pointer without the tag, again behaving as a raw pointer for the outside world
  • The swap and make_tag functions are provided and implemented trivially
  • No std::hash overload is provided because it is debatable whether we want to consider or ignore the tag when computing the hash
  • This is working under the assumption that the pointed data is aligned, which is the common case. Note however that when custom packing or pointer arithmetic is involved things may break, so always check if the alignment and bit twiddling works for you before using it!

An example of usage can be found on the README of the project on my GitHub.