C++ For C# Developers: Part 47 – Containers Library Wrapup
We’ve covered all the individual container types, so let’s take a step back and look at the containers library as a whole. A lot of features, like support for range-based for
loops and allocator customization, are supported in all container types. Today we’ll take a look at the commonalities between the containers and see what ties them together into a cohesive library.
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrap-up
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Destructuring and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Iterators
C# containers like List
implement the IEnumerable<T>
interface. This means they provide a GetEnumerator
method that returns an IEnumerator<T>
. This in turn provides a Current
property to access the current element and a MoveNext
method to move to the next element.
C++ containers provide the same support for this abstract form of traversing a collection, but they call the object keeping track of the traversal an “iterator” instead of an “enumerator.” Regardless of the container type, it’ll have two member type aliases: iterator
and const_iterator
. Here’s how we use them the way we’d manually traverse a collection vith IEnumerator<T>
in C#:
#include <vector> // Create a vector with three elements std::vector<int> v{ 1, 2, 3 }; // Call begin() to get an iterator that's at the first element: 1 // Call end() to get an iterator that's one past the last element: 3 // Use the overloaded pre-increment operator to advance to the next element for (std::vector<int>::iterator it{ v.begin() }; it != v.end(); ++it) { // Use the overloaded dereference operator to get the element DebugLog(*it); // 1 then 2 then 3 }
It’s worth noting that while both the pre-increment and post-increment operators are overloaded for iterator types, the pre-increment operator is always at least as fast as the post-increment operator. They may be equally fast, but it’s a good habit to use the pre-increment operator when manually using iterators.
There are a few variations of the above canonical loop that are commonly seen:
#include <vector> std::vector<int> v{ 1, 2, 3 }; // Use the free functions begin() and end() instead of the member functions for (std::vector<int>::iterator it{ std::begin(v) }; it != std::end(v); ++it) { DebugLog(*it); // 1 then 2 then 3 } // Use cbegin() and cend() to get constant (i.e. read-only) iterators for ( std::vector<int>::const_iterator it{ v.cbegin() }; it != v.cend(); ++it) { DebugLog(*it); // 1 then 2 then 3 } // Use auto to avoid the long type name for (auto it{ v.cbegin() }; it != v.cend(); ++it) { DebugLog(*it); // 1 then 2 then 3 }
The design of iterators and functions like begin
and end
make all of the container types compatible with range-based for loops. Since they were introduced in C++11 it’s now far less common to see these verbose loops that manually control iterators in the simple, and most common, “start to end” fashion. Instead, we just use a for
loop and let the compiler generate the same code as the manual version:
#include <vector> std::vector<int> v{ 1, 2, 3 }; // Non-const copies of every element for (int x : v) { DebugLog(x); // 1 then 2 then 3 } // Non-const references to every element for (int& x : v) { DebugLog(x); // 1 then 2 then 3 } // const copies of every element for (const int x : v) { DebugLog(x); // 1 then 2 then 3 } // const references to every element for (const int& x : v) { DebugLog(x); // 1 then 2 then 3 }
Note that auto
can be used instead of int
in all of the above examples.
Besides the basics of traversing a collection from start to end, many iterator types support more functionality. For example, the iterators of a std::vector
support random access via the overloaded it[N]
operator. We’ll go into these in-depth in the next article.
Allocators
C++ containers allow customization of how they allocate memory in two ways. First, the classic way is to pass an allocator type as a template argument. This defaults to the std::allocator
class template we saw in the System Integration Library. We can create our own type of allocator though and use it to improve performance, add safety checks, allocate memory out of special pools such as the file system or VRAM, or anything else we’d like to do:
#include <list> // Custom allocator using malloc() and free() from the C Standard Library template <class T> struct MallocFreeAllocator { // This allocator allocates objects of type T using value_type = T; // Default constructor MallocFreeAllocator() noexcept = default; // Converting copy constructor template<class U> MallocFreeAllocator(const MallocFreeAllocator<U>&) noexcept { } // Allocate enough memory for n objects of type T T* allocate(const size_t n) const { DebugLog("allocate"); return reinterpret_cast<T*>(malloc(n * sizeof(T))); } // Deallocate previously-allocated memory void deallocate(T* const p, size_t) const noexcept { DebugLog("deallocate"); free(p); } }; void Foo() { // Use the custom allocator to allocate the list's memory std::list<int, MallocFreeAllocator<int>> li{ 1, 2, 3 }; for (int x : li) { DebugLog(x); } }
What exactly this prints, besides at least one “allocate”, then “1”, “2”, and “3”, then at least one “deallocate” depends on the implementation of the std::list
type, but it’s likely to look something like this:
allocate allocate allocate allocate allocate 1 2 3 deallocate deallocate deallocate deallocate deallocate
If we’d rather construct the allocator ourselves, perhaps to customize it in some way, rather than letting the container construct it, then we can pass it to the container’s constructor:
// Create an allocator MallocFreeAllocator<int> alloc{}; // Pass it to the constructor to be used std::list<int, MallocFreeAllocator<int>> li{ {1, 2, 3}, alloc };
The good thing about this kind of customization is that no virtual functions are required so we don’t take a performance hit compared to direct function calls. The bad part is that the allocator becomes part of the type of the container. The above variable has type std::list<int, MallocFreeAllocator<int>>
which is different from the default type std::list<int, std::allocator<int>>
. This disables certain functionality such as using the overloaded =
operator to assign all the elements of a list to another list or to mix and match iterator types.
The other form of memory allocation customization is designed to make the opposite trade-off. Available since C++17, it takes the performance hit of virtual function calls rather than create discrete types that include the allocator. We’ve seen this before with std::pmr::basic_string and it’s an option that’s also available on the other container types. There are class templates such as std::pmr::vector
and std::pmr::list
that mirror their non-pmr
versions nearly identically:
#include <list> #include <memory_resource> // A std::pmr::memory_resource that uses the new and delete operators struct NewDeleteMemoryResource : public std::pmr::memory_resource { // Allocate bytes, not a particular type virtual void* do_allocate( std::size_t numBytes, std::size_t alignment) override { return new uint8_t[numBytes]; } // Deallocate bytes virtual void do_deallocate( void* p, std::size_t numBytes, std::size_t alignment) override { delete[](uint8_t*)p; } // Check if this resource's allocation and deallocation are compatible // with that of another resource virtual bool do_is_equal( const std::pmr::memory_resource& other) const noexcept override { // Compatible if the same type return typeid(other) == typeid(NewDeleteMemoryResource); } }; void Foo() { // Make the memory resource NewDeleteMemoryResource mr{}; // Make the polymorphic allocator backed by the memory resource std::pmr::polymorphic_allocator<int> polyalloc{ &mr }; // Pass the polymorphic allocator to the constructor to be used std::pmr::list<int> li1{ {1, 2, 3}, polyalloc }; for (int x : li1) { DebugLog(x); // 1 then 2 then 3 } // Class template instantiations are compatible // This has a different polymorphic allocator: the default std::pmr::list<int> li2 = li1; for (int x : li2) { DebugLog(x); // 1 then 2 then 3 } }
C#’s managed memory model means there’s no customization of any container’s allocation strategy.
Free Functions
A number of free functions, i.e. those that aren’t a member of any class, work on all the container types. We’ve already seen std::begin
and std::end
above to get iterators. We’ve also seen C++20’s std::erase
and std::erase_if
for several array and non-array container types.
There’s one more free function that works on containers to talk about: std::swap
. This function swaps the contents of one container with the contents of another container. This can often be performed very efficiently, such as swapping pointers. For example, a simple array type might overload std::swap
like this:
// Forward-declare the array type template <typename T> class SimpleArray; // Forward-declare the std::swap function overload namespace std { template<class T> void swap(SimpleArray<T>& a, SimpleArray<T>& b) noexcept; } // Define the array type template <typename T> class SimpleArray { private: T* Elements; int Length; // Allow the std::swap overload to access private members friend void std::swap(SimpleArray<T>& a, SimpleArray<T>& b) noexcept; public: SimpleArray(int length) { Elements = new T[length]; Length = length; } ~SimpleArray() { delete[] Elements; } int GetLength() { return Length; } int& operator[](int index) { if (index < 0 || index >= Length) { throw std::out_of_range{"Index out of bounds"}; } return Elements[index]; } }; // Define the std::swap function overload namespace std { template<class T> void swap(SimpleArray<T>& a, SimpleArray<T>& b) noexcept { // Swap pointers instead of copying all the elements T* elements = a.Elements; a.Elements = b.Elements; b.Elements = elements; // Swap lengths int length = a.Length; a.Length = b.Length; b.Length = length; } } void Foo() { SimpleArray<int> a{ 3 }; a[0] = 1; a[1] = 2; a[2] = 3; SimpleArray<int> b{ 5 }; b[0] = 10; b[1] = 20; b[2] = 30; b[3] = 40; b[4] = 50; for (int i = 0; i < a.GetLength(); ++i) { DebugLog(a[i]); // 1, 2, 3 } for (int i = 0; i < b.GetLength(); ++i) { DebugLog(b[i]); // 10, 20, 30, 40, 50 } // Swap the contents of a and b std::swap(a, b); for (int i = 0; i < a.GetLength(); ++i) { DebugLog(a[i]); // 10, 20, 30, 40, 50 } for (int i = 0; i < b.GetLength(); ++i) { DebugLog(b[i]); // 1, 2, 3 } }
Overloads like the one we created above for SimpleArray
exist for all the C++ Standard Library containers. It’s even typical to create overloads like the above for any custom container types we create. Usage with types like std::unordered_set
looks just the same:
#include <unordered_set> // Create collections std::unordered_set<int> a{ 1, 2, 3 }; std::unordered_set<int> b{ 10, 20, 30, 40, 50 }; // Print initial contents for (int x : a) { DebugLog(x); // 1, 2, 3 } for (int x : b) { DebugLog(x); // 10, 20, 30, 40, 50 } // Swap contents std::swap(a, b); // Print swapped contents for (int x : a) { DebugLog(x); // 10, 20, 30, 40, 50 } for (int x : b) { DebugLog(x); // 1, 2, 3 }
Exceptions
C++ supports a wide variety of error-handling techniques ranging from simple return codes to exceptions, std::optional, and even the C Standard Library’s errno. Most of the C++ Standard Library uses exceptions though. This includes all the container types, such as detecting out-of-bounds conditions like SimpleArray
did above:
#include <vector> std::vector<int> v{ 1, 2, 3 }; try { int x = v[1000]; DebugLog(x); // Does not get printed } catch (const std::out_of_range& ex) { DebugLog(ex.what()); // Maybe "vector subscript out of range" }
The same is true for myriad other erroneous operations. For example, calling front
on an empty std::vector
throws an exception because there’s no way it could possibly return a valid T&
or a T&
indicating an error that would work with all types of T
:
#include <vector> std::vector<int> v{}; try { int x = v.front(); DebugLog(x); // Does not get printed } catch (const std::out_of_range& ex) { DebugLog(ex.what()); // Maybe "front() called on empty vector" }
This behavior is very similar to C# where exceptions are used as the preferred error-handling mechanism. Still, some C++ codebases and style guides, especially in video games, forbid using exceptions. C++ compilers typically provide a way to disable this language feature and even to change the error-handling mechanism of the C++ Standard Library so that it doesn’t use exceptions. Some Standard Library variants, notably EASTL by Electronic Arts, also support enabling and disabling exceptions.
In-Place Construction
We’ve seen the use of functions with “emplace” in their names, but not yet delved into what that means. So far they’ve behaved just like other functions such as those with “push” in their names. Now we’ll see how they’re different:
#include <vector> // A class that logs its lifecycle struct Noisy { int Val; Noisy(int val) { Val = val; DebugLog(Val, "val ctor"); } Noisy(const Noisy& other) { Val = other.Val; DebugLog(Val, "copy ctor"); } ~Noisy() { DebugLog(Val, "dtor"); } }; std::vector<Noisy> v{}; v.reserve(2); v.push_back(123); // Prints: // 123, val ctor // 123, copy ctor // 123, dtor v.emplace_back(456); // Prints: // 456, val ctor
Both push_back
and emplace_back
add an element to the end of a std::vector
, but they do it in different ways. The “push” version takes a reference to an element, a Noisy&
or Noisy&&
in this case, and copies it to the array that std::vector
contains. That means this humble function call actually performs several steps.
First, implicitly convert 123
, which is an rvalue because it doesn’t have a name, to a Noisy&&
rvalue reference by inserting a call to the Noisy(int)
constructor. This is why we see the “123, val ctor” log message. The call now looks like this:
v.push_back(Noisy{123});
Second, the implementation of push_back
copies the parameter to its array of Noisy
objects. It’s as though push_back
included a line that used “placement new
” to call the copy constructor with this
being the appropriate location in its array. This is why we see the “123, copy ctor” log message. Here’s a pseudo-code version of how that might look:
void push_back(T&& value) { new (&Elements[EndIndex]) Noisy{ value }; }
Finally, the temporary Noisy
created by the implicit conversion in the first step is destroyed at the end of the statement. As a result, we see the “123, dtor” log message.
On the other hand, emplace_back
does not take the element to add to the std::vector
. Instead, it takes the arguments to pass to the “placement new
” operator. It’s as though emplace_back
is implemented like this:
void emplace_back(int val) { new (&Elements[EndIndex]) Noisy{ val }; }
This means there’s no implicit conversion and therefore only one object is created. We see this with the “456, val ctor” log message. Of course the real implementation of emplace_back
is more complicated so it can work on the arbitrary numbers of parameters required by various element types T
.
There’s no equivalent to this in C# as all element additions use the “push” style of copying. In the case of managed types, a managed reference is copied. The size of the copy is just a pointer, but another reference to the object is created and must be tracked for future garbage collection. Unmanaged types such as primitives and structs are simply copied and the cost depends on their size. Copying large structs may be expensive.
Span
Similar to std::basic_string_view
, C++20 introduces a new “view” type that works with any container: std::span
. This is very similar to the Span
and ReadOnlySpan
types in C#. A ReadOnlySpan
type is unnecessary in C++ as const
can be used to disable mutating member functions and overloaded operators. Like std::basic_string_view
, a std::span
also doesn’t “own” the underlying memory that holds the contained values but rather references it like a pointer would. Here’s the basic usage:
#include <vector> #include <span> // Create a container std::vector<int> v{ 1, 2, 3 }; // Create a span to view the container std::span<int> s{ v }; // Use the span to get the contents of the container for (int x : s) { DebugLog(x); // 1 then 2 then 3 }
One benefit of std::span
is that it abstracts the underlying collection type in a similar way to using IEnumerable<T>
in C#. It uses templates to do this at compile-time rather than runtime and thus avoids the overhead of interfaces’ virtual functions. Its non-explicit
constructor combined with C++’s implicit conversion allows for any type of container to be passed to a function taking a std::span
:
#include <vector> #include <span> void Print(std::span<int> s) { for (int x : s) { DebugLog(x); } } std::vector<int> v{ 1, 2, 3 }; Print(v); // 1 then 2 then 3 int a[] = { 1, 2, 3 }; Print(a); // 1 then 2 then 3
std::span
is able to behave like a container because it contains all the usual member functions: begin
, end
, size
, front
, back
, etc. We can call them directly or indirectly via features like the range-based for
loop above. There’s also an additional function to get a std::span
that views just a portion of an existing std::span
:
#include <vector> #include <span> std::vector<int> v{ 1, 2, 3, 4, 5 }; // A span covering the whole container std::span<int> s{ v }; DebugLog(s.size()); // 5 for (int x : s) { DebugLog(x); // 1, 2, 3, 4, 5 } // A sub-span of the middle 3 elements std::span<int> ss{ s.subspan(1, 3) }; DebugLog(ss.size()); // 3 for (int x : ss) { DebugLog(x); // 2, 3, 4 }
Conclusion
The containers library is designed in such a way that containers are very compatible with each other and with the C++ language itself. They all have iterators that can be used directly or via range-based for
loops just like C# container types can be used with foreach
loops. They consistently handle errors with exceptions, just like C# containers do, but often allow disabling those exceptions. The std::span
type provides an abstract view of any container just like Span
does. All in all, there’s a lot of similarity between C++ and C#.
Some divergence appears when we start to look at advanced operations. Swapping the contents of containers is implemented very efficiently in C++ via template specialization, which C# lacks. Allocation customization is available in two forms, but unavailable in C#. In-place construction of elements is only possible in C++ due to the ability to “forward” constructor parameters and perform “placement new
” construction.
Finally, here’s a comparison table between each of the container types in the two languages:
Container | C++ | C# |
---|---|---|
Dynamic Array | vector |
List |
Static Array | array |
N/A |
Bit Array | vector<bool> |
BitArray |
Value Array | valarray |
N/A |
Double-Ended Queue | deque |
N/A |
Queue | queue |
Queue |
Stack | stack |
Stack |
Hash Map | unordered_map |
Dictionary |
Hash Multimap | unordered_multimap |
N/A |
Map | map |
OrderedDictionary |
Multimap | multimap |
N/A |
Hash Set | unordered_set |
HashSet |
Hash Multiset | unordered_multiset |
N/A |
Set | set |
SortedSet |
Multiset | multiset |
N/A |
Doubly-Linked List | list |
LinkedList |
Singly-Linked List | forward_list |
N/A |
#1 by mactyr on July 29th, 2023 ·
I can’t believe this resource exists and I’m so grateful that it does!
I think there are a few places where you’re actually talking about emplace_back but refer to it as “push_back” (including in the pseudo implementation).
#2 by jackson on July 30th, 2023 ·
I’m glad you’re enjoying the series! Thanks for letting me know about the typos. I’ve updated the article with some fixes.