|
| 1 | +--- |
| 2 | +authors: [gaurav] |
| 3 | +layout: post |
| 4 | +title: Copy Semantics |
| 5 | +categories: [cpp] |
| 6 | +tags: [c++, c++98, c++03, copy, copy-constructor, copy-semantics] |
| 7 | +--- |
| 8 | + |
| 9 | +Copy semantics refer to the rules and mechanisms by which objects are copied or cloned when they |
| 10 | +are assigned to another object or passed as function arguments. It creates objects that are |
| 11 | +`equivalent` and `independent` i.e |
| 12 | + |
| 13 | +1. `source == destination` |
| 14 | +2. modification to one object does not cause modification to other |
| 15 | + |
| 16 | +{: file="copy.cpp" } |
| 17 | + |
| 18 | +```c++ |
| 19 | +#include <cassert> |
| 20 | + |
| 21 | +struct Rectangle { // plain old datatypes (POD) |
| 22 | + int length; |
| 23 | + int breadth; |
| 24 | +}; |
| 25 | + |
| 26 | +int area(Rectangle r) { // pass by value, implicit copy |
| 27 | + return r.length * r.breadth; |
| 28 | +} |
| 29 | + |
| 30 | +int main() { |
| 31 | + int x = 10; |
| 32 | + int y = x; // 1. construct, implicit copy |
| 33 | + assert(x == y); |
| 34 | + |
| 35 | + Rectangle rect {10, 20}; |
| 36 | + assert(area(rect) == 200); // 2. call area, |
| 37 | +} |
| 38 | +``` |
| 39 | +
|
| 40 | +1. `int y = x;` copies the value `10` from variable `x` to `y`. |
| 41 | +2. call to a function `area(rect)` copies values stored in `rect.length` and `rect.breadth` to |
| 42 | +parameter `r.length` and `r.breadth` |
| 43 | +
|
| 44 | +By default compiler generates a default copy constructor and a default copy assignment operator if |
| 45 | +required, which performs `member wise copy` i.e it copies each member from source to destination. |
| 46 | +
|
| 47 | +In case of basic datatypes such as `int` it is just copying a single value `y = x` and in case of |
| 48 | +plain old datatypes (POD) it copies each member variable from source to destination `r = rect`. |
| 49 | +
|
| 50 | +This is also known as `shallow copy`. For basic datatypes and POD this is not an issue. But for user |
| 51 | +defined classes/structures which include pointer member variables, this can be problematic. |
| 52 | +
|
| 53 | +## Shallow Copy |
| 54 | +
|
| 55 | +{: file="shallow_copy.cpp" } |
| 56 | +
|
| 57 | +```c++ |
| 58 | +#include <cstddef> |
| 59 | +
|
| 60 | +struct DynamicArray { |
| 61 | + DynamicArray(size_t size) |
| 62 | + : m_size {size} |
| 63 | + , m_ptr {new int[m_size]} |
| 64 | + {} |
| 65 | +
|
| 66 | + ~DynamicArray() { |
| 67 | + delete[] m_ptr; |
| 68 | + } |
| 69 | +
|
| 70 | + size_t m_size; |
| 71 | + int* m_ptr; |
| 72 | +}; |
| 73 | +
|
| 74 | +int main() { |
| 75 | + DynamicArray arr(10); |
| 76 | + { |
| 77 | + DynamicArray arr_copy(arr); // Copies arr into arr_copy. member by member |
| 78 | + } |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +{: file="output" } |
| 83 | +{: .nolineno } |
| 84 | + |
| 85 | +```bash |
| 86 | +g++ -std=c++11 -fsanitize=address shallow_copy.cpp && ./a.out |
| 87 | + |
| 88 | +================================================================= |
| 89 | +==11528==ERROR: AddressSanitizer: attempting double-free on 0x604000000010 in thread T0: |
| 90 | + #0 0x7ff4a3010780 in operator delete[](void*) |
| 91 | +``` |
| 92 | + |
| 93 | +As described earlier, compiler provided copy constructor performs member by member copy i.e |
| 94 | +`arr_copy.m_ptr = arr.m_ptr`. |
| 95 | + |
| 96 | +So now `m_ptr` of both the objects are pointing to the same address. And this is bad. |
| 97 | + |
| 98 | +1. `No independent modifications`: Changes done through `arr.m_ptr` will be reflected in |
| 99 | + `arr_copy.m_ptr` and vice-versa |
| 100 | +2. `Undefined behavior`: Due to limited scope `arr_copy` is destroyed first resulting in deletion of |
| 101 | + memory pointed by `arr_copy.m_ptr`. Now any operation done through `arr.m_ptr` will result in |
| 102 | + undefined behavior since the memory block to which `arr.m_ptr` is pointing is already deleted. |
| 103 | + Pointing to deleted memory is also known as `dangling pointer`. |
| 104 | +3. `Double free`: At the end of the program both objects go out of scope and are destroyed. Since |
| 105 | + both the pointers are pointing to the same memory this results in deletion of the same memory |
| 106 | + twice and we can see this error in the output. |
| 107 | + |
| 108 | +To deal with such issues we need `deep copy`. |
| 109 | + |
| 110 | +## Deep Copy |
| 111 | + |
| 112 | +Issue with shallow copy was it simply copies the values directly even in case of pointer variables. |
| 113 | +To fix this deep copy first allocates a sepearate memory to pointer member variables and then |
| 114 | +copies the contents stored from source memory address to the newly allocated memory address. |
| 115 | +Now each copy contains its own unique set of data, even if that data includes references or |
| 116 | +pointers to other objects. |
| 117 | + |
| 118 | +Deep copy is implemented explicitly by the programmer by providing user defined `copy constructor` |
| 119 | +and `copy assignment operator`. |
| 120 | + |
| 121 | +{: width="800" } |
| 122 | + |
| 123 | +As can be seen in the image after copy operation members of `destination` and `source` both point to |
| 124 | +different address but have same contents (shapes). |
| 125 | + |
| 126 | +### Copy Constructor |
| 127 | + |
| 128 | +{: file="copy_constructor.cpp" } |
| 129 | + |
| 130 | +```c++ |
| 131 | +#include <iostream> |
| 132 | +#include <algorithm> |
| 133 | + |
| 134 | +DynamicArray(const DynamicArray& o) |
| 135 | + : m_size {o.m_size} |
| 136 | + , m_ptr {new int[m_size]} // 1. memory allocation |
| 137 | + { |
| 138 | + std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr); // 2. copy values from source to dest |
| 139 | + } |
| 140 | + |
| 141 | +void printArray(DynamicArray arr) { |
| 142 | + for (size_t i = 0; i < arr.m_size; ++i) { |
| 143 | + std::cout << i << std::endl; |
| 144 | + } |
| 145 | +} |
| 146 | + |
| 147 | +int main() { |
| 148 | + DynamicArray arr(10); |
| 149 | + { |
| 150 | + DynamicArray arr_copy(arr); // Invokes copy constructor |
| 151 | + } |
| 152 | + printArray(arr); // creates a temporary copy of arr and passes to printArray |
| 153 | +} |
| 154 | +``` |
| 155 | +
|
| 156 | +1. `m_ptr {new int[m_size]}`: Separate memory is allocated to `arr_copy.m_ptr` |
| 157 | +2. `std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr);`: copies values stored at memory address pointed |
| 158 | + by `arr.m_ptr` to newly allocated memory address `arr_copy.m_ptr` |
| 159 | +
|
| 160 | +### Why Copy Constructor Takes Argument By Reference |
| 161 | +
|
| 162 | +Canonical signature of copy constructor is `DynamicArray(const DynamicArray& o)`. |
| 163 | +
|
| 164 | +If it will recieve the argument by pass by value, then when copy constrcutor is invoked it will need |
| 165 | +a copy of the argument which will in turn invoke the copy constructor, which would again call the |
| 166 | +copy constructor and this will continue recursively until stack is full. |
| 167 | +
|
| 168 | +So it takes a parameter by reference. |
| 169 | +
|
| 170 | +### Why Copy Constructor Takes Const Argument |
| 171 | +
|
| 172 | +To avoid accidental modfications to the source object. Also const reference `const &` allows copy |
| 173 | +constructor to receive `temporary objects`. |
| 174 | +
|
| 175 | +### Copy Assignment |
| 176 | +
|
| 177 | +Copy constructor solves only the half problems, we can get into same issues if copy assignment |
| 178 | +operator is not provided. |
| 179 | +
|
| 180 | +```c++ |
| 181 | +int main() { |
| 182 | + DynamicArray arr(10); |
| 183 | + DynamicArray other(5); |
| 184 | + other = arr; // Invokes compiler provided copy assignment |
| 185 | +} |
| 186 | +``` |
| 187 | + |
| 188 | +Since both the objects already exist, `other = arr` invokes compiler provided assignment operator, |
| 189 | +which performs member by member copy and will result in the same issue of two pointers pointing to |
| 190 | +the same memory. |
| 191 | + |
| 192 | +{: file="copy_assignment.cpp" } |
| 193 | + |
| 194 | +```c++ |
| 195 | +DynamicArray& operator=(const DynamicArray& o) |
| 196 | +{ |
| 197 | + if (this == &o) { // 1. prevents self assignment, arr = arr |
| 198 | + return *this; |
| 199 | + } |
| 200 | + delete[] m_ptr; // 2. delete any existing memory if any |
| 201 | + m_size = o.m_size; |
| 202 | + m_ptr = new int[m_size]; |
| 203 | + std::copy(o.m_ptr, o.m_ptr + o.m_size, m_ptr); |
| 204 | + return *this; |
| 205 | +} |
| 206 | + |
| 207 | +int main() { |
| 208 | + DynamicArray arr(10); |
| 209 | + DynamicArray other(1); |
| 210 | + arr = other; // Since arr already exists, invokes copy assignment |
| 211 | +} |
| 212 | +``` |
| 213 | + |
| 214 | +Implentation of copy constructor and copy assignment operator is almost same with three small but |
| 215 | +important differences. |
| 216 | + |
| 217 | +1. Self assignment check `if (this == &o)`: Statement such as `arr = arr` is a self assignment, if |
| 218 | + the program does not check for self assignment then it will result in deletion of `m_ptr` first |
| 219 | + and on next line try to allocate new memory and will loose its original content. |
| 220 | +2. Delete pre-allocated memory `delete[] m_ptr;`: In assignment both the object already exist and |
| 221 | + `m_ptr` might be pointing to valid memory. Allocating new memory without delete will result in |
| 222 | + memory leak. |
| 223 | +3. `return *this;`: Returning reference to self is not mandatory but then you cannot perform |
| 224 | + assignment chaining `(a = b = c)`. |
| 225 | + |
| 226 | +## Note |
| 227 | + |
| 228 | +> If you find the need to provide a custom implementation for either the `copy constructor`, |
| 229 | + `copy assignment operator`, or `destructor`, it's a strong indication that you should consider |
| 230 | + providing custom implementations for all three of them. This principle is commonly referred to as |
| 231 | + the `Rule of Three`. |
| 232 | +{: .prompt-info } |
| 233 | + |
| 234 | +## Conclusion |
| 235 | + |
| 236 | +Understanding copy semantics is crucial for managing the behavior of your C++ programs and |
| 237 | +controlling how user defined structures are copied especially when pointer member variables are |
| 238 | +involved. |
| 239 | + |
| 240 | +Copy semantics is an important concept but it has its own downsides. For smaller size objects, |
| 241 | +it is tolerable, but for larger ones, it leads to noticeable performance degradation due to the |
| 242 | +creation of numerous temporary copies. |
| 243 | + |
| 244 | +To address this inefficiency, c++11 introduced the concept of `move semantics`. If you're a |
| 245 | +technical enthusiast looking to optimize your code and understand the inner workings of move |
| 246 | +semantics, dive into our next blog on the topic. |
0 commit comments