generated from lorossi/latex-notes-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdata-bases-2-notes.tex
executable file
·4154 lines (3301 loc) · 204 KB
/
data-bases-2-notes.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[english]{article}
\usepackage[title={Data Bases 2 notes}, date={2022/2023}]{notestemplate}
\usepackage{db2}
\begin{document}
\makecover
\section{Introduction}
\subsection{Architecture of the \dbms}
A \textbf{\dbms} \textit{(short for Data Base Management System)} is a system \textit{(or software product)} capable of managing data collections that can be:
\begin{itemize}
\item \textbf{Large}
\begin{itemize}[label=\(\rightarrow\)]
\item much larger than the central memory available on the computer that runs the software
\item often data must be stored on secondary storage devices
\end{itemize}
\item \textbf{Persistent}
\begin{itemize}[label=\(\rightarrow\)]
\item its lifespan is longer than the lifespan of the software that accesses it
\end{itemize}
\item \textbf{Shared}
\begin{itemize}[label=\(\rightarrow\)]
\item used by several applications at the same time
\item various users must be able to gain access to the same data
\end{itemize}
\item \textbf{Reliable}
\begin{itemize}[label=\(\rightarrow\)]
\item ensures tolerance to hardware and software failures
\item the \dbms provides backup and recovery capabilities
\end{itemize}
\item \textbf{Data-ownership respectful}
\begin{itemize}[label=\(\rightarrow\)]
\item the data access is disciplined and controlled by the \dbms
\item users can only access the data they are authorized to
\end{itemize}
\end{itemize}
\begin{figure}[htbp]
\centering
\bigskip
\tikzfig{figure-1.tikz}
\bigskip
\caption{Architecture of the \dbms}
\label{fig:architecture}
\end{figure}
\bigskip
\textbf{Capabilities} of the \dbms:
\begin{itemize}
\item \textbf{Transaction} management
\begin{itemize}[label=\(\rightarrow\)]
\item \textit{ACID} properties \textit{(Section~\ref{sec:ACID-properties})} make sure that a set of operations is performed as a single unit
\end{itemize}
\item \textbf{Concurrency} control
\begin{itemize}[label=\(\rightarrow\)]
\item pessimistic and optimistic locking prevent data corruption in presence of concurrent accesses
\item also known as \cc theory
\end{itemize}
\item \textbf{Reliability} control
\begin{itemize}[label=\(\rightarrow\)]
\item log and recover protocols prevent data loss in case of failures
\end{itemize}
\item \textbf{Buffer} and secondary memory management
\begin{itemize}[label=\(\rightarrow\)]
\item paging and caching techniques improve performance by reducing the number of disk accesses
\end{itemize}
\item \textbf{Physical data structures} and \textbf{access} structures
\begin{itemize}[label=\(\rightarrow\)]
\item sequential, hash-based and tree-based structures are some of the low-level data structures used by the \dbms
\end{itemize}
\item \textbf{Query} management
\begin{itemize}[label=\(\rightarrow\)]
\item cost-based query optimization techniques are used to find the best execution plan for a given query
\end{itemize}
\end{itemize}
\subsection{Database-Application integration}
\begin{itemize}
\item \textbf{Impedance mismatch} handling
\begin{itemize}[label=\(\rightarrow\)]
\item differences between database and application models are solved with code-level procedures and object-relational mapping
\end{itemize}
\item Database \textbf{communication}
\begin{itemize}[label=\(\rightarrow\)]
\item \dbms provides call level interfaces, \texttt{ODBC}-\texttt{JDBC}, and \jpa persistence provider
\item the state of an object and the state of the persistent data that corresponds to it is synchronized by the \dbms via \jpa manage entities
\end{itemize}
\item Data \textbf{ranking}
\begin{itemize}[label=\(\rightarrow\)]
\item questions regarding all kinds of data preference and ranking are solved by the simultaneous optimization of several criteria
\end{itemize}
\end{itemize}
\clearpage
\section{Transactions}
A \textbf{transaction} is an elementary, atomic unit of work performed by an application.
The need for a transaction arises when multiple operations must be performed in a single step or when the data in the application can be manipulated between multiple users at the same time \textit{(properties of reliability and isolation)}.
Each transaction is encapsulated in a \textbf{transaction boundary}, defined by the commands:
\begin{enumerate}
\item \texttt{begin transaction} or \texttt{bot}
\item \texttt{end transaction} or \texttt{eot}
\end{enumerate}
Within a transaction, one of two commands \textbf{is executed exactly once} to signal the end of the transaction:
\begin{enumerate}
\item \texttt{commit-work}, the transaction is committed
\item \texttt{rollback-work}, the transaction is aborted and rolled back
\end{enumerate}
A transaction is defined as \textbf{well formed} if it fulfils the following conditions:
\begin{enumerate}
\item It \textbf{begins} its execution with a \texttt{begin transaction} command
\item It \textbf{ends} its execution with a \texttt{commit-work} or \texttt{rollback-work} command
\item It \textbf{includes} only one command between \texttt{commit-work} and \texttt{rollback-work}
\end{enumerate}
An application is normally composed of multiple transactions, which are executed in a sequence.
\subsection{ACID properties}
\label{sec:ACID-properties}
A transaction must possess \(4\) peculiar properties \textit{(called \textbf{ACID})}:
\begin{itemize}
\item \textbf{Atomicity}
\begin{itemize}
\item a transaction is an indivisible unit of execution: it either \textbf{succeeds} or \textbf{fails} completely
\begin{itemize}
\item if it fails, the data is \textbf{rolled back} to the state it was before the transaction started
\item an error after the end does not alter the effect of the transaction
\end{itemize}
\item if a transaction fails, the \dbms must restore the database to its state before the transaction started
\end{itemize}
\item \textbf{Consistency}
\begin{itemize}
\item the carrying out of the transaction \textbf{does not violate any integrity constraint} defined on the database
\begin{itemize}
\item if that happens, the transaction itself is aborted by the \dbms
\end{itemize}
\item immediate constraints can be checked by the \dbms before the transaction is committed, while deferred constraints can be checked only after
\item if the initial state \(S_0\) is consistent then the final state \(S_f\) is also consistent, while intermediate state \(S_i\) may not be consistent
\end{itemize}
\item \textbf{Isolation}
\begin{itemize}
\item the execution of a transaction is independent of the simultaneous execution of other transactions
\item the parallel execution of a set of transactions gives the result that the same transaction would obtain by carrying them out singularly
\item isolation impacts performance and trade-offs can be defined between isolation and performance
\end{itemize}
\item \textbf{Durability}
\begin{itemize}
\item the effects of a correctly committed transaction are permanent
\item no piece of data is ever lost, for any reason
\end{itemize}
\end{itemize}
\newpage
\subsection{\dbms and ACID properties}
The mechanisms provided by the \dbms, conform to the \textit{ACID} properties, are:
\begin{itemize}
\item \textbf{Atomicity}
\begin{itemize}
\item \textbf{abort}: the transaction is aborted and rolled back
\item \textbf{rollback}: the transaction is rolled back to the state it was before the transaction started
\item \textbf{restart}: the transaction is restarted
\item \textbf{reliability manager}: the \dbms provides a reliability manager that manages the execution of the transaction
\end{itemize}
\item \textbf{Consistency}
\begin{itemize}
\item \textbf{integrity checking of the \dbms}: the \dbms checks the integrity of the database before the transaction is committed
\item \textbf{integrity control system at query execution time}: the \dbms checks the integrity of the database at query execution time
\end{itemize}
\item \textbf{Isolation}
\begin{itemize}
\item \textbf{concurrency control}: the \dbms provides a concurrency control system that manages the execution of the transaction
\end{itemize}
\item \textbf{Durability}
\begin{itemize}
\item \textbf{recovery management}: the \dbms provides a reliability manager that manages the execution of the transaction
\end{itemize}
\end{itemize}
\clearpage
\section{Concurrency}
Since multiple applications can access the same data at the same time, the \dbms must provide a mechanism to control the concurrent access to the data.
The application load of a \dbms can be measured using the number of transactions per second \textit{(TPS)};
by exploiting the parallelism, the \textit{TPS} can be increased.
The \textbf{Concurrency Control System} \textit{(or \cc system)} manages the execution of transactions, avoiding the insurgence of anomalies while ensuring performances.
The anomalies can be:
\begin{itemize}
\item \textbf{Update loss}
\begin{itemize}
\item two transactions try to modify the same data, resulting in the loss of one of the updates
\end{itemize}
\item \textbf{Dirty read}
\begin{itemize}
\item a transaction reads data that has been modified by another transaction that aborts
\item this is a problem with a difficult solution
\end{itemize}
\item \textbf{Inconsistent read} - \textit{(phantom read)}
\begin{itemize}
\item a transaction reads data that has been modified by another transaction that commits
\end{itemize}
\item \textbf{Phantom insert} - \textit{(phantom update)}
\begin{itemize}
\item a transaction writes data that has been read by another transaction that commits
\end{itemize}
\end{itemize}
\subsection{Concurrency control theory}
\textbf{Model}: an abstraction of a system, an object of process, which purposely ignores some details to focus on the relevant aspects.
The concurrency theory builds upon a model of transaction and concurrency control principles that helps understand real-world systems;
they exploit implementation-level mechanisms \textit{(like locks and snapshots)} that help achieve some of the desirable properties postulated by the theory.
For the sake of simplicity, the concurrency theory is based on the following assumptions:
\begin{itemize}
\item A \textbf{transaction} is a \textbf{syntactical object}, of which only the \textit{input} and \textit{output} actions are known
\item All transactions are \textbf{initiated} by the \texttt{begin-transaction} command
\item All transactions are \textbf{terminated} by the \texttt{end-transaction} command
\item The concurrency control system \textbf{accepts} or \textbf{refuses} concurrent executions during the evolution of the transactions, without knowing their outcome \textit{(either with a \texttt{commit} or \texttt{abort} command)}
\item An \textbf{operation} is a \textbf{read} or \textbf{write} of a specific datum by a specific transaction
\item A \textbf{schedule} is a \textbf{sequence of operations} performed by concurrent transactions that respect the order of operations of each transaction
\end{itemize}
\subparagraph*{Transactions notation}
\begin{itemize}
\item A \textbf{transaction} is denoted by \(T_i\), where \(i\) is a number
\item A \texttt{READ} operation on data \(x\) is denoted by \(r_i(x)\)
\item A \texttt{WRITE} operation on data \(x\) is denoted by \(w_i(x)\)
\item A \textbf{schedule} is denoted by the letter \(S\)
\end{itemize}
\subsubsection{Schedules}
Let \(N_S\) and \(N_D\) be respectively the number of \textbf{serial schedules} and \textbf{distinct schedules} for \(n\) transactions \(\langle T_1, \dots, T_n \rangle\) each with \(k_i\) operations.
Then:
\begin{align*}
N_S & = n! \quad & \text{number of permutations of } n \text{ transactions} \\[0.5ex]
N_D & = \dfrac{\displaystyle \left(\sum_{i=1}^{n}\right)!}{\displaystyle \prod_{i=1}^{n} (k_i!)} & \text{number of permutations of all operations}
\end{align*}
\subsubsection{Principles of Concurrency Control}
The goal of the \textbf{Concurrency Control} is to \textbf{reject schedules that cause anomalies}.
To enable Concurrency Control, two components are needed:
\begin{enumerate}
\item \textbf{scheduler}, a component that accepts or rejects the operations requested by the transactions
\item \textbf{serial schedule}, a schedule in which the actions of each transaction occur in a contiguous sequence
\end{enumerate}
\bigskip
A \textbf{serializable schedule} is a schedule that leaves the database in the same state as some serial schedule of the same transactions;
this property is commonly accepted as a notion of \textbf{schedule correctness}.
To identify classes of schedules that ensure serializability, the notion of \textbf{schedule equivalence} is needed;
However, there's a difference between real life and theory:
\begin{itemize}
\item \textbf{in theory}, all transactions are observed \textit{a posteriori} and limited to those that have committed
\begin{itemize}
\item this technique is called \textbf{commit projection}
\item the observed schedule is admissible if the transactions lead to a valid state
\end{itemize}
\item \textbf{in practice}, all schedulers must make decisions while the transactions are still running
\end{itemize}
\begin{figure}[htbp]
\bigskip
\centering
\tikzfig[0.8]{figure-2.tikz}
\caption{Schedules equivalence}
\label{fig:schedules-equivalence}
\bigskip
\end{figure}
Finally, the following definitions are introduced:
\begin{itemize}
\item \textbf{Reads-from} relation: \(r_i(x)\) reads from \(w_j(x)\) in a schedule \(S\) when \(w_j(x)\) precedes \(r_i(x)\) in \(S\) and there's no \texttt{WRITE} operation between them
\begin{itemize}
\item this relation is independent of the time at which the commit \(T_j\) occurs
\end{itemize}
\item \textbf{Final write}: \(w_i(x)\) in a schedule \(S\) is a final write if it is the last write on \(x\) that occurs in \(S\)
\item \textbf{Blind write}: \(w_i(x)\) in a schedule \(S\) is not a final on \(x\) and the following operation on \(x\) is also a write
\item \textbf{View equivalence}: two schedules \(S_i\) and \(S_j\) are view equivalent \textit{(\(S_i \approx_v S_j\))} if they have the same operations, the same final writes and the same final reads
\item \textbf{View serializability}: a schedule is view serializable if it is view-equivalent to a serial schedule of the same transactions
\begin{itemize}
\item the class of view-serializable schedules is called \vsr
\end{itemize}
\end{itemize}
\paragraph{Complexity of View Serializability}
Deciding whether two given schedules are view equivalent is done in polynomial time and space;
deciding whether a generic schedule is in \vsr is a \NPC problem, as it requires considering the \textit{reads-from} and \textit{final writes} of all possible serial schedules with the same operations \textit{(a combinatorial problem)}.
However, by giving up some accuracy, it's possible to increase the performance: a stricter definition of view equivalence is introduced.
This simplification may lead to the rejection of the schedules that are view-serializable under the broader definition.
\subsubsection{Conflict Serializability}
First, the notion of \textbf{conflict} is introduced:
\begin{itemize}
\item Two operations \(o_i\) and \(o_j\), with \(i \neq j\), are \textbf{in conflict} if they address the same resource and at least one of them is a \texttt{WRITE}
\begin{itemize}
\item \textit{read-write} conflicts \textit{R-W} or \textit{W-R}
\item \textit{write-write} conflicts \textit{W-W}
\end{itemize}
\end{itemize}
Then, the notion of \textbf{conflict serializability} is defined:
\begin{itemize}
\item Two schedules \(S_i\) and \(S_j\) are \textbf{conflict-equivalent} \textit{(\(S_i \approx_C S_k\))} if they contain the \textbf{same operations} and in all the conflicting pairs the transactions occur in the \textbf{same order}
\item A schedule is \textbf{conflict-serializable} if it is \textbf{conflict-equivalent to a serial schedule} of the same transactions
\item The class of conflict-serializable schedules is called \textbf{\csr}
\end{itemize}
\paragraph{Relation between VSR and CSR}
First of all, it's immediate to establish that \(\csr \subseteq \vsr\)
\begin{itemize}
\item \textbf{Proof}: there are \vsr schedules that are not \csr because they contain operations that are not in conflict.
\item \textbf{Counter example}: consider the schedule \(r_1(x) w_2(x) w_1(x) w_3(x)\)
\begin{itemize}
\item it's view-serializable, as it's view-equivalent to the schedule \(T_1 T_2 T_3 = r_1(x) w_1(x) w_2(x) w_3(x)\)
\item it's not conflict-serializable, as it contains \textit{R-W} and \textit{W-W} conflicts
\item there is no conflict-equivalent serial schedule
\end{itemize}
\end{itemize}
\bigskip
Therefore it can be deducted that \(\csr \Rightarrow \vsr\): conflict-equivalence \(\approx_C\) implies view-equivalence \(\approx_v\), by assuming that \(S_1 \approx_C S_2\) and \(S_2 \approx_v S_3\).
To achieve this assumption, \(S_1\) and \(S_2\) must have:
\begin{itemize}
\item The \textbf{same final writes}
\begin{itemize}
\item if they didn't, there would be at least two writes in a different order, and therefore they would not be conflict-equivalent
\end{itemize}
\item The \textbf{same reads-from} relations
\begin{itemize}
\item if they didn't, there would be at least two reads in a different order, and therefore they would not be conflict-equivalent
\end{itemize}
\end{itemize}
\begin{figure}[htbp]
\bigskip
\centering
\tikzfig[0.8]{figure-3.tikz}
\caption{Relation between \vsr and \csr}
\label{fig:relation-between-vsr-and-csr}
\bigskip
\end{figure}
\paragraph{Testing \csr}
As already said, determining the conflict serializability of two generic schedules is a \NPC problem.
To test the conflict-serializability of a schedule, it's necessary to build a \textbf{conflict graph} \textit{(or \cg)} that has:
\begin{itemize}
\item One \textbf{node} for each transaction \(T_i\)
\item One \textbf{arc} from \(T_i\) to \(T_j\) if exists at least one conflict between an operation \(o_i\) of \(T_i\) and an operation \(o_j\) of \(T_j\) such that \(o_i\) precedes \(o_j\)
\end{itemize}
The schedule is conflict-serializable \textit{(in \csr)} if and only if the conflict graph is acyclic.
\bigskip
Finally, each schedule \(S \in \vsr\) and \(S \notin \csr\) has cycles in its \cg due to the presence of blind writes.
Such pairs can be swapped without affecting the reads-from and final-write relationships creating a new schedule \(S^\text{mod}\) that is view equivalent to \(S\);
its \cg is acyclic and can be used to find serial schedules that are transitively view equivalent to \(S\).
\paragraph{\csr implies acyclicity of the \cg}
This Paragraph is going to provide a simple explanation of the previous statement.
Consider a schedule \(S\) in \csr.
As such, it is \(\approx_c\) to a serial schedule \(S'\).
Without loss of generality, the transaction of \(S\) can be renamed such that their order is \(T_1, T_2, \dots, T_n\).
Since the serial schedule has all conflicting pairs in the same order as schedule \(S\), the \cg will have arcs only connecting the pairs \((i, j)\) with \(i < j\).
As a direct consequence, the \cg will be acyclic \textit{(a cycle requires at least an arc \((i, j)\) with \(i > j\))}.
\paragraph{Acyclicity of the \cg implies \csr}
Consider once again the schedule \(S\) already explored in the previous Paragraph.
If the \cg is acyclic, then it induces a \textbf{topological ordering} on its nodes \textit{(an ordering such that the graph only contains arcs from \(i\) to \(j\) if \(i < j\))}.
The same partial order exists on the transactions of \(S\).
Generally speaking, any serial schedule whose transactions are ordered according to the partial order is conflict-equivalent to \(S\), because for all conflicting pairs \(i, j\) the transaction \(T_i\) precedes \(T_j\) in both schedules.
\subsection{Concurrency control in practice}
In the real world, the concurrency control methods explained so far are not used directly: \csr checking would be efficient if it was possible to know the graph in the beginning, but it's not the case.
Additionally, as stated, the problem is \NPC, so it's not possible to solve it in a reasonable time.
A scheduler must work \inlinequote{online} \textit{(or \inlinequote{on the fly})}: it must be able to decide for each operation if it can be executed or not, without knowing the whole schedule in advance;
if it's not possible to maintain the conflict graph, then it has to be updated and its acyclicity checked after each operation.
The assumption that concurrency control can work only with the commit-projection of the schedule is not realistic as transactions can be aborted at any time.
To solve this issue, a simple decision criterion is required for the scheduler.
It must:
\begin{itemize}
\item \textbf{Avoid} as many \textbf{anomalies} as possible
\item Have \textbf{negligible overhead}
\end{itemize}
So far, the notation \(r_1(x) w_2(x) w_1(x) w_3(x)\) has been used to represent a schedule, or a posteriori view of the execution of concurrent transactions in the \dbms \textit{(also sometimes called history)}.
A schedule represents \inlinequote{what has happened} or, with more detail, \inlinequote{which operations have been executed by which transactions in which order}.
They can be further restricted by the commit-projection hypothesis so operations are executed by committed transactions.
When dealing with concurrency control, it's important to consider \textbf{arrival sequences}: sequences of operation requests emitted in order by transactions.
With abuse of notation, the arrival schedule will be denoted in the same way as the a posteriori schedule.
The distinction will be clear from the context.
The \cc system maps an arrival sequence to an a posteriori schedule, and it must guarantee that the a posteriori schedule is conflict-serializable.
\bigskip
To implement a real \cc system, two main approaches are used in the real world:
\begin{itemize}
\item \textbf{Pessimistic}
\begin{itemize}
\item based on \textbf{locks} or resource access control
\item if a resource is being used, no other transaction can access it
\item \textbf{prevents the errors from happening}
\end{itemize}
\item \textbf{Optimistic}
\begin{itemize}
\item based on \textbf{timestamping} and versioning
\item serve as many requests as possible, possibly using out-of-date data
\item \textbf{solves the error after it has happened}
\end{itemize}
\end{itemize}
Both families of approaches will be compared later.
% TODO add a reference to the section where this does happen
\subsubsection{Locking}
The concurrency control mechanism implemented by most \dbms is called \textbf{locking}.
It works on a simple principle: a transaction can access a data item \textit{(either through a \texttt{WRITE} or a \texttt{READ} operation)} only if it has acquired a lock on it.
Three primitive operations are defined:
\begin{itemize}[label=\textbf{\texttt{>}}]
\item \texttt{r\_lock(\(x\))}, used to acquire a read lock on \(x\)
\item \texttt{w\_lock(\(x\))}, used to acquire a write lock on \(x\)
\item \texttt{unlock(\(x\))}, used to release the lock on \(x\)
\end{itemize}
The \textbf{scheduler} \textit{(also called lock manager in this context)} receives those requests and decides if they can be executed or not by checking an adequate data structure with minimal computational cost and overhead.
During the execution of \texttt{READ} and \texttt{WRITE} operations, the following rules must be respected:
\begin{itemize}
\item Each \texttt{READ} operation should be preceded by an \texttt{r\_lock} and followed by an \texttt{unlock}
\begin{itemize}
\item this type of lock is \textbf{shared}, as many transactions can acquire it at the same time
\item this lock can be upgraded into a \texttt{w\_lock} via \textbf{lock escalation}
\end{itemize}
\item Each \texttt{WRITE} operation should be preceded by a \texttt{w\_lock} and followed by an \texttt{unlock}
\begin{itemize}
\item this type of lock is \textbf{exclusive}, as only one transaction can acquire it at a time
\end{itemize}
\end{itemize}
When a transaction follows these rules, it's called \textbf{well formed with regard to locking}.
The object can then be in one of \(3\) possible states:
\begin{enumerate}
\item \texttt{free} or \texttt{unlocked}: no transaction has acquired a lock on it
\item \texttt{r-locked}: at least one transaction has acquired a \texttt{READ} lock on it
\item \texttt{w-locked}: exactly one transaction has acquired a \texttt{WRITE} lock on it
\end{enumerate}
The lock manager receives the primitives from the transaction and grants resources according to the conflict table (shown in Table~\ref{tab:conflict-table}).
\begin{table}[htbp]
\bigskip
\centering
\begin{tblr}{colspec={c|c|c|c}, cells={font=\ttfamily}, cell{5}{3}={font=\itshape}}
{status \(\rightarrow\) \\ request \(\downarrow\)} & free & r-locked & w-locked \\
\hline
r\_lock & \colorcmark r-locked & \colorcmark r-locked(n++) & \colorxmark w-locked \\
w\_lock & \colorcmark w-locked & \colorxmark r-locked & \colorxmark w-locked \\
unlock & \colorxmark & \colorcmark/\colorxmark depends on \texttt{n} & \colorcmark free
\end{tblr}
\bigskip
\caption{Conflict table for locking: \texttt{n} is the number of concurrent readers on the object, incremented by \texttt{r\_lock} and decremented by \texttt{unlock}.}
\label{tab:conflict-table}
\end{table}
\paragraph{Predicate locks}
To prevent phantom reads and inserts, a lock should also be placed on \inlinequote{future} data \textit{(inserted data that would satisfy previously issued queries)};
to achieve this goal, a new type of lock is introduced: the \textbf{predicate lock}, represented by the symbol \(\phi\).
Predicate locks extend the notion of data locks to future data:
\begin{itemize}
\item the \textbf{lock manager} can \textbf{acquire} a predicate lock on a predicate \(\phi\)
\item other \textbf{transactions cannot insert, delete or upgrade} any tuple satisfying \(\phi\)
\item if the predicate lock is not supported, the transaction must acquire a lock on all the tuples satisfying \(\phi\); otherwise, the lock is managed via the help of \textbf{indexes}
\end{itemize}
\paragraph{Locks implementation}
Typically, locks are implemented via \textbf{lock tables}: hash tables index the lockable items via hashing.
Each locked item has a linked list of transactions that have required a lock on it.
Every new lock request for the data item is appended as a new node to the list;
locks can be applied to both data and index items.
An illustration of the lock table is shown in Figure~\ref{fig:lock-table}.
\begin{figure}[htbp]
\centering
\bigskip
\tikzfig[0.75]{figure-4.tikz}
\caption{Illustration of a lock table}
\label{fig:lock-table}
\bigskip
\end{figure}
Resources can be \textit{free}, \textit{read-locked}, or \textit{write-locked}.
To keep track of the number of readers, a counter is used.
Transactions requesting locks are either granted the lock or suspended and queued \textit{(handled via a \texttt{FIFO} policy)}.
This technique may cause:
\begin{itemize}
\item \textbf{Deadlocks}
\begin{itemize}
\item two or more transactions are stuck in endless \textbf{mutual wait}
\item typically occurs due to transactions \textbf{waiting for each other} to release a lock
\item explained in Section~\ref{sec:deadlocks}
\end{itemize}
\item \textbf{Starvation}
\begin{itemize}
\item a transaction is waiting for a lock \textbf{for a long time}
\item typically occurs due to write transactions \textbf{waiting for resources} that are continuously \textit{read-locked} by other transactions
\end{itemize}
\end{itemize}
\subsection{Two-Phase Locking - \tpl}
The locking method ensures that the writing actions are exclusive while reading actions can occur concurrently;
however, it doesn't guarantee that the reading actions are consistent with the writing actions.
The schedule mapped by the \cc system can be \textbf{inconsistent} if the following conditions are met:
\begin{itemize}
\item a transaction \(T_1\) reads a data item \(x\) and then writes it
\item a transaction \(T_2\) reads \(x\) before \(T_1\) writes it
\end{itemize}
To avoid this situation, the locking method can be extended with a \textbf{two-phase locking} mechanism \textit{(also called
\tpl for brevity)}.
This method introduces a new restriction: \textbf{a transaction, after having released a lock, cannot acquire another lock}.
As a consequence of this principle, two phases can be distinguished:
\begin{enumerate}
\item \textbf{Growing phase}
\begin{itemize}
\item the transaction \textbf{acquires} all the locks it needs to execute its operations
\item the transfer of an \texttt{r\_lock} into a \texttt{w\_lock} can only appear in this phase
\end{itemize}
\item \textbf{Shrinking phase}
\begin{itemize}
\item the transaction \textbf{releases} all the locks it has acquired
\end{itemize}
\end{enumerate}
An illustration of the resources used in the two phases is shown in Figure~\ref{fig:resources-use-phases-two-phases-locking}.
\begin{figure}[htbp]
\centering
\bigskip
\tikzfig{figure-5.tikz}
\caption{Illustration of the resources used in the two phases of the \tpl}
\label{fig:resources-use-phases-two-phases-locking}
\bigskip
\end{figure}
This extension is sufficient to \textbf{prevent non-repeatable reads and phantom reads}, but it \textbf{doesn't prevent dirty reads};
however, it does ensure serializability.
\bigskip
Finally, a \textbf{scheduler} that:
\begin{itemize}
\item only processes \textbf{well-formed transactions}
\item grants locks according to the \textbf{conflict table}
\item checks that all transactions apply the \textbf{two-phase locking}
\end{itemize}
generates schedules in the \tpl class;
as soon will be shown, those schedules are both \textbf{view-serializable} and \textbf{conflict-serializable}.
It can be noted that:
\[ \tpl \subset \csr \subset \vsr \]
\subsubsection{\tpl implies \csr}
If \(\tpl \subseteq \csr\), then \textbf{every \tpl schedule is also conflict serializable}.
\bigskip
First, let's suppose that a \tpl schedule is not \csr.
Then, there exists a transaction \(T_i\) that has acquired a lock on a data item \(x\) and then released it (a cycle \(T_i \rightarrow T_j \rightarrow T_i\)).
Therefore there must be a pair of conflicting operations in reverse order, such that:
\begin{enumerate}
\item \(\text{OP}_{i}^{h}(x), \text{OP}_{j}^{k}(x) \ldots \text{OP}_{j}^{u}, \ldots \text{OP}_{i}^{w}(x)\) where:
\begin{itemize}
\item at least one of \(\text{OP}_{i}^{h}, \text{OP}_{j}^{k}\) is a \texttt{WRITE} operation
\item at least one of \(\text{OP}_{j}^{u}, \text{OP}_{i}^{v}\) is a \texttt{READ} operation
\end{itemize}
\item when the instructions \(\text{OP}_{i}^{h}, \text{OP}_{j}^{k}\) are executed, the transaction \(T_i\) has acquired a lock on \(x\)
\item later in the schedule, when the instructions \(\text{OP}_{j}^{u}, \text{OP}_{i}^{v}\) are executed,
\begin{itemize}
\item for a conflict to occur, the transaction must have acquired a lock on \(x\)
\item this is a contradiction on the \tpl
\end{itemize}
\end{enumerate}
The inclusion of \tpl in \csr is therefore proved;
this proves also that all \tpl schedules are view-serializable too.
In the same way, the inclusion of \tpl in \vsr can be proved \textit{(showing that \(\csr \subset \vsr\))}.
\paragraph{Alternate proof}
An alternate proof of the same relation can be found in the following steps.
\begin{itemize}
\item Consider a generic conflict \(o_i \rightarrow o_j\) in \(S^\prime\) with \(o_i \in T_i, o_j \in T_J, i < j\)
\begin{itemize}
\item by definition, \(o_i\) and \(o_j\) address the same resource \(r\) and at least one of them is a \texttt{WRITE}
\end{itemize}
\item Is it possible that \(o_i\) and \(o_j\) execute in the reverse order?
\begin{itemize}
\item no, because then \(T_j\) would have released the lock on \(r\) before \(T_i\) acquired it
\item this would be a contradiction of the ordering criterion in the \tpl
\end{itemize}
\end{itemize}
\bigskip
This also proves that all \tpl schedules are view-serializable too and that they can be also checked with negligible overhead.
\paragraph{\tpl is smaller than \csr}
It can be proven that \(\tpl \subset \csr\): every \tpl schedule is also conflict serializable, while the opposite is not always true.
\bigskip
Figure~\ref{fig:relation-between-tpl-csr-vsr-schedules} shows the relation between the three classes of schedules.
\begin{figure}[htbp]
\centering
\bigskip
\tikzfig[0.8]{figure-6.tikz}
\caption{Relation between \tpl, \csr and \vsr schedules}
\label{fig:relation-between-tpl-csr-vsr-schedules}
\bigskip
\end{figure}
\subsection{Strict \tpl}
In the previous Sections, the hypothesis of commit-projection \textit{(no transactions in the schedule are aborted)} was used;
however, the \tpl does not protect against dirty reads \textit{(caused by uncommitted transactions)}, as releasing locks before rollbacks expose dirty data.
In the same way, neither \vsr nor \csr protects against dirty reads.
To remove this hypothesis, an additional constraint to \tpl is added, therefore obtaining the \textbf{\stpl}:
\textbf{locks held by a transaction can be released only after commit or rollback.}
This version of \tpl is used in most of the modern \dbms whenever a level of higher isolation is required.
\bigskip
In practice:
\begin{itemize}
\item \stpl locks are also called \textbf{long duration locks}, while \tpl locks are called \textbf{short duration locks}
\item Real systems may apply \tpl policies differently, depending on the type of the lock:
\begin{itemize}
\item \textbf{write locks} are usually released after commit or rollback \textit{(long duration)}
\item \textbf{read locks} are usually implemented with mixed policies \textit{(both short and long duration)}
\end{itemize}
\item Long-duration read locks are costly in terms of performance and real systems replace them with more complex mechanisms
\end{itemize}
\subsubsection{Use of long duration write locks}
Consider the following schedule \textit{(admissible if write locks are of short duration)} of two transactions \(T_1, T_2\) without the hypothesis that aborts are not possible:
\begin{enumerate}
\item \(w_1(x), \ldots, w_2(x), \ldots, \left( \left( c_1 \lor a_1 \right) \land \left( c_2 \lor a_2 \right) \text{ in any order }\right)\)
\begin{itemize}[label=\(\rightarrow\)]
\item \(T_2\) is allowed to write over the same object updated by \(T_1\) which has not yet completed
\end{itemize}
\item Transaction \(T_1\) aborts
\begin{itemize}
\item the schedule looks like \(w_1(x), \ldots, w_2(x), \ldots, a_1, \left( c_2 \lor a_2 \right)\)
\item there are \(2\) ways to process \(a_1\):
\begin{enumerate}[label=\arabic*.]
\item if \(x\) is restored to the state before \(T_1\), then \(T_2\)'s update is lost; if it commits, \(x\) has a stale value
\item if \(x\) is not restored and \(T_2\) aborts, then its previous state cannot be reinstalled
\end{enumerate}
\end{itemize}
\end{enumerate}
To solve this problem, write locks are held until the completion of the transaction to enable proper processing of aborts.
The anomalies of the above non commit-projection schedule are called \textbf{dirty write} \textit{(or dirty write anomaly)}.
\subsubsection{Isolation levels in \texttt{SQL}}
The \texttt{SQL} standard defines transaction isolation levels which specify the anomalies that should be prevented by the \dbms;
however, the level does not affect write locks.
A transaction is always able to get \textit{(and hold)} an exclusive lock on any data it modifies until its commit point, regardless of the isolation level.
For read operations, levels define the degree of protection against modifications made by other transactions.
Furthermore, different systems offer different guarantees about lost updates.
The different isolation levels are defined in Table~\ref{tab:sql-isolation-levels}.
\bigskip
\texttt{SQL} isolation levels may be implemented with the appropriate use of locks: Table~\ref{tab:sql-isolation-locks} shows the locks that are held by the \dbms for each level.
Normally, commercial systems use locks and timestamp-based concurrency control mechanisms.
\begin{table}[htbp]
\centering
\bigskip
\begin{tblr}{colspec={r|c|c|c}, row{1}={font=\itshape}, column{1}={font=\itshape}}
{error \(\rightarrow\) \\ isolation level \(\downarrow\)} & dirty read & non repeatable read & phantoms \\
\hline
read uncommitted & \colorcmark & \colorcmark & \colorcmark \\
read committed & \colorxmark & \colorcmark & \colorcmark \\
repeatable read & \colorxmark & \colorxmark & \colorcmark \\
snapshot & \colorxmark & \colorxmark & \colorxmark \\
serializable & \colorxmark & \colorxmark & \colorxmark \\
\end{tblr}
\caption{\texttt{SQL} isolation levels}
\label{tab:sql-isolation-levels}
\bigskip
\end{table}
\begin{table}[htbp]
\centering
\bigskip
\begin{tblr}{colspec={r|c|c|c|c|c}, cells={valign=m}, row{1}={font=\itshape}, cell{2-4}{2-6}={font=\itshape}}
& no isolation & read uncommitted & read committed & repeatable read & serializable \\
\hline
\texttt{READ} & none & none & RL & RL & FS \\
\texttt{WRITE} - \texttt{UPDATE} & none or RU & RU & RU & RU & FU or IX \\
\texttt{SELECT} & - & - & - & - & IX \\
\end{tblr}
\caption{\texttt{SQL} isolation levels and locks. Legend: \textit{RL} = record lock; \textit{FS} = file lock; \textit{RU} = record update lock; \textit{FU} = file update lock; \textit{IX} = intent lock}
\label{tab:sql-isolation-locks}
\end{table}
\bigskip
It's important to notice that \textbf{serializable transactions don't necessarily execute serially}:
the requirement is that transactions can only commit if the result would be as if they had been executed serially in any order.
The locking requirements to achieve this level of isolation can frequently lead to deadlocks where one of the transactions needs to be rolled back.
Because of this problem, the \textit{serializable} level is used sparingly and it's not the default in most \dbms.
\bigskip
The \texttt{SQL} code to set the isolation level is shown in Code~\ref{lst:sql-set-isolation-level}.
\begin{minipage}{\textwidth}
\bigskip
\begin{lstlisting}[language=SQL, caption={\texttt{SQL} statement to set the isolation level of a transaction}, label={lst:sql-set-isolation-level}]
SET TRANSACTION ISOLATION LEVEL <isolation level>;
<set transaction statement> ::=
SET [LOCAL] TRANSACTION <transaction characteristics>
<transaction characteristics> ::=
[ <transaction mode> [{, <transaction mode>}...]]
<transaction mode> ::=
<isolation level> | <transaction access mode> | <diagnostics size>
<transaction access mode> ::=
READ WRITE | READ ONLY
<isolation level> ::=
ISOLATION LEVEL <isolation level name>
<isolation level name> ::=
READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE
\end{lstlisting}
\end{minipage}
\subsection{Deadlocks}
\label{sec:deadlocks}
A \textbf{deadlock} is a situation where two or more transactions are waiting for each other to complete.
The presence of a deadlock in a set of transactions can be shown via:
\begin{itemize}
\item \textbf{Lock graphs}: a bipartite graph in which nodes are resources or transactions and arcs represent lock assignments or requests
\item \textbf{Wait-for graphs}: a directed graph in which nodes are transactions and arcs represent requests for locks
\end{itemize}
In both cases, a cycle in the graph represents a deadlock.
Representation of a deadlock via a lock graph and a wait-for graph is shown respectively in Figure~\ref{subfig:lock-graph} and Figure~\ref{subfig:wait-for-graph}.
\begin{figure}[htbp]
\centering
\bigskip
\begin{subfigure}[b]{0.495\textwidth}
\centering
\tikzfig{figure-8.tikz}
\caption{Lock graph}
\label{subfig:lock-graph}
\end{subfigure}
\bigskip
\begin{subfigure}[b]{0.495\textwidth}
\centering
\tikzfig{figure-9.tikz}
\caption{Wait for graph}
\label{subfig:wait-for-graph}
\end{subfigure}
\bigskip
\caption{Representations of a deadlock}
\label{fig:deadlock-representations}
\end{figure}
\bigskip
A deadlock can be created in the following way:
\begin{minipage}{0.496\textwidth}
\bigskip
\begin{itemize}
\item \(T_1: \ r_1(x), w_1(x), c_1\)
\item \(T_2: \ r_2(y) w_2(x), c_2\)
\item[\(\Rightarrow\)] \(T_1\) is waiting for \(T_2\) to release the lock on \(x\)
\item[\(\Rightarrow\)] a deadlock occurs
\end{itemize}
\end{minipage}
\begin{minipage}{0.496\textwidth}
\centering
\tikzfig{figure-8.tikz}
\end{minipage}
\subsection{Deadlock Resolution}
Many different techniques exist to solve deadlocks, such as:
\begin{itemize}
\item \textbf{Timeout}
\item Deadlock \textbf{prevention}
\item Deadlock \textbf{detection}
\end{itemize}
The deadlock resolution techniques will be discussed in more detail in the next Sections.
\subsubsection{Timeout}
A transaction is killed and restarted if it has been \textbf{waiting for a lock for a certain amount of time}, as it's assumed that the wait it's caused by a deadlock.
Determining the maximum time interval is a difficult task, as it can vary greatly;
as such, it must be determined for each system and sometimes can be altered by the database administrator.
The timeout value must be chosen carefully, keeping in mind the following tradeoff:
\begin{itemize}
\item an \textbf{high value} can lead to \textbf{long delays} whenever a deadlock occurs
\item a \textbf{low value} can lead to \textbf{frequent and unneeded restarts} of transactions
\end{itemize}
This is the most used technique to solve deadlocks, as it's the simplest to implement and it's the one that requires the least amount of resources.
\subsubsection{Deadlock Prevention}
The deadlock prevention techniques work by \textbf{killing transactions} that are likely to cause a deadlock.
This solution is implemented mainly in \(2\) ways:
\begin{itemize}
\item \textbf{Resource-based} prevention
\begin{itemize}
\item introduces \textbf{restrictions on lock requests}
\item resources are globally sorted and must be \textbf{locked in a specific order}
\item since not all transactions know beforehand which resources they will need, \textbf{this technique is not very effective}
\end{itemize}
\item \textbf{Transaction-based} prevention
\begin{itemize}
\item introduces \textbf{restrictions} based on the transactions themselves
\item to each transaction an ID is assigned incrementally, \textbf{giving the transaction a priority}
\begin{itemize}
\item the ID assignment can be implemented via \textbf{timestamps}
\end{itemize}
\item \inlinequote{older} transaction don't have to wait for \inlinequote{younger} transactions
\item there are two ways of determining which transaction to kill:
\begin{itemize}
\item \textit{preemptive} - the holding transaction is killed \textit{(wound-wait)}
\item \textit{non-preemptive} - the requesting transaction is killed \textit{(wait-die)}
\end{itemize}
\item the problem with this technique is that \textbf{too many transactions get killed}
\end{itemize}
\end{itemize}
\subsubsection{Deadlock Detection}
This technique requires \textbf{controlling the contents of the lock tables} to reveal possible concurrency issues.
The discovery of a deadlock requires the analysis of the \textit{wait-for graph}, determining if it contains a cycle;
to do so, the \textbf{Obermarck algorithm} is used.
Assumptions of the algorithm:
\begin{itemize}
\item transactions execute on a \textbf{single main node} \textit{(one locus of control)}
\item transactions may be \textbf{decomposed in sub-transactions} running on other nodes
\item when a transaction spawns a sub-transaction, it \textbf{waits} for the latter to complete
\item two kinds of wait-for relationships exist:
\begin{itemize}
\item \(T_i\) waits for \(T_j\) to release a resource on the same node
\item a sub-transaction of \(T_i\) waits for another sub transaction of \(T_i\) running on a different node
\end{itemize}
\end{itemize}
This is proven to be a \textbf{feasible solution} and it's implemented by most \dbms.
\paragraph{Obermarck Algorithm}
\subparagraph*{Goal}
\textbf{Detection of a potential deadlock} by looking only at the local view of a node.
\subparagraph*{Method}
Establishment of a \textbf{communication protocol} whereby each node has a local projection of the global dependencies.
Nodes exchange information and update their local graph based on the received information;
communication has to be further optimized to avoid situations in which multiple nodes detect the same \textit{(potential)} deadlock.
\bigskip
Node \texttt{A} \textbf{sends} its local info to a node \texttt{B} if:
\begin{itemize}
\item \texttt{A} \textbf{contains a transaction} \(T_i\) that is waited from another remote transaction and waits for a transaction \(T_j\) active on \texttt{B}
\item \(i > j\) \textit{(the transaction with the highest ID is the oldest)}, ensuring a \textbf{message forwarding along a node path} where node \texttt{A} precedes node \texttt{B}
\end{itemize}
\bigskip
The algorithm runs periodically at each node and consists of \(4\) steps:
\begin{enumerate}
\item \textbf{get graph info} \textit{(wait for dependencies among transactions and external calls)} from the previous nodes
\begin{itemize}[label=\(\rightarrow\)]
\item sequences contain only node and top-level transaction identifiers
\end{itemize}
\item \textbf{update the local graph} by adding the received info
\item \textbf{check the existence of cycles} among transactions; if found, select one transaction in the cycle and kill it
\item \textbf{send updated graph} info to the next nodes
\end{enumerate}
The algorithm contains the \textbf{arbitrary} choice of:
\begin{itemize}
\item \textbf{sending} the message to the \textbf{following} node (\(i > j\))
\item \textbf{sending} the message to the \textbf{previous} node (\(i < j\))
\end{itemize}
Therefore \(4\) variants of the algorithm exist, each one with a different behaviour;
each of them sends messages in different orders, while every one of them can detect the same deadlocks.
\paragraph{Distributed Deadlock Detection}
To implement the aforementioned algorithm in the context of a distributed system, a \textbf{distributed dependency graph} is created:
external call nodes represent a sub-transaction activating another sub-transaction at a different node.
An illustration of such a graph is shown in Figure~\ref{fig:distributed-dep-graph}.
\begin{figure}[htbp]
\centering
\bigskip
\tikzfig{figure-10.tikz}
\caption{Distributed dependency graph}
\label{fig:distributed-dep-graph}