This repository shares my learning materials from study concurrency-in-go
written by
Katherine Cox Buday. The book is excellent and using this
repository along side studying the book can add some extra benefit.
The book in question can be purchased to support Katherine via oreilly
at Concurrency in Go
- 🔎 Introduction Materials
- 👀 Caveats
- ⛺ Race Conditions
- ⛺ Atomicity
- ⛺ Dead Locking & Starvation
- ⛺ Communication Sequential Processes
- ⛺ Concurrency Building blocks
- ⛺ Synchronisation Primities
- ⛺ Pipelining
- ⛺ Patterns
The following materials are critical to understand when learning about concurrency in go:
- Placeholder
- Placeholder
- Placeholder
Important
Please read Go Memory Model
Understanding the golang memory model is important when dealing with concurrency to fully understand why certain things may cause subtle bugs or be head scratching. Let's take one example:
var a, b int
func f() {
a = 1
b = 2
}
func g() {
print(b)
print(a)
}
func main() {
go f()
g()
}
Seems relatively straight forward? It is completely possible here that from the reading perspective
of one goroutine, lets call it goroutine 2
that the writes in another goroutine (1
) - which invoked
f()
asynchronously are not visible OR the write assigning b is visible, but a is not!
This code can print out 20
where b is 2
and a is 0
, you may be very confused by that as
b was assigned after a, however for various reasons (and lack of synchronisation) such as the compiler
may even rewrite these lines or the processor may be smart at runtime.
Always use synchronisation primitives when required to ensure correctness.
A race condition
occurs when code written has a naive expectation on execution
order. Often a developer expects the code written to execute as it is written.
These kinds of bugs can often be hard(er) to debug and can lie hidden until
things are scaled up.
Caution
Attempting to manually force goroutine scheduling / context switching is considered an anti-pattern and should strongly be avoided.
Race Conditions: A basic Introduction
Race Conditions: Causing a context switch
Atomicity
is the concept that something is indivisible or uninterruptable within
a particular context
. Context is very important here. Something that is
atomic
within your process (such as an atomic add leveraging CPU swap instructions)
is not atomic
in the context of the operating system.
Performance
plays a vital part in managing the parallelism of code and when using
various primitives to guard against race conditions, a performance penalty must be
considered.
Note
Having an 'opt-in' convention for using an API that requires users to remember to guard the critical sections is error prone, try and build this into your APIs and have function docstrings articulate when this is (or isn't) the case
Atomicity: A Basic Introduction
At a basic level, ensuring atomicity with locking critical sections is not the be all and end all. All of this can be done however you can still run into other problems, such as multiple blocks on locks. Go is not by default re-entrant in terms of mutexes etc so this is another case of problems that need to be considered.
This section covers case of dead locking, live locking and starvation.
In order to understand where deadlocking can occur, there are a few conditions we can
evaluate, these are known as the Coffman Conditions
:
Mutual Exclusion
: A concurrent process holds exclusive rights to a resource at any time.Wait-For Condition
: A concurrent process must simultaneously hold a resource and wait for another.No Premption
: A resource held by a concurrent process can only be released by that process itself.Circular Wait
: A concurrent Process (P1) must be waiting on a chain of other concurrent processes (P2, ...PN), which are in turn waiting on it (P1).
Note
Preventing even one of the 4 conditions above, can help prevent deadlocking!
Tip
Try to limit the scope of locking to critical sections to start, rather than being broad with locking see the starvation example. It is much easier to widen the locking later, than to reduce it.
Go is modelled (but not entirely) on the ideas of Tony Hoare's
CSP
(Communication Sequential Processing).
Note
Don't communicate by sharing memory, share memory by communicating!
Note
Go channels/select are powerful CSP primitives, go still offers typical mutexes etc via the sync package.
The goroutine
is the core building block behind go's excellent concurrency model. A goroutine is a function that
is running concurrently
(maybe in parallel).
Note
A goroutine is not guaranteed to be running in parallel, you may have a single core machine!
The goroutine
in go is NOT an OS thread, but is also not considered a green
thread (A thread that
is managed by the languages runtime). They are infact a higher level of abstraction known as coroutines
.
A coroutine
is simply a concurrent subroutine (think function, method, closures etc) that are not
preemptive
(they cannot be interrupted). Instead there are various points in their lifecycle where
they can be stopped or enables re-entry.
One of the beauties of go is that the goroutines are heavily managed by the go runtime and is abstracted away from the users. The go runtime can inspect goroutines at runtime to detect operations that would block (and resume) and suspend/re-enter respectively.
Go's mechanism for hosting goroutines is an implementation of whats called an M:N scheduler
. This means
it maps M
green-threads -> N
operating system threads. When more goroutines are spawned than there
are green threads the go scheduler will handle the distribution of the goroutines across the available
threads and for ones that block, ensure others can run.
Note
More on the go scheduler later, see the topic -> The Go Runtime Scheduler.
Note
Go uses the fork-join model of concurrency.
Caution
Time.Sleep does NOT create a join point, avoid attempting to use it as one.
Goroutines
operate in the same address space in which they were created, this means they can
access / read variables in their scope when running closures for exmaple:
Go's compiler will take care of keeping variables in scope to avoid goroutines accessing garbage collected memory.
Goroutines
are extremely efficient, a fresh goroutine is only given a few kilobytes (2KB at
the time of writing this) which in most cases is often enough for the task. If it isn't the
go runtime will grow (and shrink) the memory for storing the goroutines stack automatically.
This allows go programs to use very little memory in comparison and it is practical to create hundreds if not thousands of goroutines in the same address space. If goroutines were threads the system would be completely overloaded much faster.
In the example above, we created 10k goroutines on our system, using a simple 64 bit CPU with 32GB
of memory. The 10k routines took up a total of 2.59KB
per goroutine. Without using swap space
on this system, it would in theory be possible (if their stacks didn't need to grow ofcourse) to
spawn a total of 2^5 (~32GB)
of ram => 12_350_000+
(yes, 12.3 million) goroutines without
using swap space!
Caution
Because you can, doesn't mean you should! Switching between this many routines will have a heavy penalty!
In the world of os threads, context switching can be pretty costly. It's not to say that it also cannot hurt goroutines when the coroutines must be suspended/reentered etc, but its another win for go in that it is less costly.
Here we can see a performance increase of over 90%
compared to switching OS threads on my machine on linux:
A waitgroup is an atomic counter that allows waiting for a collection of goroutines to finish. This is useful only if you do not care about routine return values, or if you do you have another mechanism for collecting them:
Another synchronisation primitive, most familiar to those from other languages that handle synchronisation outside of CSP.
A sync.Cond
is a mechanism for creating a redezvous point for multiple goroutines that be alerted/awoken
by a particular event
. By event, we essentially mean a signal of some sort that it is 'ok to proceed'.
The sync.Cond
type can signal a single goroutine (of potentially all the ones waiting) or tell them
all that is ok to proceed.
The sync
package offers primarily low level synchronisation primitives.
Some of the common usages of the sync
package are WaitGroup
, various
flavours of Mutex (Locker)
types and the Once
variants aswell as other
low level primitivies.
While pipelining is considered just another pattern, I feel it warrants an individual
topic of its own. Pipelining is a concurrent program that has multiple sequential
stages that are parallelised internally. Both a basic and complex example of pipelining
exist in:
Typically within a pipeline, both the first and last stages have a single entry point (channel) i.e generating numbers (in) returning the transformed results (out). Interim stages typically take an upstream inbound channel and yield their results to and outbound one.
A collective of patterns with explanations can be found below:
Pattern | Summary |
---|---|
01 Basic Goroutine | A simple introduction to goroutines. |
02 Basic Channel | A simple introduction to channels. |
03 Generator | A python like generator |
04 Fan In | Fan in multiple goroutines |
05 Restore Sequence | Fan in multiple goroutines with equal yielding |
06 Select Timeout | Cause a goroutine to terminate conditionally |
07 Quit Signal | Cancel a goroutine with an channel send |
08 Daisy Chain | A simulation of chinese whispers with goroutines |
09 Basic Pipeline | A simple mathematical example of pipelining |
10 Advanced Pipeline | A smarter parallel pipeline |