You're going to use frequency analysis to crack a Caesar cipher to recover the key and the plaintext.
These are methods of encryption where you take the plaintext (the unencrypted text) and encrypt it by substituting one letter for another to produce the ciphertext (the encrypted text).
For example, we might have the following mapping (which is the key for unlocking this cipher, not to be confused with a hash table key):
A -> H B -> Z C -> Y D -> W E -> O
F -> R G -> J H -> D I -> P J -> T
K -> I L -> G M -> L N -> C O -> E
P -> X Q -> K R -> U S -> N T -> F
U -> A V -> M W -> B X -> Q Y -> V
Z -> S
So if you have plaintext like HELLO, WORLD!
, use the above table and
H
becomes D
, E
becomes O
, and so on to produce ciphertext
DOGGE, BEUGW!
To decode, just do the reverse, D
becomes H
, etc.
But what if you evesdrop on some ciphertext, but don't know the key (the mapping). How can you decode it?
Turns out, letters occur in the English language with a known frequency.
The letter A
is 8.46% of all letters, for example.
(Disclaimer: these are not the actual frequencies in general english prose. They're contrived for this specific challenge so that you get a decent result, but they're quite close to the real percentages.)
Letter | Percentage |
---|---|
E | 11.53 |
T | 9.75 |
A | 8.46 |
O | 8.08 |
H | 7.71 |
N | 6.73 |
R | 6.29 |
I | 5.84 |
S | 5.56 |
D | 4.74 |
L | 3.92 |
W | 3.08 |
U | 2.59 |
G | 2.48 |
F | 2.42 |
B | 2.19 |
M | 2.18 |
Y | 2.02 |
C | 1.58 |
P | 1.08 |
K | 0.84 |
V | 0.59 |
Q | 0.17 |
J | 0.07 |
X | 0.07 |
Z | 0.03 |
In other words, ordered from most frequently used to least, the letters are:
'E', 'T', 'A', 'O', 'H', 'N', 'R', 'I', 'S', 'D', 'L', 'W', 'U',
'G', 'F', 'B', 'M', 'Y', 'C', 'P', 'K', 'V', 'Q', 'J', 'X', 'Z'
E
is the most frequent letter. Z
is the least frequent. And M
is
somewhere in the middle.
So if you have a large enough block of ciphertext, you can analyze the
frequency of letters in there. And if X
is the most frequent, then
it's a safe bet that the key includes this mapping:
E -> X
Write a program that automatically finds the key for the ciphertext in
the file ciphertext.txt
, then decodes it and shows
the plaintext.
(All non-letters should pass through the decoding as-is, i.e. spaces and punctuation should be preserved. The input will not contain any lowercase letters.)
No tests are provided for this one, but the result should be readable, with at most a handful of incorrect letters.