Cryptography Part 3 - Frequency and Pattern Analysis - Security Series #16.2

In my first entry in this series I challenged you to do a little bit of cryptanalysis. This was, hopefully, a fun exercise to get you thinking about cryptography works and how it can be easily broken if it is not implemented properly.

Of course the examples I am using in this series are extremely simple and I hope that no one would consider using any of them in production code. The point of showing you these things is partly for fun and partly to get you thinking about how cryptanalysts work on cracking more complicated algorithms using crazy mathematics and powerful computers.

Cracking the simple ciphers we've been looking at could probably being done in many of your heads. But I want to take a minute to look at the process in a little more formal way. So let's walk through the steps of cracking the Caesar Cipher using simple frequency and pattern analysis.

Cracking the Caesar Cipher - The Hard Way

Yes, this exercise is more work than it needs to be for solving a simple ROT cipher, but these are the basics skills of cryptanalysis and exercises like these are fun (for geeks).

I chose a random passage out of a research paper I wrote on cryptography a few months ago. I then encrypted it using the Caesar Cipher. Now if you really want to, you can just use the function I showed in my last post to decrypt this message, but I wanted to look at the process that we'd use if the algorithm were more complicated or if we were unfamiliar with the Caesar Cipher.

Here is the ciphertext of the message:


BAR BS GUR SRNGHERF BS GUR PNRFNE PVCURE BE INEVNGVBAF BS VG GUNG ZNXRF VG RNFVYL PBZCEBZVFRQ VF GUNG RNPU PUNENPGRE AB ZNGGRE UBJ ZNAL CBFVGVBAF VG VF FUVSGRQ VF PBAFVFGRAGYL ERCERFRAGRQ OL GUR FNZR PBEERFCBAQVAT PUNENPGRE VS GUR PVCUREGRKG PUNENPGRE SBE N VF A GURA VG VF NYJNLF A GUEBHTUBHG GUR RAPELCGRQ ZRFFNTR GUVF ZRNAF GUNG PELCGNANYLFVF PNA HFR PUNENPGRE SERDHRAPL NANYLFVF GB ZNXR THRFFRF NG JUVPU PVCUREGRKG PUNENPGREF PBEERFCBAQ GB JUVPU CYNVAGRKG PUNENPGREF

The most commonly used letters in the English language are E, T, O, A, N, I, R, S, and H, in that order. In the ciphertext above, the most common letters are R, G, N, F, E, U, P, V, and B.

The cryptanalyst could start by replacing the letter R in the ciphertext with the letter E, and the letter G with the letter T.


**E ** T*E *E*T**E* ** T*E **E*** ****E* ** *****T**** ** *T T**T ***E* *T E***** *********E* ** T**T E*** ******TE* ** **TTE* *** **** ****T**** *T ** ****TE* ** ******TE*T** *E**E*E*TE* ** T*E ***E ****E******** ******TE* ** T*E ****E*TE*T ******TE* *** * ** * T*E* *T ** ****** * T********T T*E E*****TE* *E****E T*** *E*** T**T ****T******** *** **E ******TE* **E**E*** ******** T* ***E **E**E* *T ***** ****E*TE*T ******TE** ****E***** T* ***** *****TE*T ******TE**

This looks like a reasonable start. Nothing stands out as inappropriate (like three T characters in a row). The cryptanalyst might then move on to replacing the letter N with the letter O, but first, she might notice that the third word is currently T*E. There is almost no doubt that it should be the word THE. In the ciphertext, the letter U is in that position. So the cryptanalyst might next replace the letter U with the letter H.

The cryptanalyst might also notice that the characters *T appear in the text three times. Looking at the ciphertext, the characters are VG in all three cases, so they are all three the same word. There are only two common English words that are two letters long and end with T. It is more likely for the word "it" to appear three times in a paragraph than the word "at", so the cryptanalyst also decides to replace the letter V with the letter I.


**E ** THE *E*T**E* ** THE **E*** *I*HE* ** ***I*TI*** ** IT TH*T ***E* IT E**I** *******I*E* I* TH*T E**H *H****TE* ** **TTE* H** **** ***ITI*** IT I* *HI*TE* I* ****I*TE*T** *E**E*E*TE* ** THE ***E ****E*****I** *H****TE* I* THE *I*HE*TE*T *H****TE* *** * I* * THE* IT I* ****** * TH****H**T THE E*****TE* *E****E THI* *E*** TH*T ****T******I* *** **E *H****TE* **E**E*** ******I* T* ***E **E**E* *T *HI*H *I*HE*TE*T *H****TE** ****E***** T* *HI*H ***I*TE*T *H****TE**

Finally, the cryptanalyst might notice the remaining two-letter words in the ciphertext and assume that they are either "of", "on" or "or". All are common two-letter words and all begin with O. Looking at the plain text, she sees that there are three instances of BS and one instance of BE. On a whim, she decides that OF is a more common word than OR or ON and that OR is more common than ON, so she replaces B with O, S with F, and E with R.


O*E OF THE FE*T*RE* OF THE **E**R *I*HER OR **RI*TIO** OF IT TH*T ***E* IT E**I** *O**RO*I*E* I* TH*T E**H *H*R**TER *O **TTER HO* **** *O*ITIO** IT I* *HIFTE* I* *O**I*TE*T** RE*RE*E*TE* ** THE ***E *ORRE**O**I** *H*R**TER IF THE *I*HERTE*T *H*R**TER FOR * I* * THE* IT I* ****** * THRO**HO*T THE E**R**TE* *E****E THI* *E*** TH*T *R**T******I* *** **E *H*R**TER FRE**E*** ******I* TO ***E **E**E* *T *HI*H *I*HERTE*T *H*R**TER* *ORRE**O** TO *HI*H ***I*TE*T *H*R**TER*

From here, the cryptanalyst can easily start seeing word patterns and fill in spaces until she solves the cipher. She will more likely notice first that the E replaced R, R replaced E, and that A replaced N and N replaced A.


ONE OF THE FEATURES OF THE CAESAR CIPHER OR VARIATIONS OF IT THAT MAKES IT EASILY COMPROMISED IS THAT EACH CHARACTER NO MATTER HOW MANY POSITIONS IT IS SHIFTED IS CONSISTENTLY REPRESENTED BY THE SAME CORRESPONDING CHARACTER IF THE CIPHERTEXT CHARACTER FOR A IS N THEN IT IS ALWAYS N THROUGHOUT THE ENCRYPTED MESSAGE THIS MEANS THAT CRYPTANALYSIS CAN USE CHARACTER FREQUENCY ANALYSIS TO MAKE GUESSES AT WHICH PLAINTEXT CHARACTERS

Conclusion

Like I said, this is more work than it needs to be. But I think it is a fun exercise. We were able to take a bit of gibberish and decrypt it by using English language letter frequency knowledge and by matching patterns in English language words. And we did it without the help of programming.

Comments
BlogCFC was created by Raymond Camden. This blog is running version 5.9.1. Contact Blog Owner