Cryptography Part 2 - Modular Mathematics - Security Series #16.1

OK, first, let's get this out of the way. I am not a math guy. Not even close. I enjoyed "Math for the Liberal Arts Major" back in my community college days, but I never even completed college algebra (though it is on my list of things to go back and do). So I am about to explain some math, but there is a GOOD chance that I am going to butcher this. If so, please correct me.

Now with my disclaimer out of the way, I can say, "MATH IS COOL"! I really do enjoy the little bit of math that I know and while researching cryptography, I came across a little more. In Cryptography: A very short Introduction I was introduced to modular arithmetic.

As the book states, "Modular arithmetic is only concerned with integers... If N is a positive integer, then arithmetic modulo N uses only the integers 0,1,2,3,... N-1". This means that for modulo N we are only using the numbers 0 through N-1. So in the case of the Caesar Cipher, which uses 26 characters (the English alphabet) we are only going to deal with the integers 0-25.

Now with a shift cipher, like Caesar, we should be able to shift the alphabet any number of spaces. We don't need to worry about shift left vs right, a shift of one to the left is the same as a shift of 25 to the right. So I will only be shifting in one direction.

So I wrote a function that will try all combinations of shift, from 1 through 25 (a shift of zero would always produce the plaintext, so not very good encryption). We'll look at the function in a few, but let's first look at why modulo math works for this.

Modulo Math and 0-25

So we only want to deal with numbers 0-25 which represent two things. The numbers first represent the letters A-Z, but then a single number 0-25 will represent the shift.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

So, as you can see in this table, A is represented by 0, Z is represented by 25 and each number in between is appropriately numbered in order.

Now we need to assign a shift. For simplicity sake, we do a shift of 1. This we can do in our head. So we will add one number to each value. Now A will be 1, B will be 2, C will be 3 and Z will be 26.

Wait! 26? I thought we only wanted to deal with 0-25.

That's what you were just thinking, right? We need Z to be represented by 0.

Now the hacky solution would be to say, "Well, I will just check to see what the value is and if it is 26, then I will make it 0. But that's not cool math, that is bad programming.

Let's look at cool math. We need to find a formula where we can apply the shift and have A end up equal to 1 and Z equal to 0. How can we do that?

MODULO!

Hopefully we are all familiar with the MOD operator in ColdFusion. We all use it when we need to apply an operation to every-other-something.


<cfif something MOD 2 EQ 0>...</cfif>

But now we are going to take MOD to another level.

So if A starts out at 0 and Z starts out at 25 and we apply the shift and then MOD 26 them:


<cfset shift = 1 />
<cfset A = (0 + shift) MOD 26 />
<cfset Z = (25 + shift) MOD 26 />

<cfoutput>A: #a#</cfoutput><br />
<cfoutput>Z: #z#</cfoutput>

Then we end up with output of:


A: 1
Z: 0

Perfect!

Now why did this work?

The MOD operator gives us the remainder of a division performed on the two numbers. So if we look at simple examples, 10 mod 5 = 0 and 5 mod 2 = 1. This is because 5 goes into ten evenly with no remainder, so the MOD result is 0, and 2 goes into 5 twice (4) and leaves a remainder of 1.

Now in our A example, (0 + 1) MOD 26 is the same as 1 MOD 26. In modulo math, when the divisor is a higher number than the dividend, then the remainder is equal to the dividend. So 1 MOD 26 = 1 and 2 MOD 26 = 2.

So now, you might be wondering "Well is 1 MOD 26 = 1, then why not just say 1 = 1"? The point here, remember, is that when we get up to Z (25) and apply the shift to it, that we end up with 26. We don't want 26, we want 0. Well, it turns out, as we already saw that 26 MOD 26 = 0. Because 26 goes into 26 evenly with no remainder. Sweet!

UPDATE: I should have pointed out that even in this simple situation of the Caesar cipher, we still have to worry about numbers much higher than even 26. If the shift is, for example 23 instead of just one then when we apply the shift A will equal 22 and X will equal 46, Y will be 47 and Z will be 48. But as you will see here, the math still works.


<cfset shift = 23 />
<cfset A = (0 + shift) MOD 26 />
<cfset B = (1 + shift) MOD 26 />
<cfset X = (23 + shift) MOD 26 />
<cfset Y = (24 + shift) MOD 26 />
<cfset Z = (25 + shift) MOD 26 />

<cfoutput>A: #a#</cfoutput><br />
<cfoutput>B: #b#</cfoutput><br />
<cfoutput>X: #x#</cfoutput><br />
<cfoutput>Y: #y#</cfoutput><br />
<cfoutput>Z: #z#</cfoutput>

Output:

A: 23
B: 24
X: 20
Y: 21
Z: 22

So with one algorithm we can apply our shift and get the result we need without having to do any checks to see if X is greater than 25 or less than 0, etc.

This may seem like a lot of work for the Caesar cipher, but this math can be applied to much more complicated encryption and decryption algorithms. Math that is well beyond my understanding.

My Caesar Brute Force Decryption Function

As part of my research I mad a simple brute force decryption function for shift ciphers. This will run the ciphertext through all 25 possible shifts and output the plain text. I can then look through each decryption and hopefully spot the right one.



<cffunction name="rot">
    <cfargument name="inputString" />
    <cfargument name="rotation" required="false" type="numeric" default="3" />

    <!--- Upper case the characters so we are only dealign with ASCII 65-90 --->
    <cfset var input = ucase(arguments.inputString) />
    <cfset var return = "" />
    <cfset var currentChar = "" />
    <cfset var cipherASCII = "" />
    <cfset var cipherVal = "" />

    <cfloop from="1" to="#len(input)#" index="listIndex">
        <!--- Get the current char --->
        <cfset currentChar = mid(input, listIndex, 1) />

        <!--- If it is a space, skip it --->
        <cfif currentChar EQ " ">
            <cfset return = return & currentChar />
            <cfcontinue />
        </cfif>

        <!--- Get the ASCII value and subtract 65 so that we are only dealing with 0-25--->
        <cfset current = (asc(currentChar) - 65) />

        <!--- If it is not A-Z, then we are not going to do anything with it --->
        <cfif current GTE 0 AND current LTE 25>

            <!--- Perform the modulo math after applying the shift --->
            <cfset cipherASCII = (current + arguments.rotation) mod 26 />

            <!--- Add 65 to get back to the appropriate ASCII val --->
            <cfset cipherVal = chr(cipherASCII + 65) />

            <!--- Append to return variable --->
            <cfset return = return & cipherVal />
        </cfif>
    </cfloop>

    <!--- Return --->
    <cfreturn return />
</cffunction>

<!--- Output the ciphertext to the screen and assign it to a variabel to work with --->
<cfset ciphertext="WKH SHDUO LV LQ WKH ULYHU" />
<cfoutput><strong>#ciphertext#</strong></cfoutput>

<br /><br />

<!--- Loop over shift 1-25 and send through the rotation --->
<cfoutput>
<cfloop from="1" to="25" index="shift">
#rot(ciphertext, shift)#<br />
</cfloop>
</cfoutput>

I will grant that this is a lot of code and could be done in much less, as Andy showed us in my previous post. But with mine, I wanted to ensure that we were working with 0-25, not 65-90 (ASCII values) and I wanted to make sure that we were using modulo math so that we only ever had results of 0-25, not results which had ASCII values ranging above 90 and below 65. Andy's function, in the long run, works just as well as mine, but I wanted to do cool math.

Comments
Barney's Gravatar I always think about clocks (which are, of course, modulo 12). "9 + 4 % 12 == 1" translates very gracefully into the English prose "starting at 9, add 4 hours on a analog clock and you'll get 1", which in turn, is easily understood by anyone old enough to tell time.

The biggest gotcha is that modulo arithmetic implies zero-based indexing. So doing MOD in CFML usually requires some extra +/-1 operations in the mix. In this particular case it's not needed (because ASCII is zero-indexed), but for CFML data structures it always is.
# Posted By Barney | 5/4/10 2:08 PM
Jason Dean's Gravatar Thanks for the comment Barney.

Actually the book I was learning from did use the clock as an example as well. Thanks for bringing that up.

I had not considered the zero-indexed issue. That's a great point.
# Posted By Jason Dean | 5/4/10 2:21 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.1. Contact Blog Owner