-
-
- Goal: Encrypt a Message
- A message can be viewed many ways:
- - a bunch of words
- - a string of characters
- - a string (or series) of bytes
- - a string of bits
- For the approach in our solution we'll view it as a string of bytes (i.e. 8-bit 'groups'). Ultimately, it doesn't matter what the file is actually meant to represent. If we treat the contents of the file as simply a string of bytes, we can manipulate the contents, regardless of it's meaning. We can mess with a file whose bytes represent characters just as easily as we can if the bytes represent instructions, or part of an mp3 file.
- Reading a file character by character amounts to the same operation as reading the file byte by byte. There is really no magic in characters - they are just bytes: bunches of 8-bits and once we have one, we are free to interpret those bits in a number of ways. (Question: what if we are working with unicode characters?)
-
- The field of cryptography, of which encription is a part, is an advanced toipic with a lot of math. We can only expose you to a superficial introduction here. There are numerous approaches to encrypting a file. What follows is a few of the simpler approaches.
- Reverse, Switch, Exchange
- One approach is to reverse, switch, or exchange parts of a byte, entire bytes or even sets of bytes. If we are considering simply switching parts of one byte with other parts of that same byte, we still have choices. For example, we could view that byte as a decimal unsigned integer. Then, we could mess with the byte (number) in such a way as to effect a switch of its representative digits, like so:
- eg.
- 'A' = 065 (ASCII) ==> 560
- 'z' = 122 ==> 221
- '0' = 048 ==> 840
- ' ' = 032 ==> 230
- Note: we are looking at the base 10 interpretations of the bit strings for each character. To produce the above results, we start with the base 10 ASCII value for the character, 'pull out' the digits of the number:
- char C = 'A'; /* for example */
-
- /* using the integer representation, get the 3rd digit
- there can't be more than three digits */
- first = (int)C / 100;
- /* get the second digit */
- second = ((int)C -(first*100)) / 10;
- /* get the third digit */
-
- third = (int)c % 10;
- /* 'build' them back up in reverse order */
- newval = third*100 + second*10 + first;
- If our goal is to keep the message size the same (so a file of size 937 bytes produces an encrypted file of the same size), the approach above WON'T WORK. The maximum value in 8 bits is 11111111 (binary) = 255 (base 10). the values 560 and 840 above won't fit into 8 bits.
- What if we look at our work in octal instead of decimal? (we are still dealing with the same bit patterns, we're just interpreting them differently)
-
- 'A' = 101 (ASCII) ==> 101 [convert to octal by dividing by 64, then 8, then 1]
- 'z' = 172 ==> 271
- '0' = 060 ==> 060
- ' ' = 040 ==> 040
- '7' = 067 ==> 760
- Still won't work because the last value (760) still requires 9 bits to represent and we only have 8.
- What about HEX?
- 'A' = 61 (ASCII) ==> 16 [convert to hex by dividing by 16, then 1]
- 'z' = 7A ==> A7
- '0' = 30 ==> 03
- ' ' = 20 ==> 02
- '7' = 37 ==> 73
- TA DA!! This will work.
- What we need to do is take our 8-bit character and treat it as a number:
- use unsigned char:
- unsigned char C;
- C = fgetc(source);
- We can manipulate the unsighed char as a number quite easily. We will "seperate" the two hex digits (i.e. get the high 4 bts and the low 4 bits of the byte seperately):
- high = (int)C / 16;
- low = (int)C % 16;
- and then put them back together in reverse order:
- newchar = (low * 16) + high;
- Alternative approaches:
- In any encryption approach we must remember that we will need to be able to DECODE the message. Anything we do must be applicable to ALL characters in the same way AND each character must "translate" to one unique encrypted code (if it's not unique, we will have no way of deciding which character it used to be).
- Changing the Bytes:
- Caesar Cypher
- Another possibility is to actually change the bytes so they no longer look like they did, but so we can still get the original back. A caesar cypher is a simple example.
- In its simplest form, a Caesar Cypher is an encryption scheme that uses a number N and replaces each character in the message with the character found N positions farther along in the alphabet. For example, if N = 1; then A becomes B, B becomes C, etc. If we are only using the alphabet, then when we wrap, Z becomes A. One variation uses the entire ASCII table (all 128 codes). To make it a little harder to 'crack', we can also use the character's position in the message to further modify N.
-
- For example, if N = 1, then the original message: "AAAAA" would be encoded as: "BCDEF".
Cyclic Encryption
- The encrypted form of a character c is c^key[i] (^ is the XOR operator) where key is a string passed as a command-line argument
- The program uses the characters in key in a cyclic manner until all the input has been read. If the length of the key is the same as the length of the entire message, it's very close to a "one-time pad". To be a proper one-time pad, the key must be truly random, only the encoder and decoder have copies of it, and it will only be used to encode a single message.
- Re-encrypting encoded text with the same key produces the original text.
- One Time Pad
- A one-time pad, sometimes called the Vernam cipher, uses a string of bits that is generated completely at random. The keystream is the same length as the plaintext message and the random string is combined using bitwise exclusive-or with the plaintext to produce the ciphertext. Since the entire keystream is random, an opponent with infinite computational resources can only guess the plaintext if he sees the ciphertext. Such a cipher is said to offer perfect secrecy and the analysis of the one-time pad is seen as one of the cornerstones of modern cryptography.
- While the one-time pad saw use during wartime, over diplomatic channels requiring exceptionally high security, the fact that the secret key (which can be used only once) is as long as the message introduces severe key-management problems. While perfectly secure, the one-time pad is impractical.
- Stream ciphers were developed as an approximation to the action of the one-time pad, and while contemporary stream ciphers are unable to provide the satisfying theoretical security of the one-time pad, they are at least practical.
- One Time Pad Encryption - 0.9.4. www.vidwest.com/otp/
- One-Time Pad Generator www.fourmilab.ch/onetime/otpjs.html
- One-Time Pad Generators www.fourmilab.ch/onetime/
- Why Are One-Time Pads Perfectly Secure? http://world.std.com/~franl/crypto/one-time-pad.html
- ONE-TIME PAD CRYPTOGRAPHY www.contestcen.com/crypt005.htm
- One Time Pad www.cypherspace.org/~adam/rsa/otp.html
- One-time pad - Wikipedia www.wikipedia.org/wiki/One-time_pad
- One-Time-Pad Frequently Asked Questions www.ranum.com/pubs/otpfaq/
- Testing & Verification
- We can verify our manipulation is working the way we want by echoing our output and input in hex form:
- printf("[%c] %02X",c,c);
- Since we don't want the I/O echoed all the time we want to be able to turn echo on or off without having to re-compile the program. This can be done with a simple command line debug switch: '-d'. If included in the command line, it will cause all characters to be echoed before and after 'conversion'; if this option is absent, there will be no echo.
- ./encrypt -d orig.txt new.txt