### Table of Contents

# Rijndael

## Overview

Rijndael is a cipher created by Joan Daemen and Vincent Rijmen for the Advanced Encryption Standard competition. It operates on a 128-bit (4×4 byte) array of data (the state), and a 128-bit (4×4 byte), 192-bit (4×6 byte), or 256-bit (4×8 byte) key. ^{1)} Rijndael consists of two major routines: the key schedule, and the cipher itself. The cipher can be further broken down into four subroutines, called SubBytes, ShiftRows, MixColumns, and AddRoundKey. If the key size is 128 bits, the cipher is repeated 10 times (10 rounds). If the key size is 192 bits, there are 12 rounds, and for 256 bits, 14 rounds.

Before any rounds are done, the AddRoundKey step is performed. This is sometimes considered “Round 0”, but note that it does not count as a round. Then, during each round except the final, SubBytes, ShiftRows, MixColumns, and AddRoundKey are performed in that order. For the final round, the MixColumns step is omitted.

## Encryption

### SubBytes

The SubBytes step is rather simple. Each byte is replaced with its corresponding entry in a lookup table. In C, this can be performed using a simple array, and replacing a byte by using it as an index into the array. An example in C is below.

unsigned char sbox[256] = { 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, 0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15, 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75, 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84, 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf, 0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8, 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, 0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73, 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb, 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79, 0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08, 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a, 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e, 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16 }; void subBytes(unsigned char *state) { int i = 16; // state is always 128 bit while(i--) state[i] = sbox[state[i]]; return; }

### ShiftRows

The ShiftRows step is also simple. Each row in the state is rotated x number of places to the right, x being its row number, counting from zero. Row 0 is therefore not rotated at all.

void shiftRows(unsigned char *state) { unsigned char temp; // Shifted by row number, row 0 is obviously not shifted temp = state[1]; state[1] = state[5]; state[5] = state[9]; state[9] = state[13]; state[13] = temp; // Rotate second row two columns to the the left temp = state[2]; state[2] = state[10]; state[10] = temp; temp = state[6]; state[6] = state[14]; state[14] = temp; // Rotate third row... temp = state[3]; state[3] = state[15]; state[15] = state[11]; state[11] = state[7]; state[7] = temp; return; }

### MixColumns

This is where it gets complex. Each column in the state is multiplied by a matrix. If that were all, it would be simple, but it's not. This matrix multiplication is performed inside a finite field. This changes the rules for multiplication. The steps to multiply two numbers (a and b) inside a finite field (2^8) follow. The variable p is set to zero at the start, and 0x1B is hexadecimal, in decimal it is 27.

Do this eight times:

- If lowest bit of b is set do p = p XOR a
- Keep track of whether highest bit of a is set
- Rotate a one bit to the left
- If a's highest bit was one PRIOR to the shift, do a = a XOR 0x1B
- Rotate b one bit to the right

An example in C follows:

unsigned char galoismul(unsigned char a, int b) { int i = 0; unsigned char p = 0, hibit; for(i = 0; i < 8; i++) // Do eight times { if((b & 1) == 1) // 1. If low bit of b is set p ^= a; // Do p XOR a hibit = (a & 0x80); // 2. Keep track of whether high bit of a is set a <<= 1; // 3. Rotate a one bit to the left if(hibit == 0x80) // 4. If a's high bit was one PRIOR to the shift a ^= 0x1B; // XOR a with 0x1B b >>= 1; // 5. Rotate b one bit to the right } return(p); // After 8 cycles, p is the answer }

Addition is also modified, but the algorithm is not as complex as that for multiplication. Instead of regular addition, XOR is done.

void mixColumns(unsigned char *state) { int i = 0; unsigned char a, b, c, d; for(; i < 16; i + =4) { a = state[i]; b = state[i + 1]; c = state[i + 2]; d = state[i + 3]; state[i] = galoismul(a, 2) ^ galoismul(b, 3) ^ galoismul(c, 1) ^ galoismul(d, 1); state[i + 1] = galoismul(a, 1) ^ galoismul(b, 2) ^ galoismul(c, 3) ^ galoismul(d, 1); state[i + 2] = galoismul(a, 1) ^ galoismul(b, 1) ^ galoismul(c, 2) ^ galoismul(d, 3); state[i + 3] = galoismul(a, 3) ^ galoismul(b, 1) ^ galoismul(c, 1) ^ galoismul(d, 2); } return; }

### AddRoundKey

In the final step, the state is combined with the round key with XOR.

void addRoundKey(unsigned char *state, int round) { int x = 0; int i = round * 16; while(x < 16) state[x++] ^= keys[i++]; return; }

The array keys is a global unsigned char array that holds the expanded key after the key schedule has been run. The key schedule will be covered later.

## Decryption

### SubBytes

This is just like the encryption version of SubBytes, except that the inverse table is used. A C implementation follows.

unsigned char invSbox[256] = { 0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB, 0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB, 0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E, 0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25, 0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92, 0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84, 0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06, 0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B, 0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73, 0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E, 0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B, 0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4, 0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F, 0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF, 0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D }; void invSubBytes(unsigned char *state) { int i = 16; while(i--) state[i] = invSbox[state[i]]; return; }

### ShiftRows

The inverse ShiftRows step is fairly simple and obvious. Instead of rotating each row left by its row number, we shift it right by its row number.

void invShiftRows(unsigned char *state) { unsigned char temp; // Rotate first row 1 columns to right temp = state[13]; state[13] = state[9]; state[9] = state[5]; state[5] = state[1]; state[1] = temp; // Rotate second row 2 columns to right temp = state[2]; state[2] = state[10]; state[10] = temp; temp = state[6]; state[6] = state[14]; state[14] = temp; // Rotate third row 3 columns to right temp = state[3]; state[3] = state[7]; state[7] = state[11]; state[11] = state[15]; state[15] = temp; return; }

### MixColumns

The inverse of MixColumns isn't so simple. It's like MixColumns, except we're multiplying by a different matrix this time.

void invMixColumns(unsigned char *state) { int i = 0; unsigned char a, b, c, d; for(; i < 16; i += 4) { a = state[i]; b = state[i + 1]; c = state[i + 2]; d = state[i + 3]; state[i] = galoismul(a, 0x0E) ^ galoismul(b, 0x0B) ^ galoismul(c, 0x0D) ^ galoismul(d, 0x09); state[i + 1] = galoismul(a, 0x09) ^ galoismul(b, 0x0E) ^ galoismul(c, 0x0B) ^ galoismul(d, 0x0D); state[i + 2] = galoismul(a, 0x0D) ^ galoismul(b, 0x09) ^ galoismul(c, 0x0E) ^ galoismul(d, 0x0B); state[i + 3] = galoismul(a, 0x0B) ^ galoismul(b, 0x0D) ^ galoismul(c, 0x09) ^ galoismul(d, 0x0E); } return; }

### AddRoundKey

Finally, AddRoundKey is EXACTLY the same, since XOR is its own inverse, so this is the easiest bit.

## Key Schedule

The key schedule takes the key and expands it so that there is enough key for each round to be XOR'd with different values.

First, the actual key becomes round key 0. Then, the last column of the current round key is rotated, making the top value the new bottom value, everything else being shifted up by 1 place. Then, that column is run through the sbox, and after that, the top byte in the column is XOR'd with the first column of the current round key and the round constant, called rcon. The other three bytes are just XOR'd with the first column of the current round key. This column is the first column of the next round key. The second column is created by XORing the first column of the new round key with the second column of the current one, the third column by XORing the second of the new round key with the third column of the current one, and so on. Code implementing the entire key schedule is below. Note that keys is a pointer to an unsigned char, and a global variable.

unsigned char sbox[256] = { 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, 0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15, 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75, 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84, 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf, 0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8, 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, 0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73, 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb, 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79, 0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08, 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a, 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e, 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16 }; unsigned char rcon(int i) { unsigned char rcon[52] = { 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 0xc5, 0x91, 0x39, 0x72, 0xe4, 0xd3, 0xbd, 0x61, 0xc2, 0x9f, 0x25, 0x4a, 0x94, 0x33, 0x66, 0xcc, 0x83, 0x1d, 0x3a, 0x74, 0xe8, 0xcb, 0x8d }; return(rcon[i % 51]); } void expandKeyCore(unsigned char *key, int bits, int roundnum) { int i; unsigned char temp[4], x; unsigned char *oldkey = (unsigned char *)malloc(sizeof(unsigned char) * (bits / 8) + 1); // back up the key, we'll need it memcpy(oldkey, key, (bits / 8)); // take the last column, RotWord() i = (bits / 8) - 4; x = key[i]; key[i] = key[i + 1]; key[i+1] = key[i + 2]; key[i+2] = key[i + 3]; key[i+3] = x; // run it through the sbox key[i] = sbox[key[i]]; key[i+1] = sbox[key[i + 1]]; key[i+2] = sbox[key[i + 2]]; key[i+3] = sbox[key[i + 3]]; // xor with first column and rcon, this is the first column of our expanded key key[0] = key[i] ^ oldkey[0] ^ rcon(roundnum); key[1] = key[i+1] ^ oldkey[1]; key[2] = key[i+2] ^ oldkey[2]; key[3] = key[i+3] ^ oldkey[3]; // take that, and xor it with the second column of the original key for the second column key[4] = oldkey[4] ^ key[0]; key[5] = oldkey[5] ^ key[1]; key[6] = oldkey[6] ^ key[2]; key[7] = oldkey[7] ^ key[3]; // take THAT, and xor it with the third column of the original for the third column key[8] = oldkey[8] ^ key[4]; key[9] = oldkey[9] ^ key[5]; key[10] = oldkey[10] ^ key[6]; key[11] = oldkey[11] ^ key[7]; // I think you get the idea key[12] = oldkey[12] ^ key[8]; key[13] = oldkey[13] ^ key[9]; key[14] = oldkey[14] ^ key[10]; key[15] = oldkey[15] ^ key[11]; if(bits == 256) { temp[0] = key[12]; temp[1] = key[13]; temp[2] = key[14]; temp[3] = key[15]; temp[0] = sbox[temp[0]]; temp[1] = sbox[temp[1]]; temp[2] = sbox[temp[2]]; temp[3] = sbox[temp[3]]; key[16] = temp[0] ^ oldkey[16]; key[17] = temp[1] ^ oldkey[17]; key[18] = temp[2] ^ oldkey[18]; key[19] = temp[3] ^ oldkey[19]; } if(bits == 192) { key[16] = key[12] ^ oldkey[16]; key[17] = key[13] ^ oldkey[17]; key[18] = key[14] ^ oldkey[18]; key[19] = key[15] ^ oldkey[19]; key[20] = key[16] ^ oldkey[20]; key[21] = key[17] ^ oldkey[21]; key[22] = key[18] ^ oldkey[22]; key[23] = key[19] ^ oldkey[23]; } else if(bits == 256) { key[20] = key[16] ^ oldkey[20]; key[21] = key[17] ^ oldkey[21]; key[22] = key[18] ^ oldkey[22]; key[23] = key[19] ^ oldkey[23]; key[24] = key[20] ^ oldkey[24]; key[25] = key[21] ^ oldkey[25]; key[26] = key[22] ^ oldkey[26]; key[27] = key[23] ^ oldkey[27]; key[28] = key[24] ^ oldkey[28]; key[29] = key[25] ^ oldkey[29]; key[30] = key[26] ^ oldkey[30]; key[31] = key[27] ^ oldkey[31]; } memset(oldkey, 0x00, (bits / 8)); free(oldkey); return; } void expandKey(unsigned char *key, int bits, int rounds) { int i = 1, round = 1; keys = (unsigned char *)malloc(sizeof(unsigned char) * ((bits / 8) * (rounds + 1))); memcpy(keys, key, (bits / 8)); while (i < rounds + 1) { expandKeyCore(key, bits, round++); memcpy(keys + (i++ * (bits / 8)), key, (bits / 8)); } return; }

^{1)}