Abstract:
Modern cryptography relies on the difficulty of certain problems arising in pure mathematics. I will describe how group theory and modular arithmetic give an easy to describe "one-way function" that can be used in a variety of ways, including for spam prevention and for creating a digital currency requiring no central authority. No background in group theory or cryptography will be assumed.
Outline:
$ \newcommand{\Z}{\mathbb{Z}} \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\Zp}{\Z/p} \newcommand{\Zpx}{\Zp^\times} $
Groups are mathematical structures that are used to encode symmetries as well as properties of arithmetic. Here is the formal definition.
Definition: A group is a set $G$ with a binary operation $m : G \times G \to G$ satisfying the conditions below, where we write $a b$ for $m(a,b)$:
When $G$ is a group, the identity $e$ and the inverses are all unique.
For cryptography, we're going to use some other groups.
Choose a prime number $p$, like $p = 7$. Write $\Zp$ for the set $\{0, 1, \ldots, p-1\}$.
This is a group under addition modulo $p$, where you take the remainder modulo $p$ after adding. E.g.
(4+5)%7 # 9 = 1*7 with 2 remainder
$0$ is the identity element:
(4+0)%7
Associativity holds because you can delay taking the remainder until the end:
print ((4+5)%7+6)%7, (4+5+6)%7, (4+(5+6)%7)%7
The negative (additive inverse) of 4 is 3:
(4+3)%7
In general, the negative of $k$ is $p-k$, since $k + (p-k) = p \equiv 0$ modulo $p$.
The group $\Zp$ is called the cyclic group of order $p$. It is called "cyclic" because if you start with $1$ and keep adding it to itself, you go through all the elements of the group and eventually end up where you started.
The group we really need is $\Zpx = \{1, 2, \ldots, p-1\}$ under multiplication modulo $p$. E.g.
(3*4)%7
$1$ is the identity:
print (1*3)%7, (3*1)%7
Associativity is like with addition. But inverses are less clear!
print (1*3)%7, (2*3)%7, (3*3)%7, (4*3)%7, (5*3)%7, (6*3)%7
So, in $\Zpx$, the (multiplicative) inverse of $3$ is $5$! That is, $3^{-1} = 5$.
But why does every number have an inverse? This follows from the Euclidean algorithm, using that $p$ is prime.
For us, the important tool in $\Zpx$ is going to be exponentiation:
for i in range(7): print '3 to the', i, 'is', (3**i)%7
It loops around, giving every element of $\Z/7^\times$!
In other words, $\Z/7^\times$ is also a cyclic group, and is isomorphic to the cyclic group $\Z/6$ using the correspondence sending $i$ in $\Z/6$ to $3^i$ in $\Z/7^\times$.
This happens for any prime $p$: you can always find an element $g$ such that \begin{align} \Z/(p-1) &\cong \Zpx \ i &\mapsto g^i \ \end{align} is an isomorphism. Such a $g$ is called a generator.
Switching gears to cryptography, a one-way function is a function $f : A \to B$ such that:
More precisely, $f(a)$ should be computable in polynomial time in the length of $a$, while any polynomial time algorithm should have negligible probability of finding preimages.
It is an open problem whether any one-way function exists, and if true, implies $P \neq NP$, one of the biggest open problems in theoretical computer science.
But it is widely believed to be true, and in fact, it is believed that the exponentiation map sending $i$ to $g^i$ in a cyclic group is one-way for large primes $p$.
Let's explore this. First, is exponentiation quick to compute?
def exp1(g,i,p):
result = 1.
for j in range(i):
result = (result * g)%p
return result
p1 = 191; p2 = 1234577; p3 = 123456791; p4 = 12345678923
for p in [p1, p2, p3]:
print '%12d: ' % p,
time2('exp1(3, p-2, p)')
That method takes $p-2$ steps, which is exponential in the number of bits!
The trick is to use Horner's method.
For example,
$
3^{45} = 3^{32+8+4+1} = (3^{32}) (3^8) (3^4) (3^1),
$
and we compute the powers by repeated squaring:
$3^{32} = ((((3^2)^2)^2)^2)^2$ only requires five multiplications!
def exp2(g,i,p):
result = 1
power = g
while i != 0:
if i%2 == 1:
result = (result * power)%p
power = (power * power)%p
i = i/2
return result
for p in [p1, p2, p3, p4]: # even p4 is included!
print '%12d: ' % p,
time2('exp2(3, p-2, p)')
The inverse to this exponentiation map is called the discrete logarithm: given $a$ in $\Zpx$, find $i$ such that $g^i = a$. We'll write $i = \log_g a$.
To compute the discrete logarithm, we can naively search for the answer:
def dlog(g,a,p):
result = 1
i = 0
while result != a:
result = (result * g)%p
i = i + 1
return i
time2('dlog(3,81,11113)')
a = exp2(3, p3-2, p3)
time2('dlog(3, a, p3)')
We believe that there is no method of computing discrete logarithms that is substantially faster than this!
A one-way function can be used for proof of work.
An example of this is the Hashcash anti-spam tool.
Each message I send has a header that looks like this:
X-Hashcash: 1:24:161021:[email protected]::kaw+sLa1OykzF+pL:01LFRH
This header proves that I used some cpu time before sending the message to generate this stamp. It only took a second or two, but spammers can't afford to spend one second per message, so a spam filter is more likely to let my message through.
Here's how we could do this with the discrete logarithm.
We decide on a large prime $p$ and a element $g$ in $\Zpx$ so that the discrete log problem takes about a second to solve on average.
Then, we combine the current date and the recipient's address into a string of bits that we interpret as a number $a$ (modulo $p$).
Then we compute $i = \log_g a$ in $\Zpx$, and include $i$ in the header. The spam filter just has to check that $g^i = a$, which is fast.
Proof of work can also be used to create a digital currency like Bitcoin.
One way to create a digital currency is to create a ledger that records all of the transactions.
This is easy if you have a trusted central authority. E.g., I could create Dancoin right now (if time).
But how to do it in a distributed way, with untrusted participants?
We make it so that adding new transactions to the ledger requires such a large amount of work that it takes our combined computing power about 10 minutes.
Then it's unlikely that one person can influence the ledger.
This is how Bitcoin and most other digital currencies work.
The method solves the Byzantine generals problem, the problem of coordinating information among distributed, untrusted agents with unreliable communication links.
Even better than a one-way function is a cryptographic hash function, which must in addition have the property that the output $f(a)$ appears to be essentially random.
It should not be possible to deduce anything about $a$ from $f(a)$, and changing $a$ slightly should change $f(a)$ in an unpredictable way.
This is not true for exponentiation, since increasing $i$ by $1$ multiplies $g^i$ by $g$.
An example of such a function is SHA:
print sha('This is a test')
print sha('This is a test!')
print sha(42)
print sha(43)
This kind of cryptographic hash function is used for:
from time import time, sleep
def time2(code):
st = time()
ret = eval(code)
print duration(st)
return ret
def duration(st):
dur = time() - st
if dur >= 1:
return '%.3gs' % (dur,)
dur *= 1000
if dur >= 1:
return '%.3gms' % (dur,)
dur *= 1000
if dur >= 1:
return '%.3gµs' % (dur,)
return '0'
time2('sleep(0.15)')
from hashlib import sha512, sha256
def sha(s):
return sha256(str(s)).hexdigest()[:50]
sha('Testing'), sha(17)