Group theory, one-way functions, spam prevention and digital currency

Dan Christensen, October 24, 2016

Abstract:

Modern cryptography relies on the difficulty of certain problems arising in pure mathematics. I will describe how group theory and modular arithmetic give an easy to describe "one-way function" that can be used in a variety of ways, including for spam prevention and for creating a digital currency requiring no central authority. No background in group theory or cryptography will be assumed.

Outline:

  • group theory
  • one-way functions
  • proof of work
  • spam prevention
  • digital cash

$ \newcommand{\Z}{\mathbb{Z}} \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\Zp}{\Z/p} \newcommand{\Zpx}{\Zp^\times} $

Groups

Groups are mathematical structures that are used to encode symmetries as well as properties of arithmetic. Here is the formal definition.

Definition: A group is a set $G$ with a binary operation $m : G \times G \to G$ satisfying the conditions below, where we write $a b$ for $m(a,b)$:

  • There is an identity element $e$ in $G$. That is, $e a = a = a e$ for every $a$ in $G$.
  • The operation is associative. That is, $a(b c) = (a b)c$ for every $a$, $b$ and $c$ in $G$.
  • For every element $g$ in $G$, there is an element $h$ in $G$ such that $g h = e = h g$. We call $h$ an inverse for $g$ and write $h = g^{-1}$.

When $G$ is a group, the identity $e$ and the inverses are all unique.

Examples:

  • The collection $\Z$ of integers under addition.
  • The collection $\R$ of real numbers under addition.
  • The collection $\R^\times$ of non-zero real numbers under multiplication.
  • The collection of rotations of the plane about the origin, under composition.

Non-examples:

  • The collection $\N$ of natural numbers under addition.
  • The collection $\R^\times$ of non-zero real numbers under addition.
  • The collection $\Z$ of integers under subtraction: $(4-2)-1 \neq 4-(2-1)$.

Cyclic groups

For cryptography, we're going to use some other groups.

Choose a prime number $p$, like $p = 7$. Write $\Zp$ for the set $\{0, 1, \ldots, p-1\}$.

This is a group under addition modulo $p$, where you take the remainder modulo $p$ after adding. E.g.

In [154]:
(4+5)%7   # 9 = 1*7 with 2 remainder
Out[154]:
2

$0$ is the identity element:

In [155]:
(4+0)%7
Out[155]:
4

Associativity holds because you can delay taking the remainder until the end:

In [156]:
print ((4+5)%7+6)%7, (4+5+6)%7, (4+(5+6)%7)%7
1 1 1

The negative (additive inverse) of 4 is 3:

In [157]:
(4+3)%7
Out[157]:
0

In general, the negative of $k$ is $p-k$, since $k + (p-k) = p \equiv 0$ modulo $p$.

The group $\Zp$ is called the cyclic group of order $p$. It is called "cyclic" because if you start with $1$ and keep adding it to itself, you go through all the elements of the group and eventually end up where you started.

The group we really need is $\Zpx = \{1, 2, \ldots, p-1\}$ under multiplication modulo $p$. E.g.

In [158]:
(3*4)%7
Out[158]:
5

$1$ is the identity:

In [159]:
print (1*3)%7, (3*1)%7
3 3

Associativity is like with addition. But inverses are less clear!

In [160]:
print (1*3)%7, (2*3)%7, (3*3)%7, (4*3)%7, (5*3)%7, (6*3)%7
3 6 2 5 1 4

So, in $\Zpx$, the (multiplicative) inverse of $3$ is $5$!  That is, $3^{-1} = 5$.

But why does every number have an inverse? This follows from the Euclidean algorithm, using that $p$ is prime.

For us, the important tool in $\Zpx$ is going to be exponentiation:

In [161]:
for i in range(7): print '3 to the', i, 'is', (3**i)%7
3 to the 0 is 1
3 to the 1 is 3
3 to the 2 is 2
3 to the 3 is 6
3 to the 4 is 4
3 to the 5 is 5
3 to the 6 is 1

It loops around, giving every element of $\Z/7^\times$!

In other words, $\Z/7^\times$ is also a cyclic group, and is isomorphic to the cyclic group $\Z/6$ using the correspondence sending $i$ in $\Z/6$ to $3^i$ in $\Z/7^\times$.

This happens for any prime $p$: you can always find an element $g$ such that \begin{align} \Z/(p-1) &\cong \Zpx \ i &\mapsto g^i \ \end{align} is an isomorphism. Such a $g$ is called a generator.

One-way functions

Switching gears to cryptography, a one-way function is a function $f : A \to B$ such that:

  • for each $a$ in $A$, $f(a)$ is easy to compute;
  • given $b$ in the image of $f$, it is difficult to find an $a$ in $A$ such that $f(a) = b$.

More precisely, $f(a)$ should be computable in polynomial time in the length of $a$, while any polynomial time algorithm should have negligible probability of finding preimages.

It is an open problem whether any one-way function exists, and if true, implies $P \neq NP$, one of the biggest open problems in theoretical computer science.

But it is widely believed to be true, and in fact, it is believed that the exponentiation map sending $i$ to $g^i$ in a cyclic group is one-way for large primes $p$.

Let's explore this. First, is exponentiation quick to compute?

In [ ]:
def exp1(g,i,p):
    result = 1.
    for j in range(i):
        result = (result * g)%p
    return result
In [ ]:
p1 = 191; p2 = 1234577; p3 = 123456791; p4 = 12345678923

for p in [p1, p2, p3]:
    print '%12d: ' % p,
    time2('exp1(3, p-2, p)')
         191:  85.8µs
     1234577:  177ms
   123456791: 

That method takes $p-2$ steps, which is exponential in the number of bits!

The trick is to use Horner's method. For example, $ 3^{45} = 3^{32+8+4+1} = (3^{32}) (3^8) (3^4) (3^1), $
and we compute the powers by repeated squaring: $3^{32} = ((((3^2)^2)^2)^2)^2$ only requires five multiplications!

In [ ]:
def exp2(g,i,p):
    result = 1
    power = g
    while i != 0:
        if i%2 == 1:
            result = (result * power)%p
        power = (power * power)%p
        i = i/2
    return result
In [ ]:
for p in [p1, p2, p3, p4]:  # even p4 is included!
    print '%12d: ' % p,
    time2('exp2(3, p-2, p)')

The inverse to this exponentiation map is called the discrete logarithm: given $a$ in $\Zpx$, find $i$ such that $g^i = a$. We'll write $i = \log_g a$.

To compute the discrete logarithm, we can naively search for the answer:

In [ ]:
def dlog(g,a,p):
    result = 1
    i = 0
    while result != a:
        result = (result * g)%p
        i = i + 1
    return i
In [ ]:
time2('dlog(3,81,11113)')
In [ ]:
a = exp2(3, p3-2, p3)
time2('dlog(3, a, p3)')

We believe that there is no method of computing discrete logarithms that is substantially faster than this!

Proof of work

A one-way function can be used for proof of work.

An example of this is the Hashcash anti-spam tool.
Each message I send has a header that looks like this:

X-Hashcash: 1:24:161021:[email protected]::kaw+sLa1OykzF+pL:01LFRH

This header proves that I used some cpu time before sending the message to generate this stamp. It only took a second or two, but spammers can't afford to spend one second per message, so a spam filter is more likely to let my message through.

Here's how we could do this with the discrete logarithm.

We decide on a large prime $p$ and a element $g$ in $\Zpx$ so that the discrete log problem takes about a second to solve on average.

Then, we combine the current date and the recipient's address into a string of bits that we interpret as a number $a$ (modulo $p$).

Then we compute $i = \log_g a$ in $\Zpx$, and include $i$ in the header. The spam filter just has to check that $g^i = a$, which is fast.

Digital currency

Proof of work can also be used to create a digital currency like Bitcoin.

One way to create a digital currency is to create a ledger that records all of the transactions.

This is easy if you have a trusted central authority. E.g., I could create Dancoin right now (if time).

But how to do it in a distributed way, with untrusted participants?

We make it so that adding new transactions to the ledger requires such a large amount of work that it takes our combined computing power about 10 minutes.

Then it's unlikely that one person can influence the ledger.

This is how Bitcoin and most other digital currencies work.

The method solves the Byzantine generals problem, the problem of coordinating information among distributed, untrusted agents with unreliable communication links.

Cryptographic hash functions (if time)

Even better than a one-way function is a cryptographic hash function, which must in addition have the property that the output $f(a)$ appears to be essentially random.

It should not be possible to deduce anything about $a$ from $f(a)$, and changing $a$ slightly should change $f(a)$ in an unpredictable way.

This is not true for exponentiation, since increasing $i$ by $1$ multiplies $g^i$ by $g$.

An example of such a function is SHA:

In [ ]:
print sha('This is a test')
print sha('This is a test!')
print sha(42)
print sha(43)

This kind of cryptographic hash function is used for:

  • Hashcash
  • Digital currency
  • Electronic signatures
  • Verification of file integrity (e.g. downloads)
  • Password storage and checking
  • Proof of knowledge (e.g. publish the SHA of a document, without revealing the contents until later)



Thanks for listening!










In [ ]:
from time import time, sleep

def time2(code):
    st = time()
    ret = eval(code)
    print duration(st)
    return ret
    
def duration(st):
    dur = time() - st
    if dur >= 1:
        return '%.3gs' % (dur,)
    dur *= 1000
    if dur >= 1:
        return '%.3gms' % (dur,)
    dur *= 1000
    if dur >= 1:
        return '%.3gµs' % (dur,)
    return '0'
In [ ]:
time2('sleep(0.15)')
In [ ]:
from hashlib import sha512, sha256
def sha(s):
    return sha256(str(s)).hexdigest()[:50]
In [ ]:
sha('Testing'), sha(17)
In [ ]: