# Knapsack Cryptosystems (Part 2)

In this post, I continue the discussion from the previous post on knapsack cryptosystems. As discussed before, knapsack cryptosystems base their hardness on the difficulty of the subset-sum problem. However, as we have seen, in the case where the set of integers is superincreasing, the subset-problem can be solved efficiently. We can exploit this fact to construct our cryptosystem. First, we describe how we can encode messages using the subset-sum problem.

Let $X=(x_1,\ldots,x_n)$ be a set of numbers. Suppose we want to encode an $n$-bit message with bits $b_1,\ldots,b_n.$ To encrypt $m$, we compute

$c:=\sum_{i=1}^n b_i x_i.$

Observe that to recover the bits of the message $b_1,\ldots,b_n$ given $m$ and $X$ amounts to solving a subset-sum problem, which is intractable for large $n$. Note that for this scheme to work, the subset summing to a particular $c$ must be unique. Note that this property is satisfied if $X$ is superincreasing.

With this preparation, the construction proceeds as follows:

1. Begin with some superincreasing sequence $X$.
2. Hide the superincreasing sequence using a linear modular transformation to produce a new sequence $X'$ that is not superincreasing.
3. Publish $X'$ as the public key to the scheme. Using the secret key (used to specify the transformation in step 2), we can transform the subset-sum problem relative to $X'$ to a subset-sum problem relative to the set $X$ that share the same solution. Since $X$ is superincreasing, this is an easy problem to solve and we recover the original message.

Now, we describe the details of the scheme.

Key Generation. Let $X=(x_1,\ldots,x_n)$ be a superincreasing sequence of numbers. Choose values $A$ and $B$ such that $B>2x_n$ and $\gcd(A,B)=1$. Compute $x'_i=A x_i\pmod{B}$ for all $i$. The public key is the sequence $X'=(x'_1,\ldots,x'_n)$ and the secret key is the tuple $(A,B,X).$

Encryption. To encrypt a binary-valued message $b \in \{0,1\}^n$, compute the ciphertext $c':=b^TX'.$

Decryption. To decrypt a ciphertext $c'$, compute $c:=A^{-1}c'\pmod{B}$. Solve the subset-sum problem using the superincreasing set $X$ with target value $c$ to obtain the message $b\in\{0,1\}^n.$

We show that the above scheme is correct. This is easy to see; simply observe the following:

$c=A^{-1}c'=A^{-1}\sum_{i=1}^n b_i x'_i=\sum_{i=1}^n b_i A^{-1}(A x_i)=\sum_{i=1}^n b_i x_i=b^TX\pmod{B}.$

The solution to the subset-sum problem over set $X$ with target $c$ is precisely the solution to the original subset-sum problem with set $X'$ and target $c'$. Finally, observe that since $X$ is superincreasing,

$b^TX=\sum_{i=1}^n b_i x_i < 2\cdot b_n < B,$

by construction of $B$. Thus, the statement $c = b^T X$ holds over the integers, and not just modulo $B$. This proves the correctness of the above scheme. As a final note, $A^{-1} \pmod{B}$ exists since we required $\gcd(A,B)=1.$ Security of the scheme is based on the fact that the subset-sum problem is NP-complete.

The advantage of knapsack cryptosystems is their efficiency. For instance, in the scheme described here, encryption is effectively free while decryption requires a single modular multiplication followed by an efficient linear-time algorithm for solving the superincreasing subset-sum problem. Even in variants of the knapsack cryptosystems, decryption still requires only a few modular multiplications. In comparison to RSA of Diffie-Hellman, which require many expensive modular exponentiations, the Merkle-Hellman knapsack cryptosystem is much more computationally efficient. Unfortunately, however, there are efficient lattice-based attacks on knapsack cryptosystems. As a result, knapsack cryptosystems require the use of very large parameters in order to achieve security. Consequently, secure knapsack systems are quite impractical and not used in practice. Nevertheless, their construction is still remarkable in both simplicity as well as elegance.

# Knapsack Cryptosystems (Part 1)

When people think of public key cryptosystems today, the classic algorithms of RSA and Diffie-Hellman jump to mind. However, an oft-forgotten public key cryptosystem developed around the same time as RSA and Diffie-Hellman was the knapsack-based cryptosystem by Ralph Merkel and Martin Hellman. While knapsack cryptosystems have been broken by lattice-based techniques (to be discussed in a future post), their construction is still quite elegant and noteworthy. In this two-part sequence, I will outline the construction of the knapsack cryptosystem.

Knapsack cryptosystems represent one of the first examples of using an NP-complete problem to develop a secure public key scheme. In this case, we use the subset-set problem. In the subset-sum problem, we are given a list of positive integers $X=(x_1,x_2,\ldots,x_n)$ and a target value $y$. The problem is to find a subset $X'=(x'_1,x'_2,\ldots,x'_k)\subseteq X$ such that $\sum_{i=1}^k x'_i=y$. As a small example, let $X=(3,5,6,10)$. If $y=9$, then there exists a subset of $X$ whose sum is $y=9$, namely the subset $X'=(3,6)$. If instead however, $y=12$, then there is no subset of $X$ that sums to $y$. Note that in this case, we are not allowed to use the 6 twice. (As a side note, the subset-sum problem can be viewed as a special case of the 0-1 knapsack problem where the weights and values of each item are equal. This is where the cryptosystem derives its name.)

While the subset-sum problem is NP-complete in general, for certain sets $X$, the problem can be solved efficiently. One particular case is when the set of numbers $X$ is superincreasing.

Definition. Let $X=(x_1,x_2,\ldots,x_n)$ be a sequence of positive numbers where $x_1 \le x_2 \le \cdots \le x_n$. Then, we say that $X$ is superincreasing if for all $1\le i\le n-1$, we have $2\cdot x_i< x_{i+1}$.

For instance, the set $(1,3,7,20,50,120)$ is superincreasing (each number is more than twice the previous number).

A useful property of superincreasing sets is that $\sum_{i=1}^k x_i< x_{k+1}$ for all $k=1,\ldots, n-1$.

Lemma. Let $X=(x_1,x_2,\ldots,x_n)$ be a superincreasing sequence of positive numbers where $x_1 \le x_2 \le \cdots \le x_n$. Then, for all $1\le k.

Proof. This is easily seen via induction on $k$. For $k=1$, we have that $x_1 < 2\cdot x_1 < \cdot x_2$ using the fact that $x_1$ is positive and the sequence is superincreasing. Now, for the inductive step, we have

$\sum_{i=1}^{k+1}x_i=\sum_{i=1}^k x_i+x_{k+1}

using the inductive hypothesis and the fact that the sequence of $x_i$ is superincreasing.

It is not difficult to see that if $X$ is a superincreasing set, the subset-sum problem becomes easy. Specifically, given a superincreasing set $X$ and a target value $y$, we proceed as follows:

1. Without loss of generality, let $X=(x_1,x_2,\ldots,x_n)$ where $x_1 < x_2 < \cdots < x_n$. Initialize $X'=()$.
2. If $x_n \le y$, then add $x_n$ to the set $X'$ and subtract $x_n$ from $y$. If $y=0$, then we have found our subset $X'$. Repeat this process for each of $x_{n-1},x_{n-2},\ldots, x_1$.

At the conclusion of the algorithm, either we will have a subset $X'$, or we will not. If we do not, then we conclude that no such subset exists. Observe that the above algorithm is linear in the cardinality of $X$, so this is quite efficient.

In words, at each step, the algorithm selects the largest element in the set $X$ that does not exceed the target $y$ and adds it to the subset. The value of the target $y$ is then updated. To see that this works, suppose that we have $x_1<\cdots. We claim that $x_k$ must be in any subset that sums to $y$. First, since $x_{k+1},\ldots,x_n$ are greater than $y$, they cannot be in the subset. If $x_k$ is not in the subset, then the maximum sum of the subset is given by

$\sum_{i=1}^{k-1} x_i

using the above lemma. Thus, any subset that does not contain $x_k$ cannot sum to $y$. Thus, we see that if a subset summing to $y$ exists, then it must contain the maximum element in the set that is less than $y$. To summarize, we have demonstrated that the subset problem is easy over superincreasing sets.

In the next post, I will show how we can exploit this property to construct a public key cryptosystem based on the hardness of the general subset-sum problem. The idea here is that we can choose the set $X$ such that we can solve the subset-set problem over $X$ by converting it to an equivalent subset-sum problem over a superincreasing set $X'$ using some secret information (e.g., the secret key).