# Knapsack Cryptosystems (Part 1)

When people think of public key cryptosystems today, the classic algorithms of RSA and Diffie-Hellman jump to mind. However, an oft-forgotten public key cryptosystem developed around the same time as RSA and Diffie-Hellman was the knapsack-based cryptosystem by Ralph Merkel and Martin Hellman. While knapsack cryptosystems have been broken by lattice-based techniques (to be discussed in a future post), their construction is still quite elegant and noteworthy. In this two-part sequence, I will outline the construction of the knapsack cryptosystem.

Knapsack cryptosystems represent one of the first examples of using an NP-complete problem to develop a secure public key scheme. In this case, we use the subset-set problem. In the subset-sum problem, we are given a list of positive integers $X=(x_1,x_2,\ldots,x_n)$ and a target value $y$. The problem is to find a subset $X'=(x'_1,x'_2,\ldots,x'_k)\subseteq X$ such that $\sum_{i=1}^k x'_i=y$. As a small example, let $X=(3,5,6,10)$. If $y=9$, then there exists a subset of $X$ whose sum is $y=9$, namely the subset $X'=(3,6)$. If instead however, $y=12$, then there is no subset of $X$ that sums to $y$. Note that in this case, we are not allowed to use the 6 twice. (As a side note, the subset-sum problem can be viewed as a special case of the 0-1 knapsack problem where the weights and values of each item are equal. This is where the cryptosystem derives its name.)

While the subset-sum problem is NP-complete in general, for certain sets $X$, the problem can be solved efficiently. One particular case is when the set of numbers $X$ is superincreasing.

Definition. Let $X=(x_1,x_2,\ldots,x_n)$ be a sequence of positive numbers where $x_1 \le x_2 \le \cdots \le x_n$. Then, we say that $X$ is superincreasing if for all $1\le i\le n-1$, we have $2\cdot x_i< x_{i+1}$.

For instance, the set $(1,3,7,20,50,120)$ is superincreasing (each number is more than twice the previous number).

A useful property of superincreasing sets is that $\sum_{i=1}^k x_i< x_{k+1}$ for all $k=1,\ldots, n-1$.

Lemma. Let $X=(x_1,x_2,\ldots,x_n)$ be a superincreasing sequence of positive numbers where $x_1 \le x_2 \le \cdots \le x_n$. Then, for all $1\le k.

Proof. This is easily seen via induction on $k$. For $k=1$, we have that $x_1 < 2\cdot x_1 < \cdot x_2$ using the fact that $x_1$ is positive and the sequence is superincreasing. Now, for the inductive step, we have

$\sum_{i=1}^{k+1}x_i=\sum_{i=1}^k x_i+x_{k+1}

using the inductive hypothesis and the fact that the sequence of $x_i$ is superincreasing.

It is not difficult to see that if $X$ is a superincreasing set, the subset-sum problem becomes easy. Specifically, given a superincreasing set $X$ and a target value $y$, we proceed as follows:

1. Without loss of generality, let $X=(x_1,x_2,\ldots,x_n)$ where $x_1 < x_2 < \cdots < x_n$. Initialize $X'=()$.
2. If $x_n \le y$, then add $x_n$ to the set $X'$ and subtract $x_n$ from $y$. If $y=0$, then we have found our subset $X'$. Repeat this process for each of $x_{n-1},x_{n-2},\ldots, x_1$.

At the conclusion of the algorithm, either we will have a subset $X'$, or we will not. If we do not, then we conclude that no such subset exists. Observe that the above algorithm is linear in the cardinality of $X$, so this is quite efficient.

In words, at each step, the algorithm selects the largest element in the set $X$ that does not exceed the target $y$ and adds it to the subset. The value of the target $y$ is then updated. To see that this works, suppose that we have $x_1<\cdots. We claim that $x_k$ must be in any subset that sums to $y$. First, since $x_{k+1},\ldots,x_n$ are greater than $y$, they cannot be in the subset. If $x_k$ is not in the subset, then the maximum sum of the subset is given by

$\sum_{i=1}^{k-1} x_i

using the above lemma. Thus, any subset that does not contain $x_k$ cannot sum to $y$. Thus, we see that if a subset summing to $y$ exists, then it must contain the maximum element in the set that is less than $y$. To summarize, we have demonstrated that the subset problem is easy over superincreasing sets.

In the next post, I will show how we can exploit this property to construct a public key cryptosystem based on the hardness of the general subset-sum problem. The idea here is that we can choose the set $X$ such that we can solve the subset-set problem over $X$ by converting it to an equivalent subset-sum problem over a superincreasing set $X'$ using some secret information (e.g., the secret key).