When people think of public key cryptosystems today, the classic algorithms of RSA and Diffie-Hellman jump to mind. However, an oft-forgotten public key cryptosystem developed around the same time as RSA and Diffie-Hellman was the knapsack-based cryptosystem by Ralph Merkel and Martin Hellman. While knapsack cryptosystems have been broken by lattice-based techniques (to be discussed in a future post), their construction is still quite elegant and noteworthy. In this two-part sequence, I will outline the construction of the knapsack cryptosystem.

Knapsack cryptosystems represent one of the first examples of using an NP-complete problem to develop a secure public key scheme. In this case, we use the subset-set problem. In the subset-sum problem, we are given a list of positive integers and a target value . The problem is to find a subset such that . As a small example, let . If , then there exists a subset of whose sum is , namely the subset . If instead however, , then there is no subset of that sums to . Note that in this case, we are not allowed to use the 6 twice. (As a side note, the subset-sum problem can be viewed as a special case of the 0-1 knapsack problem where the weights and values of each item are equal. This is where the cryptosystem derives its name.)

While the subset-sum problem is NP-complete in general, for certain sets , the problem can be solved efficiently. One particular case is when the set of numbers is *superincreasing*.

*Definition*. Let be a sequence of positive numbers where . Then, we say that is *superincreasing* if for all , we have .

For instance, the set is superincreasing (each number is more than twice the previous number).

A useful property of superincreasing sets is that for all .

*Lemma*. Let be a superincreasing sequence of positive numbers where . Then, for all .

*Proof. *This is easily seen via induction on . For , we have that using the fact that is positive and the sequence is superincreasing. Now, for the inductive step, we have

using the inductive hypothesis and the fact that the sequence of is superincreasing.

It is not difficult to see that if is a superincreasing set, the subset-sum problem becomes easy. Specifically, given a superincreasing set and a target value , we proceed as follows:

- Without loss of generality, let where . Initialize .
- If , then add to the set and subtract from . If , then we have found our subset . Repeat this process for each of .

At the conclusion of the algorithm, either we will have a subset , or we will not. If we do not, then we conclude that no such subset exists. Observe that the above algorithm is *linear* in the cardinality of , so this is quite efficient.

In words, at each step, the algorithm selects the largest element in the set that does not exceed the target and adds it to the subset. The value of the target is then updated. To see that this works, suppose that we have . We claim that must be in any subset that sums to . First, since are greater than , they cannot be in the subset. If is not in the subset, then the maximum sum of the subset is given by

using the above lemma. Thus, any subset that does not contain cannot sum to . Thus, we see that if a subset summing to exists, then it must contain the maximum element in the set that is less than . To summarize, we have demonstrated that the subset problem is easy over superincreasing sets.

In the next post, I will show how we can exploit this property to construct a public key cryptosystem based on the hardness of the general subset-sum problem. The idea here is that we can choose the set such that we can solve the subset-set problem over by converting it to an equivalent subset-sum problem over a superincreasing set using some secret information (e.g., the secret key).