Variational Characterization of Eigenvalues

For real, symmetric matrices in {\mathbb{R}^{n\times n}}, the spectral decomposition theorem states that there exist eigenvalues {\lambda_{1},\ldots,\lambda_{n}\in\mathbb{R}} with corresponding eigenvectors {v_{1},\ldots,v_{n}\in\mathbb{R}^{n}} such that {\left\{ v_{1},\ldots,v_{n}\right\} } is an orthonormal basis for {\mathbb{R}^{n}.} In this post, we will provide a proof of the spectral theorem for real symmetric matrices and continue to develop a variational characterization of {\lambda_{1},\ldots,\lambda_{n}.} In our exposition, we will freely use the isomorphism between linear functions over {\mathbb{R}^{n}} and matrices {\mathbb{R}^{n\times n},} depending on which representation is more convenient.

Lemma 1. Let {M\in\mathbb{R}^{n\times n}} be an {n\times n} matrix. The eigenvalues of {M} are then precisely the roots {\lambda} of the characteristic polynomial {\det(M-\lambda I).}

Proof. By definition, {\lambda} is an eigenvalue if and only if there exists a non-zero vector {v\in\mathbb{R}^{n}} such that {Mv=\lambda v.} Thus, {(M-\lambda I)v=0.} There is a nonzero {v} that satisfies this relation if and only if {\det(M-\lambda I)=0.} Thus, the eigenvalues of {M} precisely coincide with the roots of the characteristic polynomial {\det(M-\lambda I).} For an {n\times n} matrix, the characteristic polynomial has degree {n}, so {M} has exactly {n} eigenvalues (not necessarily distinct) over the complex numbers.

Lemma 2. Let {M\in\mathbb{R}^{n\times n}} be a symmetric {n\times n} matrix. Then, all eigenvalues of {M} are real.

Proof. Let {\lambda} be an eigenvalue of {A} with associated eigenvector {v.} Then, we have

\displaystyle \left\langle v,Mv\right\rangle =\left\langle v,\lambda v\right\rangle =\bar{\lambda}\left\langle v,v\right\rangle =\bar{\lambda}\left\Vert v\right\Vert _{2}^{2},

where {\bar{\lambda}} denotes the complex conjugate of {\lambda.} Next, since {M} is real and symmetric, and thus Hermitian, we have

\displaystyle \left\langle v,Mv\right\rangle =\left\langle Mv,v\right\rangle =\left\langle \lambda v,v\right\rangle =\lambda\left\langle v,v\right\rangle =\lambda\left\Vert v\right\Vert _{2}^{2}.

Thus, {\bar{\lambda}\left\Vert v\right\Vert _{2}^{2}=\lambda\left\Vert v\right\Vert _{2}^{2}} and since {v\ne0,} it follows that {\lambda=\bar{\lambda},} so {\lambda} is real.

Lemma 3. Let {M\in\mathbb{R}^{n\times n}} and let {\lambda} be an eigenvalue of {M} with associated (normalized) eigenvector {v.} Then, for all {u\perp v,} {Mu\perp v.}

Proof. We have

\displaystyle \left\langle Mu,v\right\rangle =\left\langle u,Mv\right\rangle =\lambda\left\langle u,v\right\rangle =0.

Theorem 4 (Spectral Theorem). Let {M\in\mathbb{R}^{n\times n}} be a symmetric {n\times n} matrix. Then, there exist eigenvalues {\lambda_{1},\ldots,\lambda_{n}} of {M} with corresponding eigenvectors {\{v_{1},\ldots,v_{n}\}} such that {\{v_{1},\ldots,v_{n}\}} constitutes an orthonormal basis for {\mathbb{R}^{n}.}

Proof. It will be easier to work with linear transformations. Let {T:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}} be the linear transformation that corresponds to {M.} Take {\lambda_{1}} to be an eigenvalue of {T} and let {v_{1}\in\mathbb{R}^{n+1}} be the associated (normalized) eigenvector. Let {W} be the orthogonal complement of {\mathrm{span}\{v_{1}\}} in {\mathbb{R}^{n}.} Consider the restricted map {\left.T\right|_{W}.} By Lemma 3, {\left.T\right|_{W}} is a linear operator on {W.} Since {\left.T\right|_{W}} is the restriction map, eigenvalues of {\left.T\right|_{W}} are also eigenvalues of {T.} Thus, let {\lambda_{2}} be an eigenvalue of {\left.T\right|_{W}} with associated eigenvector {v_{2}.} By construction {v_{2}\in W} so {v_{2}\perp v_{1}.} Now, we can repeat this process by defining {W'} to be the orthogonal complement of {\mathrm{span}\left\{ v_{1},v_{2}\right\} } and considering {\left.T\right|_{W'}.} Repeating this process yields eigenvalues {\lambda_{1},\ldots,\lambda_{n}} with an orthonormal set of eigenvectors {\left\{ v_{1},\ldots,v_{n}\right\} }, which must necessarily constitute a basis for {\mathbb{R}^{n}.}

Theorem 5 (Variational Characterization). Let {M\in\mathbb{R}^{n\times n}} be a symmetric {n\times n} matrix with eigenvalues {\lambda_{1}\ge\lambda_{2}\ge\cdots\ge\lambda_{n}.} Then, for all {1\le k\le n,}

\displaystyle \lambda_{k}=\max_{\substack{U\subseteq\mathbb{R}^{n}\\  \dim(U)=k  }  }\min_{\substack{x\in U\\  \left\Vert x\right\Vert _{2}=1  }  }x^{T}Mx.

Proof. By the spectral theorem, there exist an orthonormal set of eigenvectors {v_{1},\ldots,v_{n}} corresponding to {\lambda_{1},\ldots,\lambda_{n}.} Then, write {x} as a linear combination of {v_{1},\ldots,v_{n}}: {x=\sum_{i}\alpha_{i}v_{i}.} Then,

\displaystyle   x^{T}Mx=\left\langle x,Mx\right\rangle =\left\langle \sum_{i=1}^{n}\alpha_{i}v_{i},\sum_{i=1}^{n}\alpha_{i}\lambda_{i}v_{i}\right\rangle =\sum_{i=1}^{n}\alpha_{i}^{2}\lambda_{i},

since {v_{i}\perp v_{j}} for all {i\ne j.} It is not hard to see that

\displaystyle   \lambda_{k}\le\max_{\substack{U\subseteq\mathbb{R}^{n}\\  \dim(U)=k  }  }\min_{\substack{x\in U\\  \left\Vert x\right\Vert _{2}=1  }  }x^{T}Mx=\max_{\substack{U\subseteq\mathbb{R}^{n}\\  \dim(U)=k  }  }\min_{\substack{x\in U\\  \left\Vert x\right\Vert _{2}=1  }  }\sum_{i=1}^{n}\alpha_{i}^{2}\lambda_{i},

with {x=\sum_{i}\alpha_{i}v_{i}.} To see this, take {U=\mathrm{span}\left\{ v_{1},\ldots,v_{k}\right\}.} By construction, for all {x\in U}, {\alpha_{k+1},\ldots,\alpha_{n}=0.} Since {\lambda_{1}\ge\cdots\ge\lambda_{k}}, to minimize {\sum_{i}\alpha_{i}^{2}\lambda_{i}} subject to {\sum_{i}\alpha_{i}^{2}=1}, we would simply choose {\alpha_{1},\ldots,\alpha_{k-1}=0} and {\alpha_{k}=1.} In this case, the objective value is precisely {\lambda_{k}.} Since we are maximizing over all subspaces {U} with dimension {k,} {\lambda_{k}} is upper-bounded by the objective. Next, we show that {\lambda_{k}} is lower-bounded by the objective. To see this, take any subspace {U} of dimension {k.} Since {\left\{ \lambda_{1},\ldots,\lambda_{k-1}\right\} } span a subspace of dimension {k-1}, there is some {x\in U} such that {x\perp\left\{ \lambda_{1},\ldots,\lambda_{k-1}\right\}.} Thus, when we minimize over {x\in U}, there is some {x} such that

\displaystyle   \sum_{i=1}^{n}\alpha_{i}^{2}\lambda_{i}=\sum_{i=k}^{n}\alpha_{i}^{2}\lambda_{i}\le\lambda_{k},

and so the objective is lower-bounded by {\lambda_{k}}. Thus, we conclude that

\displaystyle   \lambda_{k}=\max_{\substack{U\subseteq\mathbb{R}^{n}\\  \dim(U)=k  }  }\min_{\substack{x\in U\\  \left\Vert x\right\Vert _{2}=1  }  }x^{T}Mx.

Advertisements

Pairings and Three-Party Diffie-Hellman

In the last ten years, pairings have served as the basis for many cryptographic constructions and protocols for which there are no known alternative constructions or implementations that are efficient. Pairings were first used in the construction of Joux’s single-round three-party Diffie-Hellman protocol in 2000 as well as the Boneh-Franklin identity based encryption (IBE) scheme in 2001 . Since then, pairings, specifically in the form of bilinear maps, have been successfully applied to predicate encryption, attribute-based encryption, broadcast encryption, traitor tracing, group signatures, and many other schemes. In this post, I will describe pairings at a very high level and illustrate how they can be used for single-round tripartite Diffie-Hellman.

Definition. Let {\mathbb{G}_{1}} and {\mathbb{G}_{2}} be groups with prime order {p}, and let {g_{1}\in\mathbb{G}_{1}} and {g_{2}\in\mathbb{G}_{2}} be generators for their respective groups. A pairing is a mapping {e:\mathbb{G}_{1}\times\mathbb{G}_{2}\rightarrow\mathbb{G}_{T}} such that the following two properties hold:

  1. (Bilinearity). For all {a,b\in\mathbb{Z}_{p}^{*}}, {e(g_{1}^{a},g_{2}^{b})=e(g_{1},g_{2})^{ab}}.
  2. (Non-degeneracy). {e(g_{1},g_{2})\ne1_{T}}, where {1_{T}} denotes the identity element in {\mathbb{G}_{T}}.

While not strictly part of the definition, we often require that the pairing {e} be efficiently computable. In the cases that we will consider, the pairing is symmetric, that is, {\mathbb{G}_{1}=\mathbb{G}_{2}}. For notational simplicity, we will denote {\mathbb{G}_{1}} and {\mathbb{G}_{2}} with {\mathbb{G}}, and so the pairing {e} is a mapping from {\mathbb{G}\times\mathbb{G}\rightarrow\mathbb{G}_{T}}. From algebraic geometry, the Weil and Tate pairings are concrete examples of pairings that satisfy the above properties. In both these cases, the group {\mathbb{G}} is an elliptic-curve group and {\mathbb{G}_{T}} is a finite field. Thus, cryptographic schemes involving pairings can be instantiated using the Weil and Tate pairings, as appropriate.

Before we proceed to the construction of the single-round, three-party Diffie-Hellman protocol, we first review the standard two-party Diffie-Hellman protocol. First, let’s review the decisional Diffie-Hellman (DDH) assumption.

Definition (Decisional Diffie-Hellman Assumption). Let {\mathbb{G}} be a cyclic group of order {q}, and let {g\in\mathbb{G}} be a generator of {\mathbb{G}}. Then, sample {a,b} uniformly at random from {\mathbb{Z}_{q}}. Then, we say that the DDH assumption holds in {\mathbb{G}} if given the values of {g^{a},g^{b}}, it is difficult to distinguish the element {g^{ab}} from a random element in {\mathbb{G}}. More formally, we say that the distributions given by the tuples {(g^{a},g^{b},g^{ab})} and {(g^{a},g^{b},g^{r})} where {a,b,r} are randomly sampled from {\mathbb{Z}_{q}} are computationally indistinguishable.

With this in mind, we describe the standard two-party Diffie-Hellman protocol. We have two parties, say Alice and Bob, and their objective is to derive a shared key. To do so, Alice and Bob first agree on a cyclic group {\mathbb{G}} of order {q} with generator {g} such that the DDH assumption holds in {\mathbb{G}}. Then, Alice and Bob each choose random values {a,b\in\mathbb{Z}_{q}}. Then, Alice sends {g^{a}} to Bob and Bob replies with {g^{b}}. Finally, Alice computes, using her secret value {a} along with Bob’s message, the quantity {\left(g^{a}\right)^{b}=g^{ab}}. Similarly, observe that Bob can take his secret value {b} along with Alice’s message and compute {\left(g^{b}\right)^{a}=g^{ab}}. Thus, Alice and Bob have successfully agreed on a shared key. To argue that this is secure, consider an adversary that eavesdrops on Alice and Bob’s communication. From the messages that Alice and Bob exchange, the adversary learns the group {\mathbb{G}}, the generator {g}, and the value of {g^{a}} and {g^{b}}, and its objective is to construct the element {g^{ab}}. But this is impossible by the DDH assumption! The distribution {(g^{a},g^{b},g^{ab})} is computationally indistinguishable from the distribution {(g^{a},g^{b},g^{r})} for a random {r\in\mathbb{Z}_{q}}, or in other words, the values of {g^{a}} and {g^{b}} leak no information about the value {g^{ab}}.

Now, consider the scenario where we have three parties, Alice, Bob, and Carol, who all want to agree on a shared key. Is it possible to generalize the above construction? Turns out, the answer is yes, but we will need to rely on pairings. Observe that a pairing takes two arguments (say, messages) and outputs a new element in {\mathbb{G}_{T}}. Thus, we can naturally generalize the two-party Diffie-Hellman protocol from above. In the first round of the protocol, the three parties agree on a pairing {e:\mathbb{G}\times\mathbb{G}\rightarrow\mathbb{G}_{T}} along with a generator {g\in\mathbb{G}}. Let {q} denote the order of {\mathbb{G}}. We will examine what the necessary security assumption is shortly. As before, Alice, Bob, and Carol each choose a random element {a,b,c\in\mathbb{Z}_{q}}, respectively, and compute {g^{a},g^{b},g^{c}}, respectively. Each party broadcasts its value to the other two parties. To complete the protocol and derive a shared key, Alice takes Bob’s value {g^{b}} and Carol’s value {g^{c}} and computes, using her secret value {a}, the quantity {e(g^{b},g^{c})^{a}=e(g,g)^{abc}}, where we have exploited the bilinearity property of the pairing. Similarly, Bob and Carol takes their incoming messages, apply the pairing and raises the result to their secret value to obtain the shared key {e(g,g)^{abc}}. Thus, once again, we see that the three parties have succeeded in deriving a shared key.

The natural question now is what security assumption is needed to ensure that an eavesdropper cannot also learn the shared key. First, does the DDH assumption suffice in this case? It turns out that the answer is no. In fact, it is easy to see that the existence of a bilinear map on {\mathbb{G}} means that DDH cannot hold in {\mathbb{G}}. To see this, suppose there exists a pairing {e:\mathbb{G}\times\mathbb{G}\rightarrow\mathbb{G}_{T}}. In the DDH challenge, we are given the group {\mathbb{G}} with a generator {g} and three additional elements, {g^{a},g^{b},T\in\mathbb{G}} where either {T=g^{ab}} or {T} is random in {\mathbb{G}}. The goal is to decide whether {T=g^{ab}}. To do so, we apply the bilinear map. First, we compute {e(g^{a},g^{b})=e(g,g)^{ab}}. Next, we compute {e(g,T)}. Now, observe what happens when {T=g^{ab}}. Then, by bilinearity, {e(g,T)=e(g,g^{ab})=e(g,g)^{ab}=e(g^{a},g^{b})}. Conversely, if {T} was random, then {e(g,T)} will give us a random element in {\mathbb{G}_{T}} distinct from {e(g,g)^{ab}}. Thus, using the pairing, we can easily win the DDH challenge, and so, the standard two-party DDH assumption is not a valid security assumption when dealing with groups that admit bilinear maps.

Instead, we require a slight generalization of the DDH assumption in our setting. In particular, we would like to assume that given {g,g^{a},g^{b},g^{c}}, it should be difficult to compute {e(g,g)^{abc}}. More formally, given cyclic groups {\mathbb{G}} and {\mathbb{G}_{T}} of order {q}, a generator {g\in\mathbb{G}}, and a pairing {e:\mathbb{G}\times\mathbb{G}\rightarrow\mathbb{G}_{T}}, the following two distributions are computationally indistinguishable: {(g^{a},g^{b},g^{c},e(g,g)^{abc})} and {(g^{a},g^{b},g^{c},e(g,g)^{r})} where {a,b,c,r} are drawn uniformly at random from {\mathbb{Z}_{q}}. We call this the Bilinear Diffie-Hellman (BDH) assumption. Observe that if BDH holds, then the three-party Diffie-Hellman protocol described above is secure in the sense that a passive eavesdropper cannot derive the shared key.

Garbled Circuits and Secure Function Evaluation

An important and recurring problem in cryptography is the problem of secure function evaluation (SFE). For simplicity, we just consider the 2-party setting. In the 2-party setting, the two agents, Alice and Bob, each have some private input {x} and {y}, respectively, and they want to compute some function of their inputs {f(x,y)} without revealing their inputs. The SFE problem was first introduced by Andrew Yao [1] in 1982 where he introduced what has come to be known as the millionaire problem.

Millionaire Problem. Alice and Bob are millionaires and are interested in knowing who is richer. However, they wish to do so without revealing their actual wealth.

If we view the above problem through the SFE lens, then {x} and {y} would coincide with Alice’s and Bob’s wealth, respectively. The function {f(x,y)} they want to compute would be the comparator function:

\displaystyle f(x,y)=\mathbb{I}\left\{ x>y\right\} =\begin{cases} 1 & x>y\\ 0 & \mathrm{otherwise}. \end{cases}

Note that {\mathbb{I}\{\mathrm{predicate}\}} denotes an indicator function that evaluates to 1 if the predicate is satisfied and 0 otherwise.

More generally, we can take {f(x,y)=(f_{a}(x,y),f_{b}(x,y))}, and at the conclusion of the protocol, Alice learns {f_{a}(x,y)} and Bob learns {f_{b}(x,y)}. Alice and Bob should not learn anything about the other’s input except what is explicitly leaked by the result. Observe that the millionaire problem described above is just a special case where

{f_{a}(x,y)=\mathbb{I}\{x>y\}=f_{b}(x,y)}.

Moreover, observe that we can take {f_{a}(x,y)=\perp}, where {\perp} denotes the null function. In other words, following the SFE protocol, Alice does not learn anything about Bob’s input, whereas Bob learns the value of {f_{b}(x,y)}.

One elegant construction of an SFE protocol is through garbled circuits. In this model, we will use a Boolean circuit to model the function {f}. A Boolean circuit consists of a series of logic gates: AND, OR, NOT, and so on. In our setup, a logic gate takes several input bits (0s and 1s) and produces a single output bit.  We can characterize the operation of a logic gate by writing down a truth table, that is, the gate’s output bit for each assignment to its inputs. As an example, consider an AND gate with two inputs. The output of an AND gate is 1 if both of its inputs is 1, and 0 otherwise. Thus, the truth table for an AND gate will look like the following:

Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0
1 1 1

Similarly, we can write down truth tables for OR gates, NOT gates, and so on. Then, given a function {f(x,y)}, we can construct a Boolean circuit that evaluates {f}. A Boolean circuit consists of a series of logic gates with wires connecting the inputs and outputs of the different gates. From complexity theory, we know that if there exists a polynomial-time algorithm for computing {f}, then there exists a polynomial-sized circuit that evaluates {f}. Thus, the circuit model is quite flexible and allow us to efficiently model many functions of interest.

Thus, given a function {f}, we can construct a circuit that evaluates {f}. The final ingredient we need to obtain SFE is a way of evaluating the circuit without revealing any information about the inputs. The idea here is to take a circuit for {f}, and construct a garbled circuit {\mathcal{C}} that can be used to securely evaluate {f}. In particular, we seek a circuit that may be evaluated without revealing any information about the input or intermediate bits. To do so, we transform the circuit as follows:

  1. Observe that each wire in the circuit will carry a single bit. Then, for each wire {w}, we associate two random values {k_{w}^{0}} and {k_{w}^{1}} with the wire, depending on the bit carried by the wire. In other words, if the wire is carrying a 0, we assign it the value {k_{w}^{0}} and if the wire is carrying the bit 1, we assign it the value {k_{w}^{1}}. Note that because {k_{w}^{0}} and {k_{w}^{1}} are chosen uniformly at random, given {k_{w}^{b}} for {b\in\{0,1\}} does not yield information on whether {b=0} or {b=1}.
  2. Having scrambled the values of each wire, we still need a way to evaluate each gate. Recall that the behavior of a logic gate is entirely specified by its truth table. Thus, we prescribe a method for constructing a garbled truth table for each gate in our circuit. Consider a logic gate with input wires {w_{1},\ldots,w_{n}} and output wire {z}. As before, let {b_{1},\ldots,b_{n}} be the bits carried by the wires {w_{1},\ldots,w_{n}}, respectively. Then, let {h(b_{1},\ldots,b_{n})} denote the operation of the logic gate, that is, {z=h(b_{1},\ldots,b_{n})}. For notational convenience, we will write {k_{i}} to denote {k_{w_{i}}^{b_{i}}}, that is, the value of the {i^{\mathrm{th}}} wire if it is carrying bit {b_{i}}. Then, let {h'} denote the operation of our garbled gate. One candidate definition for {h'} is to simply yield {k_{z}^{h(b_{1},\ldots,b_{n})}}, that is the value of the output wire carrying the corresponding output bit:

    \displaystyle h'\left(k_{1},\ldots k_{n}\right)=k_{z}^{h(b_{1},\ldots,b_{n})}.

    Thus, we can simply construct a table mapping each assignment {\left(k_{1},\ldots,k_{n}\right)} to the corresponding value {k_{z}^{h(b_{1},\ldots,b_{n})}}. However, in the SFE setting, the circuit evaluator should only learn the value of {k_{z}^{h(b_{1},\ldots,b_{n})}} that corresponds to his particular set of inputs and nothing else about the circuit. Given a fixed set of input bits, the evaluator should only be able to extract one set of intermediate and output values from the circuit. If both possible values of a given wire was known, the evaluator can evaluate the circuit using different intermediate states, and in doing so, potentially learn additional information about the hidden inputs to the circuit. We want the evaluator to obtain one of {k_{z}^{0}} or {k_{z}^{1}} depending on the value of {h(b_{1},\ldots,b_{n})}. To do this, we treat {k_{1},\ldots,k_{n}} as encryption keys and encrypt the value of {k_{z}^{h(b_{1},\ldots,b_{n})}} using {k_{1},\ldots,k_{n}}:

    \displaystyle h'\left(k_{1},\ldots,k_{n}\right)=\mathsf{Enc}_{k_{1}}\left(\mathsf{Enc}_{k_{2}}\left(\cdots\left(\mathsf{Enc}_{k_{n}}\left(k_{z}^{h(b_{1},\ldots,b_{n})}\right)\right)\right)\right).

    In the above, {\mathsf{Enc}_{k}(m)} refers to the encryption of a message {m} using key {k}. In words, we successively encrypt the value of {k_{z}^{h(b_{1},\ldots,b_{n})}} with keys {k_{1},\ldots,k_{n}}. Observe that in this case, if we are given the input values {k_{1},\ldots,k_{n}}, then we can look up {h'(k_{1},\ldots,k_{n})} in the truth table to obtain an encryption of {k_{z}^{h(b_{1},\ldots,b_{n})}} with respect to keys {k_{1},\ldots,k_{n}}. Since we have the keys {k_{1},\ldots,k_{n}}, we can now decrypt this ciphertext to obtain {k_{z}^{h(b_{1},\ldots,b_{n})}}. Moreover, because we only have the keys {k_{1},\ldots,k_{n}}, we can only decrypt the ciphertext encrypting {k_{z}^{h(b_{1},\ldots,b_{n})}}, and in particular, we are not able to learn any information about the other value of the output wire {z}.

  3. For the output wires {w_{\mathsf{output}}} of the circuit, we set {k_{w_{\mathsf{output}}}^{b}=b}. Specifically, the value of an output wire carrying a given bit is just the value of the bit itself. Thus, when the evaluator decrypts the output wires on the circuit, he obtains the bits corresponding to the output of the circuit.

The above description is a bit abstract, so let’s consider a concrete example and construct the garbled equivalent of the AND gate from before. First, we assign values for each of the input and output wires. Let {k_{1}^{b}} and {k_{2}^{b}} for {b\in\{0,1\}} be the values corresponding to the two input wires. Let {k_{3}^{b}} be the value corresponding to the output wire. Then, using our construction above, the truth table for the garbled AND gate will be the following:

Input 1 Input 2 Output
{k_{1}^{0}} {k_{2}^{0}} {\mathsf{Enc}_{k_{1}^{0}}\left(\mathsf{Enc}_{k_{2}^{0}}\left(k_{3}^{0}\right)\right)}
{k_{1}^{0}} {k_{2}^{1}} {\mathsf{Enc}_{k_{1}^{0}}\left(\mathsf{Enc}_{k_{2}^{1}}\left(k_{3}^{0}\right)\right)}
{k_{1}^{0}} {k_{2}^{0}} {\mathsf{Enc}_{k_{1}^{1}}\left(\mathsf{Enc}_{k_{2}^{0}}\left(k_{3}^{0}\right)\right)}
{k_{1}^{1}} {k_{2}^{1}} {\mathsf{Enc}_{k_{1}^{1}}\left(\mathsf{Enc}_{k_{2}^{1}}\left(k_{3}^{1}\right)\right)}

Observe that given the inputs, we are able to compute just a single output bit. In practice, we randomly permute the entries in the table, so the table does not leak any information about the computation performed by a given gate. This concludes the construction of a garbled circuit. What remains is to show how we can use these garbled circuits to perform SFE. Recall our previous setup: Alice and Bob each have an input {x} and {y} and would like to securely evaluate some function {f(x,y)=\left(f_{a}(x,y),f_{b}(x,y)\right)} without revealing their private input. We arbitrarily designate Alice as the sender and Bob as the receiver. The protocol then proceeds as follows:

  1. Alice generates some secret key {s} and constructs a garbled circuit {\mathcal{C}} for the function {\tilde{f}(x,y)=\left(\mathsf{Enc}_{s}(f_{a}(x,y)),f_{b}(x,y)\right)}. Alice sends the garbled circuit {\mathcal{C}} along with her garbled inputs to Bob.
  2. Next, in order to evaluate the circuit, Bob needs to obtain the values corresponding to his input. However, since Alice constructed the circuit, only she knows what which values correspond to which bits. Thus, Bob must obtain the keys corresponding to his input from Alice, but without revealing to Alice the value of his input. We can accomplish this using a 1-out-of-2 oblivious transfer (OT) protocol. In 1-out-of-2 OT, Alice holds two keys {k_{0}} and {k_{1}}, and Bob holds a bit {b\in\{0,1\}}. At the end of the OT protocol, Bob learns the value of {k_{b}} and Alice learns nothing. I will defer the discussion of 1-out-of-2 oblivious transfer protocols to a future post. For now, it is sufficient to assume that such a protocol exists. Then, for each bit of his input, Bob engages in a 1-out-of-2 OT protocol withe Alice to learn the value corresponding to his particular bit. In this way, for each of his input wires {w}, Bob learns the value {k_{w}^{b}} where {b} is the value of that particular input bit.
  3. Since Bob now has all of the garbled inputs to {\mathcal{C}}, he evaluates {\mathcal{C}} and obtains the result {\left(\mathsf{Enc}_{s}(f_{a}(x,y)),f_{b}(x,y)\right)}. Thus, Bob learns the value of {f_{b}(x,y)}. He sends {\mathsf{Enc}_{s}(f_{a}(x,y))} to Alice.
  4. Alice decrypts {\mathsf{Enc}_{s}(f_{a}(x,y))} to obtain the result {f_{a}(x,y)}.

At the end of the above protocol, observe that Alice learns the value of {f_{a}(x,y)} and Bob learns the value of {f_{b}(x,y)}. Alice does not have access to the internal state of the garbled circuit (she does not evaluate {\mathcal{C}}), so she does not learn any information about Bob’s input. From Bob’s perspective, the circuit is garbled, so Bob does not learn any information regarding Alice’s input. Finally, we note that while Bob learns {\mathsf{Enc}_{s}(f_{a}(x,y))}, this does not leak any information about {f_{a}(x,y)} by semantic security of the underlying encryption scheme.

Note that the above protocol is only secure against honest-but-curious adversaries. In particular, we assume that Alice and Bob follow the protocol as described but may inspect the messages that are sent between them in order to learn something about the other’s input or response. In particular, we do not protect against malicious adversaries that may send malformed or malicious messages. Naturally, there are constructions that are secure even against malicious adversaries, but that’s a topic for a later time.

[1] Andrew Yao. Protocols for Secure Computations. FOCS 1982.

Zero-Knowledge Proofs (Informal)

A common conversation amongst kids might go something like this (for lack of better names, I’ll call them Alice and Bob):

Alice: Hey Bob, I have a secret!
Bob: Ooh secret! I like secrets. Tell me! Tell me!
Alice: No! If I told you, it wouldn’t be a secret anymore!
Bob: Ha, I bet you don’t have a secret!
Alice: I do!
Bob: Well then, prove it!

After a few more exchanges, Alice relents and tells Bob the secret. Now, Bob goes and tells Charlie, and pretty soon, everyone learns what Alice’s secret is. Well, that was rather unfortunate for Alice.

So, the natural question is whether we can do better? This brings us to the concept of a zero-knowledge proof. Informally, zero-knowledge proofs allow Alice to convince Bob that her claim is true without revealing any information that would allow Bob to then convince someone else that her claim was true. In particular, we want to avoid the above scenario where Bob learns Alice’s secret and can now broadcast it to the world. At the same time, we want Bob to be absolutely convinced that Alice is not lying when she says she has some secret.

We consider a more concrete example of this. Alice and Bob discover a cave deep in the mountains. The cave is very small and there is only a single path leading into the cave. At some point, the path branches off into two separate paths, one to the left, and one to the right. Both paths seemingly lead to a dead end. But this is no ordinary cave, for there is actually a portal at each of the dead ends. If one speaks the magic incantation, the portals will activate, allowing one to cross from one end of the cave to the other.

One day, Alice somehow discovers the words that open the portal. Excited, she runs over to tell Bob of her discovery. But from her past experience, she knows that Bob cannot keep a secret, so she doesn’t want to directly tell Bob the magic words. Consider what would happen if she did. Then, Bob can very easily go to the cave, speak the magic words, and check that either the portal opens and Alice was telling the truth, or nothing happens, and Alice just lied to him. If Alice was indeed telling the truth, Bob can now tell all of his friends the magic words and they too can be convinced that the words are correct simply by trying to open the portal in the cave. Thus, simply telling Bob the secret is not a zero-knowledge proof.

Being a very clever person, Alice thinks of a way to let Bob know that she discovered the secret to the cave without explicitly telling Bob the magic words. When Bob asks Alice to tell him the secret, Alice instead proposes that they play the following game instead:

  1. Alice and Bob travel to the cave. Alice tells Bob to wait at the entrance. She enters the cave and walks down one of the paths.
  2. Alice calls out to Bob to tell him to enter the cave. Bob walks to the fork in the path. At the fork in the path, he announces “Left” or “Right.”
  3. Upon hearing Bob’s selection, Alice exits the cave from the path that Bob announced.

Let’s analyze this scheme in some more detail. First, suppose that Alice does indeed know the magic words. Then, when Bob makes the announcement, either Alice comes out on the same path she took when entering the cave, or she walks through the portal and emerges from the other path. Regardless of which direction Bob chooses, Alice can exit the cave on the path that Bob announced. Now suppose what happens if Alice does not in fact know the magic words, that is, she lied to Bob. In this case, when Bob announces the direction, she has only a 50% chance of coming out on the designated path. Suppose Bob chooses “left.” In this case, if Alice went down the “left” path, she can exit the cave on the path Bob selected, but if she went down the “right” path instead, then she has no choice but to exit from the same path, namely the “right” path.

Is Bob convinced at the end of all this? Well, no. If Alice did in fact emerge from the direction he selected, he might just think that Alice got lucky. And if Alice fails to come out on the designated path, she can just claim to Bob that she wasn’t able to hear Bob announce the path. But what happens if Alice and Bob continue to play this game. Surely, Alice cannot keep getting lucky and she cannot keep claiming that she is unable to hear Bob properly whenever she comes out on the wrong path. In fact, if after k rounds of the game, Bob observes that Alice is successful at exiting through the designated path, he will be more and more convinced that Alice is telling the truth. In particular, if Alice has a \frac{1}{2} chance of getting “lucky” on any given round, the chances that she will get lucky k rounds in a row is given by \frac{1}{2^k}. For sufficiently large k, this probability will become very small and Bob will be convinced that Alice has indeed discovered the secret to the cave. If Alice has not discovered the secret, then after k rounds, she will have succeeded roughly \frac{k}{2} times; if Bob indeed observes that Alice consistently fails to exit on the designated path, he will remain unconvinced that Alice is telling the truth. But observe the crucial fact here: if Alice is telling the truth, she will be able to convince Bob.

The next question then is whether all this is really “zero-knowledge.” We know that in playing this game, Bob does not learn what the magic words are. But can he still convince his friend Charlie (who has not played the game), that Alice has discovered the secret? Bob tries the following: when he and Alice play the game, he takes a video recorder and records everything he sees while he plays the game. In particular, he records the following sequences:

  1. Alice walking into the cave.
  2. Bob walking into the cave and arriving at the fork.
  3. Bob announcing his choice of direction.
  4. Alice exiting the cave.

Suppose Alice and Bob play this game 10 times and Alice successfully exits the cave from the designated direction each time. Bob captures all this on video and shows it to Charlie. Is Charlie convinced that Alice does indeed know the secret? Not surprisingly, the answer is no! In fact, after watching the video, Charlie tells Bob: “So what? I bet you and Alice prearranged the entire sequence ahead of time? Not fooling me!” Indeed, Bob’s video is convincing – who knows, Bob could have edited the video before showing it, or he may colluded with Alice to make it seem like Alice knew the secret. Thus, given only what he has seen, Bob is unable to convince anyone else of Alice’s claim. This brings us to the second important aspect of zero knowledge proofs: the fact that using the information he has gained (i.e., the video of his interactions with Alice), he is unable to convince anyone else that Alice’s claim is true. He has not gained any knowledge or information from Alice about her secret, and yet, he is unequivocally convinced that Alice possesses the secret!

Knapsack Cryptosystems (Part 2)

In this post, I continue the discussion from the previous post on knapsack cryptosystems. As discussed before, knapsack cryptosystems base their hardness on the difficulty of the subset-sum problem. However, as we have seen, in the case where the set of integers is superincreasing, the subset-problem can be solved efficiently. We can exploit this fact to construct our cryptosystem. First, we describe how we can encode messages using the subset-sum problem.

Let X=(x_1,\ldots,x_n) be a set of numbers. Suppose we want to encode an n-bit message with bits b_1,\ldots,b_n. To encrypt m, we compute

c:=\sum_{i=1}^n b_i x_i.

Observe that to recover the bits of the message b_1,\ldots,b_n given m and X amounts to solving a subset-sum problem, which is intractable for large n. Note that for this scheme to work, the subset summing to a particular c must be unique. Note that this property is satisfied if X is superincreasing.

With this preparation, the construction proceeds as follows:

  1. Begin with some superincreasing sequence X.
  2. Hide the superincreasing sequence using a linear modular transformation to produce a new sequence X' that is not superincreasing.
  3. Publish X' as the public key to the scheme. Using the secret key (used to specify the transformation in step 2), we can transform the subset-sum problem relative to X' to a subset-sum problem relative to the set X that share the same solution. Since X is superincreasing, this is an easy problem to solve and we recover the original message.

Now, we describe the details of the scheme.

Key Generation. Let X=(x_1,\ldots,x_n) be a superincreasing sequence of numbers. Choose values A and B such that B>2x_n and \gcd(A,B)=1. Compute x'_i=A x_i\pmod{B} for all i. The public key is the sequence X'=(x'_1,\ldots,x'_n) and the secret key is the tuple (A,B,X).

Encryption. To encrypt a binary-valued message b \in \{0,1\}^n, compute the ciphertext c':=b^TX'.

Decryption. To decrypt a ciphertext c', compute c:=A^{-1}c'\pmod{B}. Solve the subset-sum problem using the superincreasing set X with target value c to obtain the message b\in\{0,1\}^n.

We show that the above scheme is correct. This is easy to see; simply observe the following:

c=A^{-1}c'=A^{-1}\sum_{i=1}^n b_i x'_i=\sum_{i=1}^n b_i A^{-1}(A x_i)=\sum_{i=1}^n b_i x_i=b^TX\pmod{B}.

The solution to the subset-sum problem over set X with target c is precisely the solution to the original subset-sum problem with set X' and target c'. Finally, observe that since X is superincreasing,

b^TX=\sum_{i=1}^n b_i x_i < 2\cdot b_n < B,

by construction of B. Thus, the statement c = b^T X holds over the integers, and not just modulo B. This proves the correctness of the above scheme. As a final note, A^{-1} \pmod{B} exists since we required \gcd(A,B)=1. Security of the scheme is based on the fact that the subset-sum problem is NP-complete.

The advantage of knapsack cryptosystems is their efficiency. For instance, in the scheme described here, encryption is effectively free while decryption requires a single modular multiplication followed by an efficient linear-time algorithm for solving the superincreasing subset-sum problem. Even in variants of the knapsack cryptosystems, decryption still requires only a few modular multiplications. In comparison to RSA of Diffie-Hellman, which require many expensive modular exponentiations, the Merkle-Hellman knapsack cryptosystem is much more computationally efficient. Unfortunately, however, there are efficient lattice-based attacks on knapsack cryptosystems. As a result, knapsack cryptosystems require the use of very large parameters in order to achieve security. Consequently, secure knapsack systems are quite impractical and not used in practice. Nevertheless, their construction is still remarkable in both simplicity as well as elegance.

Knapsack Cryptosystems (Part 1)

When people think of public key cryptosystems today, the classic algorithms of RSA and Diffie-Hellman jump to mind. However, an oft-forgotten public key cryptosystem developed around the same time as RSA and Diffie-Hellman was the knapsack-based cryptosystem by Ralph Merkel and Martin Hellman. While knapsack cryptosystems have been broken by lattice-based techniques (to be discussed in a future post), their construction is still quite elegant and noteworthy. In this two-part sequence, I will outline the construction of the knapsack cryptosystem.

Knapsack cryptosystems represent one of the first examples of using an NP-complete problem to develop a secure public key scheme. In this case, we use the subset-set problem. In the subset-sum problem, we are given a list of positive integers X=(x_1,x_2,\ldots,x_n) and a target value y. The problem is to find a subset X'=(x'_1,x'_2,\ldots,x'_k)\subseteq X such that \sum_{i=1}^k x'_i=y. As a small example, let X=(3,5,6,10). If y=9, then there exists a subset of X whose sum is y=9, namely the subset X'=(3,6). If instead however, y=12, then there is no subset of X that sums to y. Note that in this case, we are not allowed to use the 6 twice. (As a side note, the subset-sum problem can be viewed as a special case of the 0-1 knapsack problem where the weights and values of each item are equal. This is where the cryptosystem derives its name.)

While the subset-sum problem is NP-complete in general, for certain sets X, the problem can be solved efficiently. One particular case is when the set of numbers X is superincreasing.

Definition. Let X=(x_1,x_2,\ldots,x_n) be a sequence of positive numbers where x_1 \le x_2 \le \cdots \le x_n. Then, we say that X is superincreasing if for all 1\le i\le n-1, we have 2\cdot x_i< x_{i+1}.

For instance, the set (1,3,7,20,50,120) is superincreasing (each number is more than twice the previous number).

A useful property of superincreasing sets is that \sum_{i=1}^k x_i< x_{k+1} for all k=1,\ldots, n-1.

Lemma. Let X=(x_1,x_2,\ldots,x_n) be a superincreasing sequence of positive numbers where x_1 \le x_2 \le \cdots \le x_n. Then, for all 1\le k<n, \sum_{i=1}^k x_i< x_{k+1}.

Proof. This is easily seen via induction on k. For k=1, we have that x_1 < 2\cdot x_1 < \cdot x_2 using the fact that x_1 is positive and the sequence is superincreasing. Now, for the inductive step, we have

\sum_{i=1}^{k+1}x_i=\sum_{i=1}^k x_i+x_{k+1}<x_{k+1}+x_{k+1}=2\cdot x_{k+1}<x_{k+2}

using the inductive hypothesis and the fact that the sequence of x_i is superincreasing.

It is not difficult to see that if X is a superincreasing set, the subset-sum problem becomes easy. Specifically, given a superincreasing set X and a target value y, we proceed as follows:

  1. Without loss of generality, let X=(x_1,x_2,\ldots,x_n) where x_1 < x_2 < \cdots < x_n. Initialize X'=().
  2. If x_n \le y, then add x_n to the set X' and subtract x_n from y. If y=0, then we have found our subset X'. Repeat this process for each of x_{n-1},x_{n-2},\ldots, x_1.

At the conclusion of the algorithm, either we will have a subset X', or we will not. If we do not, then we conclude that no such subset exists. Observe that the above algorithm is linear in the cardinality of X, so this is quite efficient.

In words, at each step, the algorithm selects the largest element in the set X that does not exceed the target y and adds it to the subset. The value of the target y is then updated. To see that this works, suppose that we have x_1<\cdots<x_k<y<x_{k+1}<\cdots < x_n. We claim that x_k must be in any subset that sums to y. First, since x_{k+1},\ldots,x_n are greater than y, they cannot be in the subset. If x_k is not in the subset, then the maximum sum of the subset is given by

\sum_{i=1}^{k-1} x_i<x_k<y

using the above lemma. Thus, any subset that does not contain x_k cannot sum to y. Thus, we see that if a subset summing to y exists, then it must contain the maximum element in the set that is less than y. To summarize, we have demonstrated that the subset problem is easy over superincreasing sets.

In the next post, I will show how we can exploit this property to construct a public key cryptosystem based on the hardness of the general subset-sum problem. The idea here is that we can choose the set X such that we can solve the subset-set problem over X by converting it to an equivalent subset-sum problem over a superincreasing set X' using some secret information (e.g., the secret key).

Pseudorandom Musings

Recently, I’ve started spending some time reading and learning more about cryptography and mathematics. As a way of condensing and organizing my thoughts, I’ve decided to perform an experiment with this whole “blogging” idea. Up until now, my notes have been scrawled on pieces of scratch paper, or occasionally, written down in a notebook; hopefully, this will be a better way for me of organizing everything. Perhaps you might find the content useful too!

Also, as the name of the blog suggests, the content I post here will be a splattering of whatever happens to be in my stream of consciousness on any given day. Not likely to be completely random, but probably not structured either!

I’ll leave you with a quote from mathematician G.H. Hardy:

A mathematician, like a painter or poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas.” ~ G.H. Hardy