7 The spectral theorem and Hadamard’s determinant problem
We start with some preliminary facts. Let \(O(n) \subseteq \mathbb {R}^{n \times n}\) be the set of real orthogonal matrices of order \(n\). Since
for \(P, Q \in O(n)\), we see that the set \(O(n)\) is a group. Regarding any matrix in \(\mathbb {R}^{n \times n}\) as a vector in \(\mathbb {R}^{n^2}\), we find that \(O(n)\) is a compact set. Indeed, as the columns of an orthogonal matrix \(Q = (q_{ij})\) are unit vectors, we have \(|q_{ij}| \le 1\) for all \(i\) and \(j\), thus \(O(n)\) is bounded. Furthermore, the set \(O(n)\) is defined as a subset of \(\mathbb {R}^{n^2}\) by the equations
hence it is closed, and thus compact.
For any real square matrix \(A\) let \(\operatorname{Od}(A) = \sum _{i \ne j} a_{ij}^2\) be the sum of the squares of the off-diagonal entries.
If \(A\) is a real symmetric \(n \times n\) matrix that is not diagonal, that is, \(\operatorname{Od}(A) {\gt} 0\), then there exists \(U \in O(n)\) such that \(\operatorname{Od}(U^T A U) {\lt} \operatorname{Od}(A)\).
We use a very clever method attributed to Carl Gustav Jacob Jacobi. Suppose that \(a_{rs} \ne 0\) for some \(r \ne s\). Then we claim that the matrix \(U\) that agrees with the identity matrix except that \(u_{rr} = u_{ss} = \cos \vartheta \), \(u_{rs} = \sin \vartheta \), \(u_{sr} = -\sin \vartheta \) does the job, for some choice of the (real) angle \(\vartheta \):
Clearly, \(U\) is orthogonal for any \(\vartheta \).
Now let us compute the \((k, \ell )\)-entry \(b_{k\ell }\) of \(U^T A U\). We have
For \(k, \ell \notin \{ r, s\} \) we get \(b_{k\ell } = a_{k\ell }\). Furthermore, we have
Similarly, one computes
It follows that
and by symmetry
We conclude that the function \(\operatorname{Od}\), which sums the squares of the off-diagonal values, agrees for \(A\) and \(U^T A U\) except for the entries at \((r, s)\) and \((s, r)\), for any \(\vartheta \). To conclude the proof we now show that \(\vartheta _0\) can be chosen suitably as to make \(b_{rs} = 0\), which will result in
as required.
Using ?? we find
For \(\vartheta = 0\) this becomes \(a_{rs}\), while for \(\vartheta = \pi /2\) it is \(-a_{rs}\). Hence by the intermediate value theorem there is some \(\vartheta _0\) between \(0\) and \(\pi /2\) such that \(b_{rs} = 0\), and we are through.
For every real symmetric matrix \(A\) there is a real orthogonal matrix \(Q\) such that \(Q^{T}AQ\) is diagonal.
The theorem follows in three quick steps. Let \(A\) be a real symmetric \(n \times n\) matrix.
Consider the map \(f_A : O(n) \to \mathbb {R}^{n \times n}\) with \(f_A(P) := P^T A P\). The map \(f_A\) is continuous on the compact set \(O(n)\), and so the image \(f_A(O(n))\) is compact.
The function \(\operatorname{Od}: f_A(O(n)) \to \mathbb {R}\) is continuous, hence it assumes a minimum, say at \(D = Q^T A Q \in f_A(O(n))\).
The value \(\operatorname{Od}(D)\) must be zero, and hence \(D\) is a diagonal matrix as required.
Indeed, if \(\operatorname{Od}(D) {\gt} 0\), then applying the Lemma we find \(U \in O(n)\) with \(\operatorname{Od}(U^T D U) {\lt} \operatorname{Od}(D)\). But
is in \(f_A(O(n))\) (remember \(O(n)\) is a group!) with \(\operatorname{Od}\)-value smaller than that of \(D\) — contradiction, and end of proof.
For any real \(n \times n\) matrix \(A = (a_{ij})\) with \(|a_{ij}| \le 1\),
The problem to find the maximum value of \(\det A\) on the set of all real \(n \times n\) matrices \(A = (a_{ij})\) with \(|a_{ij}| \le 1\) is unsolved. Since the determinant is a continuous function in the \(a_{ij}\) (considered as variables) and the matrices form a compact set in \(\mathbb {R}^{n^2}\), this maximum must exist. Furthermore, the maximum is attained for some matrix all of whose entries are \(+1\) or \(-1\), because the function \(\det A\) is linear in each single entry \(a_{ij}\) (if we keep all other entries fixed). Thus we can start with any matrix \(A\) and move one entry after the other to \(+1\) or to \(-1\), in every single step not decreasing the determinant, until we arrive at a \(\pm 1\)-matrix. In the search for the largest determinant we may thus assume that all entries of \(A\) are \(\pm 1\).
Here is the trick: Instead of \(A\) we consider the matrix \(B = A^T A = (b_{ij})\). That is, if \(c_j = (a_{1j}, a_{2j}, \dots , a_{nj})^T\) denotes the \(j\)-th column vector of \(A\), then \(b_{ij} = \langle c_i, c_j \rangle \), the inner product of \(c_i\) and \(c_j\). In particular,
and
which will come in handy in a moment.
Now we can go to work. First of all, from \(B = A^T A\) we get \(|\det A| = \sqrt{\det B}\). Since multiplication of a column of \(A\) by \(-1\) turns \(\det A\) into \(-\det A\), we see that the maximum problem for \(\det A\) is the same as for \(\det B\). Furthermore, we may assume that \(A\) is nonsingular, and hence that \(B\) is nonsingular as well.
Since \(B = A^T A\) is a symmetric matrix the spectral theorem tells us that for some \(Q \in O(n)\),
where the \(\lambda _i\) are the eigenvalues of \(B\). Now, if \(d_j\) denotes the \(j\)-th column vector of \(AQ\) (which is nonzero since \(A\) is nonsingular), then
Thus \(\lambda _1, \dots , \lambda _n\) are positive real numbers and
Whenever such a product and sum of positive numbers turn up, it is always a good idea to try the arithmetic-geometric mean inequality. In our case this gives with ??
and out comes Hadamard’s upper bound
When do we have equality in ?? or, what is the same, in ??? Easy enough: if and only if the geometric mean of the \(\lambda _i\)’s equals the arithmetic mean, or equivalently, if and only if \(\lambda _1 = \dots = \lambda _n = \lambda \). But then \(\operatorname{trace}B = n\lambda = n^2\), and so \(\lambda _1 = \dots = \lambda _n = n\). Looking at 3 this means \(Q^T B Q = n I_n\), where \(I_n\) is the \(n \times n\) identity matrix. Now recall \(Q^T = Q^{-1}\), multiply by \(Q\) on the left, by \(Q^{-1}\) on the right, to obtain
Going back to \(A\) this means that
Matrices \(A\) with \(\pm 1\)-entries that achieve equality in ?? are aptly called Hadamard matrices. So an \(n \times n\) matrix \(A\) with \(\pm 1\)-entries is a Hadamard matrix if and only if
If a Hadamard matrix of size \(n \times n\) exists for \(n {\gt} 2\), then \(n\) must be a multiple of \(4\).
A short argument shows that if \(n\) is greater than \(2\), then it must be a multiple of \(4\). Indeed, suppose that \(A\) is an \(n \times n\) Hadamard matrix, \(n \ge 2\), whose rows are the vectors \(r_1, \dots , r_n\). Clearly, multiplication of any row or column by \(-1\) gives another Hadamard matrix. So we may assume that the first row consists of \(1\)’s only. Since \(\langle r_1, r_i \rangle = 0\) for \(i \ne 1\), every other row must contain \(n/2\) \(1\)’s and \(n/2\) \(-1\)’s; in particular, \(n\) must be even. Assume now that \(n {\gt} 2\) and consider rows \(r_2\) and \(r_3\), and denote by \(a, b, c, d\) the numbers of columns that have \(\begin{pmatrix} +1 \\ +1 \end{pmatrix}\), \(\begin{pmatrix} +1 \\ -1 \end{pmatrix}\), \(\begin{pmatrix} -1 \\ +1 \end{pmatrix}\), and \(\begin{pmatrix} -1 \\ -1 \end{pmatrix}\) in rows \(2\) and \(3\), respectively. Then from \(\langle r_1, r_2 \rangle = 0\) and \(\langle r_1, r_3 \rangle = 0\) we get
which gives \(b = c, a = d\). But from \(\langle r_2, r_3 \rangle = 0\) we also have \(a+d = b+c\), resulting in \(2a = 2b\). We conclude that \(a = b = c = d = n/4\). Thus the order of the Hadamard matrix is either \(n = 1\) or \(n = 2\), or \(n = a+b+c+d = 4a\), a multiple of \(4\).
Hadamard matrices exist for all \(n = 2^m\).
Consider an \(m\)-set \(X\) and index the \(2^m\) subsets \(C \subseteq X\) in any way \(C_1, \dots , C_{2^m}\). The matrix \(A = (a_{ij})\) is defined as
We want to verify \(\langle r_i, r_j \rangle = 0\) for \(i \ne j\). From the definition,
Now, as \(C_i \ne C_j\) there exists an element \(a \in X\) with \(a \in C_i \setminus C_j\) or \(a \in C_j \setminus C_i\); suppose \(a \in C_i \setminus C_j\). Half the subsets of \(X\) contain \(a\), and half do not. Let \(C\) run through all subsets that contain \(a\), then the pairs \(\{ C, C \setminus \{ a\} \} \) will comprise all subsets of \(X\). But for each such pair \(\{ C, C \setminus \{ a\} \} \), \(|C_i \cap C| + |C_j \cap C|\) and \(|C_i \cap (C \setminus \{ a\} )| + |C_j \cap (C \setminus \{ a\} )|\) have different parity, and so the corresponding terms in the sum will sum to \(0\). But then the whole sum is \(0\), as required.
There exists an \(n \times n\) matrix with entries \(\pm 1\) whose determinant is greater than \(\sqrt{n!}\).
Let us look at all \(2^{n^2}\) matrices with \(\pm 1\)-entries and consider some averages of the determinant. The arithmetic mean \(\frac{1}{2^{n^2}} \sum _A \det A\) is \(0\) (clear?), so this is no big help. But if we consider the mean square average instead,
then things brighten up. Clearly,
so this will give us a lower bound for the maximum.
The following stunningly simple calculation of \(D_n^2\) probably appeared first in an article by George Szekeres and Paul Turán. We learnt it from a beautiful paper of Herb Wilf who heard it from Mark Kac. In the words of Mark Kac: “Just write \((\det A)^2\) out twice, interchange summation, and everything simplifies.” So we want to do just that.
From the definition of the determinant we get
where \(\sigma \) and \(\tau \) run independently through all permutations of \(\{ 1, \dots , n\} \). Interchange of summation yields
This doesn’t look too promising, but wait. Look at a fixed pair \((\sigma , \tau )\). The inner sum \(\sum _A\) is really a summation over \(n^2\) variables, one for each \(a_{ij}\):
Suppose \(\sigma (i) = k \ne \tau (i)\). Then every summand contains \(a_{ik}\), and therefore the whole sum has the factor \(\sum _{a_{ik} = \pm 1} a_{ik} = 0\), and hence is \(0\) as well. The only way that the sum fails to be \(0\) is when \(\sigma = \tau \), and everything simplifies indeed: For \(\sigma = \tau \), the inner product is \(1\) as is the term \((\operatorname{sign}\sigma )^2\). The sum in ?? is therefore
and wrapping things up we obtain
and thus the result.