4 Vector Spaces
“Matrices act. They don’t just sit there.”
– Gilbert Strang
Thus far, we developed efficient techniques to solve linear systems of equations without a thorough discussion on their solvability. We only said: If solution exists uniquely, methods like Gaussian elimination and \mathbf{LU} factorization go through seamlessly. When pivoting falls apart, however, our technique has not been mature enough to
- detect whether the system is inconsistent (no solution possible)
- return at least one solution when the system has many solutions.
In this chapter, we take a brief diversion to build the theory of vector spaces and their subspaces to be able to facilitate more concrete discussions on linear systems and a better understanding of the nature of their solutions. Our ultimate goal of the chapter is to improve the classical Gaussian elimination to make it more intelligent in the face of the missing pivots.
4.1 Vectors and Subspaces
Our story begins with vectors and the spaces they live in!
A vector, written as a column, is a collection of real numbers. For example, \mathbf{x}=\begin{bmatrix}1\\-2\\0\end{bmatrix} is a vector with 3 components. We say that the vector \mathbf{x} belongs to \mathbb\R^3, written as \mathbf{x}\in\mathbb{R}^3. So, \mathbb R^3 is the space where all vectors with 3 components live.
More generally, an n component vector \mathbf{x}=\begin{bmatrix}x_1\\\vdots\\ x_n\end{bmatrix} is an elements of the vector space \mathbb{R}^n.
You are probably thinking why call \mathbf{x} a vector, when it is basically a matrix of size n\times1. Well, vectors are, in fact, special matrices. However, for our theory, it is more convenient to designate them as vectors and an n\times m matrix as a (ordered) collection of m columns or vectors. In other words, an n\times m matrix merely holds m many vectors from the vector space \mathbb{R}^n as its columns.
In any vector space \mathbb{R}^n, the following are true:
add any two vectors to get another vector in the same vector space. This is component-wise addition.
multiply or scale a vector by a scalar c to get another vector in the same vector space. This is also component-wise multiplication.
In other words, linear combinations of vectors from a space belong to the same vector space.
The spaces \mathbb{R}^n are some very special vector spaces. The concept of vector space is much more general. Any set can be called a vector space if addition of elements and multiplication by a scalar are well-defined operations, and the set is closed under these operations. In other words, linear combinations of any elements from the set still remain in the set. Examples include: set of matrices of same size, set of real-valued functions, etc.
We will only discuss vector spaces that are found inside \mathbb{R}^n. We will call them vector subspaces.
Definition 4.1 (Subspace) A non-empty subset S of a vector space V is called a (vector) subspace if it satisfies the requirements of a vector space:
- if \mathbf{x}, \mathbf{y}\in S implies \mathbf{x} + \mathbf{y}\in S, and
- if \mathbf{x}\in S and c\in\mathbb{R} implies c\mathbf{x}\in S.
Exercise 4.1 Is the single element subset \{\mathbf{0}\} containing the zero vector of \mathbb{R}^2 a subspace of \mathbb{R}^2?
Exercise 4.2 Let us take the set of all vectors of \mathbb{R}^3 that have a zero third component. Is the set subspace of \mathbb{R}^3?
Exercise 4.3 Let us take the set of all vectors of \mathbb{R}^4 that have a positive third component. Is the set subspace of \mathbb{R}^4?
Exercise 4.4 Justify why a subspace must contain the zero vector.
4.2 Linear Combinations
A linear combination of vectors \mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}\in\mathbb{R}^m is c_1\mathbf{x_1}+c_2\mathbf{x_2}+\ldots+c_n\mathbf{x_n} for some scalars c_1, c_2, \ldots, c_n\in\mathbb{R}.
Example 4.1 (Linear Combination) Let us consider the vectors \begin{bmatrix}1\\ 5\\ 2\end{bmatrix},\begin{bmatrix}0\\ 4\\ 4\end{bmatrix}\in\mathbb{R}^3. For scalars c_1, c_2\in\mathbb{R}, the linear combination c_1\begin{bmatrix}1\\ 5\\ 2\end{bmatrix}+c_2\begin{bmatrix}0\\ 4\\ 4\end{bmatrix} can also be written as the following matrix multiplication: \begin{bmatrix}1&0\\ 5&4\\ 2&4\end{bmatrix}\begin{bmatrix}c_1\\c_2\end{bmatrix}.
From the previous example, we note that multiplication of an m\times n matrix \mathbf{A} by a vector \mathbf{x} by the produces the linear combination of the n columns of \mathbf{A} for the scalars sitting in the vector \mathbf{x}. \mathbf{A}\mathbf{x}=\begin{bmatrix}\begin{matrix}\Big|\\\mathbf{x}_1\\\Big|\end{matrix} & \begin{matrix}\Big|\\\mathbf{x}_2\\\Big|\end{matrix} &\vdots& \begin{matrix}\Big|\\\mathbf{x}_n\\\Big|\end{matrix}\end{bmatrix} \begin{bmatrix}x_1\\x_2\\\vdots\\ x_n\end{bmatrix} is the combination of the columns \mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n from the vector space \mathbb R^m.
Therefore, solving a linear system \mathbf{A}\mathbf{x}=\mathbf{b} of m equations in n unknowns amounts to finding scalars or weights \mathbf{x} so that the linear combination of the columns of \mathbf{A} is \mathbf{b}.
4.2.1 Linear Span
We now define a special subspace called linear span.
Definition 4.2 (Linear Span) The subset of all possible linear combinations of a set of vectors \mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}\in\mathbb{R}^m across all scalars c_1, c_2, \ldots, c_n\in\mathbb{R} is a subspace of \mathbb{R}^m. And, the subspace is called the (linear) span of the vectors \mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}, written as \mathcal{S}(\mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}).
Exercise 4.5 Find a set of vectors that span \mathbb R^2.
Exercise 4.6 Describe (pictorially) the span of the vectors \begin{bmatrix}1\\0\\1\end{bmatrix} and \begin{bmatrix}1\\1\\1\end{bmatrix} in \mathbb{R}^3.
Exercise 4.7 Would the span from the previous exercise change if we added another vector \begin{bmatrix}0\\1\\0\end{bmatrix} into the pile?
4.2.2 Column Space and Nullspace
Definition 4.3 (Column Space) The span of the columns of an m\times n matrix \mathbf{A} is called the column space of \mathbf A. The column space is denoted by \mathcal{C}(\mathbf A) and is a subspace of \mathbb R^m.
Definition 4.4 (Nullspace) The nullspace of an m\times n matrix \mathbf{A} consists of all vector \mathbf{x}\in\mathbb{R}^n such that \mathbf{A}\mathbf{x}=\mathbf{0}. The nullspace is denoted by \mathcal{N}(\mathbf A) and is a subspace of \mathbb R^n.
Exercise 4.8 Describe the column space and nullspace of the matrix \begin{bmatrix}1 & -1\\0 & 0\end{bmatrix}.
4.3 Linear Dependence
A set of vectors \mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}\in\mathbb{R}^m is called linearly dependent if one can find scalars c_1,c_2,\ldots,c_n (not all zero) such that the linear combination c_1\mathbf{x_1}+c_2\mathbf{x_2}+\ldots+c_n\mathbf{x_n}=0.
The vectors are called linearly independent if they are not linearly dependent.
A set of vectors is linearly dependent if and only if one of the vectors is a linear combination of the rest. Can you prove it?
Exercise 4.9 If the columns of a matrix are linearly independent, what can you say about the nullspace of the matrix?
Exercise 4.10 Do you think the vectors \begin{bmatrix}1\\ 5\\ 2\end{bmatrix},\begin{bmatrix}0\\ 4\\ 4\end{bmatrix}\in\mathbb{R}^3 are linearly dependent?
Exercise 4.11 Do you think the vectors \begin{bmatrix}1\\ 5\\ 2\\-1\end{bmatrix},\begin{bmatrix}-2\\ 4\\ 4\\0\end{bmatrix},\begin{bmatrix}-3\\ 13\\ 10\\-1\end{bmatrix}\in\mathbb{R}^4 are linearly dependent?
Exercise 4.12 Are the columns of the matrix \begin{bmatrix}1&1\\ 2&3\\1&2\end{bmatrix} linearly independent?
We now see the need to develop a computational scheme to detect linear independence of vectors. The following section gives a satisfying answer.
4.4 Solutions to General Systems
In order to make our discussion more general, we consider a rectangular system: \mathbf{A}_{m\times n}\mathbf{x}_{n\times 1}=\mathbf{b}_{m\times 1} with m equations in n unknowns. Keep in mind, however, that a rectangular systems is often more problematic than a square system.
In light of the discussion of the theory above, we can rephrase our solvability question in the following manner.
The rectangular system \mathbf{A}\mathbf{x}=\mathbf{b} is solvable (with at least one solution) if and only if the vector \mathbf{b} can be expressed as a linear combination of the columns of \mathbf{A}.
In other words, it is solvable if and only if the right-hand side \mathbf{b} belongs to the column space \mathcal{C}(\mathbf A).
In order to make the intuition more concrete, we look at the following example.
Example 4.2 Consider the following system of three equations in two unknowns: \begin{bmatrix} 1 & 0\\ 1 & 2\\ 0 & 4 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} b_1\\ b_2\\ b_3. \end{bmatrix} The column space of the coefficient matrix is a plane in the three dimensional vector space \mathbb{R}^3. The system is solvable only when the right-hand side vector \begin{bmatrix}b_1\\b_2\\b_3\end{bmatrix} is perfectly sitting in that plane. Indeed, this is a thin set of vectors \mathbf{b} that make the system solvable. A right-hand side that is just a tiny bit off the plane make the solution unattainable.
Exercise 4.13 For the above system, find an attainable right-hand side \mathbf{b}, i.e., a solution exists. Also, find an unattainable right-hand side \mathbf{b} so that a solution does not exist.
When a rectangular system \mathbf{A}\mathbf{x}=\mathbf{b} admits a solution, the solutions is unique if and only if the nullspace \mathcal{N}(\mathbf{A}) contains only the zero vector.
More precisely, if \mathbf{x}_p is a solution, then \mathbf{x}_p+\mathbf{x}_n is also a solution for any \mathbf{x}_n in the nullspace (prove it!). If \mathcal{N}(\mathbf{A}) is non-trivial, there will be infinitely many solutions.
Exercise 4.14 Consider the square system: \begin{bmatrix} 1 & 1\\ -2 & -2 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} b_1\\ b_2. \end{bmatrix} Under what condition does the system admit a solution? Assuming the condition is met for a right-hand side of your choice, find all the solutions.
4.5 Generalized Gaussian Elimination
We now modify Gaussian elimination to handle also the bad cases. In case of a missing pivot, we simply move on to the next column, signalling a warning to the user. Take the following example rectangular coefficient matrix: \mathbf{A}=\begin{bmatrix} 1 & 3 & 3 & 2\\ 2 & 6 & 9 & 7\\ -1 & -3 & 3 & 4\\ \end{bmatrix}. The first pivot is a_{11}=1, eliminating the entries below this pivot we get: \mathbf{A}\mapsto\begin{bmatrix} \mathbf{1} & 3 & 3 & 2\\ 0 & 0 & 3 & 3\\ 0 & 0 & 6 & 6\\ \end{bmatrix}. Our implementation of Gauss elimination (and \mathbf{LU} factorization) would give up at this point, since the pivot the second column is missing. If \mathbf{A} was a square matrix, it would mean that the matrix was singular, i.e., det(\mathbf{A})=0.
Nonetheless, we go on to the next column in search of the next pivot, with a note that the second column does not have a pivot! Our next pivot is a_{23}=3. As usual, eliminate the entries below to get: \mathbf{U}=\begin{bmatrix} \mathbf{1} & 3 & 3 & 2\\ 0 & 0 & \mathbf{3} & 3\\ 0 & 0 & 0 & 0\\ \end{bmatrix}. Similarly, there is no pivot for the fourth column either. With that note, our elimination process stops here, producing an upper triangular matrix \mathbf{U} as before. The only difference is the pivots are not on the principal diagonal. However, \mathbf{U} is still in echelon form.
The columns of \mathbf{A} can be partitioned into two groups:
- pivot columns containing a pivot, and
- free columns without a pivot.
The column space \mathcal{C}(\mathbf{A}) can be spanned by only the pivot columns. Moreover, the columns are linearly independent.
In the example above, the first and third columns are pivot columns; the second and fourth columns are free columns.
Definition 4.5 (Rank and Nullity of \mathbf{A}) The number of pivot columns of a matrix \mathbf{A} is called its rank. The number of free variables is called its nullity.
Exercise 4.15 Find the pivot and free columns of the matrix, and find its rank and nullity: \mathbf{A}=\begin{bmatrix} 1 & 0 & 2 & 1\\ 3 & -1 & 6 & 2\\ 5 & 2 & 10 & 7\\ \end{bmatrix}.
Exercise 4.16 Prove that a square matrix of size n \times n is invertible iff its rank is n.
One thing to note even in this degenerate (and rectangular) case is that we still have \mathbf{A}=\mathbf{LU} factorization with \mathbf{L}=\begin{bmatrix} 1 & 0 & 0\\ 2 & 1 & 0\\ -1 & 2 & 1\\ \end{bmatrix}.
Any m\times n matrix \mathbf{A} can be factored into matrices \mathbf{P}_{m\times m}, \mathbf{L}_{m\times m}, \mathbf{U}_{m\times n} using Gaussian elimination so that \mathbf{PA=LU} and
\mathbf{P} is an invertible matrix, which is just the identity is no row exchanges are required,
\mathbf{L} is an invertible lower triangular matrix having 1s along its diagonal, and
\mathbf{U} is an upper triangular matrix in echelon form.
4.5.1 Pseudocode
Exercise 4.17 Write a code for generalized Gaussian elimination. Try to (optionally) incorporate partial pivoting.
Exercise 4.18 Write a solver using the generalized \mathbf{PA}=\mathbf{LU} factorization.