3.2 Matrix multiplication ‣ Chapter 3 Matrices ‣ MATH0005 Algebra 1‣ Chapter 3 Matrices ‣ MATH0005 Algebra 1 (2024)

We are going to define a way to multiply certain matrices together.After that we will see several different ways to understand thisdefinition, and we will see how the definition arises as a kind offunction composition.

Definition 3.2.1.

Let $A=(a_{ij})$ be a $m\times n$ matrix and $B=(b_{ij})$ be an $n\times p$ matrix. Then the matrix product $A B$ is defined to bethe $m\times p$ matrix whose $i, j$ entry is

\sum_{k=1}^{n}a_{ik}b_{kj}.

(3.1)

Before we even start thinking about this definition we record one keypoint about it. There are two $n$ s in the definition above: one is thenumber of columns of $A$ and the other is the number of rows of $B$ .These really must be the same. We only define the matrixproduct $A B$ when the number of columns of $A$ equals the number of rowsof $B$ . The reason for this will become clear when we interpret matrixmultiplication in terms of function composition later.

Example 3.2.1.

The $1,2$ entry of a matrix product $A B$ is obtained by putting $i=1$ and $j=2$ in the formula (3.1).If $A=(a_{ij})$ is $m\times n$ and $B=(b_{ij})$ is $n\times p$ thenthis is

a_{11}b_{12}+a_{12}b_{22}+a_{13}b_{32}+\cdots+a_{1n}b_{n2}

You can see that we are multiplying each entry in the first row of $A$ by the corresponding entry in the second column of $B$ andadding up the results. Ingeneral, the $i, j$ entry of $A B$ is obtained by multiplying theentries of row $i$ of $A$ with the entries of column $j$ of $B$ andadding them up.

Example 3.2.2.

Let’s look at an abstract example first. Let

A=\begin{pmatrix}a_{11}&a_{12}\\a_{21}&a_{22}\end{pmatrix},B=\begin{pmatrix}b_{11}&b_{12}\\b_{21}&b_{22}\end{pmatrix}.

The number of columns of $A$ equals the number of rows of $B$ , so thematrix product $A B$ is defined, and since (in the notation of thedefinition) $m=n=p=2$ , the size of $A B$ is $m\times p$ which is $2\times 2$ . From the formula, we get

AB=\begin{pmatrix}a_{11}b_{11}+a_{12}b_{21}&a_{11}b_{12}+a_{12}b_{22}\\a_{21}b_{11}+a_{22}b_{21}&a_{21}b_{12}+a_{22}b_{22}\end{pmatrix}.

Example 3.2.3.

Making the previous example concrete, if

A=\begin{pmatrix}1&2\\3&4\end{pmatrix},B=\begin{pmatrix}5&6\\7&8\end{pmatrix}.

then $A$ is $2\times 2$ , $B$ is $2\times 2$ , so thematrix product $A B$ is defined and will be another $2\times 2$ matrix:

	$\displaystyle AB$	$\displaystyle=\begin{pmatrix}1\times 5+2\times 7&1\times 6+2\times 8\\3\times 5+4\times 7&3\times 6+4\times 8\end{pmatrix}$
		$\displaystyle=\begin{pmatrix}19&22\\43&50\end{pmatrix}.$

Matrix multiplication is so important that it is helpful to have severaldifferent ways of looking at it. The formula above is useful when wewant to prove general properties of matrix multiplication, but we canget further insight when we examine the definition carefully fromdifferent points of view.

3.2.1 Matrix multiplication happens columnwise

A very important special case of matrix multiplication is when wemultiply a $m\times n$ matrix by an $n\times 1$ column vector.Let

A=\begin{pmatrix}a&b&c\\d&e&f\end{pmatrix},\mathbf{x}=\begin{pmatrix}x\\y\\z\end{pmatrix}.

Then we have

A\mathbf{x}=\begin{pmatrix}ax+by+cz\\dx+ey+fz\end{pmatrix}

Another way to write the result of this matrix multiplication is

x\begin{pmatrix}a\\d\end{pmatrix}+y\begin{pmatrix}b\\e\end{pmatrix}+z\begin{pmatrix}c\\f\end{pmatrix}

showing that the result is obtained by adding up scalar multiples of thecolumns of $A$ . If we write $\mathbf{c}_{j}$ for the $j$ th column of $A$ then the expression

x\mathbf{c}_{1}+y\mathbf{c}_{2}+z\mathbf{c}_{3},

where we add up scalar multiples of the $\mathbf{c}_{j}$ s, is called alinear combination of $\mathbf{c}_{1}$ , $\mathbf{c}_{2}$ , and $\mathbf{c}_{3}$ . Linear combinations are a fundamental idea and we willreturn to them again and again in the rest of MATH0005.

This result is true whenever we multiply an $m\times n$ matrix and an $n\times 1$ column vector, not just in the example above.

Proposition 3.2.1.

Let $A=(a_{ij})$ be an $m\times n$ matrix and $\mathbf{x}$ an $n\times 1$ column vector with entries $x_{1},\ldots,x_{n}$ . If $\mathbf{c}_{1},\ldots,\mathbf{c}_{n}$ are the columns of $A$ then

A\mathbf{x}=\sum_{k=1}^{n}x_{k}\mathbf{c}_{k}.

Proof.

From the matrix multiplication formula(3.1) we get

A\mathbf{x}=\begin{pmatrix}\sum_{k=1}^{n}a_{1k}x_{k}\\\sum_{k=1}^{n}a_{2k}x_{k}\\\vdots\\\sum_{k=1}^{n}a_{mk}x_{k}\end{pmatrix}=\sum_{k=1}^{n}x_{k}\begin{pmatrix}a_{1k%}\\a_{2k}\\\vdots\\a_{mk}\end{pmatrix}

The column vector whose entries are $a_{1k}$ , $a_{2k}$ , … $a_{mk}$ is exactly the $k$ th column of $A$ , so this completes theproof.∎

Definition 3.2.2.

For a fixed $n$ , the standard basis vectors $\mathbf{e}_{1},\ldots,\mathbf{e}_{n}$ are the vectors

\begin{pmatrix}1\\0\\0\\\vdots\\0\end{pmatrix},\begin{pmatrix}0\\1\\0\\\vdots\\0\end{pmatrix},\ldots,\begin{pmatrix}0\\0\\\vdots\\0\\1\end{pmatrix}.

The vector $\mathbf{e}_{i}$ with a 1 in position $i$ and zeroeselsewhere is called the $i$ th standard basis vector.

For example, if $n=3$ then there are three standard basis vectors

\mathbf{e}_{1}=\begin{pmatrix}1\\0\\0\end{pmatrix},\mathbf{e}_{2}=\begin{pmatrix}0\\1\\0\end{pmatrix},\mathbf{e}_{3}=\begin{pmatrix}0\\0\\1\end{pmatrix}.

The special case of the proposition above when we multiply a matrix by astandard basis vector is often useful, so we’ll record it here.

Corollary 3.2.2.

Let $A$ be a $m\times n$ matrix and $\mathbf{e}_{j}$ the $j$ thstandard basis vector of height $n$ . Then $A\mathbf{e}_{j}$ is equalto the $j$ th column of $A$ .

Proof.

According to Proposition 3.2.1 we have $A\mathbf{e}_{j}=\sum_{k=1}^{n}x_{k}\mathbf{c}_{k}$ where $x_{k}$ is the $k$ thentry of $\mathbf{e}_{j}$ and $\mathbf{c}_{k}$ is the $k$ th column of $A$ . The entries of $\mathbf{e}_{j}$ are all zero except for the $j$ thwhich is 1, so

A\mathbf{e}_{j}=0\times\mathbf{c}_{1}+\cdots+1\times\mathbf{c}_{j}+\cdots+0%\times\mathbf{c}_{n}=\mathbf{c}_{j}.\qed

Example 3.2.4.

Let $A=\begin{pmatrix}1&2\\3&4\end{pmatrix}$ . You should verify that $A\begin{pmatrix}1\\0\end{pmatrix}$ equals the first column of $A$ and $A\begin{pmatrix}0\\1\end{pmatrix}$ equals the second column of $A$ .

Proposition 3.2.1 is important it lets us show thatwhen we do any matrix multiplication $A B$ , we can do the multiplicationcolumn-by-column.

Theorem 3.2.3.

Let $A$ be an $m\times n$ matrix and $B$ an $n\times p$ matrix withcolumns $\mathbf{d}_{1},\ldots,\mathbf{d}_{p}$ . Then

AB=\begin{pmatrix}|&\cdots&|\\A\mathbf{d}_{1}&\cdots&A\mathbf{d}_{p}\\|&\cdots&|\end{pmatrix}.

The notation means that the first column of $A B$ is equal to what you getby multiplying $A$ into the first column of $B$ , the second column of $A B$ is what you get by multiplying $A$ into the second column of $B$ ,and so on. That’s what it means to say that matrix multiplication workscolumnwise.

Proof.

From the matrix multiplication formula(3.1) the $j$ th column of $A B$ hasentries

\begin{pmatrix}\sum_{k=1}^{n}a_{1k}b_{kj}\\\sum_{k=1}^{n}a_{2k}b_{kj}\\\vdots\\\sum_{k=1}^{n}a_{mk}b_{kj}\end{pmatrix}

(3.2)

The entries $b_{kj}$ for $k=1,2,\ldots,n$ are exactly the entriesin column $j$ of $B$ , so (3.2) is $A\mathbf{d}_{j}$ as claimed.∎

Corollary 3.2.4.

Every column of $A B$ is a linear combination of the columns of $A$ .

Proof.

Theorem 3.2.3 tells us that each column of $A B$ equals $A\mathbf{d}$ for certain vectors $\mathbf{d}$ , andProposition 3.2.1 tells us that any such vector $A\mathbf{d}$ is a linear combination of the columns of $A$ .∎

Example 3.2.5.

Let’s look at how the Proposition and the Theorem in this sectionapply to Example 3.2.3, when $A$ was $\begin{pmatrix}1&2\\3&4\end{pmatrix}$ and the columns of $B$ are $\mathbf{d}_{1}=\begin{pmatrix}5\\7\end{pmatrix}$ and $\mathbf{d}_{2}=\begin{pmatrix}6\\8\end{pmatrix}$ .

You can check that

	$\displaystyle A\mathbf{d}_{1}$	$\displaystyle=\begin{pmatrix}19\\43\end{pmatrix}$
		$\displaystyle=5\begin{pmatrix}1\\3\end{pmatrix}+7\begin{pmatrix}2\\4\end{pmatrix}$
	$\displaystyle A\mathbf{d}_{2}$	$\displaystyle=\begin{pmatrix}22\\50\end{pmatrix}$
		$\displaystyle=6\begin{pmatrix}1\\3\end{pmatrix}+8\begin{pmatrix}2\\4\end{pmatrix}$

and that these are the columns of $A B$ we computed before.

3.2.2 Matrix multiplication happens rowwise

There are analogous results when we multiply an $1\times n$ row vectorand an $n\times p$ matrix.

Proposition 3.2.5.

Let $\mathbf{a}$ be a $1\times n$ row vector with entries $a_{1},\ldots,a_{n}$ and let $B$ be an $n\times p$ matrix with rows $\mathbf{s}_{1},\ldots,\mathbf{s}_{n}$ . Then $\mathbf{a}B=\sum_{k=1}^{n}a_{k}\mathbf{s}_{k}$ .

Proof.

From the matrix multiplication formula(3.1) we get

	$\displaystyle\mathbf{a}B$	$\displaystyle=\begin{pmatrix}\sum_{k=1}^{n}a_{k}b_{k1}&\cdots&\sum_{k=1}^{n}a_%{k}b_{kp}\end{pmatrix}$
		$\displaystyle=\sum_{k=1}^{n}a_{k}\begin{pmatrix}b_{k1}&\cdots&b_{kp}\end{pmatrix}$
		$\displaystyle=\sum_{k=1}^{n}a_{k}\mathbf{s}_{k}.\qed$

In particular, $\mathbf{a}B$ is a linear combination of the rows of $B$ .

Theorem 3.2.6.

Let $A$ be a $m\times n$ matrix with rows $\mathbf{r}_{1},\ldots,\mathbf{r}_{m}$ and let $B$ be an $n\times p$ matrix. Then

AB=\begin{pmatrix}\mbox{---}&\mathbf{r}_{1}B&\mbox{---}\\\cdots&\cdots&\cdots\\\mbox{---}&\mathbf{r}_{m}B&\mbox{---}\end{pmatrix}

The notation is supposed to indicate that the first row of $A B$ is equalto $\mathbf{r}_{1}B$ , the second row is equal to $\mathbf{r}_{2}B$ , and soon.

Proof.

From the matrix multiplication formula(3.1), the $i$ th row of $A B$ hasentries

	$\displaystyle\begin{pmatrix}\sum_{k=1}^{n}a_{ik}b_{k1}&\cdots&\sum_{k=1}^{n}a_%{ik}b_{kp}\end{pmatrix}$		(3.4)
	$\displaystyle=\sum_{k=1}^{n}a_{ik}\begin{pmatrix}b_{k1}&\cdots&b_{kp}\end{%pmatrix}.$		(3.6)

Row $i$ of $A$ is $\mathbf{r}_{i}=\begin{pmatrix}a_{i1}&a_{i2}&\cdots&a_{in}\end{pmatrix}$ , so $\mathbf{r}_{i}B$ agrees with (3.6)by Proposition 3.2.5.∎

The theorem combined with the proposition before it show that ingeneral the rows of $A B$ are always linear combinations of the rows of $B$ .

Example 3.2.6.

Returning to the example where

A=\begin{pmatrix}1&2\\3&4\end{pmatrix},B=\begin{pmatrix}5&6\\7&8\end{pmatrix}

the rows of $A$ are $\mathbf{r}_{1}=\begin{pmatrix}1&2\end{pmatrix}$ and $\mathbf{r}_{2}=\begin{pmatrix}3&4\end{pmatrix}$ and the rows of $B$ are $\mathbf{s}_{1}=\begin{pmatrix}5&6\end{pmatrix}$ and $\mathbf{s}_{2}=\begin{pmatrix}7&8\end{pmatrix}$ . We have

	$\displaystyle\mathbf{r}_{1}B$	$\displaystyle=\begin{pmatrix}1&2\end{pmatrix}\begin{pmatrix}5&6\\7&8\end{pmatrix}$
		$\displaystyle=\mathbf{s}_{1}+2\mathbf{s}_{2}$
		$\displaystyle=\begin{pmatrix}19&22\end{pmatrix}$
	$\displaystyle\mathbf{r}_{2}B$	$\displaystyle=\begin{pmatrix}3&4\end{pmatrix}\begin{pmatrix}5&6\\7&8\end{pmatrix}$
		$\displaystyle=3\mathbf{s}_{1}+4\mathbf{s}_{2}$
	$\displaystyle=\begin{pmatrix}43&50\end{pmatrix}.$

and these are the rows of the matrix product $A B$ .

Example 3.2.7.

When the result of a matrix multiplication is a $1\times 1$ matrix wewill usually just think of it as a number. This is like a dot product,if you’ve seen those before.

\begin{pmatrix}1&2&3\end{pmatrix}\begin{pmatrix}4\\5\\6\end{pmatrix}=1\times 4+2\times 5+3\times 6=32.

Example 3.2.8.

Let $A=\begin{pmatrix}1&2\\3&4\\5&6\end{pmatrix}$ , a $3\times 2$ matrix, and $\mathbf{c}=\begin{pmatrix}7\\8\end{pmatrix}$ , a $2\times 1$ column vector. The number of columns of $A$ and the number of rows of $\mathbf{c}$ are equal, so we can compute $A\mathbf{c}$ .

A\mathbf{c}=\begin{pmatrix}1\times 7+2\times 8\\3\times 7+4\times 8\\5\times 7+6\times 8\end{pmatrix}.

Example 3.2.9.

Let

A=\begin{pmatrix}1&2\end{pmatrix},B=\begin{pmatrix}1&0&1\\0&1&0\end{pmatrix}.

$A$ is $1\times 2$ , $B$ is $2\times 3$ , so the matrixproduct $A B$ is defined, and is a $1\times 3$ matrix.The columns of $B$ are $\mathbf{c}_{1}=\begin{pmatrix}1\\0\end{pmatrix}$ , $\mathbf{c}_{2}=\begin{pmatrix}0\\1\end{pmatrix}$ , and $\mathbf{c}_{3}=\begin{pmatrix}1\\0\end{pmatrix}$ . The product $A B$ is therefore

	$\displaystyle\begin{pmatrix}A\mathbf{c}_{1}&A\mathbf{c}_{2}&A\mathbf{c}_{3}%\end{pmatrix}$	$\displaystyle=\begin{pmatrix}1\times 1+2\times 0&1\times 0+2\times 1&1\times 1%+2\times 0\end{pmatrix}$
		$\displaystyle=\begin{pmatrix}1&2&1\end{pmatrix}$

Example 3.2.10.

Let

A=\begin{pmatrix}1&2\\3&4\end{pmatrix},B=\begin{pmatrix}5&6\\7&8\end{pmatrix}.

Then $A$ is $2\times 2$ , $B$ is $2\times 2$ , so thematrix product $A B$ is defined and will be another $2\times 2$ matrix:

AB=\begin{pmatrix}1\times 5+2\times 7&1\times 6+2\times 8\\3\times 5+4\times 7&3\times 6+4\times 8\end{pmatrix}.

3.2.3 Matrix multiplication motivation

In this section we’ll try to answer two questions: where does this strange-looking notion of matrix multiplication comefrom? Why can we only multiply $A$ and $B$ if the number of columns of $A$ equals the number of rows of $B$ ?

Definition 3.2.3.

Let $A$ be a $m\times n$ matrix. Then $T_{A}:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is the function defined by

T_{A}(\mathbf{x})=A\mathbf{x}.

Notice that this definition really does make sense. If $\mathbf{x}\in\mathbb{R}^{n}$ then it is an $n\times 1$ column vector, so the matrixproduct $A\mathbf{x}$ exists and has size $m\times 1$ , so it is an element of $\mathbb{R}^{m}$ .

Now suppose we have an $m\times n$ matrix $A$ and a $q\times p$ matrix $B$ , so that $T_{A}:\mathbb{R}^{n}\to\mathbb{R}^{m}$ and $T_{B}:\mathbb{R}^{p}\to\mathbb{R}^{q}$ . Can we form the composition $T_{A}\circ T_{B}$ ? The answer is no, unless $q=n$ , that is, unless the number ofcolumns of $A$ equals the number of rows of $B$ . So let’s assume that $q=n$ so that $B$ is $n\times p$ and the composition

T_{A}\circ T_{B}:\mathbb{R}^{n}\to\mathbb{R}^{p}

makes sense. What can we say about it?

Theorem 3.2.7.

If $A$ is $m\times n$ and $B$ is $n\times p$ then $T_{A}\circ T_{B}=T_{AB}$ .

You will prove this on a problem sheet.

The theorem shows that matrix multiplication is related to compositionof functions. That’s useful because it suggests something: we know thatfunction composition is always associative, so can we use that to showmatrix multiplication is associative too? That is, if the products $A B$ and $B C$ make sense, is $A(BC)$ equal to $(AB)C$ ? This is not exactlyobvious if you just write down the horrible formulas for the $i$ , $j$ entries of both matrices. If we believe the theorem though it’s easy: weknow

T_{A}\circ(T_{B}\circ T_{C})=(T_{A}\circ T_{B})\circ T_{C}

because function composition is associative, and so

	$\displaystyle T_{A}\circ T_{BC}$	$\displaystyle=T_{AB}\circ T_{C}$
	$\displaystyle T_{A(BC)}$	$\displaystyle=T_{(AB)C}.$

If $T_{X}=T_{Y}$ then $X=Y$ (for example, you could evaluate at thestandard basis vector $\mathbf{e}_{j}$ to see that the $j$ th column of $X$ equals the $j$ th column of $Y$ for any $j$ ), so we get $A(BC)=(AB)C$ .

Since we didn’t prove the theorem here, we’ll prove the associativityresult in a more pedestrian way in the next section.

3.2 Matrix multiplication ‣ Chapter 3 Matrices ‣ MATH0005 Algebra 1‣ Chapter 3 Matrices ‣ MATH0005 Algebra 1 (2024)

Definition 3.2.1.

Example 3.2.1.

Example 3.2.2.

Example 3.2.3.

3.2.1 Matrix multiplication happens columnwise

Proposition 3.2.1.

Proof.

Definition 3.2.2.

Corollary 3.2.2.

Proof.

Example 3.2.4.

Theorem 3.2.3.

Proof.

Corollary 3.2.4.

Proof.

Example 3.2.5.

3.2.2 Matrix multiplication happens rowwise

Proposition 3.2.5.

Proof.

Theorem 3.2.6.

Proof.

Example 3.2.6.

Example 3.2.7.

Example 3.2.8.

Example 3.2.9.

Example 3.2.10.

3.2.3 Matrix multiplication motivation

Definition 3.2.3.

Theorem 3.2.7.

References