[Mathematics for Machine Learnig] Linear Algebra (1)

1. Linear Algebra - (1)

Matrices

์ด๋ฒˆ ์‹œ๊ฐ„์—๋Š” ๊ธฐ๊ณ„ํ•™์Šต์„ ์œ„ํ•œ ์ˆ˜ํ•™ ์ค‘, ์„ ํ˜•๋Œ€ ์ˆ˜ํ•™์˜ Matrix ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ์ ์ธ ๊ฐœ๋…๋“ค์„ ์•Œ์•„๋ณธ๋‹ค.

Inverse

Definition of Inverse:

  • Consider a square matrix AโˆˆRnร—nA\in\mathbb{R}^{n\times n}, Let matrix BโˆˆRnร—nB\in\mathbb{R}^{n\times n} have the property that AB=In=BAAB=I_n=BA.\ BB is called the inverse of AA and denoted by Aโˆ’1A^{-1}

์ฆ‰ nร—nn\times n์˜ ์ •๋ฐฉํ–‰๋ ฌ AA ์— ๋Œ€ํ•˜์—ฌ ํ–‰๋ ฌ๊ณฑ์„ ํ–ˆ์„ ๋•Œ Identity matrix๊ฐ€ ๋‚˜์˜ค๊ฒŒ ํ•˜๋Š” ์ •๋ฐฉํ–‰๋ ฌ BB ๋ฅผ AA์˜ inverse matrix ๋ผ๊ณ  ํ•œ๋‹ค.

์—ญํ–‰๋ ฌ์€ ์ฃผ์–ด์ง„ ํ–‰๋ ฌ์ด ๊ฐ€์—ญ์ (invertible)์ผ ๋•Œ๋งŒ ์กด์žฌ.

๊ฐ€์—ญ์„ฑ์€ ํ–‰๋ ฌ์‹(Determinant) ๋ฅผ ํ†ตํ•ด์„œ ํŒ๋‹จ ๊ฐ€๋Šฅํ•˜๊ณ  (det(A)!=0det(A) !=0 โ†’ invertible), ์ด๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ–‰๋ ฌ์ด ์ •๋ฐฉํ–‰๋ ฌ์ด์—ฌ์•ผํ•จ.

(p.s. ์ •๋ฐฉํ–‰๋ ฌ์ด ์•„๋‹Œ ํ–‰๋ ฌ์‹์—์„œ๋Š” Moore-Penrose pseudo-inverse ๋ฅผ ํ†ตํ•ด์„œ ํ•ด๋ฅผ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ์Œ.)

  • ๋ชจ๋“  matrix๊ฐ€ inverse matrix๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค.
  • ๋งŒ์•ฝ inverse matrix๊ฐ€ ์กด์žฌํ•œ๋‹ค๋ฉด, ๊ทธ matrix๋Š” regular/invertible/nonsingular ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.
  • ๋งŒ์•ฝ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, singular/noninvertible ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

Transpose

Definition of Transpose:

  • For AโˆˆRmร—nA\in\mathbb{R^{m\times n}} the matrix BโˆˆRmร—nB\in\mathbb{R^{m\times n}} with bij=ajib_{ij} = a_{ji} is called the transpose of AA. We write BT=AB^T = A

์ฆ‰ Matrix ์›์†Œ๋“ค์˜ Row์™€ Column์ด ๋ฐ”๋€ ์ƒํƒœ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

  • ๋งŒ์•ฝ matrix AโˆˆRmร—nA\in\mathbb{R^{m\times n}} ์— ๋Œ€ํ•ด์„œ A=ATA=A^T ๋ผ๋ฉด AA ๋Š” symmetric ์ด๋ผ๊ณ  ํ•œ๋‹ค.

vanilla python์œผ๋กœ matrix์˜ upper triangle๋งŒ ํƒ์ƒ‰ํ•˜์—ฌ ๊ฐ„๋‹จํ•˜๊ฒŒ transpose๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. (or numpy.array.T)

#nxn matrix
for i in range(n): #row
	for j in range(i+1, n): #col
		matrix[i][j], matrix[j][i] = matrix[j][i], matrix[i][j]

Compact representation of system of linear equations

์šฐ๋ฆฌ๋Š” Matrix์™€ Vector๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ linear equations๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์€ Linear equation์€

image.png

์•„๋ž˜์™€ ๊ฐ™์ด Ax=bAx=b ํ˜•ํƒœ์˜ Matrix์™€ Vector ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค.

image.png

์˜ˆ๋ฅผ ๋“ค์–ด

image.png

์œ„์˜ System์€ 2๊ฐœ์˜ equation๊ณผ 4๊ฐœ์˜ unknown variable์ด ์กด์žฌํ•œ๋‹ค. (์ฆ‰ ๋ฌด์ˆ˜ํ•œ ํ•ด(solutions)๊ฐ€ ์กด์žฌ)

์ด ๋ฌธ์ œ๋Š” ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ โˆ‘i=04xici=b\sum_{i=0}^4 x_ic_i=b ์˜ ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ cic_i๋Š” ii๋ฒˆ์งธ Column์ด๊ณ  bb ๋Š” [42,8]T[42, 8]^T ์ด๋‹ค.

๊ทธ๋Ÿผ ์ด ์‹์„ ์–ด๋–ป๊ฒŒ ํ’€ ์ˆ˜ ์žˆ์„๊นŒ?


Particular Solution

Particular Solution(ํŠน๋ณ„ํ•ด)์€ ๋ง ๊ทธ๋Œ€๋กœ ์–ด๋– ํ•œ Linear equation system์„ ๋งŒ์กฑํ•˜๋Š” ํ•˜๋‚˜์˜ ํŠน์ •ํ•œ ํ•ด ์ด๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ๋ฐ”๋กœ ์•ž์ „์˜ linear equation โˆ‘i=04xici=b\sum_{i=0}^4 x_ic_i=b ์—์„œ

image.png

์œ„ linear equation์„ ๋งŒ์กฑํ•˜๋Š” ํ•ด๋Š” single 1 ์„ ํฌํ•จํ•˜๋Š” column๋“ค์„ ํ†ตํ•ด์„œ ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด ๊ฒฝ์šฐ x=[42,8,0,0]Tx=[42, 8, 0, 0]^T ๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค.


General Solution

General Solution(์ผ๋ฐ˜ํ•ด) ๋Š” Linear equation system์„ ๋งŒ์กฑํ•˜๋Š” ๋ชจ๋“  ํ•ด๋“ค์˜ ์ง‘ํ•ฉ ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๊ฐ„๋‹จํ•œ ์•„์ด๋””์–ด๋กœ, Linear equation์—์„œ ์ด๋ฏธ ๊ตฌํ•œ Particular solution์— 0์„ ๋”ํ•˜๋Š” ๊ฒƒ์€ ๊ธฐ์กด ํ•ด๊ฐ€ ์„ฑ๋ฆฝํ•˜๋Š”๋ฐ ์— ์žˆ์–ด์„œ ์•„๋ฌด๋Ÿฐ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์„๊ฒƒ์ด๋‹ค.

๋”ฐ๋ผ์„œ ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ Particular solution์—์„œ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” ๋ณ€์ˆ˜ (์œ„์˜ ์˜ˆ์ œ์—์„œ x3,x4x_3, x_4) ๋ฅผ linear combination์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ , ์ž์œ ๋ณ€์ˆ˜ ฮปi\lambda_i ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ homogeneous solution ์„ ์ถ”๊ฐ€ํ•ด ๋” ๋งŽ์€ ํ•ด๋“ค์˜ ์ง‘ํ•ฉ ฮป\lambda ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

์•ž์„  ์˜ˆ์ œ์—์„œ x3[8,2]Tx_3[8, 2]^T ๋Š” 8ร—[1,0]T8\times [1, 0]^T (which is c1c_1) + 2ร—[0,1]T2\times [0, 1]^T (which is c2c_2) ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

โˆด\therefore ฮป1(8c1+2c2โˆ’c3)=0\lambda_1(8c_1+2c_2-c_3)=0

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ 4๋ฒˆ์งธ ํ•ญ๋„ ์•„๋ž˜์™€ ๊ฐ™์ด ํ‘œํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค.

โˆดฮป2(โˆ’4c1+12c2โˆ’c4)=0\therefore\lambda_2(-4c_1+12c_2-c_4)=0

์ตœ์ข…์ ์œผ๋กœ general solution์€ ์•„๋ž˜์ฒ˜๋Ÿผ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค.

image.png


Row-Echelon Form

Linear equation์˜ Solution์„ ๊ตฌํ• ๋•Œ์—๋Š” Matrix๋ฅผ Reduced Row-Echelon Form์œผ๋กœ ๋งŒ๋“ค๋ฉด ๋งค์šฐ ์œ ์šฉํ•˜๋‹ค.

์—ฌ๊ธฐ์„œ (Reduced) Row-Echelon Form์€ ๋ญ˜๊นŒ?

Definition of Row-Echelon Form (REF):

  • All rows that contains only zeros are at the bottom of the matrix; correspondingly, all rows that contains at least one nonzero element are on top of rows that contain only zeros.
  • Looking at nonzero rows only, the first nonzero number from the left (also called pivot) is always strictly to the right of the pivot of the row above it.

์ฐธ๊ณ : REF ์—์„œ Pivot value์— ๋Œ€ํ•œ ์ •์˜๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

*Technically, theย leading coefficientย can be any number. However, the majority of Linear Algebra textbooks do state that the leading coefficient must be the number 1. To add to the confusion, some definitions of row echelon form state that there must be zeros both aboveย andย below the leading coefficient. Itโ€™s therefore best to follow the definition given in the textbook youโ€™re following (or the one given to you by your professor). ref;* statisticshowto

์˜ˆ๋ฅผ ๋“ค์–ด,

image.png

์œ„์˜ Matrix๋Š” ๋ชจ๋“  Pivot ์ด ์ด์ „ row ์˜ Pivot ์œ„์น˜๋ณด๋‹ค ์˜ค๋ฅธ์ชฝ์— ์žˆ๊ณ ,

๋ชจ๋“  ๊ฐ’์ด 0์ธ Row๊ฐ€ ์ œ์ผ ๋ฐ‘์— ์กด์žฌํ•˜๋ฏ€๋กœ Row-Echelon Form ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

Definition of Reduced Row-Echelon Form (RREF):

  • It is in row-echelon form
  • Every pivot is 1.
  • The pivot is the only nonzero entry in its column.

์˜ˆ๋ฅผ ๋“ค์–ด,

image.png

์œ„์˜ Matrix๋Š” 1) ๋ชจ๋“  Pivot value๊ฐ€ 1์ด๊ณ , 2) ์ด์ „ row์˜ Pivot๋ณด๋‹ค ์˜ค๋ฅธ์ชฝ์— ์žˆ์œผ๋ฉฐ, 3) ๊ฐ Pivot์˜ Column์€ pivot์ด์™ธ์˜ ๊ฐ’๋“ค์ด ๋ชจ๋‘ 0์ด๊ธฐ ๋•Œ๋ฌธ์— Reduced Row-Echelon Form ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

Reduced Row-Echelon Form์—์„œ๋Š” Linear equation์˜ ํ•ด๋ฅผ ๋งค์šฐ ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ๊ฐ์˜ ii ๋ฒˆ์งธ Pivot์ด xix_i ์˜ ํ•ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฏ€๋กœ

x1=โˆ’2,ย x3=1,ย x4=โˆ’2x_1 = -2,\ x_3=1,\ x_4=-2 ์ด๊ณ  Pivot์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š” 2๋ฒˆ์งธ Column์€ free variable์ด ๋˜๋ฏ€๋กœ ๋‹จ์ˆœํžˆ 0 ์„ ๋Œ€์ž…ํ•ด x2=0x_2=0 ์„ ์–ป์œผ๋ฉด ํ•˜๋‚˜์˜ Particular Solution์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Free variable์„ ํ™œ์šฉํ•œ General Solution๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค

x1=2x2โˆ’2,x3=1,x4=โˆ’2x_1=2x_2-2,\\ x_3=1,\\ x_4=-2

์—ฌ๊ธฐ์„œ x2x_2 ๋Š” Free variable์ด๋ฏ€๋กœ x2=ฮปx_2=\lambda๋ผ๊ณ  ํ•˜๋ฉด

(x1,x2,x3,x4)=2ฮปโˆ’2,ย ฮป,1,ย โˆ’2(x_1, x_2, x_3, x_4)=2\lambda-2,\ \lambda, 1,\ -2

์ฆ‰ xโˆˆR4:x=[โˆ’2,0,1,โˆ’2]T+ฮป[โˆ’2,โˆ’1,0,0]Tx\in\mathbb{R}^4 : x= [-2, 0, 1, -2]^T + \lambda[-2, -1, 0, 0]^T

ํ˜น์€ Minus one Trick์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง๊ด€์ ์œผ๋กœ ๊ตฌํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

Free variable์ด ์œ„์น˜ํ•˜๋Š” Column์— -1 ์„ ๋„ฃ์–ด์ฃผ๋ฉด

image.png

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ xโˆˆR4:x=[โˆ’2,0,1,โˆ’2]T+ฮป[โˆ’2,โˆ’1,0,0]Tx\in\mathbb{R}^4 : x= [-2, 0, 1, -2]^T + \lambda[-2, -1, 0, 0]^T ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Reduced Row-Echelon Form์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” Gaussian elimination algorithm์„ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

Ref: Wikipedia


Ref:

  • POSTECH CSED343 (Prof. Dongwoo Kim)
  • Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Cambridge University Press 2020