Foundation of Neural Networks

1. Neural Network Concept

As discussed by IBM, neural networks are a fundamental component of deep learning models. They consists of node layers. Each node is designated a certain weight and threshold. As shown below, the input layers are related to the output layers via multiple hidden layers:

imgNeural

From paper “Using Deep Learning to Localize Gravitational Wave Sources”

2. Neural Model

As shown in the above image, a neuron (node) is fed inputs. Additionally, each node has an inherent weight ($ w $). The multiplication of the input with its corresponding weight, plus the addition of a bias ($b$), yields an expression for the output of a neuron.

Output of a Neuron

\[O_{i,j} = x_{i,k} w_{k,j} + b_{j}\]

We can deduce from this equation, neuron output is inherently matrix multiplication.

from Fast AI Textbook - Chapter 17

3. Broadcasting

Tensors (matrices) will not always have the same rank. In a situation where two tensors in an operation do not have the same shape, broadcasting is used to make sure they are compatible with one another.

There are several types of broadcasting:

3.1 Broadcasting with a Scalar

Consider when an array and scalar are both involved in an operation.

a = [1, 2, 3]
b = 2
c = a * b

In this situation, broadcasting rules stipulate the scalar must be extended into an array of the same size as the matrix/array, as shown in the conceptual diagram below:

img link

from Numpy Documentation

3.2 Vector to Matrix

3.2.1 Matrix & Matrix Broadcasting Rules

3.2.1.1 Rules

Tensors are compatible for arithmetic operations when:

Dimensions are identical, or
Dimensions of one of the matrices is 1

Provided these conditions are met, the output matrix:

“Will have same number of dimensions as the input array with the greatest number of dimensions” (Numpy, 2023)

3.2.1.2 Example:

We can demonstrate this principle in the following example, with the multiplication of a 4D matrix and 3D matrix:

Matrix A: 4 x 1 x 3 x 1

Matrix B: ? x 2 x 1 x 5

Result C: 4 x 2 x 3 x 5

As higlighted in green the size of each dimension in the resulting matrix corresponds to the largest size of the input matrices.

Also take note of the ‘?’ in Matrix B. In such a situation, where the dimension is unknown, it is assumed to be size 1.

A visual depiction of this broadcasting technique has been shown below:

matrixImg

Image from Numpy Documentation

Reference List

https://numpy.org/doc/stable/user/basics.broadcasting.html

https://nbviewer.org/github/fastai/fastbook/blob/master/17_foundations.ipynb

https://www.researchgate.net/publication/335855384_Using_Deep_Learning_to_Localize_Gravitational_Wave_Sources/figures?lo=1

https://www.ibm.com/topics/neural-networks