The Neural Network Used
For our pattern recognition task, we shall consider a multilayer, feedforward (i.e. with no
feedback loops) neural network, as depicted in the figure below. It has an input layer
(where the external input vector is applied to the network), an output layer (of neurons,
whose outputs constitute the output of the network) and intermediate hidden layers of
neurons. We will consider a fully connected network, in which the output of each neuron
in a hidden layer is applied to every neuron in the next layer. Such a network is also
called a Multilayer Perceptron.

It is, at this stage beyond us to justify why such a structure of a neural network is
employed for our pattern recognition task. All we can say is, such networks are known to
do a good job, with suitable choice of synaptic weights and biasing parameters, and that
algorithms for evaluating these parameters optimally are known. We shall discuss in
detail one such training algorithm a little later, however, let us first make clear what
these algorithms do for us. We start off with a set of input vectors, and the corresponding
responses we expect from the system. For instance, with forex forecasting, if we have
determined that factors a, b, c and d determine tomorrow'sforex rate, we start off with a
set (typically large) of previously observed values of a, b, c and d and the
correspondingly observed forex rate of the next day. This constitutes what is called the
training data. The learning algorithms adjust the free parameters of the network so that
the responses of the network to the input samples in the training data are close to the
corresponding desired responses.
Now, once the training is done, what happens when the network is presented with an
input vector not encountered before during training? It is observed that under certain
conditions, the response of the neural network is actually close to the response we would
expect for that input. This property of the trained neural network is called generalization.
It is this generalization we rely on when we input, say today'svalues of the factors a, b, c,
and d to get a forecast of tomorrow'sforex rate. Neural networks are able to pick up very
complex dependencies of the dependent variable(s) on the independent variables (that
constitute the components of the input vector), and hence do a better job at function
extrapolation / interpolation than conventional regression techniques.
A natural thought arises: surely this kind of generalization must demand certain
conditions on the function involved and the training data used. The answer is yes, and
some of the intuitive conditions are:
1. The components of the input vector must be chosen, so that a dependence does in fact
exist, between these and the dependent variable. You obviously will not expect the
neural network to generalize well with incomplete input dependencies. For example,
to say the forex rate depends only on inflation rate and price index in the two countries
is of course grossly crude. Hence, if you train a network with only these as input
fields, you can't expect the network top redict forex rates accurately.
2. The data samples must be large in number and well spread over the input space.
3. The function involved must be smooth, i.e. a small change in one of the factors should
cause a small change in the dependent variable. An arbitrary function obviously can't
be estimated from its samples!
Now, we move on to a study of the Back Propagation Algorithm, one of the algorithms
used to train a multilayer feedforward neural network.
|