« Local Search Algorith… « || Inicio || » Self Organizing Maps … »

Artificial Neural Networks in NetLogo

Última modificación: 26 de Diciembre de 2018, y ha tenido 3514 vistas

Etiquetas utilizadas: || || || || ||

As a way to continue with AI algorithms implemented in NetLogo, in this post we will see how we can make a simple model to investigate about Artificial Neural Networks (ANN). We will restrict ourselves to the more common and classical ones, the Multilayer Perceptron Network... my excuses for those of you awaiting for something about Deep Neural Networks (DNN, as those used in AlphaGo from Google, or CaptionBot from Microsoft), maybe in a later post I will try to extract the main features of some convolutional neural network and test it on a very simple NetLogo model, but I am afraid that we will need too many computational resources to obtain anything of interest with this tool... Who knows? I will keep thinking.

The Multilayer Perceptron is one of the main ANN structures in use today... indeed, they are the basement of the new DNN methods and it is still appearing in some phases of them. Although it lacks some good features (for example, we can't control the learning process when we need several hidden layers to model more complex problems because the bad composition of Gradient Descent Method for tuning weights), it can solve a lot of interesting problems and offers a very visual way to understand the fundamentals of this area of Machine Learning.

As you can find a lot of good resources out there about the basics of ANN (better than I can write here) we will focus this post only in the implementation in NetLogo of a more or less flexible multilayer perceptron network, hoping that the use of agents and links in the model can help the reader to understand the central ideas of how it works.

The model we will prepare here is a slightly variant of that coming with the NetLogo official distribution. As a first approach, we will use them for computing boolean functions,

\[f:\{0,1\}^n \to \{0,1\}^k\]

that is, neurons from input layer will take only boolean values (\(0\)/\(1\)) and the return will be a boolean string. All these restrictions can be easily adapted to fix other requirements, ans then the main learning process has not to be changed, only the setup. Indeed, our networks will return a value in the interval \([0,1]\), that we can transform adequately in a boolean value (for example, with an appropriate step function). The NetLogo model comes with two example boolean functions:

  • Majority (it returns \(1\) if the input has more 1's than \(0\)'s), and
  • Even (it returns \(1\) if there are an even number of \(1\)'s).

Again, this affects only auxiliary procedures, not the main ones for the learning process, and it is easy to add more functions, or adapt the code to work with different kinds of functions.

We will train the network to compute a function from a sample-set of pairs $(input\ output)$ with the correct return that the network must provide (computing with the function we are trying to learn), and as usual in this learning model:

  1. Starting from a random value of weights
  2. Repeat a number of times:
    1. Take every sample from the the samples dataset
    2. Propagate: Compute the value of the network for the sample
    3. Compute the error between the expected value and the obtained one
    4. Back-Propagate: Adjust the weights in some way to reduce this error

With this in mind, let's see the data structures to be used in the implementation.

NetLogo Implementation

Global variables needed to operate de model are (together with those from the interface controls)

globals [
    data-list    ; List of pairs [Input Output] to train the network
    inputs       ; List with the binary inputs in the training
    outputs      ; List with the binary output in the training
    epoch-error  ; error in every epoch during training

We will define several breeds of agents in order to ease the model:

  • one for every type of layer-neurons (input, hidden and output), in this way it iis simpler to provide different behaviours for them, and
  • one for the bias neuron, we use only one bias neuron because the information needed for every other neuron is stored in the weight of the link connecting it with the bias.

All of them will have properties for storing the activation value of the neuron (output) and the backpropagated gradient that must act on the links reaching it. Also, we add a weight property for the links:

breed [bias-neurons bias-neuron]
bias-neurons-own [activation grad]
breed [input-neurons input-neuron]
input-neurons-own [activation grad]
breed [output-neurons output-neuron]
output-neurons-own [activation grad]
breed [hidden-neurons hidden-neuron]
hidden-neurons-own [activation grad]
links-own [weight]

After deciding the number of neurons in every layer (through the interface controls), setup procedure will prepare the geometry of the network (note the different shape of the different layer neurons), initialize the global variables for the learning process, and create a sample-set of pairs according to the boolean function we want to learn:

to setup
  ; Building the Front Panel
  ask patches [ set pcolor 39 ]
  ask patches with [pxcor > -2] [set pcolor 38]
  ask patches with [pxcor > 4] [set pcolor 37]
  ask patch -6 10 [ set plabel-color 32 set plabel "Input"]
  ask patch  3 10 [ set plabel-color 32 set plabel "Hidden Layer"]
  ask patch  8 10 [ set plabel-color 32 set plabel "Output"]
  ; Building the network
  ; Recolor of neurons and links
  ; Initializing global variables
  set epoch-error 0
  set data-list []
  set inputs []
  set outputs []
  ; Create samples
; Reset timer reset-ticks end
; Auxiliary Procedure to setup neurons to setup-neurons set-default-shape input-neurons "square" set-default-shape output-neurons "neuron-node" set-default-shape hidden-neurons "hidden-neuron" set-default-shape bias-neurons "bias-node" ; Create Input neurons repeat Neurons-Input-Layer [ create-input-neurons 1 [ set size min (list (10 / Neurons-Input-Layer) 1) setxy -6 (-19 / Neurons-Input-Layer * ( who - (Neurons-Input-Layer / 2) + 0.5)) set activation random-float 0.1]] ; Create Hidden neurons repeat Neurons-Hidden-Layer [ create-hidden-neurons 1 [ set size min (list (10 / Neurons-Hidden-Layer) 1) setxy 2 (-19 / Neurons-Hidden-Layer * ( who - Neurons-Input-Layer
- (Neurons-Hidden-Layer / 2) + 0.5)) set activation random-float 0.1 ]] ; Create Output neurons repeat Neurons-Output-Layer [ create-output-neurons 1 [ set size min (list (10 / Neurons-Output-Layer) 1) setxy 7 (-19 / Neurons-Output-Layer * ( who - Neurons-Input-Layer
- Neurons-Hidden-Layer
- (Neurons-Output-Layer / 2) + 0.5)) set activation random-float 0.1]] ; Create Bias Neurons create-bias-neurons 1 [ setxy -1.5 9 ] ask bias-neurons [ set activation 1 ] end
; Auxiliary Procedure to create connections between neurons to setup-links connect input-neurons hidden-neurons connect hidden-neurons output-neurons connect bias-neurons hidden-neurons connect bias-neurons output-neurons end ; Auxiliary procedure to totally connect two groups of neurons to connect [neurons1 neurons2] ask neurons1 [ create-links-to neurons2 [ set weight random-float 0.2 - 0.1 ] ] end

The procedure to create samples modify some global variables:

  • inputs, to store the list of inputs of every sample,
  • outputs, to store the list of outputs of every sample, in the same order, and
  • data-list, to store the pairs of [input output] of every sample.
to create-samples
set inputs (n-values num-samples [ (n-values Neurons-input-layer [one-of [0 1]])]) set outputs map [ x -> (list evaluate Function x)] inputs
set data-list (map [ [x y] -> (list x y)] inputs outputs)

In order to show the dynamics of the network while it is learning, we have a procedure, recolor, that adequately recolor neurons and links (also their thickness) to show their values:

  • neurons: $0$-white, $1$-yellow. It uses the setp function to discretize the value,
  • links: negative-red, positive-blue, value-thickness.
to recolor
  ask turtles [
    set color item (step activation) [white yellow]
  let MaxP max [abs weight] of links
  ask links [
    set thickness 0.05 * abs weight
    ifelse weight > 0
      [ set color lput (255 * abs weight / MaxP) [0 0 255]]
      [ set color lput (255 * abs weight / MaxP) [255 0 0]]  ]
; Step Function
to-report step [x]
  ifelse x > 0.5
    [ report 1 ]
    [ report 0 ]

The Propagation procedure is very simple when working with agents. Every neuron computes its activation value applying the sigmoid function to the sum of activations of neurons feeding it:

; Forward Propagation of the signal along the network
to Forward-Propagation
  ask hidden-neurons [ set activation compute-activation ]
  ask output-neurons [ set activation compute-activation ]
to-report compute-activation
  report sigmoid (sum [ [activation] of end1 * weight] of my-in-links)
; Sigmoid Function
to-report sigmoid [x]
  report 1 / (1 + e ^ (- x))

With all the previous procedures we have a functional model that use ANN to compute functions. Now we will provide the backpropagation procedure allowing the network to approximate functions from a sample-set of correct values. The calculations that you can find in the next procedure are the standard ones that derives from the Gradient Descent Method for optimizing the output error by varying the weights of the links.

Of course, the error is computed only from the differences between the output of output-neurons and the correct values we know they must provide (it is a supervised learning algorithm):

to Back-propagation
  let error-sample 0
  ; Compute error and gradient of every output neurons
  (foreach (sort output-neurons) outputs [
[ n y] -> ask n [ set grad activation * (1 - activation) * (y - activation) ] set error-sample error-sample + ( (y - [activation] of n) ^ 2 )]) ; Average error of the output neurons in this epoch set epoch-error epoch-error + (error-sample / count output-neurons) ; Compute gradient of hidden layer neurons ask hidden-neurons [ set grad activation * (1 - activation) * sum [weight * [grad] of end2] of my-out-links ] ; Update link weights ask links [ set weight weight + Learning-rate * [grad] of end2 * [activation] of end1 ] set epoch-error epoch-error / 2 end

If we have more hidden layers (in this model we have only one), we need going back from the output layer to the input layer one by one, computing the grad value of all the neurons of every layer. After that, we can update the weights of all the links. In ths point we can see one of the problems of this method when working with a great number of layers (deep layers): farther from output-layer, smaller the effect of the grad on the connection weights, and then the model can't learn the correct weights in layers distant from the output one (where the correct error to be reduced can be computed).

With this individual learning procedure we can now code the training procedure that will learn from all the samples from the set. It will try with all the samples only once (one epoch, in the ANN terminology), so it can be called from a Forever button, for example (in order to randomize the epoch and to not memorize the order of the dataset, we will shuffle this set in every epoch).

During the training, the procedure will plot the average error reached in the current epoch, so we can evaluate if the network is learning in a correct way. Observe that this will depend on the complexity of the function, as well as the structure of the network (maybe it is too simple for the function) and the sample dataset generated for the training process.

to train
  set epoch-error 0
  ; For every trainig data
  foreach (shuffle data-list) [
d -> ; Take the input and correct output set inputs first d set outputs last d ; Load input on input-neurons (foreach (sort input-neurons) inputs [
[n x] -> ask n [set activation x] ]) ; Forward Propagation of the signal Forward-Propagation ; Back Propagation from the output error Back-propagation ] plotxy ticks epoch-error ; Plot the error tick end

After the training you can test if the network has approximated the function by trying with random inputs (usually, with some reserved test data) and using the correct value that you can compute with the real function (it is an advantage to use toy models and not real ones, where usually we have no idea about the "real" function). The next procedure may help you with the test, it will create a random input, and will return a pair [correct-ouput network-output] (where network-output is obtained from the continuous activations of output-layer):

to-report  test
  let inp n-values Neurons-input-layer [one-of [0 1]]
  let out (list evaluate Function inp)
  set inputs inp
  report (list out [activation] of output-neurons)
; Activate input neurons with read inputs
to active-inputs
  (foreach (sort input-neurons) inputs [
[n x] -> ask n [set activation x]]) recolor end

If you want, you can prepare a procedure to repeat some tests and calculate the average error of the set:

to-report multi-test [n]
  let er 0
  repeat n [
    let t test
    set er er + ((first first t) - (first last t)) ^ 2]
  report er / n

Finally, we provide the functions you can find in the NetLogo Model to make some experiments:

to-report evaluate [f x]
  report runresult (word f x)
to-report Majority [x]
  let ones length filter [ x -> x = 1] x
  let ceros length filter [ x -> x = 0] x
  report ifelse-value (ones > ceros) [1] [0]
to-report Even [x]
  let ones length filter [ x -> x = 1] x
  report ifelse-value (ones mod 2 = 1) [1] [0]

If you want to add any other function, simply add the code as a report and add the name of the function to the Chooser control (Function) in the interface.

In this link you can play with a web version of the model (NetLogoWeb doesn't allow runresult on strings, for that reason the model irunning in the web is slightly different in the treatment of the evaluation functions):

Note that it is running on NetLogoWeb, so it is much more slower than the normal version, that you can find here.

You can play around with all the parameters and trying to discover how they affect the quality of the solution (remember to repeat the experiments since the training sample-set is randomly created and can affect the accuracy). Maybe it is of interest trying to change the number of neurons in the hidden layer.

To know more...

Wikipedia Neural Networks

Michael Nielsen's Neural Networks

The Nature of Code (Ch. 10)

ANN for Beginners

NetLogo Book

« Local Search Algorith… « || Inicio || » Self Organizing Maps … »