School of Science and Technology 科技學院
Computing Programmes 電腦學系

On Solving the Local Minimum Problem in Feed Forward Neural Networks

TSE Hau Ting Eric

Programme Bachelor of Computing with Honours in Internet Technology
Supervisor Dr. Vanessa Ng
Areas Neural Networks, Algorithms
Year of Completion 2011
Awards Recieved IEEE HK Section Student Paper Contest 2011 Undergraduate Champion

Objectives

The aim of this project is to investigate the local minimum problem in neural network and to design an algorithm to solve the problem. Neural network is widely used in pattern recognition and data classification accomplished by training. The training mainly involves comparing the output of neural network with the desire output and the difference is used to adjust the parameters to minimize the error. However, the performance of the training is always affected by the local minimum problem, or even fails in the training. The nature of the local minimum problem is illustrated in the following:

Background and Methodology

In this project, two approaches are used to solve the local minimum problem. The first approach is output monitoring and modification. In the early experiments, certain patterns are observed from the output indicating the occurrence of local minimum. For example, a wrong output is produced and remains unchanged for a period of time. The following shows the results for similar error group and extreme error group:

Since the training to escape from local minimum can be improved from monitoring the output of neural network and making corresponding modification, the second approach is based on Meta training. The training is learned to solve the local minimum by restarting the training with a different set of parameters to reach its destination by a different path. With these approaches, most of the training data are guaranteed to achieve over 90 percent of successful training. The detail of the methodology is described below:

Implementation

Concerned with the similar error group problem, this type of problem happens when a group of output value, which is two or more, has a similar error, with different signs of target value. According to the result of experiment, such problem is now solved using the Weight Update Rule (MGF) with details as shown in the following:

Regarding the extreme error problem, it occurs when one or more input nodes have an error which is an extreme value. Generally, since sigmoid function is used as activation function, the range of output value in neural network is 0 to 1, which are known as extreme values. Extreme error, on the other hand, states that an output node has obtained an extreme value but targeting on another extreme value. A new method is proposed particularly solving this type of error using output monitoring and modification, which is aimed to firstly identifying the extreme error output through monitoring and then produce a temperate value to significant the change of weight through modification. The detailed operation is shown below:

The extreme error is now solved by this integrated algorithm. The resulted graph is shown in the following using Rprop as learning algorithm to learn the patterns comparing with the algorithm of Meta training in 5-bit counting:

Evaluation

The integrated algorithm are used to learn all problem sets to examine the performance by comparing the result to other learning algorithm, including MGF, Quickprop and Rprop. The results are based on 100 different weight files, 100000 iterations are used for the training processes, to ensure that the only reason of training failed is the local minimum problem instead of not enough iteration to operate. The training is considered as successful if the error is smaller than 0.001. The results are show as follow:

Problem setLearning algorithmAverage convergence rateConvergence percentage
XORMGF1925.9199%
 Quickprop68.3655%
 Rprop53.1546%
 Integrated algorithm1970.23100%
ParityMGF6699.48100%
 Quickprop84.9100%
 Rprop117.3871%
 Integrated algorithm2522.97100%
RegressionMGF11850.62100%
 Quickprop6603.1199%
 Rprop2961.5596%
 Integrated algorithm3413.8100%
5 Bit CountingMGF684.52%
 Quickprop446.1941%
 Rprop441.628%
 Integrated algorithm1195.97100%
WineMGFN/A0%
 QuickpropN/A0%
 Rprop735.9149%
 Integrated algorithm9433.65100%
Breast CancerMGFN/A0%
 QuickpropN/A0%
 Rprop1130.01%
 Integrated algorithm1653.02100%
IrisMGFN/A0%
 QuickpropN/A0%
 Rprop3286.8111%
 Integrated algorithm16257.91100%
ThyroidMGFN/A0%
 QuickpropN/A0%
 Rprop327.0798%
 Integrated algorithm366.36100%

With the integrated algorithm, all problem set can have 100% percentage convergence since the local minimum problem is handled with focus. The average convergence rate is relatively large because some iteration during the training is used to locate the local minimum problem and to apply the corresponding optimization.

The integrated algorithm is evaluated by testing. Since it is possible that the training is completed by the Rprop itself without interrupting by the algorithm, the weights which have used the algorithm to complete the training will only be considered. Testing set included wine, breast cancer and Iris testing set are used. The results are measured by the correctness of the target output and the actual output. The results are shown below:

Testing setPercentage of correctness
Wine98.14%
Breast Cancer96.0%
Iris97.03%

From the result, the integrated algorithm can be concluded is a valid algorithm for neural network to help improving the performance of learning process.

Conclusion and Future Development

An integrated algorithm consists of Meta Training, solutions for both type of local minimum and deterministic optimization is proposed. Since both type of local minimum has handled respectively, the training process is capable to achieve 90% of successful training.

As the causes of local minimum problem have been exploited, the future development of neural network is recommended on developing an algorithm with local minimum problem free weight update rule. It can be done by preventing the calculation error which neglects the error of the training process. Also, the relationship of the patterns in the problem sets and the local minimum problem can be investigated to predict which type of local minimum problem will be happened.

Copyright Tse Hau Ting and Vanessa Ng 2011