On Solving the Local Minimum Problem in Feed Forward Neural Networks

School of Science and Technology 科技學院
Computing Programmes 電腦學系

On Solving the Local Minimum Problem in Feed Forward Neural Networks

TSE Hau Ting Eric


Programme	Bachelor of Computing with Honours in Internet Technology
Supervisor	Dr. Vanessa Ng
Areas	Neural Networks, Algorithms
Year of Completion	2011
Awards Recieved	IEEE HK Section Student Paper Contest 2011 Undergraduate Champion

Objectives

The aim of this project is to investigate the local minimum problem in neural network and to design an algorithm to solve the problem. Neural network is widely used in pattern recognition and data classification accomplished by training. The training mainly involves comparing the output of neural network with the desire output and the difference is used to adjust the parameters to minimize the error. However, the performance of the training is always affected by the local minimum problem, or even fails in the training. The nature of the local minimum problem is illustrated in the following:

Background and Methodology

In this project, two approaches are used to solve the local minimum problem. The first approach is output monitoring and modification. In the early experiments, certain patterns are observed from the output indicating the occurrence of local minimum. For example, a wrong output is produced and remains unchanged for a period of time. The following shows the results for similar error group and extreme error group:

Since the training to escape from local minimum can be improved from monitoring the output of neural network and making corresponding modification, the second approach is based on Meta training. The training is learned to solve the local minimum by restarting the training with a different set of parameters to reach its destination by a different path. With these approaches, most of the training data are guaranteed to achieve over 90 percent of successful training. The detail of the methodology is described below:

Implementation

Concerned with the similar error group problem, this type of problem happens when a group of output value, which is two or more, has a similar error, with different signs of target value. According to the result of experiment, such problem is now solved using the Weight Update Rule (MGF) with details as shown in the following:

Regarding the extreme error problem, it occurs when one or more input nodes have an error which is an extreme value. Generally, since sigmoid function is used as activation function, the range of output value in neural network is 0 to 1, which are known as extreme values. Extreme error, on the other hand, states that an output node has obtained an extreme value but targeting on another extreme value. A new method is proposed particularly solving this type of error using output monitoring and modification, which is aimed to firstly identifying the extreme error output through monitoring and then produce a temperate value to significant the change of weight through modification. The detailed operation is shown below:

The extreme error is now solved by this integrated algorithm. The resulted graph is shown in the following using Rprop as learning algorithm to learn the patterns comparing with the algorithm of Meta training in 5-bit counting:

Evaluation

The integrated algorithm are used to learn all problem sets to examine the performance by comparing the result to other learning algorithm, including MGF, Quickprop and Rprop. The results are based on 100 different weight files, 100000 iterations are used for the training processes, to ensure that the only reason of training failed is the local minimum problem instead of not enough iteration to operate. The training is considered as successful if the error is smaller than 0.001. The results are show as follow:

Problem set	Learning algorithm	Average convergence rate	Convergence percentage
XOR	MGF	1925.91	99%
	Quickprop	68.36	55%
	Rprop	53.15	46%
	Integrated algorithm	1970.23	100%
Parity	MGF	6699.48	100%
	Quickprop	84.9	100%
	Rprop	117.38	71%
	Integrated algorithm	2522.97	100%
Regression	MGF	11850.62	100%
	Quickprop	6603.11	99%
	Rprop	2961.55	96%
	Integrated algorithm	3413.8	100%
5 Bit Counting	MGF	684.5	2%
	Quickprop	446.19	41%
	Rprop	441.62	8%
	Integrated algorithm	1195.97	100%
Wine	MGF	N/A	0%
	Quickprop	N/A	0%
	Rprop	735.91	49%
	Integrated algorithm	9433.65	100%
Breast Cancer	MGF	N/A	0%
	Quickprop	N/A	0%
	Rprop	1130.0	1%
	Integrated algorithm	1653.02	100%
Iris	MGF	N/A	0%
	Quickprop	N/A	0%
	Rprop	3286.81	11%
	Integrated algorithm	16257.91	100%
Thyroid	MGF	N/A	0%
	Quickprop	N/A	0%
	Rprop	327.07	98%
	Integrated algorithm	366.36	100%

With the integrated algorithm, all problem set can have 100% percentage convergence since the local minimum problem is handled with focus. The average convergence rate is relatively large because some iteration during the training is used to locate the local minimum problem and to apply the corresponding optimization.

The integrated algorithm is evaluated by testing. Since it is possible that the training is completed by the Rprop itself without interrupting by the algorithm, the weights which have used the algorithm to complete the training will only be considered. Testing set included wine, breast cancer and Iris testing set are used. The results are measured by the correctness of the target output and the actual output. The results are shown below:

Testing set	Percentage of correctness
Wine	98.14%
Breast Cancer	96.0%
Iris	97.03%

From the result, the integrated algorithm can be concluded is a valid algorithm for neural network to help improving the performance of learning process.

Conclusion and Future Development

An integrated algorithm consists of Meta Training, solutions for both type of local minimum and deterministic optimization is proposed. Since both type of local minimum has handled respectively, the training process is capable to achieve 90% of successful training.

As the causes of local minimum problem have been exploited, the future development of neural network is recommended on developing an algorithm with local minimum problem free weight update rule. It can be done by preventing the calculation error which neglects the error of the training process. Also, the relationship of the patterns in the problem sets and the local minimum problem can be investigated to predict which type of local minimum problem will be happened.