Complex Neural Fuzzy Prediction Using Multi-Swarm Continuous Ant Colony Optimization

Prediction of time series is one of major research subjects in data science. This paper proposes a novel approach to the problem of multiple-target prediction. The proposed approach is mainly composed of three parts: the complex neuro-fuzzy system (CNFS) built by using complex fuzzy sets, the two-stage feature selection method for multiple targets, and the hybrid machine learning method that uses the multi-swarm continuous ant colony optimization (MCACO) and the recursive least squares estimation (RLSE). The CNFS predictive model is responsible for prediction after training. During the training of the model, the parameters are updated by the MCACO method and the RLSE method where the two methods work cooperatively to become one machine learning procedure. For the predictive model, complex fuzzy sets (CFSs) are with complex-valued membership degrees within the unit disk of the complex plane, useful to the non-linear mapping ability of the CNFS model for multiple target prediction. This CFS property is contrast to real-valued membership degrees in the unit interval [0,1] of traditional fuzzy sets. The two-stage feature selection applies to select significant features to be the inputs to the model for multiple target prediction. Experiments using real world data sets obtained from stock markets for the prediction of multiple targets have been conducted. With the results and performance comparison, the proposed approach has shown outstanding performance over other compared methods.


Introduction
Prediction of time series is one of major research subjects in data science. When data grows massively, effective prediction becomes much important. A novel approach is therefore proposed to effectively make multiple-targets prediction. It uses a complex neuro-fuzzy system where the involved parameters are decided by the proposed method MCACO-RLSE, integrating both the multiswarm continuous ant colony optimization (MCACO) algorithm and the well-known recursive least squares estimation (RLSE) method.
In addition, a prediction system is developed to demonstrate the performance. The prediction system adopts a neuro-fuzzy system composed of several fuzzy if-then rules imbedded in neural network structure, where complex fuzzy sets and Takagi-Sugeno linear functions are used to define the premises and the consequents of rules, respectively. The adopted system is termed as Takagi-Sugeno complex neuro-fuzzy system (denoted as TS-CNFS in short). All parameters in the if-part are optimized by the method MCACO and those in the then-part are estimated by the method of RLSE. The method MCACO is a kind of continuous ant colony optimization (CACO) where multiple ant colonies are used. Furthermore, a twostage feature selection is applied to select the input data of the mostaffecting to multiple-targets. The method MCACORLSE associates two methods to decide all involved parameters separately and cooperatively. It comes out that the MCACO-RLSE helps the optimization effectively in the massively deflated solution space.
The computation time is also decreased significantly. In addition, the two-stage feature selection presented in this paper selects only a few inputs to make multiple-targets prediction.

ISSN: 2643-6744
In the paper, practical stock prices and indices are chosen as case studies. Stock market forecasting is an important investment issue. Early in 1992, Engle [1] proposed ARCH model that successfully estimated the variance of economic inflation. Moreover, in 1986, Bollerslev [2] proposed the GARCH model that generalized the ARCH model. There are many implicit and complex factors making stock market forecasting difficult. Thus, technologies of machine learning were introduced. In 1943, McCulloch and Pitts proposed that a neuron works like a switch connecting with other neurons when turned on and vice versa. In 1959, Widrow & Haff [3] developed a neuro network model of self-adapting linear unit (ADALINE). After having been trained, this model is capable to solve practical problems such as weather forecasting. In 1990, Kimoto et al. [4] applied neural networks to stock market prediction. In 2000, Kim & Han [5] predicted stock market using a neural system optimized by genetic algorithms. In 2007, Roh [6] applied neural networks to forecasting the volatility of stock price index. Stock market forecasting is a problem with real values, although initially simple neural networks deal with binary data. In 1965, Zadeh [7] proposed the concept of fuzzy sets, converting two-valued membership in {0,1} to continuous membership in [0,1] through membership function. Furthermore, in 2002, Romat [8] proposed an advanced concept of complex fuzzy sets, expanding membership from a continuous segment [0,1] to a two-dimensional complex plane of unit disk. The difference between them is on the phase dimension of complex-valued membership initiated by complex fuzzy sets. The adaptive TS-CNFS is based on Takagi-Sugeno fuzzy system that Takagi & Sugeno [9] firstly proposed in 1985. In 2013, Li & Chiang [10] developed an advanced CNFS system with ARIMA (denoted as CNFS-ARIMA) that successfully implemented dual-output forecasting and performed excellent. This CNFS-ARIMA system adopted Gaussian functions and their derivatives to implement the complex-valued membership functions of complex fuzzy sets, which are therefore used in the paper. The method of MCACO is an algorithm of multiple ant colonies using continuous ant colony optimization (CACO). In 1991, Dorigo [11] proposed ant colony optimization (ACO) for optimizing traveling salesmen problem whose solution space is discrete. To deal with problems in a continuous solution space, Dorigo [12] further developed the method of CACO in 2008. The CACO method uses probability density to demonstrate how ants select their own routes in a continuous space for searching food. This paper presents the MCACO algorithm hybrid with the RLSE method to boost the machine learning ability of searching solution for the proposed TS-CNFS used on forecasting stock market. Data collected in the real world may contain redundant or noisy annal recordings that easily cause false prediction. Feature selection is a way to select the mostaffecting data as the input for effective prediction. There are several methods for feature selection. The paper presents the two-stage feature selection using the theory of Shannon information entropy.
In 1949, Shannon [13] proposed a theory of information entropy that quantified the level of information uncertainty. When data are real values, probability density, instead of probability, is used. In the paper, the method of kernel density estimation is adopted for evaluating probability density [14][15][16].
The rest contents of the paper are organized as follows. Section 2 describes the proposed TS-CNFS predictive system and the novel MCACO-RLSE method for machine learning. Section 3 describes the proposed two-stage feature selection crossing multiple targets. Section 4 demonstrates the experimentation using real world data sets by means of the proposed approach. The proposed approach is compared with other methods in literature for performance.
Finally, the paper is summarized with conclusions.

Predictive Model: TS-CNFS
The kernel of prediction is a Takagi-Sugeno complex neurofuzzy system (TS-CNFS) used as the predictive model. When expanded, it contains six neural-network layers and is illustrated in Figure 1. These are the layers of input, complex fuzzy sets, activation, normalization and output, respectively. Each layer is formed by artificial neurons (termed nodes for simplicity). Nodes in each layer have specific functionality to be explained below.   These nodes are termed as the complex fuzzyset nodes. Each input to the predictive model corresponds to a certain amount of nodes (that is, a group of nodes) in the layer. More specifically, the group size of nodes of one input can be different from that of another. Note that to form an if part (or termed a premise) with M conditions, each input provides a node from its group of nodes in Layer 2 to make the if part. Note that an if part is regarded as a node in Layer 3.
Each of the M conditions of an if part can be expressed by a simple proposition with the form " is ," where is a linguistic variable and is a linguistic value. As a result, an if part is expressed by a complex proposition in the form of statement " 1 is 1 where , is the jth linguistic value of the ith variable. Accordingly, 1; x R j x ω ∈ = − is a generic numerical variable. This paper presents the following membership function design for complex fuzzy sets.
where { , , } are the parameters of {center, spread, phase factor} of a complex fuzzy set. The class of complex fuzzy sets defined by equation (2) is termed as Gaussian complex fuzzy sets [17], for their membership functions share the same basis using a Gaussian function and its first derivative with respect to . In general, ( ) consequence) can make an if-then rule. The nodes of the layer represent K if parts in total, and therefore if-then rules to make. The type of Takagi-Sugeno (TS) if-then rules is adopted in the paper. The kth TS rule (for k=1, 2, …, K) can be expressed as follows.
where is the ith input linguistic variable whose base (numerical) variable is denoted as x A is the linguistic value of in the if part of the kth rule and is defined using a complex fuzzy set, whose complexvalued membership function is a function of and is denoted as A is a linguistic value from the term set given in equation (1). For the then part of equation (4) (2), the membership function is denoted as given below.
are the parameters of the complex fuzzy set of (5) is arranged as follows. .., where can be arranged using the information by ( ) , for example case) or other possible forms, depending on applications. We apply the fuzzy-and operation, using the "product" implementation, to aggregate the propositions of the if part of the kth rule in [4].
The output of the kth node of layer 3, which is the activation level vector of the kth if part (or regarded as the rule's firing strength vector for multiple outputs), is given in a vector form below.
where the qth component of ( ) to part of its components are complex.

Layer 4: Performs the normalization of
There are K nodes performing the normalization in this layer, where each node receives all The output of the kth node is in a vector form given below.
vector for all the components are complex after the normalization of ( ) k β  .

Layer 5:
Performs the TS then parts. There are K nodes for such a purpose in this layer. The output of the kth node is in a vector form given below.   ,... (10) where ˆq y is the qth component of the model output vector y  . As shown in equation (10) The flowchart of the prediction system is shown in Figure 2.  . ,..., There are total h ants in the colony and q is a learning parameter between 0 and 1. The larger the weighting is the better the solution that the lth ant represents is. The selection probability is therefore calculated based on equation (8).

Recursive least squares estimation (RLSE) for multiple targets
The RLSE method devotes to estimate all consequent parameters where x is the input to model; y is the target; In matrix notation, (18) can be rewritten as follows. Where, The consequent parameters of the predictive model can be arranged to be a vector that is regarded as the vector θ .  The transpose of ( ( + 1)) in the RLSE equations is the ith row block of , that is, Note that the RLSE given above is to deal with multiple targets simultaneously and recursively.
For the implementation of RLSE on the predictive model with the training data set Ω in the learning T process, the ( ( + 1)) is given below.
where the index k indicates the kth if part ( = 1,2, … , ) and the index ( + 1 ) indicates the ( + 1 )th recursive iteration; component (j=1,2,…,M)) of the input vector given in (11); { , , } are the numbers of inputs, outputs and if-then rules of the predictive model, T respectively. Note that when equation (26) compares with (27) for the dimension of ( ( + 1)) , the equation holds. And, the vector of consequent parameters of the predictive model is given as follows. Step 1. Initialize the setting of MCACO-RLSE. Start the MCACO.
Step 2. Calculate membership degree vectors of complex fuzzy sets in layer 2. Obtain the firing strength vectors in layer 3 and do the normalization in layer 4 for these vectors of the predictive complex neuro-fuzzy model.
Step 3. Update the consequent parameters in layer 5 of the model by the RLSE. Step 6. Check for any of stopping conditions to stop. Otherwise go back to Step 2 and continue the procedure.

Feature Selection Crossing Multiple Targets
Feature selection is a method of selecting a compact set of important features out from candidate features. These selected features are used as the inputs to the prediction system with multiple targets. In the paper, the inputs and outputs of the prediction system are difference values of real observations. For multiple target prediction, a two-stage feature selection is proposed, where useful features for each single target are determined explicitly in the first stage and, based on the determined in stage 1, those features significant to all targets are selected in the second stage.
Suppose that there are a set of candidate features to be selected in the procedure of feature selection in which multiple targets are considered. Let the candidate features be placed in a pool called the candidate feature pool denoted as C p , given below.
where is the kth candidate feature of where is the ith target in the set T whose size is denoted as T . All the candidate features of C p and the targets of T are regarded as random variables. ( ) is the probability density distribution of . From the candidate feature pool C p , a feature is considered. Let the feature X be separated into two parts + and − , where + and − denote the positive and negative parts of X, respectively. That is, the values of + are positive and those of − are non-positive. Let +( ) denote the entropy of target conditioned by feature + . According to the Shannon's information theory [11], the mutual information between and + is given as ( + , ) = ( ) − +( )

. Now another idea of information called the influence information
is defined for the influence made by feature to target . Such influence information is denoted as x y I → given below.
where ( )is the probability density distribution of x, a generic event of feature X. Note that the distributions of ( ) and ( ) can be estimated by density estimation [44]. Note that the integration indicates the probability of ( + , ) and the probability of ( − , ). In contrast to the symmetry of Shannon's mutual information, the influence information given in (31) is asymmetric, that is, The first stage procedure of feature selection is given below.
Step 1. Prepare each target of T a pool to contain features to be selected from C p . For target of T, the pool is termed the selected features pool for it, denoted as S , for 1, 2,..., i T = ⋅ Initially S is empty.
Step 2. Let C be the pool containing all the candidate features for a target , that is, Step 3. Calculate the mutual information of , to which compute influence information of each candidate feature of C . For each candidate feature, calculate selection gain. The selection gain of of C making contribution to target is defined as follows. Step 4. Check for the candidate feature whose selection gain is the largest over all other ones in C . If it is positive, update the pools C and S , as follows. Step 5.
If 0 yi C = , go to step 6. Otherwise, go back to step 3.
Step 6. If i T = , update = + 1 to get next target and go back Step 4. Test the features in Ω for the following condition. { } , 1, 2,...
where is the th feature of F P contributing to all the targets given in the set T.  are selected as inputs to the prediction system. The first feature is selected from the features of TAIEX and the rest two from SSECI.

Results and Discussion
Thus, the prediction system has total 27 TakagiSugeno if-then rules with one complex output. Furthermore, such successful performance only uses three inputs obtained by the proposed feature selection. Note that each of the compared methods made the prediction of TAIEX or SSECI individually. In contrast, the proposed TS-CNFS performed the prediction of TAIEX and SSECI at the same time.

Conclusion
The prediction system using complex fuzzy sets has been  Note that each of the compared methods made the prediction of SSECI, S&P 500 or Dow Jones individually. In contrast, the proposed TS-CNFS performed the prediction of SSECI, S&P 500 and Dow Jones simultaneously.