Stock Market Trend Prediction Model Using Data Mining Techniques

Stock market prediction is essential and of great interest because successful prediction of stock prices may promise smart benefits. These tasks are highly complicated and very difficult. Many researchers have made valiant attempts in data mining to devise an efficient system for stock market movement analysis. This research has developed an efficient approach to stock market trend prediction by employing Frequent Pattern growth and Fuzzy C-means clustering algorithms. This research has been encouraged by the need of predicting the stock market to facilitate investors about when to buy, sell or hold a stock in order to make profit. Firstly, the original stock market data were converted into interpreted historical (financial) data via technical indicators. Based on these technical indicators, datasets that are required for analysis was created. Subsequently, Frequent Pattern Growth algorithm was used to generate frequent patterns. Based on these frequent patterns, Fuzzy C-means clustering technique was used to formulate the prediction model. Finally, a classification technique, K-Nearest Neighbor classifier was employed to predict the stock market trends. The results from the stock market trend prediction were validated through Hit ratio evaluation metric to estimate the prediction accuracy. Comparative analysis was carried out for the proposed model and a neural network model was used to benchmark the proposed model. The obtained results showed that proposed model produced better results than the neural network model in terms of accuracy. This paper has provided a novel approach which combines FP-Growth, Fuzzy C-means and K-Nearest Neighbor algorithms for stock market trend prediction.


Introduction
Stock market prediction is one of the most trending topics in finance and business. However, the unpredictable nature of the stock market creates a problem for investors to do profitable investments. Several research efforts have been carried out to predict the market in order to make profit using different techniques ranging from statistical analysis, technical analysis and fundamental analysis among others, with different results. These techniques however cannot provide deeper analysis that is required and therefore not effective in predicting stock market trends. However, finding patterns in stock market can provide insight into market behavior, buying or selling habits and co-movement of stock shares.
This research aims at creating an approach to discover inference knowledge from the relationships among stock index indicators to provide useful information about market trends for investment decisions Ehsan et al. [1] defined stock market as a private or public market for the trading of company stock and derivatives of company stock at an agreed price; these are securities listed on a stock exchange as well as those only traded privately. Stock market is very volatile in nature and prices of stocks change almost instantly. It strongly depends on demand and supply. The prices will be high when the demand is high, and the prices will be low when the demand is low [2]. Uncertainty is the main characteristic of all stock markets, which is related to their future state. This feature is undesirable and unavoidable for the investor whenever stock market is selected as the investment tool. Predicting the stock market is the best option to reduce uncertainty. Stock market prediction includes uncovering market trends, planning investment, investment strategies, determining the perfect time to Due to technological development in information technology and software abilities, gathering sufficient data on stock traded by the hour and even by the minute have become much easier. This is the reason why data mining techniques have attracted the attention of investors interested in predicting the trends of the stock market.
According to Ravindranath [4], Data Mining is the process of selecting, exploring and modeling large quantities of data to unravel previously unknown patterns for the purpose of business and commercial advantage. It allows users to analyze data, categorize it, and summarize the relationships among data Ehsan et al. [1] defined data mining as the science and technology of exploring data in order to discover previously unknown patterns. Finding frequent patterns plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data indexing, classification, clustering, and other data mining tasks as well. Frequent pattern mining is an important data mining task and a focused theme in data mining research. It was first introduced by Agrawal et al. [5] to find frequently purchased items by customers in a "market basket analysis". It became more popular due to its high applicability in various data analysis fields such as DNA pattern recognition, web data mining, clinical data mining, software bug analysis and stock market analysis. Apriori and FP-Growth algorithms are two most frequent pattern mining algorithms in literature [6]. Apriori algorithm which was introduced by Agarwal and Srikant [7] adopts the apriori property and candidate generation process to generate association rules. FP-Growth is a tree-based approach introduced by Han et al. [8] which follows a two-step procedure; it scans the database once and generates the Frequent Pattern tree in the first step and then it discovers frequent patterns from the tree. Apriori and FP-growth algorithms generate huge number of frequent patterns which are not able to produce direct knowledge or inference. Fuzzy Logic can play an essential role in discovering inference knowledge from huge number of frequent patterns by using Fuzzy Inference. Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. This paper adopts the fuzzy c-means clustering approach for matching facts with frequent patterns which performs excellently in either exact or partial matching to generate inference knowledge. The inference knowledge generated helps investors to know when to buy, sell or hold a stock. Therefore, the main purpose of implementing the fuzzy inference approach for association rule mining on stock market data is to discover inference knowledge by analyzing the random data and to use the knowledge as a reference during decision making.

Related Works
Stock market prediction is considered to be a challenging task for both investors and researchers, due to its profitability and intricate complexity. The predictability of the market is an issue that has been much discussed by researchers and academics. Several researches have been done for predicting the price direction of a stock using different data mining techniques and have emerged with various results Sachin et al. [2] worked on an association rule mining model for finding the interesting patterns in stock market dataset. The objective was to predict if the stock prices of a specific company will be up or down by the next day. The work used apriori algorithm for generating frequent patterns from the stock data and rules were generated from the frequent patterns. However, the prediction was not accurate because the rules generated were too ambiguous.
Krittithee and Pakorn [9] worked on association rule mining on stock index indicators. In their work, a performance study was done using association rule mining to model index indicators and trading volume and their effect on price change. The work only described the relationship between stock indexes and how they affect price change but did not go further to give conclusion on the state of the particular stock. Shubhangi and Nandgaonkar [10] proposed Using Association Rule Mining: Stock Market Event Prediction from Financial News. Their work considered only the closing price of stocks and it was used to calculate the stock index indicators and for defining the rule. Naive Bayes algorithm was used for training the technical index indicators and association rule mining was used for generating buy, sell or hold signals. However, the accuracy of the model was low because they only considered one field of the stock data among five which is closing price. However, relations among stock market data are not sufficient for predicting the trends in the market. This paper improves on the reviewed related works by modeling a fuzzy clustering mechanism for predicting the future price direction of a stock.

Methodology of the Proposed Model
The proposed model adopted frequent pattern growth algorithm to find frequent patterns among stock index indicators and also adopted Fuzzy C-means clustering approach to develop a prediction model using the frequent patterns dataset. Historical data of a Nigerian bank for three years were obtained online from a stock broking firm with the Nigerian Stock Exchange. The dataset represented in Table 1 contains fields such as such as open price, close price, high price, low price and volume for each trading day.

Moving average convergence divergence (MACD)
The MACD is the difference between these two moving averages plotted against a centerline. In this work, the most common moving average values are used in the calculation. These are the 26-day and 12-day exponential moving averages (EMA). The signal line used is zero. Formula: Where n 1 is 12 days, n 2 is 26 days, II. IF MACD is below the signal then sell.

Relative strength index (RSI)
The Relative Strength Index (RSI) considers whether an asset is over bought or oversold. RSI is described in equation 3.4. Where n is the number of days. The default number of days used by analyst is 14 days which will be adopted in this work. The trading rules to be used are: i.
IF RSI increases to above 70 (implies overbought) then sell.
ii. IF RSI is between 30 and 70 (implies normal) then hold. iii.
IF RSI decreases to below 30 (implies oversold) then buy.

Rate of change (ROC)
ROC is a momentum oscillator that measures the percentage of change in price. It calculates the percentage difference between the closing price of the current day and the closing price of days. Formula: where n is the number of trading days which is 12 days, t C is closing price of current day, n C closing price of 12 days ago. The trading rules to be used are: i. IF ROC is negative then sell.
ii. IF ROC is positive then buy.
iii. IF ROC is zero then hold.

Stochastic oscillator (STO)
The Stochastic Oscillator gives an indication of the stock's last closing price relative to the stock's recent trading range [11]. The STO is plotted within a range of 0 to 100 and signals over-bought conditions above 80 and oversold conditions below 20 [12]. STO is described in the equation below. Formula: where n is the number of trading days which is 14 days, t C is the closing price of current day, n L is the lowest price over 14 days and n H is highest price over 14 days. The trading rules of the stochastic oscillator are summarized as follows.
i. IF STO increases above 80 (implies overbought) then sell.

iii.
IF STO is below 20 (implies oversold) then buy.

On-balance volume (OBV)
The On-Balance Volume indicator reflects movement in volume of stocks. The OBV is calculated by taking the total volume for the trading period and assigning it a positive or negative value depending on whether the price is up or down during the trading period. OBV is described in the equation below. The indicators for each trading day are calculated and a sample is presented in Table 3 together with the trading volume for each day. They are referred to as features and serves as the input dataset.
The indicators for each trading day are calculated and a sample is presented in Table 2 together with the trading volume for each day.
They are referred to as features and serves as the input dataset.

Data Preprocessing
Time series data is difficult to manipulate, but when they are treated as symbols (item units) instead of data points, interesting patterns can be discovered, and it becomes an easier task to mine them. Thus, it is suggested to convert the basic unit into symbols, i.e., numeric-to-symbolic conversion to reduce the large number of distinct values. The numeric-to-symbolic conversion transforms the available features (e.g. MACD, RSI, ROC, STO and OBV) of a financial instrument into a string of symbols. Each indicator is divided into three ranges (a, b, c) as shown in Table 3.

Frequent Pattern Mining
In order to mine the mapped dataset to discover frequently occurring features, frequent pattern mining has to be performed on preprocessed data. Under this process, frequent pattern growth algorithm reads the mapped dataset and generates frequent patterns as per predefined minimum threshold value and store in the frequent pattern base.
The step by step procedure for constructing a FP-tree is stated below.
Step 1: The mapped dataset is scanned once to determine the support count of each feature. Infrequent features are discarded, while the frequent features are sorted in decreasing support counts.
Step 2: Create the root of an FP-tree T and label it as null.
For each trading day in the database, select the frequent items and sort them according to the order of the frequent item list.
Step 3: Let the sorted frequent item list be p P     where p is the first element and P is the remaining list. Call The function is performed as follows. If T has a child N such that N.itemname = p.item-name, then increment N's count by 1, else create a new node N, with its count initialized to 1, its parent link linked to T and its node-link linked to the nodes with the same item-name via the node-link structure.

_ , insert tree P N recursively
After the FP-tree is constructed, frequent itemset are mined from it. Minimum support was experimented from 0.1 to 1. 0.1-0.5 had a lot of uninteresting patterns. 0.6 to 1 had concise and interesting patterns but the numbers of patterns generated for each 0.6 to 1 minimum support were almost equal and did not much effect on the result. Therefore, 0.6 was selected as the minimum support for this paper.   FCM performs the following steps during clustering.

Fuzzy C-Means Clustering Algorithm
Step 1: The cluster membership value ij u is initialized to 0 Step 2: The predefined number of clusters N is assigned. The number of clusters is 3 and each cluster represents each trend which are uptrend, downtrend and static trend.
Step 3: The cluster centers of each cluster is calculated as follow.
Step 4: The degree of membership of ith data point in each cluster is calculated and updated as follows.
Step 5: The objective function m O is calculated.
Step 6: Step 3 to 5 is repeated until m O improves less than the specified minimum threshold.

Model Implementation
This section describes the analysis and implementation results of the Fuzzy clustering-based model for stock market prediction.
The implementation of this model follows the following steps.

1.
Frequent patterns are generated from the pre-processed dataset.

2.
The frequent patterns are then clustered into three clusters.

K-Nearest Neighbour Classifier assigns new data points to
the cluster centers.
The  Table   4. Each cluster centre is a data point and it corresponds to a trend.
Data points that fall into cluster 1, 2 and 3 have static trend, uptrend and downtrend respectively.

Result and Discussion
The performance of the model was studied using financial data of three banks from the period of September 2017 to January 2018.
The dataset comprises of opening, high, low and closing prices as well as the traded volume of one Nigerian Bank stock. Thus, a total of 300 data points, that is 100 data points for each bank were However, when the trend is static, that is price did not go up neither did it fall, the best decision to take at that point is to hold stock.   For GTB dataset, 17 data points misclassified out of 100 data points. Therefore, the accuracy of prediction is 83%.
• For UBA dataset, 89 data points classified correctly while 11 are misclassified. Therefore, the accuracy is 89%.
• For ZENITH dataset, 82 data points classified correctly while 18 are misclassified. Therefore, the accuracy is 82%.

Performance Evaluation
The evaluation criteria used in this paper is Hit Ratio and its expression is shown in equation 13.   Figure 5 shows the hit ratio of each bank and also the number of 1's and 0's for each bank.  Therefore, the model has an average accuracy of 84.67%.

Conclusion
This paper has proposed a novel approach for predicting stock market trends using frequent pattern mining, fuzzy clustering and  b) The prediction model showed an average prediction accuracy of 84.67%.
Results from the prediction can help in the decision making of individual investors of when to buy, sell or hold a stock.

Future Work
Based on the result obtained from this paper, the model has shown an efficient way of predicting real time stock trends and assisting decision making of investors. In future research, more technical indicators can be included in the model. The experiment reported in this paper was based on modeling; future work can concentrate on developing a real time prediction system which will be easy to use by individual investors.