Even though I have never worked as a data engineer, I have been playing with machine learning for a while now. At first, I was more interested in the mathematics behind it; I even made a repository where I tried implementing it.
Being in the world of Bitcoin and, more generally, crypto-currencies for a while, I wondered if a complex algorithm could not be used to predict the price.
That happens to be harder than I thought.
After all the models I have done, I decided that I would try to prove the point that without very complex models, insight data, or a lot of money, you can't predict it; I came up to the conclusion that stock price and bitcoin price are just noise by itself. I believe it is possible to have more precise predictions, not just the price or the RSI.
How will I show you the difference between a predictable time series and a not-predictable one?
Using a predictable trigonometric function
Here it is :
The predictable time series I am going to use is a sum of sinus and cosinus
n = 10000 array = np.array([math.sin(i*0.02) + math.cos(i*.05) - math.sin(i*0.01) for i in range(1, n)]) fig, ax = plt.subplots() ax.plot([i for i in range(1, n)], array, linewidth=0.75) plt.show()
That gives us this curve :
Every algorithm you will find on the internet proposes you to predict the next value given the 150 last one. We are going to make something a bit different. Let's take all our values and associate each of them with a buy index, which would be 1 when the best action to do is to buy and 0 when the best move would be to sell.
We can do this with a simple algorithm :
SELL_INDEX = np.zeros((len(array), 1)) for index, row in enumerate(array): if index > len(array) - 150: continue max_price = np.amax(array[index:index + 150]) min_price = np.amin(array[index:index + 150]) current_sell_index = (row - min_price) / (max_price - min_price) SELL_INDEX[index] = 1 if current_sell_index > 0.8 else 0 data_with_sell_index = np.hstack((array.reshape(-1,1), SELL_INDEX)) data_final = np.hstack( (data_with_sell_index, np.arange(len(data_with_sell_index)).reshape(-1, 1)) ) data_final = data_final[:len(data_final) - 150]
Let's apply it to our sum of sinus and cosinus curve, and that is what is showing :
Ok, we are all set now. The idea of the model now would be to predict the nᵗʰ buy index, using the 150 previous prices, so from n - 150 to n -1.
The model looks like this :
input_layer = Input(shape=(150, 1)) layer_1_lstm = LSTM(50, return_sequences=True)(input_layer) dropout_1 = Dropout(0.1)(layer_1_lstm) layer_2_lstm = LSTM(50, return_sequences=True)(dropout_1) dropout_2 = Dropout(0.1)(layer_2_lstm) layer_3_lstm = LSTM(50)(dropout_2) output_sell_index_proba = Dense(1, activation='sigmoid')(layer_3_lstm) model = Model(inputs=input_layer, outputs=output_sell_index_proba) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[keras.metrics.BinaryAccuracy()]) model.summary()
As I am getting either 1 or 0 as a result, I decided to use the metric
and the loss function
The optimizer, Adam is often the best choice; it is the one that has given me the best results.
After training with ten epochs, I end up with about
0.03 loss and almost
Epoch: 8. Reducing Learning Rate from 0.0009227448608726263 to 0.000913517433218658 Epoch 10/10 125/125 [==============================] - 28s 221ms/step - loss: 0.0389 - binary_accuracy: 0.9853 - val_loss: 0.0310 - val_binary_accuracy: 0.9864
Let's now try to predict from new data unseen by the algorithm during the training.
data = np.array(data_final[:,0][9000:]) results = np.array() for i in range (150, 1000): result = model.predict(data[i - 150 : i].reshape(1, 150, 1)) results = np.append(result, results)
Here is the chart :
The accuracy of unseen data is pretty good, even though there are some inconsistencies.
Using Bitcoin price
Let's plot the last 10000 CLOSE CANDLE on BTC (60S).
I will apply the previous algorithm I used for the trigonometric function, and here is the result.
So now I have everything to work with, I can reapply the same process I did in the previous section.
The only difference is that I will use a
MinMaxScaler, so the input value will only vary from 0 to 1; neural networks have a hard time working with input that vary lot (here between 20k and 40k).
scaler = MinMaxScaler(feature_range=(0, 1)) fitter = scaler.fit(x) x = fitter.transform(x)
The first difference that I noticed is the loss; It is stuck at 0.5
Epoch: 8. Reducing Learning Rate from 0.000817907159216702 to 0.0008097281097434461 Epoch 10/10 125/125 [==============================] - 27s 218ms/step - loss: 0.5074 - binary_accuracy: 0.7952 - val_loss: 0.4372 - val_binary_accuracy: 0.8475
And here is a prediction chart :
There is nothing we can rely on.
The model does not find any pattern in the price; it is considered noise.
People have been trying to predict stock prices for the longest time; they have invented many ways to do it :
- Technical analysis, like RSI, Ichimoku candle
- Artificial inteligence technique
- Sentiment analysis
I think that no one can accurately predict the stock market without a solid understanding of the asset or massive investments.
You can find the code on my GitHub repository: github.com/mathias-vandaele/keras-research.