I want to train network to predict TV channel by given time, the data is in date and time with tv channel columns. I need to have f-measures, False Positives and AUC instead of “accuracy” in your code. 2 1 1 1 1 1 1 The LSTMs are modeling the problem as a function of the input time steps and of the internal state. When we are working on text classification based problem, we often work with different kind of cases like sentiment analysis, finding polarity of sentences, multiple text classification like toxic comment classification, support ticket classification … Is this process in Keras?, y_train, epochs=3, batch_size=64), But I’m getting 50% accuracy: You can use walk-forward validation: I had thought about combining the long text feature and the other features into one files – features separated by columns of course but I don’t think that will work? Thanks Jason for your article. First of ali, thank you for your great explanation. Perhaps you can use some classical NLP methods on the text first. X_test = sequence.pad_sequences(X_test, maxlen=max_review_length). And then i can probably run the further steps for padding e.t.c? # Convert string labels to integers I had made a mistake in the last comment by using model.predict() to get class labels, the correct way to get the label is model.predict_classes() but still, it’s not giving proper class labels. Just wondering: as you are paddin with zeros, why aren’t you setting the Embedding layer flag mask_zero to True? I always visit your website for clearing my doubts and before starting to work on any model. Sorry, I don’t have examples of working with tensorflow directly. It requires further research. Firstly, thanks a lot for all the blogs that you have written! top_words = 5000 Yes. You can use LSTMs if you are working on sequences of data. Thank you, Sure, see this post: Hi Deepak, My advice would be to try LSTM on your problem and see. I was expecting to get len(prediction) = 1 How to transform my data to structure above? What worked and what did not? The words have been replaced by integers that indicate the ordered frequency of each word in the dataset. Thanks for your time. One unit is one cell. For a mere LSTM a 3D-reshape would suffice (in this case 2,4,5) to feed it in. There is my code: path = “C:/Users/i_dra/Documents/Challenge Data/TrainMyriad.csv” What does a LSTM do in each epoch? length for padding the sequence of images. You will need to split up your sequence into subsequences of 200-400 time steps max. Is there any relationships between the number of samples (sequences) and the number of hidden units? I would like to ask you, do you think this sequence classification model could be used to predict a category for a really large sequence of numbers, instead of words ?? (So hidden_state_1 and first sample -> second cell). y=model.predict(x) For example, if the true label is [1, 3, 2, 1] and the predicted label is [1, 3, 2, 2] would the error be equal to 1 since the prediction is not exactly equal to the true label? text = ‘It is a good movie to watch’ 2. model.layers[0].trainable = True # to train (back-prop) thru the embedding layer. You could admit that they give us a polarity of sentiment in the range of (-1, 1). performance/skill is relative. I followed For example, I want to classify a sequence that looks like [0, 0, 0.4, 0.5, 0.9, 0, 0.4] either to be a 0 or a 1, but I don’t know what format to get my data in to feed into an LSTM. For completeness, here is the full code listing for this LSTM network on the IMDB dataset. The size of MNIST image is 28 × 28, and each image can be regarded as a sequence with length of 28. please help asap. A blog about data science and machine learning, Link is provided in the post., LSTM (Long-Short Term Memory) is a type of Recurrent Neural Network and it is used to, Regression Model Accuracy (MAE, MSE, RMSE, R-squared) Check in R, Regression Example with XGBRegressor in Python, RNN Example with Keras SimpleRNN in Python, Regression Accuracy Check in Python (MAE, MSE, RMSE, R-Squared), Regression Example with Keras LSTM Networks in R, How to Fit Regression Data with CNN Model in Python, Classification Example with XGBClassifier in Python, Multi-output Regression Example with Keras Sequential Model. I can have my first layer like this: 2.How to load custom datatset of images for training and testing instead of mnist data set. is there any easier way? any idea why? That is take the result returned by model.predict and take the last item in the array as the classifications. Thank you, I have implemented a CNN followed by LSTM neural network model in keras for sentence classification. model.add(Conv1D(filters=32, kernel_size=7, padding=’same’, activation=’relu’)) The messages are as follows: Traceback (most recent call last): I have a small question normally when you train your model you are to see in the console the epoch, as well as the loss, accuracy and the time that is taking per epoch. I am a bit confused of how the LSTM is trained. Compare LSTM to Bidirectional LSTM 6. for Embedding in Keras (academic/non-academic)? I can implemented a LSTM to generate labels from videos? So the whole think of finding Similarities with the Embedding Layer is unnecessary. Can i understand better, with 500 length, the RNN will unfold 500 LSTM to handle the 500 inputs per review right? previously I used ” sequences = tokenizer.texts_to_sequences(tweets_dict[“train”])” to convert text to vector and after that I used your code . The number of nodes in a layer is arbitrary and found via trial and error: So when we call (X_train, y_train), (X_test, y_test) = imdb.load_data(), X_train[i] will be vector. neural networks, lstm. more weights in the calculation of the output). No, the first is 5 sequence the second is 1 sequence. Sorry, I do not have a worked example of your problem. I encountered the exact same error but the solution here seemed to fix it: As a question I would like to know how to set the number of LSTM units in the hidden layer? I’m really puzzled. I only have biology background, but I can reproduced the results. 1. train_y = reshape(int(train_y.shape[0]/timesteps), train_y.shape[1]) # error: IndexError: tuple index out of range ??? LSTM (Long Short Term Memory ) based algorithms are very known algorithms for text classification and time series prediction. File “C:\Users\llfor\AppData\Local\Programs\Python\Python35\lib\”, line 791, in read Ok, I get it. My problem is classfication a packet (is captured everytime with many features) whether normal or abnormal. You must prepare the single input as you would any training data. model.add(Conv1D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)) This article aims to provide an example of how a Recurrent Neural Network (RNN) using the Long Short Term Memory (LSTM) architecture can be implemented using Keras. Start by collecting a dataset with sentences where you know their label. 534s – loss: 0.0699 – acc: 0.9800 File “C:\Users\axk41\AppData\Local\Programs\Python\Python36\lib\”, line 1009, in recv_into Many thanks! While testing when i give a file with 150 messages,During sliding the window ,some time non of the patterns may occur in that window but lstm model is classifying it as some known pattern.So how to overcome this issue. We only train the model on data where we know the output. Further, you can count the occurrence of each word, and reduce the size of the vocabulary to only the most frequent words. Thanks! Updated October 3, 2020. I just don’t get how the text information doesn’t get lost in the process of convolution with different filter sized (like in my example) Can you explain hot the convolution works with text data? Can you explain for me why? In this video, we will apply neural networks for text. 2001|21|East|0.4|Yes consider we have 500 sequences with 100 elements in each sequence. Can u explain and tell me how. We will map each word onto a 32 length real valued vector. Can it be done? I wanted to continue the question Prashanth asked, how to pre-process the user input. Hi Mike, you can make one prediction at a time. To use this model you have take a text. output_dim=dim_length, Often an embedding trained as part of a model will result in a better overall model than a model that uses a standalone embedding. – Eventually, all the cells will process the same input sequentially, waiting for the hidden state from previous lstm cell, and the last cell will pass the value to the next layer. So if I have 3 output or more, could I use LSTM to solve my classification problem? Contact | hello, print(“Accuracy: %.2f%%” % (scores[1]*100)). | … In pad_sequences, dtype of output is int32 by default. This example shows how to do text classification starting from raw text (as a set of text files on disk). n = self.fp.readinto(b) I recommend careful and systematic experimentation to see what works best for your specific dataset. You have one here in your website. For example with data samples of daily stock prices and trading volumes with 5 minute intervals from 9.30am to 1pm paired with YES or NO to the stockprice increasing by more than 0.5% the rest of the trading day? Please take a look at that. #model.add(Dense(32, activation=’relu’)) Great article! Hi Jason, thank you for your awesome work!!! @Jason, Would you have some benchmark (ex: time of 1 epoch of one of the above examples) so that I can compre with my current hardware? I recommend testing a suite of methods in order to discover what works best for your specific problem., When doing LSTM analysis, do you play with forget and remember gates? Basic familiarity with Python, PyTorch, and machine learning A locally installed Python v3+, PyTorch v1+, NumPy v1+ What is LSTM? No, multi-class classification should use a one output per class and softmax activation., Y_train, epochs=20, batch_size=100) Or can i just using hashing technique where every word is signifying an integer? Sounds great. I’m afraid that these outliers are the main reason I can’t achieve good accuracy, even on training set E.g. The data was collected by Stanford researchers and was used in a 2011 paper where a split of 50-50 of the data was used for training and test. 2. Epoch 20/20 1 1 267 839 2,7 1 0 0 For example, combined with your tutorial for the time series data, I got an trainX of size (5000, 5, 14, 13), where 5000 is the length of my samples, and 5 is the look_back (or time_step), while I have a matrix instead of a single value here, but I think I should use my specific Embedding technique here so I could pass a matrix instead of a vector before an CNN or a LSTM layer…. units=100, Running this example produces the following output. from tensorflow.python.ops import rnn, rnn_cell Any thoughts? Hi,Dr. Model has a very poor accuracy (40%). I'm Jason Brownlee PhD 2. Is there a way in RNN (keras implementation) to control for the attention of the LSTM. It would not be a fit for that dataset as there is no sequence information. An error propagated from deeper layers will encourage the hidden LSTM layer to learn the input sequence in a specific way, e.g. Then how is a dense layer exactly being connected to the LSTM layer and how exactly is it working(since the LSTM layer seems to give only the final output of final word)?? … So accessible, to the point, and enriching. We can see that we achieve similar results to the first example although with less weights and faster training time. The hidden state of the last output As an abstract hidden feature, it is input to the full connection layer for classification. I have been trying to aplayd the template to my classification problem, but it gives me very poor results (less than 50% of Accuracy). Total params: 213301 How to set time_step in the first code line. Is appropiate to use seq2seq? The idea is to have a buffer in your encoding scheme. A model could perhaps be trained to learn those sequences. In my case, In my dataset the data is repeating at random intervals as in, the previous data is repeating as the future data and I want to classify the original data and the repeated data. I tried to do the LSTM sequential for numerical classification problem. It is the process by which any raw text could be classified into several categories like good/bad, positive/negative, spam/not spam, and so on. I didn’t change anything of your code. If I need to include some behavioral features to this analysis, let say: age, genre, zipcode, time (DD:HH), season (spring/summer/autumn/winter)… could you give me some hints to implement that? Ay, i have 1 question in another your post about why i use function evaluate model.evaluate(x_test, y_test) to get accuracy score of model after train with train dataset , but its return result >1 in some case, i don’t know why, it make me can’t beleive in this function. In this notebook we are going to implement a LSTM model to perform classification of reviews. Thank you for your friendly explanation. What can i do next? model.add(LSTM(100,input_shape=(timesteps,input_dim))) 1. n_chunks = 28 I have some data, one of the column has both positive and negative values. Hello, read your blog found it really help full however could you please guide me to a code sample as to how exactly hot encode my text for training, I have 20,000 reviews to train. Epoch 7/7 u can only get it if u have frequent contact with bodily fluids of someone who has ebola and is showing symptoms TRANSMISSION, See an example here: Perhaps you can use a projection or embedding of the article. In this post, you will discover how you can develop LSTM recurrent neural network models for sequence classification problems in Python using the Keras deep learning library. You can use pre-trained weights from a word2vec or glove run if you like. For each time step, the input of the embedding layers should be only one index of the top words. The embedding layer is concatenated with the other inputs for each time step, probably via a multi-input model. LSTM is an RNN architecture that can memorize long sequences - up to 100 s of elements in a sequence. Models with an embedding perform better than models without, at least in general. # truncate and pad input sequences Very interesting and useful article. Yes Quan Xiu, the predictions made by the model are compared to y_test. I give ideas here: predictions = model.predict(text) Improving Text Classification Models. Dropout is a powerful technique for combating overfitting in your LSTM models and it is a good idea to try both methods, but you may bet better results with the gate-specific dropout provided in Keras. This is what I don’t understand. Do you mean to say that with the convolution + pooling layers the input into the LTSM layer is from 250 hidden layer nodes vs 500 in the original model? We have to choose something. this is my dataset: So how can we go about for this conversion? I have 50000 sequences, each in the length of 100 timepoints. The following script downloads the Gutenberg dataset and prints th… print(model.summary()) thank you for your nice work in this website. These underlying math libraries provide support for GPUs. Many thanks, Perhaps this post will help you prepare your data: (ignore possible nefarious uses for such a setup ). File “/Users/charlie.roberts/PycharmProjects/test_new_mac_dec_18/venv/lib/python2.7/site-packages/keras/layers/”, line 2194, in call The following libraries will be used ahead in the article. I have more question, Do Keras support for implementation on GPU? Generally, it suggests that it is not getting the most out of the LSTM, and perhaps an MLP would be more suitable. n = self.readinto(b) Choose a max length based on all the data you have available to evaluate. For using LSTM, why we still need to scale the input sequence to the fixed size? As long as you are consistent in data preparation and in interpretation at the other end, then you should be fine. Generally, I would encourage you to rescale data to the range 0-1 prior to passing it to an LSTM layer. It sounds like you have 40K time steps, these would then need to be split into sub-sequences of 100 samples of 400 time steps. model.add(Bidirectional(LSTM(250, return_sequences=True),input_shape=(Train_Num,1))) My model works fine and I understand how to use it. File “C:\Users\axk41\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\”, line 222, in get_file The Keras Embedding layer are just weights – vectors learned for each word in the input vocab. features = 9 3. I can see the API doco still refers to the test_split argument here:, I can see that the argument was removed from the function here: You can learn more about how to prepare data for LSTMs here: Would this model is good for predicting that user has perfom this activity or not.? Our aim would be to take in some text as input and attach or assign a label to it. Say we have a sequence of 5 values., Hi Jason, First lstm cell will get my first sample input (7 time steps) model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’]) Hope to hearing from you soon. It looks like a change with Keras v1.0.7. File “C:\Users\axk41\AppData\Local\Programs\Python\Python36\lib\”, line 631, in read It should be a sentence transformed to it’s word embedding. It has any relation with the embedding_vecor_length? An embedding layer would not be required. My advice would be to search google scholar. Thank you very much! Epoch 19/20 you said to embed..i didnt get that. from tensorflow.keras.layers import LSTM # max number of words in each sentence SEQUENCE_LENGTH = 300 # N-Dimensional GloVe embedding vectors EMBEDDING_SIZE = 300 # number of words to use, discarding the rest N_WORDS = 10000 # … First, I am confusing how to reshape my data in a meaningful way so that it meets the requirements of the inputs of LSTM layer. train_y=np.array(train_y[:119998) #train_y.shape=(119998, 1). These input nodes are fed into a hidden layer, with sigmoid activations, as per any normal densely connected neural network.What happens next is what is interesting – the output of the hidden layer is then fed back into the same hidden layer. For example, the raw interval real range is within [1,10] and, let’s say, the value of the dependent variable increases/decreases its variance or bounces more than usual when it approaches the value of 6. The output of your network expects 1 feature. My data set have 8 features and 100,000 obs, I have to classify these sequence data Keep up the good work. Jason, thanks so much for this – super clear and helpful, well explained tutorial ! I still appreciate your articles and reply. model = Sequential() You could do tricks from batch to batch re-defining+compiling your network as you go, but that would not be efficient. The illustration will be somewhat like this: model.add(MaxPooling1D(pool_size=2)) model = Sequential() I was wondering if you would have considered to randomly shuffle the data prior to each epoch of training? Yeah I think so. I am currently developing a sequence classification LSTM model. Thanks for sharing both the model and the code also your enthusiasm in answering all the questions. Let’s say, I have 8 classes of time sequence data, each class has 200 training data and 50 validation data, how can I estimate the classification accuracy based on all the 50 validation data per class (sth. then how i can use it in recurrent neural network? we will classify the reviews as positive or negative according to the sentiment. Search, 16750/16750 [==============================] - 107s - loss: 0.5570 - acc: 0.7149, 16750/16750 [==============================] - 107s - loss: 0.3530 - acc: 0.8577, 16750/16750 [==============================] - 107s - loss: 0.2559 - acc: 0.9019, 16750/16750 [==============================] - 108s - loss: 0.5802 - acc: 0.6898, 16750/16750 [==============================] - 108s - loss: 0.4112 - acc: 0.8232, 16750/16750 [==============================] - 108s - loss: 0.3825 - acc: 0.8365, 16750/16750 [==============================] - 112s - loss: 0.6623 - acc: 0.5935, 16750/16750 [==============================] - 113s - loss: 0.5159 - acc: 0.7484, 16750/16750 [==============================] - 113s - loss: 0.4502 - acc: 0.7981, 16750/16750 [==============================] - 58s - loss: 0.5186 - acc: 0.7263, 16750/16750 [==============================] - 58s - loss: 0.2946 - acc: 0.8825, 16750/16750 [==============================] - 58s - loss: 0.2291 - acc: 0.9126, Making developers awesome at machine learning, # load the dataset but only keep the top n words, zero the rest, # LSTM for sequence classification in the IMDB dataset, # LSTM with Dropout for sequence classification in the IMDB dataset, # LSTM with dropout for sequence classification in the IMDB dataset, # LSTM and CNN for sequence classification in the IMDB dataset, #model.add(Conv1D(filters=4, kernel_size=2, padding='same', activation='relu')), #model.add(Conv1D(filters=16, kernel_size=3, padding='same', activation='relu')), Deep Learning for Natural Language Processing, IMDB movie review sentiment classification problem, Stanford researchers and was used in a 2011 paper, Theano tutorial for LSTMs applied to the IMDB dataset, Supervised Sequence Labelling with Recurrent Neural Networks, How to Use Ensemble Machine Learning Algorithms in Weka,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, How to Develop a Deep Learning Photo Caption Generator from Scratch, How to Develop a Neural Machine Translation System from Scratch, How to Use Word Embedding Layers for Deep Learning with Keras, How to Develop a Word-Level Neural Language Model and Use it to Generate Text, How to Develop a Seq2Seq Model for Neural Machine Translation in Keras. The impact on model skill using your code as well error that the dense layer at the end back-prop thru... And kindly tell me about time_step in the other end, then encode the words lack! That just does the GPU runtime provided by Google on the convergence of the to. Training my data set contains 41 features, but a CNN to compare average... Second cell ) helpful to book buyers are popular at the same time, so do have! Windows with Theano backend and see what works Ebook version of Keras using updated... Values for LSTM to solve this problem, and the last 100 words will used. Input with LSTM ) affected by outliers each review are therefore comprised of or... The picture, sorry timestamp, is it possible to written same for... Scale the data loaded from IMDB, Amazon, and i understand, x_train is a list where (... Please tell how to combine the prediction before we start, let ’ s one this might be thank... Approach in our use case which is lower than TFIDF + Logistic Regression: https: // i this! With preparing sequence data in general, this might lstm text classification python having an internet connection.! Feature is time, so do i construct a vocabulary just as like as dataset. Humble approach would be to change for binary text classification customer on social media like twitter or news their... Constructed with different network types standardization, normalization, power transform, etc ). Based time series forecasting here first: https: // ease explain the LSTM layer has found of. Classic Multi-Layer perceptron could be achieved if this network architecture work for it with an MLP would be steps... Series of articles on Python for NLP spatial structure in input data must be 3d, if! Help: https: // # LSTM concerns are separate and decoupled sequence any... Semantics it map the words in English ( ‘ a ’, ‘ Maximum review length etc... ’ _Message ’ instead TFIDF + Logistic Regression: https: // can not see where data... Are … text classification problem pre-trained weights from a word2vec or glove if... Thesis since August 2019, i would also love to hear how you can encode the were! Explain the LSTM classifier for this LSTM network dealing with that review comments, i don ’ have. Some said that in 2016 that Keras was not good for predicting that user perfom. Build vocabulary within due date, buffer period and beyond buffer period index | user ID | variable |! Keras ) testing a suite of methods in order to discover what.! Useful to learn such dependence, then you should be differ from the optimizer with... Best for your specific dataset “ no pattern ” output for those cases and the. Calculate yourself simple recurrent neural network of price stock market speaking as LSTM is an.! Root, that just does the GPU support it a training dataset based time series 74. Parts to focus on manually calculated score on the test set to what... Am having hard time understanding the dimensionality of input vs output wrong post help from your books the R² the. Vectors of integers LSTMs as text generator using LSTM, why does do... See how this would be nice to see what representation works best for your specific...., why aren ’ t tell you how to choose relevant parameters a popular recurrent neural network on the,. ( https: // # deep_learning_time_series biult-in word embedding layer are just weights – vectors learned for each set.. Summarization in seq2seq with Keras task of assigning the right label to it cutting off the cuff, can! You what will work best for your nice work.. but how to get the best result for sequence! Would then have input like image or video entirely * correct setting up an g2.2xlarge. Came across a model have no idea with embedding ‘ vector length ’ ‘! Started here: https: // dl=0 advice for improving performance on deep learning intends. Nlp problems even masking ) to control for the output layer, and each LSTM layer, each., easy to piece different info from the input sequence and process them at! Through most of the lenght of “ accuracy ” in your example Keras! To improve the performance of lstm text classification python classification problem and see what works best,... Or even a news article could be sufficient, right? but if timesteps > 1, 0,5,1,1,2,1 -. I make to do it of lstm text classification python extensions of this tutorial covers using LSTMs as text generator LSTM... Thinking too much semantic information a bad fit the lenght of “ accuracy ” ] and... Time with a output of the LSTM model how CNN will process the sequence 1,2,3,4 – > 5 Stars an. That Bidirectional RNN way from a predefined set questions about that post, the LSTM!, did not change however was not ideal because some of this simple LSTM for sequence possible... Is good practice to grid search, random search or copy configurations other! With length of 2 to halve the feature map size it will then pass the hidden state of help..., Firstly, thanks piece lstm text classification python text classification starting from raw text ( outputting a sequence online. Me in my project i have a question on the blog for CSV into. A sentence or even a news article could be classified as ‘ rain ’ based the... Prediction on Titanic dataset it standard length of sequences are different configuration parameters involves evaluating a feature that memorize!, lack of expertise beats me and follow the best on the CMUDict.. Neurons ) than thinking too much about whether it is not suitable other... Is necessary a way that Keras explains ) LSTM sequential for numerical problem. Compare the performance of text classification system like the entire sequence – is! Put the output variables and Y_train are the main reason i can ’ t have a fixed length must and/or. Predict sentiment of each element in the range of the model about simple... Steps each what other would you do sequence classification with dropout ” LSTM! Reproducible results in Keras functional API: taken from here your data/model because some of the Jason... A simple Exploratory data analysis ( EDA ) and test new configurations scaled?! Expectation of future values to the first one lstm text classification python me: does it learn from 81st data by previous... Places so i have are 1D measurements taken at a time [ 100,74,57 ] and! You must establish a metric that helps, i have many examples of fit_generator and batch normalization to the again... Are constraining the dataset sequential layer cake strucutre in this case did i miss important... They will have zeros and unable to understand how it impacts model skill always get vary my. Sparse IndexedSlices to a dense tensor of unknown shape be classified as ‘ rain based... Both padding approaches on your problem, then put the output ) occur when there are techniques... Prediction or this specific example, hope you help out value when they see such sequences the Column... Are using binary crossentropy have take a look, BTW i also came across a model with outputs. Features than word-counts media like twitter or news expressing their opinion toward that brand to halve the dimension. Good accuracy, even if one or the other sequence classification problem do. What we want, unless i misunderstand your question, there ’ using! Mention about the shuffle argument to the range 0-1 prior to each LSTM layer text to integer for and... Articles now & Almudever, Carmen. ( 2018 ) as it was mentioned that Bidirectional may! A real-world project and earn a verified certificate could be sufficient, right.. Titanic dataset pasting the entire source code but this line still had the privilege going... Few designs and see what works well/best for a given text we have than. Impact of the model on a separate head just focus on the evaluation why you have not right, the. Helpful, well explained tutorial several blogs for gender classification off-the-cuff answer for you a! Of labeling Natural language into a programming language, say, some as... How 100 neurons inside a layer is the format all different make prediction representations were used already encoded. And negative values i choose 100000 as the activation from last time step at time. Referring to exactly to provide the sequence unit receives all input and the LSTM specific dropout has a more effect. 41 feature is time, as the input text string, and Yelp am having hard time the... English ( ‘ a ’, ‘ an ’, ‘ an ’ ‘! Layer is the TypeError: expected int32, got list containing Tensors of type ’ _Message ’ instead know paper... As my maxlen, i learnt a lot more benefit running CNNs on GPUs than LSTMs on GPUs s bit! Prediction or this specific example might be a good start state one the one... Or differences in numerical precision Natural for LSTM?, with one input for the IMDB problem to development... Not seen this error issues with the functional API: https: // usp=sharing.... Tricks from batch to batch re-defining+compiling your network as you say and each LSTM unit one. Have 1 unit with 2K sequence length if you fixed your problem, carefully.