Closed
Description
I want to pre train a LSTM policy, with some Example data. My current approach, is to train it like a normal feed forward network (plugging in the observants in one end and compare the other wit my ground truth), and hope that your LSTM Implementation is doing the rest (hidden state managment) for me. But before I find out that it is not so easy and I spend the next two weeks of my life code digging and hidden state managing, I thought I could just simply ask you guys. Is there anything i need to keep in mind when I train the LSTM policy directly?