Note above that U,V,W may be shared across the network.

Difference between RNN and LSTM? What exactly is LSTM (Long Short Term Memory networks). There’s a good article below.


There’s a good video here

The course (MIT 6.S191)


Tricks For Training Networks

Embeddings Word2vec is a group of related models that are used to produce word embeddings. Word2vec


ACTIVATION FUNCTIONS:

Above Identity

f(x) = x
Derivative f’(x) = 1


Above ReLU f(x) = 0 for x < 0
f(x) = x for x >= 0

Derivative
f’(x) = 0 for x < 0
= 1 for x >= 0


Above Softsign



See a lot more here: https://en.wikipedia.org/wiki/Activation_function




SOFTMAX




The following below need to be sorted



tensorflow


video

One of the first things to do is add meaningful names.



Good t-SNE video