@@ -47,9 +47,8 @@ where \f$t,l\f$ are the indices of the timestamp and the layer of the cell being
47
47
48
48
And here is the equation for LSTM cells:
49
49
50
- \f[ \begin{equation * }
50
+ \f[
51
51
(h_ {t, l},c_ {t,l}) = Cell(h_ {t, l-1}, h_ {t-1, l}, c_ {t-1,l})
52
- \end{equation* }
53
52
\f]
54
53
where \f$t,l\f$ are the indices of the timestamp and the layer of the cell being executed.
55
54
@@ -84,10 +83,8 @@ functions. The following equations defines the mathematical operation
84
83
performed by the Vanilla RNN cell for the forward pass:
85
84
86
85
\f[
87
- \begin{align}
88
86
a_t &= W \cdot h_ {t,l-1} + U \cdot h_ {t-1, l} + B \\
89
87
h_t &= activation(a_t)
90
- \end{align}
91
88
\f]
92
89
93
90
### LSTM
@@ -111,7 +108,6 @@ following equation gives the mathematical description of these gates and output
111
108
for the forward pass:
112
109
113
110
\f[
114
- \begin{align}
115
111
i_t &= \sigma(W_i \cdot h_ {t,l-1} + U_i \cdot h_ {t-1, l} + B_i) \\
116
112
f_t &= \sigma(W_f \cdot h_ {t,l-1} + U_f \cdot h_ {t-1, l} + B_f) \\
117
113
\\
@@ -120,7 +116,6 @@ c_t &= f_t * c_{t-1} + i_t * \tilde c_t \\
120
116
\\
121
117
o_t &= \sigma(W_o \cdot h_ {t,l-1} + U_o \cdot h_ {t-1, l} + B_o) \\
122
118
h_t &= \tanh(c_t) * o_t
123
- \end{align}
124
119
\f]
125
120
126
121
where \f$W_ * \f$ are stored in \weightslayer, \f$U_ * \f$ are stored in
@@ -151,7 +146,6 @@ on the gates. For peephole weights, the gates order is `i`, `f`,
151
146
and output for the forward pass:
152
147
153
148
\f[
154
- \begin{align}
155
149
i_t &= \sigma(W_i \cdot h_ {t,l-1} + U_i \cdot h_ {t-1, l} + P_i \cdot c_ {t-1} + B_i) \\
156
150
f_t &= \sigma(W_f \cdot h_ {t,l-1} + U_f \cdot h_ {t-1, l} + P_f \cdot c_ {t-1} + B_f) \\
157
151
\\
@@ -160,7 +154,6 @@ c_t &= f_t * c_{t-1} + i_t * \tilde c_t \\
160
154
\\
161
155
o_t &= \sigma(W_o \cdot h_ {t,l-1} + U_o \cdot h_ {t-1, l} + P_o \cdot c_t + B_o) \\
162
156
h_t &= \tanh(c_t) * o_t
163
- \end{align}
164
157
\f]
165
158
166
159
where \f$P_ * \f$ are stored in ` weights_peephole ` , and the other parameters are
@@ -192,7 +185,6 @@ description of these gates and output for the forward pass (for simplicity,
192
185
LSTM without peephole is shown):
193
186
194
187
\f[
195
- \begin{align}
196
188
i_t &= \sigma(W_i \cdot h_ {t,l-1} + U_i \cdot h_ {t-1,l} + B_i) \\
197
189
f_t &= \sigma(W_f \cdot h_ {t,l-1} + U_f \cdot h_ {t-1,l} + B_f) \\
198
190
& \\
@@ -201,7 +193,6 @@ LSTM without peephole is shown):
201
193
& \\
202
194
o_t &= \sigma(W_o \cdot h_ {t,l-1} + U_o \cdot h_ {t-1,l} + B_o) \\
203
195
h_t &= R \cdot (\tanh(c_t) * o_t)
204
- \end{align}
205
196
\f]
206
197
207
198
where \f$R\f$ is stored in ` weights_projection ` , and the other parameters are
@@ -230,12 +221,10 @@ implicitly require the order of these gates to be `u`, `r`, and `o`. The
230
221
following equation gives the mathematical definition of these gates.
231
222
232
223
\f[
233
- \begin{align}
234
224
u_t &= \sigma(W_u \cdot h_ {t,l-1} + U_u \cdot h_ {t-1, l} + B_u) \\
235
225
r_t &= \sigma(W_r \cdot h_ {t,l-1} + U_r \cdot h_ {t-1, l} + B_r) \\
236
226
o_t &= \tanh(W_o \cdot h_ {t,l-1} + U_o \cdot (r_t * h_ {t-1, l}) + B_o) \\
237
227
h_t &= u_t * h_ {t-1, l} + (1 - u_t) * o_t
238
- \end{align}
239
228
\f]
240
229
241
230
where \f$W_ * \f$ are in \weightslayer, \f$U_ * \f$ are in
@@ -264,12 +253,10 @@ The following equation describes the mathematical behavior of the
264
253
Linear-Before-Reset GRU cell.
265
254
266
255
\f[
267
- \begin{align}
268
256
u_t &= \sigma(W_u \cdot h_ {t,l-1} + U_u \cdot h_ {t-1, l} + B_u) \\
269
257
r_t &= \sigma(W_r \cdot h_ {t,l-1} + U_r \cdot h_ {t-1, l} + B_r) \\
270
258
o_t &= \tanh(W_o \cdot h_ {t,l-1} + r_t * (U_o \cdot h_ {t-1, l} + B_ {u'}) + B_o) \\
271
259
h_t &= u_t * h_ {t-1, l} + (1 - u_t) * o_t
272
- \end{align}
273
260
\f]
274
261
275
262
Note that for all tensors with a dimension depending on the gate number, except
@@ -300,13 +287,11 @@ implicitly require the order of these gates to be `u`, `r`, and `o`. The
300
287
following equation gives the mathematical definition of these gates.
301
288
302
289
\f[
303
- \begin{align}
304
290
u_t &= \sigma(W_u \cdot h_ {t,l-1} + U_u \cdot h_ {t-1, l} + B_u) \\
305
291
r_t &= \sigma(W_r \cdot h_ {t,l-1} + U_r \cdot h_ {t-1, l} + B_r) \\
306
292
o_t &= \tanh(W_o \cdot h_ {t,l-1} + U_o \cdot (r_t * h_ {t-1, l}) + B_o) \\
307
293
\tilde u_t &= (1 - a_t) * u_t \\
308
294
h_t &= \tilde u_t * h_ {t-1, l} + (1 - \tilde u_t) * o_t
309
- \end{align}
310
295
\f]
311
296
312
297
where \f$W_ * \f$ are in \weightslayer, \f$U_ * \f$ are in
@@ -330,13 +315,11 @@ The following equation describes the mathematical behavior of the
330
315
Linear-Before-Reset AUGRU cell.
331
316
332
317
\f[
333
- \begin{align}
334
318
u_t &= \sigma(W_u \cdot h_ {t,l-1} + U_u \cdot h_ {t-1, l} + B_u) \\
335
319
r_t &= \sigma(W_r \cdot h_ {t,l-1} + U_r \cdot h_ {t-1, l} + B_r) \\
336
320
o_t &= \tanh(W_o \cdot h_ {t,l-1} + r_t * (U_o \cdot h_ {t-1, l} + B_ {u'}) + B_o) \\
337
321
\tilde u_t &= (1 - a_t) * u_t \\
338
322
h_t &= \tilde u_t * h_ {t-1, l} + (1 - \tilde u_t) * o_t
339
- \end{align}
340
323
\f]
341
324
342
325
Note that for all tensors with a dimension depending on the gate number, except
0 commit comments