10.1 Unfolding Computational Graphs

A computational graph is a way to formalize the structure of a set of computations, such as those involved in mapping inputs and parameters to outputs and loss. Please refer to Sec 6.5.1. for a general introduction. In this section we explain the idea of a recursive or recurrent computation into a computational unfolding graph that has a repetitive structure, typically corresponding to a chain of events. Unfolding this graph results in the sharing of parameters across a deep network structure.

Recurrent neural networks can be built in many different ways. Much as almost any function can be considered a feedforward neural network, essentially any function involving recurrence can be considered a recurrent neural network.

Many recurrent neural networks use Eq 10.5 . or a similar equation to define the values of their hidden units. To indicate that the state is the hidden units of the network, we now rewrite Eq 10.4. using the variable to represent the state:

typical RNNs will add extra architectural features such as output layers that read information out of the state to make predictions.

When the recurrent network is trained to perform a task that requires predicting the future from the past, the network typically learns to use h ( t ) as a kind of lossy summary of the task-relevant aspects of the past sequence of inputs up to t . This summary is in general necessarily lossy, since it maps an arbitrary length sequence ( x ( t) ,x (t−1) ,x (t−2) ,...,x (2) ,x (1) ) to a fixed length vector h ( t ) . Depending on the training criterion, this summary might selectively keep some aspects of the past sequence with more precision than other aspects. For example, if the RNN is used in statistical language modeling, typically to predict the next word given previous words, it may not be necessary to store all of the information in the input sequence up to time t , but rather only enough information to predict the rest of the sentence. The most demanding situation is when we ask h ( t ) to be rich enough to allow one to approximately recover the input sequence, as in autoencoder frameworks (Chapter )

THINKING : state和lossy summary之间的关系是什么？