Tensor Types in TensorFlow¶
In the previous post, we read about the concepts of Graph and Session which describes the way the data flows in TensorFlow. One of the first questions you might have while learning a new framework is of any new data structure that should used. TensorFlow does have its own data structure for the purpose of performance and ease of use. Tensor is the data structure used in Tensorflow (remember TensorFlow is the flow of tensors in a computational graph) and it is at the core of TensorFlow. TensorFlow programs use a tensor data structure to represent all data — only tensors are passed between operations in the computation graph. You can think of a TensorFlow tensor as an n-dimensional array or list.
In this tutorial, we'll take a look at some of the Tensor Types used in TensorFlow. The speciall ones commonly used in creating neural network models are namely Constant, Variable, and Placeholder.
This will also help us to shed some light on some of the points and questions left unanswered in the previous post.
Remember that we need to import the TensorFlow library at the very beginning of our code using the line:
import tensorflow as tf
1. Constant¶
As the name speaks for itself, Constants are used as constants. They create a node that takes value and it does not change. You can simply create a constant tensor using tf.constant. It accepts the five arguments:
tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False)
Now let's take a look at a very simple example.
Example 1:¶
Let's create two constants and add them together. Constant tensors can simply be defined with a value:
# create graph
a = tf.constant(2)
b = tf.constant(3)
c = a + b
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(c))
Perfect! Now Let's look at the created graph and generated data types:
Fig1. Left: generated graph visualized in Tensorboard, Right: generated variables (screenshot captured from PyCharm debugger when running in debug mode)
In the figure, we created 3 tensors with "Python-names" a, b, and c. As we didn't define any "TensorFlow-name" for them, TensorFlow assigns some default names to them which are observed in the graph: const and const_1 for the input constants and add for the output of the addition operation. We can easily modify it and define custom names as shown below:
# create graph
a = tf.constant(2, name='A')
b = tf.constant(3, name='B')
c = tf.add(a, b, name='Sum')
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(c))
This time the graph is created with the required tensor names:
Fig2. generated graph (Left) and variables (Right) with the modified names
Constants can also be defined with different types (integer, float, etc.) and shapes (vectors, matrices, etc.). The next example has one constant with type 32bit float and another constant with shape 2X2.
Example 2:¶
s = tf.constant(2.3, name='scalar', dtype=tf.float32)
m = tf.constant([[1, 2], [3, 4]], name='matrix')
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(s))
print(sess.run(m))
2. VARIABLE¶
Variables are stateful nodes which output their current value; meaning that they can retain their value over multiple executions of a graph. They have a number of useful features such as:- They can be saved to your disk during and after training. This allows people from different companies and groups to collaborate as they can save, restore and send over their model parameters to other people.
- By default, gradient updates (used in all neural networks) will apply to all variables in your graph. In fact, variables are the things that you want to tune in order to minimize the loss.
- Constants are (guess what!), constants. As their name states, their value doesn't change. We'd usually need our network parameters to be updated and that's where the variable comes into play.
- Constants are stored in the graph definition which makes them memory-expensive. In other words, constants with millions of entries makes the graph slower and resource intensive.
2.1. Create Variables¶
To create a variable, we should use tf.Variable as:# Create a variable.
w = tf.Variable(<initial-value>, name=<optional-name>)
Some examples of creating scalar and matrix variables are as follows:
s = tf.Variable(2, name="scalar")
m = tf.Variable([[1, 2], [3, 4]], name="matrix")
W = tf.Variable(tf.zeros([784,10]))
Variable __W__ defined above creates a matrix with 784 rows and 10 columns which will be initialized with zeros. This can be used as a weight matrix of a feed-forward neural network (or even in a linear regression model) from a layer with 784 neuron to a layer with 10 neuron. We'll see more of this later in this turorial. __*Note:__ We use tf.Variable() with uppercase "V", and tf.constant with lowercase "c". You don't necessarily need to know the reason, but it's simply because tf.constant is an op, while tf.Variable is a class with multiple ops. __* IMPORTANT Note:__ Calling tf.Variable to create a variable is the old way of creating a variable. TensorFlow recommends to use the wraper __tf.get_variable__ instead as it accepts the name, shape, etc as its arguments with many more as follows:
tf.get_variable(name,
shape=None,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
collections=None,
caching_device=None,
partitioner=None,
validate_shape=True,
use_resource=None,
custom_getter=None,
constraint=None)
Some examples are as follows:
s = tf.get_variable("scalar", initializer=tf.constant(2))
m = tf.get_variable("matrix", initializer=tf.constant([[0, 1], [2, 3]]))
W = tf.get_variable("weight_matrix", shape=(784, 10), initializer=tf.zeros_initializer())
2.2. Initialize Variables¶
Just like most programming languages, Variables need to be initialized before being used. TensorFlow, while not being a language, is no exception to this rule. To initialize variables, we have to invoke a variable initializer operation and run the operation on the session. This is the easiest way to initialize variables all variables at once.
The following toy example shows how we can add an op to initialize the variables.
Example 3:¶
Create two variables and add them together. Then print out their values and the summation result.
a = tf.get_variable(name="var_1", initializer=tf.constant(2))
b = tf.get_variable(name="var_2", initializer=tf.constant(3))
c = tf.add(a, b, name="Add1")
# launch the graph in a session
with tf.Session() as sess:
# now let's evaluate their value
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))
FailedPreconditionError: Attempting to use uninitialized value
Upon executing the program, we run into FailedPreconditionError: Attempting to use uninitialized value. This is because we tried to evaluate the variables before initializing them. Let's correct the code by first initializing all the variables and then proceed to evaluate them.
# create graph
a = tf.get_variable(name="A", initializer=tf.constant(2))
b = tf.get_variable(name="B", initializer=tf.constant(3))
c = tf.add(a, b, name="Add")
# add an Op to initialize global variables
init_op = tf.global_variables_initializer()
# launch the graph in a session
with tf.Session() as sess:
# run the variable initializer operation
sess.run(init_op)
# now let's evaluate their value
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))
Let's take a quick look at the graph and generated variables:
Fig3. generated graph (Left) and variables (Right)
The figure above shows that two blue boxes are the generated and they represent the variables (compare them with constant nodes in Fig. 2). These two variables are added together using the "Add" operation.
*Note: Variables are usually used for weights and biases in neural networks.
-
Weights are usually initialized from a normal distribution using
tf.truncated_normal_initializer()
. -
Biases are usually initialized from zeros using
tf.zeros_initializer()
.
Let's look at a very simple example of creating weight and bias variables with proper initialization:
Example 4:¶
Create the weight and bias matrices for a fully-connected layer with 2 neuron to another layer with 3 neuron. In this scenario, the weight and bias variables must be of size [2, 3] and 3 respectively.
# create graph
weights = tf.get_variable(name="W", shape=[2,3], initializer=tf.truncated_normal_initializer(stddev=0.01))
biases = tf.get_variable(name="b", shape=[3], initializer=tf.zeros_initializer())
# add an Op to initialize global variables
init_op = tf.global_variables_initializer()
# launch the graph in a session
with tf.Session() as sess:
# run the variable initializer
sess.run(init_op)
# now we can run our operations
W, b = sess.run([weights, biases])
print('weights = {}'.format(W))
print('biases = {}'.format(b))
3. Placeholder:¶
Placeholders are more basic than a variable. It is simply a variable that we asign data in a future time. Placeholders are nodes whose value is fed in at execution time. If we have inputs to our network that depend on some external data and we don't want our graph to depend on any real value while developing the graph, placeholders are the datatype we need. In fact, we can build the graph without any data. Therefore, placeholders don't need any initial value; only a datatype (such as float32) and a tensor shape so the graph still knows what to compute with even though it doesn't have any stored values yet.
Some examples of creating placeholders are as follows:
a = tf.placeholder(tf.float32, shape=[5])
b = tf.placeholder(dtype=tf.float32, shape=None, name=None)
X = tf.placeholder(tf.float32, shape=[None, 784], name='input')
Y = tf.placeholder(tf.float32, shape=[None, 10], name='label')
Let's run a simple example.
Example 5:¶
Create a constant vector and a placeholder and add them together.
a = tf.constant([5, 5, 5], tf.float32, name='A')
b = tf.placeholder(tf.float32, shape=[3], name='B')
c = tf.add(a, b, name="Add")
with tf.Session() as sess:
print(sess.run(c))
InvalidArgumentError: You must feed a value for placeholder tensor 'B' with dtype float and shape [3] error
Executing the above code will run into an error. You might have guessed it, its simply because the placeholder is empty and there is no way to add an empty tensor to a constant tensor in the add operation. To solve this, we need to feed an input value to the tensor "b". It can be done by creating a dictionary ("d" in the following code) whose key(s) are the placeholders and their values are the desired value to be passed to the placeholder(s), and feeding it to an argument called "feed_dict". In our example, say we want to pass [1, 2, 3] to the placeholder; the code needs to be modified as:
a = tf.constant([5, 5, 5], tf.float32, name='A')
b = tf.placeholder(tf.float32, shape=[3], name='B')
c = tf.add(a, b, name="Add")
with tf.Session() as sess:
# create a dictionary:
d = {b: [1, 2, 3]}
# feed it to the placeholder
print(sess.run(c, feed_dict=d))
The generated graph and variables are as follows:
Fig4. generated graph (Left) and variables (Right)
So far so good? To make it more interesting and challenging lets get our hands dirty!
Creating a Neural Network¶
Now, we have all the required materials to start building a toy feed-forward neural network with one hidden layer and 200 hidden units (neurons). The computational graph in Tensorflow will be:
Fig5. Schematic of the graph for one layer of the neural network
How many operations (or nodes) do you see in this graph? Six! right? The three circles (X, W, b) and three rectangles. We'll go through each of them and will discuss the best way to implement it.
Let's start with the input, X. This can be an input of any type, such as images, signals, etc. The general approach is to feed all inputs to the network and train the trainable parameters (here, W and b) by backpropagating the error signal. Ideally, you need to feed all inputs together, compute the error, and update the parameters. This process is called "Gradient Descent".
*Side Note: In real-world problems, we have thousands and millions of inputs which makes gradient descent computationally expensive. That's why we split the input set into several shorter pieces (called mini-batch) of size B (called mini-batch size) inputs, and feed them one by one. This is called "Stochastic Gradient Descent". The process of feeding each mini-batch of size B to the network, back-propagating errors, and updating the parameters (weights and biases) is called an iteration.
We generally use Placeholders for inputs so that we can build the graph without any real value in context. The only point is that you need to choose the proper size for the input. Here, we have a feed-forward neural network, and let's assume inputs of size 784 (similar to 28x28 images of MNIST data). The input placeholder can be written as:
# create the input placeholder
X = tf.placeholder(tf.float32, shape=[None, 784], name="X")
You might wonder why is the shape=[None, 784]?!
Well, that's the tricky part! Please read the above side note again. We need to feed B images of size 784 to the network in each training iteration as one batch. So the placeholder needs to be of shape=[B, 784]. Defining the placeholder shape as [None, 784] means that we can feed any number of images of size 784 (not B images necessarily). This is especially helpful in the evaluation time where we need to feed all validation or test images to the network and compute the performance on all of them.
Enough with the placeholders. Let's move on to the network parameters, W, and b. As explained in the Variable section above, they have to be defined as variables. Since in Tensorflow, gradient updates will be applied to the graph variables, by default. As mentioned, variables need to be initialized.
*Note: Generally, weights (W) are initialized randomly, in it's the simplest form from a normal distribution, say normal distribution with zero mean and standard deviation of 0.01. Biases (b) can be initialized as small constant values, such as 0.
Since the input dimension is 784 and we have 200 hidden units, the weight matrix will be of size [784, 200]. We also need 200 biases, one for each hidden unit. The code will be like:
# create weight matrix initialized randomely from N(0, 0.01)
weight_initer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
W = tf.get_variable(name="Weight", dtype=tf.float32, shape=[784, 200], initializer=weight_initer)
# create bias vector of size 200, all initialized as zero
bias_initer =tf.constant(0., shape=[200], dtype=tf.float32)
b = tf.get_variable(name="Bias", dtype=tf.float32, initializer=bias_initer)
Now let's move on to the rectangle operations. We must multiply input X_{[None, 784]} and weight matrix W_{[784, 200]} which gives a tensor of size [None, 200], then add the bias vector b_{[200]} and eventually pass the final tensor from a ReLU non-linearity:
# create MatMul node
x_w = tf.matmul(X, W, name="MatMul")
# create Add node
x_w_b = tf.add(x_w, b, name="Add")
# create ReLU node
h = tf.nn.relu(x_w_b, name="ReLU")
Fig6. Data flow graph of the neural network created in Tensorflow
But how can we visualize this graph? How do you create this figure? That's the magic of Tensorboard. It's thoroughly explained in our next article.
Before closing it, let's run a session on this graph (using 100 images generated by random pixel values) and get the output of hidden units (h). Below is the complete code:
# import the tensorflow library
import tensorflow as tf
import numpy as np
# create the input placeholder
X = tf.placeholder(tf.float32, shape=[None, 784], name="X")
weight_initer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
# create network parameters
W = tf.get_variable(name="Weight", dtype=tf.float32, shape=[784, 200], initializer=weight_initer)
bias_initer =tf.constant(0., shape=[200], dtype=tf.float32)
b = tf.get_variable(name="Bias", dtype=tf.float32, initializer=bias_initer)
# create MatMul node
x_w = tf.matmul(X, W, name="MatMul")
# create Add node
x_w_b = tf.add(x_w, b, name="Add")
# create ReLU node
h = tf.nn.relu(x_w_b, name="ReLU")
# Add an Op to initialize variables
init_op = tf.global_variables_initializer()
# launch the graph in a session
with tf.Session() as sess:
# initialize variables
sess.run(init_op)
# create the dictionary:
d = {X: np.random.rand(100, 784)}
# feed it to placeholder a via the dict
print(sess.run(h, feed_dict=d))
Running this code will print out h_{[100, 200]} which are the outputs of 200 hidden units in response to 100 images; i.e. 200 features extracted from 100 images.
We'll continue constructing the loss function and creating the optimizer operations in the next articles. However, we need to learn Tensorboard first to use its amazing features in our neural network code.
I hope this post has helped you to understand how to use different Tensor Types in TensorFlow. Thank you so much for reading! If you have any questions, feel free to leave a comment in our webpage. You can also send us feedback through the contacts page.