Build a SMAUG model with Python API¶
SMAUG’s Python frontend provides easy APIs to build DL models. The created model is a computational graph, which is serialized into two protobuf files, one for the model topology and the other for the parameters. These two files are inputs to SMAUG’s C++ runtime that performs the actual simulation. In this tutorial, we will be using the SMAUG Python APIs to build new DL models.
Before building a model, we need to create a Graph context in which we will add operators.
import smaug as sg
with sg.Graph(name="my_model", backend="SMV") as graph:
# Any operators instantiated within the context will be added to `graph`.
When using the smaug.Graph
API to create a graph context, we need to
give a name for the model and select the backend we want to use when running the
model through the C++ runtime. A backend in SMAUG is a logical combination of
hardware blocks that implements all the SMAUG operators. Refer to
C++ docs for more details of a backend
implementation in the C++ runtime. Here, we choose the SMV
backend that
comes with SMAUG, which is modeled after the NVDLA architecture. SMAUG also has
another backend named Reference
, which is a reference implementation
without having specific performance optimizations. Also, refer to
smaug.Graph
for a detailed description of the parameters.
Now we can start adding operators to the graph context. The following gives an example of building a simple 3-layer model.
import numpy as np
import smaug as sg
def generate_random_data(shape):
r = np.random.RandomState(1234)
return (r.rand(*shape) * 0.005).astype(np.float16)
with sg.Graph(name="my_model", backend="SMV") as graph:
input_tensor = sg.Tensor(
data_layout=sg.NHWC, tensor_data=generate_random_data((1, 28, 28, 1)))
conv_weights = sg.Tensor(
data_layout=sg.NHWC, tensor_data=generate_random_data((32, 3, 3, 1)))
fc_weights = sg.Tensor(
data_layout=sg.NC, tensor_data=generate_random_data((10, 6272)))
# Shape of act: [1, 28, 28, 1].
act = sg.input_data(input_tensor)
# After the convolution, shape of act: [1, 32, 28, 28].
act = sg.nn.convolution(
act, conv_weights, stride=[1, 1], padding="same", activation="relu")
# After the max pooling, shape of act: [1, 32, 14, 14].
act = sg.nn.max_pool(act, pool_size=[2, 2], stride=[2, 2])
# After the matrix multiply, shape of act: [1, 10].
act = sg.nn.mat_mul(act, fc_weights)
As we create the first operator for the model, we need to first prepare an
input tensor and weight tensors that are used by the operator. Tensors are
represented by the smaug.Tensor
. In the example, we create an input
tensor input_tensor
using the API. Here, we specify the
data_layout
as sg.NHWC
, which stands for a 4D tensor shape with
the channel-major layout. We also specify the tensor_data
parameter
with a randomly generated NumPy array, with a shape of [1, 28, 28, 1]
.
However, the user can use the real weights extracted from a pretrained model.
Likewise, we create two weight tensors that will be used by a convolution
operator and a matrix multiply operator, respectively.
A smaug.data_op()
, which simply forwards an input tensor to its output, is
required for any tensor that is not the output of another operator. Here,
act
is a reference to code:input_tensor. Then, act
is
fed to a convolution operator that also takes conv_weights
as its
filter input. With more details provided in smaug.nn.convolution()
, it
computes a 3D convolution given the 4D input and filter tensors, and we use 1x1
strides, the same
padding and a ReLU activation fused with the
convolution operation. The output of it then goes through a max pooling
operator with a 2x2 filter size, which in turn fans its output into the last
matrix multiply operator. Note that since the output of the max pooling
operator is a 4D tensor while smaug.nn.mat_mul()
expects a 2D input
tensor, SMAUG will automatically add a layout transformation operator
smaug.tensor.reorder()
in between to make the data layout format
compatible. Thus, the 4D tensor of shape [1, 32, 14, 14]
will be
flattened into a 2D tensor of shape [1, 6272]
before running the matrix
multiply. Similarly, SMAUG will also perform the NHWC to NCHW layout
transformation or vice versa as per the expected layout format of the backend.
After finishing adding operators to the model, we can now take a look at the
summary of the model using the smaug.Graph.print_summary()
API.
graph.print_summary()
This prints model-level information and operator-specific properties as below:
=================================================================
Summary of the network: my_model (SMV)
=================================================================
Host memory access policy: AllDma.
-----------------------------------------------------------------
Name: data (Data)
Parents:
Children:conv
Input tensors:
data/input0 Float16 [1, 28, 28, 1] NHWC alignment(8)
Output tensors:
data/output0 Float16 [1, 28, 28, 1] NHWC alignment(8)
-----------------------------------------------------------------
Name: data_1 (Data)
Parents:
Children:conv
Input tensors:
data_1/input0 Float16 [32, 3, 3, 1] NHWC alignment(8)
Output tensors:
data_1/output0 Float16 [32, 3, 3, 1] NHWC alignment(8)
-----------------------------------------------------------------
Name: conv (Convolution3d)
Parents:data data_1
Children:max_pool
Input tensors:
data/output0 Float16 [1, 28, 28, 1] NHWC alignment(8)
data_1/output0 Float16 [32, 3, 3, 1] NHWC alignment(8)
Output tensors:
conv/output0 Float16 [1, 28, 28, 32] NHWC alignment(8)
-----------------------------------------------------------------
Name: max_pool (MaxPooling)
Parents:conv
Children:reorder
Input tensors:
conv/output0 Float16 [1, 28, 28, 32] NHWC alignment(8)
Output tensors:
max_pool/output0 Float16 [1, 14, 14, 32] NHWC alignment(8)
-----------------------------------------------------------------
Name: reorder (Reorder)
Parents:max_pool
Children:mat_mul
Input tensors:
max_pool/output0 Float16 [1, 14, 14, 32] NHWC alignment(8)
Output tensors:
reorder/output0 Float16 [1, 6272] NC alignment(8)
-----------------------------------------------------------------
Name: data_2 (Data)
Parents:
Children:mat_mul
Input tensors:
data_2/input0 Float16 [10, 6272] NC alignment(8)
Output tensors:
data_2/output0 Float16 [10, 6272] NC alignment(8)
-----------------------------------------------------------------
Name: mat_mul (InnerProduct)
Parents:reorder data_2
Children:
Input tensors:
reorder/output0 Float16 [1, 6272] NC alignment(8)
data_2/output0 Float16 [10, 6272] NC alignment(8)
Output tensors:
mat_mul/output0 Float16 [1, 10] NC alignment(8)
-----------------------------------------------------------------
Finally, we can export the model files using the
smaug.Graph.write_graph()
API.
graph.write_graph()
This gives us two files named my_model_topo.pbtxt
and
my_model_params.pb
, where the former stores all the model information
except for the parameters, which are stored in the latter. This separation is
helpful for us to quickly check things in the human readable topology file
while still compressing as much as possible the oftentimes large paramaters.
We can now move on to the C++ side tutorials that
explain the details of using these two files to run the model.