SMAUG
Simulating Machine Learning Applications on gem5-Aladdin
Namespaces | Classes | Typedefs | Enumerations | Functions | Variables
smaug Namespace Reference

The smaug namespace is the parent namespace of all C++ code in SMAUG. More...

Namespaces

 gem5
 Contains utility functions for interacting with gem5.
 
 ref
 The ref namespace contains all code specific to the Reference backend.
 
 smv
 The smv namespace contains all code specific to the Smv backend.
 

Classes

class  AvgPoolingOp
 Implements the arithmetic-average-pooling operator. More...
 
class  BatchNormOp
 Implements the batch normalization layer. More...
 
class  ConcatOp
 Concatenates N Tensors along a specified axis. More...
 
class  ConvolutionOp
 The base class for all 4D spatial convolution operators. More...
 
class  DataflowGraphWriter
 DataflowGraphWriter writes the current network as a dot-graph file to the given ostream. More...
 
class  DataOp
 Exposes a Tensor as its only output. More...
 
class  DebugStream
 An stream class to consume debug logs. More...
 
class  DepthwiseConvolutionOp
 Implements the depthwise convolution operator. More...
 
class  EltwiseAddOp
 Adds two Tensors elementwise. More...
 
class  EltwiseMulOp
 Multiplies two Tensors elementwise. More...
 
class  EltwiseOp
 The base class of all elementwise operators. More...
 
class  EluOp
 Implements the exponential linear unit function. More...
 
class  FlattenOp
 Flattens each batch of a Tensor. More...
 
struct  FromDataType
 Provides compile-time conversion from SMAUG DataType to C type. More...
 
struct  FromDataType< Bool >
 
struct  FromDataType< Float16 >
 
struct  FromDataType< Float32 >
 
struct  FromDataType< Float64 >
 
struct  FromDataType< Int32 >
 
struct  FromDataType< Int64 >
 
class  FusedActivationOp
 An Operator fused with an activation function. More...
 
class  GreaterEqualOp
 Implements an elementwise greater than or equal to operator. More...
 
class  GreaterOp
 Implements an elementwise greater than operator. More...
 
class  HardTanhOp
 Implements the hard tanh operator, which bounds the min and max value of the tanh operator. More...
 
class  InnerProductOp
 Implements the inner product operator. More...
 
class  LessEqualOp
 Implements an elementwise less-than-or-equal-to operator. More...
 
class  LessOp
 Implements an elementwise less-than operator. More...
 
class  MaxPoolingOp
 Implements the max-pooling operator. More...
 
class  MergeOp
 Forwards the first live input to its output. More...
 
class  Network
 Network encapsulates all of the information SMAUG will use during execution: the overall computation graph of the model, all the operators and tensors, various housekeeping structures, and simulation information. More...
 
class  Operator
 Operator is the base class for all graph operators supported by SMAUG. More...
 
class  PaddingOp
 Pad a given tensor in any number of dimensions with arbitrary size. More...
 
class  PoolingOp
 Implements a pooling operator. More...
 
class  ReferenceBackend
 ReferenceBackend provides reference implementations of all operators supported by SMAUG. More...
 
class  ReluOp
 Implements the rectified linear unit operator: max(slope * x, 0). More...
 
class  ReorderOp
 Implements a Tensor reordering operation to convert between different DataLayouts. More...
 
class  RepeatOp
 Replicates a Tensor's data among all dimensions. More...
 
class  ReshapeOp
 Changes the Tensor's shape while retaining the number of elements. More...
 
class  Scheduler
 Scheduler is responsible for running the Network. More...
 
class  SeluOp
 Implements the scaled exponential linear unit function. More...
 
class  SigmoidOp
 Implements the sigmoid operator, defined as 1/(1 + exp(-input)). More...
 
class  SmaugTest
 The Catch2 test fixture used by all C++ unit tests. More...
 
class  SmvAcceleratorPool
 Implements a pool of worker accelerators. More...
 
class  SmvAvgPoolingOp
 Average pooling operator on SMV. More...
 
class  SmvBackend
 SmvBackend implements a set of models of optimized DL kernels that were taped out on a machine learning SoC by the Harvard Architecture, Circuits, and Compilers. More...
 
class  SmvBatchNormOp
 SMV backend implementation of batch normalization. More...
 
class  SmvConvolutionOp
 SMV backend implementation of convolution. More...
 
class  SmvEltwiseAddOp
 Elementwise addition on SMV. More...
 
class  SmvEltwiseMulOp
 Elementwise multiplication on SMV. More...
 
class  SmvEluOp
 Elementwise exponential linear unit on SMV. More...
 
class  SmvGreaterEqualOp
 Elementwise greater-than-or-equal-to operator on SMV. More...
 
class  SmvGreaterOp
 Elementwise greater-than operator on SMV. More...
 
class  SmvHardTanhOp
 Hard tanh operator on SMV. More...
 
class  SmvInnerProductOp
 Inner product operator on SMV. More...
 
class  SmvLessEqualOp
 Elementwise less-than-or-equal-to operator on SMV. More...
 
class  SmvLessOp
 Elementwise less-than operator on SMV. More...
 
class  SmvMaxPoolingOp
 Max-pooling operator on SMV. More...
 
class  SmvPoolingOp
 Base class for SMV pooling oeprators. More...
 
class  SmvReluOp
 Rectified linear-unit operator on SMV. More...
 
class  SmvSeluOp
 Elementwise scaled exponential linear unit on SMV. More...
 
class  SmvSigmoidOp
 Sigmoid linear-unit operator on SMV. More...
 
class  SmvSoftmaxOp
 Softmax operator on SMV. More...
 
class  SmvTanhOp
 Tanh operator on SMV. More...
 
class  SoftmaxOp
 Implements the softmax operator. More...
 
class  SplitOp
 Implements the split operator, which divides a Tensor into N output Tensors along a specified dimension. More...
 
class  SwitchOp
 Conditionally forwards an input to one of two outputs. More...
 
class  TanhOp
 Implements the tanh operator. More...
 
class  Tensor
 Tensor represents a single multi-dimensional array of data. More...
 
class  TensorBase
 The base class of all Tensor objects. More...
 
class  TensorIndexIterator
 An iterator over a multidimensional tensor's indices, accounting for data alignment padding. More...
 
struct  TensorIndices
 Additional metadata for edges in the graph. More...
 
class  TensorRegionIndexIterator
 A tensor index iterator that stays within a specified rectangular region. More...
 
class  TensorShape
 TensorShape describes the shape of a Tensor. More...
 
class  ThreadPool
 A user-space cooperatve thread pool implementation designed for gem5 in SE mode. More...
 
class  TiledTensor
 A multidimensional container of Tensors. More...
 
struct  ToDataType
 Provides compile-time conversion from C types to SMAUG DataTypes. More...
 
struct  ToDataType< bool >
 
struct  ToDataType< double >
 
struct  ToDataType< float >
 
struct  ToDataType< float16 >
 
struct  ToDataType< int32_t >
 
struct  ToDataType< int64_t >
 
struct  ToDataType< uint32_t >
 
struct  ToDataType< uint64_t >
 
class  UnaryOp
 Base class for all operators with one input. More...
 
class  Workspace
 Workspace is the container and owner of all Tensors and Operators in the Network. More...
 

Typedefs

using float16 = uint16_t
 
typedef void(* FillTensorDataFunc) (Tensor *tensor)
 Any function that accepts a Tensor, fills it with data, and returns nothing.
 

Enumerations

enum  BackendName { Reference = REFERENCE, Smv = SMVBACKEND, UnknownBackend }
 The list of all hardware backends in the system. More...
 

Functions

NetworkbuildNetwork (const std::string &modelTopoFile, const std::string &modelParamsFile, SamplingInfo &sampling, Workspace *workspace)
 buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run. More...
 
float16 fp16 (float fp32_data)
 This converts a float32 into a float16.
 
float fp32 (float16 fp16_data)
 This converts a float16 into a float32.
 
TensorconvertFp16ToFp32Tensor (Tensor *fp16Tensor, Workspace *workspace)
 This creates a tensor with float32 data type and fills it with data converted from a source tensor with float16 data.
 
TensorconvertFp32ToFp16Tensor (Tensor *fp32Tensor, Workspace *workspace)
 This creates a tensor with float16 data type and fills it with data converted from a source tensor with float32 data.
 
template<>
void printTensorElement< float16 > (std::ostream &os, const float16 *data, int index)
 
std::ostream & operator<< (std::ostream &os, const TensorShape &shape)
 
std::ostream & operator<< (std::ostream &os, const TensorIndexIterator &iter)
 
std::ostream & operator<< (std::ostream &os, const Tensor &tensor)
 
void copyTensorRegion (Tensor *dest, Tensor *src, std::vector< int > destOrigin, std::vector< int > srcOrigin, std::vector< int > regionSize)
 Copies a region of a source Tensor to a corresponding region in a destination Tensor. More...
 
void copyTensorData (Tensor *dest, Tensor *src, std::vector< int > destOffset, std::vector< int > srcOffset, int copySize)
 Similar to copyTensorRegion, but the region is a contiguous block of memory.
 
void copyRawTensorData (Tensor *dest, Tensor *src, int destOffset, int srcOffset, int copySize)
 Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account. More...
 
TiledTensor generateTiledTensorPerBatchNC (Tensor *tensor, const TensorShape &tileShape, Operator *op, bool copyData=true)
 Tile the provided NC Tensor per batch. More...
 
TiledTensor generateTiledTensorWithStrideAndPadding (Tensor *tensor, const TensorShape &tileShape, Operator *op, int fieldRows, int fieldCols, int rowStride, int colStride, PaddingType paddingType, bool copyData=false)
 Generates a TiledTensor from a source Tensor with the specified tile shape. More...
 
TiledTensor generateTiledTensor (Tensor *tensor, const TensorShape &tileShape, Operator *op, bool copyData=false)
 Generates a TiledTensor from a source Tensor. More...
 
void flattenTiledTensor (TiledTensor &tiledTensor, Tensor *destTensor)
 Copies the data from each tile in a TiledTensor into a destination Tensor as a contiguous block of memory, as if only one dimension ever existed.
 
TensorconcatTensors (std::vector< Tensor * > inputTensors, int concatDim, Workspace *workspace)
 Concatenates Tensors on the specified dimension into one single tensor.
 
template<typename DType >
void printTensorElement (std::ostream &os, const DType *data, int index)
 
template<typename DType >
void writeTensorToOstream (std::ostream &os, const Tensor &tensor)
 Pretty-print a Tensor's name, shape, and contents to the provided ostream.
 
std::string getTraceName (int accelIdx)
 Return the name of the dynamic trace for this accelerator. More...
 
void mapArrayToAccel (unsigned reqCode, const char *arrayName, void *baseAddr, size_t size)
 Maps an array of data to the accelerator. More...
 
void setArrayMemTypeIfSimulating (unsigned reqCode, const char *arrayName, MemoryType memType)
 Sets what memory access mechanism the accelerator will use when accessing this array. More...
 
template<typename Kernel , typename... Args>
void invokeKernel (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args)
 The generic blocking interface for all accelerator kernel functions. More...
 
template<typename Kernel , typename... Args>
void invokeKernel (unsigned reqCode, const Kernel &kernel, Args &&... args)
 A generic interface for all accelerator kernel functions. More...
 
template<typename Kernel , typename... Args>
std::unique_ptr< volatile int > invokeKernelNoBlock (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args)
 A generic non-blocking interface to accelerated kernel functions. More...
 
void convertNchwToNhwc (Tensor *input, Tensor *output)
 
void convertNhwcToNchw (Tensor *input, Tensor *output)
 
void flatten (Tensor *input, Tensor *output)
 
void transpose3D (Tensor *input, Tensor *output)
 
void transpose2D (Tensor *input, Tensor *output)
 
template<typename DType >
void convertNchwToNhwcImpl (Tensor *input, Tensor *output)
 
template<typename DType >
void convertNhwcToNchwImpl (Tensor *input, Tensor *output)
 
template<typename DType >
void flattenImpl (Tensor *input, Tensor *output)
 
template<typename DType >
void transpose3DImpl (Tensor *input, Tensor *output)
 
template<typename DType >
void transpose2DImpl (Tensor *input, Tensor *output)
 
std::normal_distribution< float > normalDist (kMean, kVar)
 
void fillTensorWithRandomData (Tensor *tensor)
 This fills the Tensor with normally distributed random values.
 
void fillTensorWithFixedData (Tensor *tensor)
 This fills the Tensor with a fixed data pattern. More...
 
void verifyTensorWithFixedData (Tensor *tensor, int valueOffset)
 Verify that the provided Tensor's data matches the fixed pattern produced by fillTensorWithFixedData, with the provided offset to each value.
 
void initDebugStream (int debugLevel)
 Initializes the global debug stream for the given debug level.
 
const DebugStreamdout (int debugLevel)
 Returns a DebugStream instance for the given debug level.
 
void * malloc_aligned (size_t size, bool zeroOut=false)
 Return heap-allocated cacheline-aligned memory.
 
std::string dataLayoutToStr (DataLayout layout)
 Get the string version of DataLayout.
 
int calc_padding (int value, unsigned alignment)
 Return the difference between value and the next multiple of alignment.
 
template<typename T >
int product (std::vector< T > array)
 
template<typename T >
std::vector< T > sum (std::vector< T > array0, std::vector< T > array1)
 Returns the elementwise-sum of the two arrays, which must be of the same size.
 
template<typename T >
void variadicToVector (std::vector< T > &vector, T elem)
 
template<typename T , typename... Args>
void variadicToVector (std::vector< T > &vector, T e, Args... elems)
 Populates a std::vector with an arbitrary number of elements.
 
template<typename T , typename... Args>
std::array< T, sizeof...(Args)+1 > variadicToArray (T i, Args... elems)
 Returns a std::array populated with the given elements. More...
 

Variables

bool runningInSimulation
 This is true if the user chooses to run the network in gem5 simulation.
 
bool fastForwardMode = true
 True if we are simulating in fast-forward mode.
 
int numAcceleratorsAvailable
 The actual number of accelerator complexes currently in use.
 
ThreadPoolthreadPool = nullptr
 The user-space thread pool used by SMAUG to run multithreaded tasks.
 
bool useSystolicArrayWhenAvailable
 If true, uses the systolic array for applicable operators when backend support exists.
 
constexpr const int maxNumAccelerators = 8
 The maximum number of accelerators an operator's work can be split across. More...
 
constexpr const char * kLayerFormat = "%-40s %-25s %=15d\n"
 
constexpr float kMargin = 0.001
 Sets the absolute value by which a result can differ from Approx's expected value.
 
constexpr float kEpsilon = 0.01
 Set the percentage by which a result can differ from Approx's expected value.
 
 BatchNormOp
 
 ReferenceBackend
 
 ConvolutionOp
 
 DepthwiseConvolutionOp
 
 EltwiseAddOp
 
 EltwiseMulOp
 
 EluOp
 
 SeluOp
 
 GreaterOp
 
 GreaterEqualOp
 
 InnerProductOp
 
 LessOp
 
 LessEqualOp
 
 MaxPoolingOp
 
 AvgPoolingOp
 
 ReluOp
 
 SigmoidOp
 
constexpr float kMean = 0.0
 
constexpr float kVar = 0.1
 
std::default_random_engine generator
 
constexpr float kFraction = 0.1
 
 SoftmaxOp
 
 TanhOp
 
 HardTanhOp
 

Detailed Description

The smaug namespace is the parent namespace of all C++ code in SMAUG.

Enumeration Type Documentation

◆ BackendName

The list of all hardware backends in the system.

Enumerator
Reference 

Reference backend.

Smv 

SMV backend.

UnknownBackend 

Invalid backend.

Definition at line 22 of file backend.h.

Function Documentation

◆ buildNetwork()

Network * smaug::buildNetwork ( const std::string &  modelTopoFile,
const std::string &  modelParamsFile,
SamplingInfo sampling,
Workspace workspace 
)

buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run.

Parameters
modelTopoFileThe path to the model topology protobuf.
modelParamsFileThe path to the model parameters protobuf, which contains values for all tensors in the network (weights and inputs).
samplingLevel of simulation sampling to apply to applicable kernels.
workspacePointer to the global Workspace holding all tensors and operators.

Definition at line 370 of file network_builder.cpp.

◆ copyRawTensorData()

void smaug::copyRawTensorData ( Tensor dest,
Tensor src,
int  destOffset,
int  srcOffset,
int  copySize 
)

Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account.

Parameters
destDestination Tensor
srcSource Tensor
destOffsetThe linear offset into the destination where data will be copied to.
srcOffsetThe linear offset into the source where data will be copied from.
copySizeThe size of the region in elements.

Definition at line 138 of file tensor_utils.cpp.

◆ copyTensorRegion()

void smaug::copyTensorRegion ( Tensor dest,
Tensor src,
std::vector< int >  destOrigin,
std::vector< int >  srcOrigin,
std::vector< int >  regionSize 
)

Copies a region of a source Tensor to a corresponding region in a destination Tensor.

The two Tensors are expected to share the same layout. Region origins and sizes are all specified in elements (not bytes) and in accordance with the data layout.

For example: tensorA: 4x4, tensor B: 3x3 To copy upper left 2x2 block of tensorA to the lower left 2x2 block of * tensorB: copyTensorRegion(tensorB, tensorA, {1,1}, {0,0}, {2,2})

Parameters
destDestination Tensor
srcSource Tensor
destOriginThe start of the copied region in the destination.
srcOriginThe start of the copied region in the source.
regionSizeThe size of the region.

Definition at line 65 of file tensor_utils.cpp.

◆ fillTensorWithFixedData()

void smaug::fillTensorWithFixedData ( Tensor tensor)

This fills the Tensor with a fixed data pattern.

The Tensor should be in NWCH data layout. Each channel dimension is initialized with a different value, but each batch/row/col will share this same pattern

Definition at line 22 of file smv_test_common.cpp.

◆ generateTiledTensor()

TiledTensor smaug::generateTiledTensor ( Tensor tensor,
const TensorShape tileShape,
Operator op,
bool  copyData = false 
)

Generates a TiledTensor from a source Tensor.

This does not support generating tiles with overlap, striding, or padding options.

Parameters
tensorThe Tensor to tile.
tileShapeThe maximum size of each tile.
opThe Operator that will be consuming this TiledTensor.
copyDataWhether to copy data from the source tensor into the tiles.

Definition at line 335 of file tensor_utils.cpp.

◆ generateTiledTensorPerBatchNC()

TiledTensor smaug::generateTiledTensorPerBatchNC ( Tensor tensor,
const TensorShape tileShape,
Operator op,
bool  copyData = true 
)

Tile the provided NC Tensor per batch.

The only requirement is to tile the Tensor in contiguous blocks of tileShape, without concern for strides, overlap, or padding. Thus, this is usually useful only for unary and elementwise operators.

Parameters
tensorThe Tensor to tile.
tileShapeThe maximum size of each tile.
opThe Operator that will be consuming this TiledTensor.
copyDataWhether to copy data from the source tensor into the tiles.

Definition at line 199 of file tensor_utils.cpp.

◆ generateTiledTensorWithStrideAndPadding()

TiledTensor smaug::generateTiledTensorWithStrideAndPadding ( Tensor tensor,
const TensorShape tileShape,
Operator op,
int  fieldRows,
int  fieldCols,
int  rowStride,
int  colStride,
PaddingType  paddingType,
bool  copyData = false 
)

Generates a TiledTensor from a source Tensor with the specified tile shape.

Depending on the operator that needs this TiledTensor, tiles may need to overlap each other (e.g. for a convolutional filter window).

Parameters
tensorThe Tensor to tile.
tileShapeThe maximum size of each tile.
opThe Operator that will be consuming this TiledTensor.
fieldRowsNumber of rows of a filter applied, if any.
fieldColsNumber of columns of a filter applied, if any.
rowStrideThe row stride of a filter applied, if any.
colStrideThe column stride of a filter applied, if any.
paddingTypeThe type of additional zero-padding applied on the Tensor by the Operator, if any.
copyDataWhether to copy data from the source tensor into the tiles.

Definition at line 233 of file tensor_utils.cpp.

◆ getTraceName()

std::string smaug::getTraceName ( int  accelIdx)

Return the name of the dynamic trace for this accelerator.

Parameters
accelIdxThe ID of this accelerator.

Definition at line 6 of file common.cpp.

◆ invokeKernel() [1/2]

template<typename Kernel , typename... Args>
void smaug::invokeKernel ( int  accelIdx,
unsigned  reqCode,
const Kernel &  kernel,
Args &&...  args 
)

The generic blocking interface for all accelerator kernel functions.

All accelerated kernels should be called via this interface, and different things will happen based on how the program is being run:

  • As a native binary: the kernel function is directly called.
  • As an LLVM-Tracer instrumented binary: sets the file name of the dynamic trace being generated, then calls the kernel function.
  • In gem5-Aladdin: invokes the Aladdin model of the specified accelerator.

This is a blocking call: in gem5-Aladdin mode, the thread will wait until the accelerator finishes. For a non-blocking call, use invokeKernelNoBlock.

Parameters
accelIdxSetes the suffix of the dynamic trace to XXX_acc[accelIdx]. Used if you want to generate multiple independent traces to simulate multiple accelerators.
reqCodeThe ID of the accelerator to invoke.
kernelThe kernel function to invoke in native/LLVM-Tracer mode.
argsThe arguments to the kernel function.

Definition at line 72 of file common.h.

◆ invokeKernel() [2/2]

template<typename Kernel , typename... Args>
void smaug::invokeKernel ( unsigned  reqCode,
const Kernel &  kernel,
Args &&...  args 
)

A generic interface for all accelerator kernel functions.

This is a convenience function that sets accelIdx = 0, so only one dynamic trace file will be generated.

Definition at line 93 of file common.h.

◆ invokeKernelNoBlock()

template<typename Kernel , typename... Args>
std::unique_ptr<volatile int> smaug::invokeKernelNoBlock ( int  accelIdx,
unsigned  reqCode,
const Kernel &  kernel,
Args &&...  args 
)

A generic non-blocking interface to accelerated kernel functions.

The only difference between this and invokeKernel is that in gem5-Aladdin mode, the thread will start Aladdin and then return immediately. The calling thread is responsible for checking the status of the accelerator and taking action appropriately.

Definition at line 106 of file common.h.

◆ mapArrayToAccel()

void smaug::mapArrayToAccel ( unsigned  reqCode,
const char *  arrayName,
void *  baseAddr,
size_t  size 
)

Maps an array of data to the accelerator.

This enables the accelerator to access host memory via DMA or caching memory accesses.

Parameters
reqCodeThe ID of the accelerator
arrayNameThe name of the array as it appears in the top-level accelerator function signature.
baseAddrThe base address of the array (e.g. &array[0]).
sizeThe size of the array.

Definition at line 12 of file common.cpp.

◆ setArrayMemTypeIfSimulating()

void smaug::setArrayMemTypeIfSimulating ( unsigned  reqCode,
const char *  arrayName,
MemoryType  memType 
)

Sets what memory access mechanism the accelerator will use when accessing this array.

This lets the user decide at runtime whether to access a hots array over DMA, hardware caching, or ACP.

Parameters
reqCodeThe ID of the accelerator
arrayNameThe name of the array as it appears in the accelerator's function signature.
memTypeThe memory access mechanism.

Definition at line 21 of file common.cpp.

◆ variadicToArray()

template<typename T , typename... Args>
std::array<T, sizeof...(Args) + 1> smaug::variadicToArray ( i,
Args...  elems 
)

Returns a std::array populated with the given elements.

Must contain at least one element.

Parameters
iThe first element.
elemsAll the remaining elements.

Definition at line 57 of file utils.h.

Variable Documentation

◆ maxNumAccelerators

constexpr const int smaug::maxNumAccelerators = 8
constexpr

The maximum number of accelerators an operator's work can be split across.

This limit exists to keep Aladdin simulation time and resources in check.

Definition at line 25 of file globals.h.