The smaug namespace is the parent namespace of all C++ code in SMAUG. More...

Namespaces
	gem5
	Contains utility functions for interacting with gem5.

	ref
	The ref namespace contains all code specific to the Reference backend.

	smv
	The smv namespace contains all code specific to the Smv backend.

Classes
class	AvgPoolingOp
	Implements the arithmetic-average-pooling operator. More...

class	BatchNormOp
	Implements the batch normalization layer. More...

class	ConcatOp
	Concatenates N Tensors along a specified axis. More...

class	ConvolutionOp
	The base class for all 4D spatial convolution operators. More...

class	DataflowGraphWriter
	DataflowGraphWriter writes the current network as a dot-graph file to the given ostream. More...

class	DataOp
	Exposes a Tensor as its only output. More...

class	DebugStream
	An stream class to consume debug logs. More...

class	DepthwiseConvolutionOp
	Implements the depthwise convolution operator. More...

class	EltwiseAddOp
	Adds two Tensors elementwise. More...

class	EltwiseMulOp
	Multiplies two Tensors elementwise. More...

class	EltwiseOp
	The base class of all elementwise operators. More...

class	EluOp
	Implements the exponential linear unit function. More...

class	FlattenOp
	Flattens each batch of a Tensor. More...

struct	FromDataType
	Provides compile-time conversion from SMAUG DataType to C type. More...

struct	FromDataType< Bool >

struct	FromDataType< Float16 >

struct	FromDataType< Float32 >

struct	FromDataType< Float64 >

struct	FromDataType< Int32 >

struct	FromDataType< Int64 >

class	FusedActivationOp
	An Operator fused with an activation function. More...

class	GreaterEqualOp
	Implements an elementwise greater than or equal to operator. More...

class	GreaterOp
	Implements an elementwise greater than operator. More...

class	HardTanhOp
	Implements the hard tanh operator, which bounds the min and max value of the tanh operator. More...

class	InnerProductOp
	Implements the inner product operator. More...

class	LessEqualOp
	Implements an elementwise less-than-or-equal-to operator. More...

class	LessOp
	Implements an elementwise less-than operator. More...

class	MaxPoolingOp
	Implements the max-pooling operator. More...

class	MergeOp
	Forwards the first live input to its output. More...

class	Network
	Network encapsulates all of the information SMAUG will use during execution: the overall computation graph of the model, all the operators and tensors, various housekeeping structures, and simulation information. More...

class	Operator
	Operator is the base class for all graph operators supported by SMAUG. More...

class	PaddingOp
	Pad a given tensor in any number of dimensions with arbitrary size. More...

class	PoolingOp
	Implements a pooling operator. More...

class	ReferenceBackend
	ReferenceBackend provides reference implementations of all operators supported by SMAUG. More...

class	ReluOp
	Implements the rectified linear unit operator: max(slope * x, 0). More...

class	ReorderOp
	Implements a Tensor reordering operation to convert between different DataLayouts. More...

class	RepeatOp
	Replicates a Tensor's data among all dimensions. More...

class	ReshapeOp
	Changes the Tensor's shape while retaining the number of elements. More...

class	Scheduler
	Scheduler is responsible for running the Network. More...

class	SeluOp
	Implements the scaled exponential linear unit function. More...

class	SigmoidOp
	Implements the sigmoid operator, defined as 1/(1 + exp(-input)). More...

class	SmaugTest
	The Catch2 test fixture used by all C++ unit tests. More...

class	SmvAcceleratorPool
	Implements a pool of worker accelerators. More...

class	SmvAvgPoolingOp
	Average pooling operator on SMV. More...

class	SmvBackend
	SmvBackend implements a set of models of optimized DL kernels that were taped out on a machine learning SoC by the Harvard Architecture, Circuits, and Compilers. More...

class	SmvBatchNormOp
	SMV backend implementation of batch normalization. More...

class	SmvConvolutionOp
	SMV backend implementation of convolution. More...

class	SmvEltwiseAddOp
	Elementwise addition on SMV. More...

class	SmvEltwiseMulOp
	Elementwise multiplication on SMV. More...

class	SmvEluOp
	Elementwise exponential linear unit on SMV. More...

class	SmvGreaterEqualOp
	Elementwise greater-than-or-equal-to operator on SMV. More...

class	SmvGreaterOp
	Elementwise greater-than operator on SMV. More...

class	SmvHardTanhOp
	Hard tanh operator on SMV. More...

class	SmvInnerProductOp
	Inner product operator on SMV. More...

class	SmvLessEqualOp
	Elementwise less-than-or-equal-to operator on SMV. More...

class	SmvLessOp
	Elementwise less-than operator on SMV. More...

class	SmvMaxPoolingOp
	Max-pooling operator on SMV. More...

class	SmvPoolingOp
	Base class for SMV pooling oeprators. More...

class	SmvReluOp
	Rectified linear-unit operator on SMV. More...

class	SmvSeluOp
	Elementwise scaled exponential linear unit on SMV. More...

class	SmvSigmoidOp
	Sigmoid linear-unit operator on SMV. More...

class	SmvSoftmaxOp
	Softmax operator on SMV. More...

class	SmvTanhOp
	Tanh operator on SMV. More...

class	SoftmaxOp
	Implements the softmax operator. More...

class	SplitOp
	Implements the split operator, which divides a Tensor into N output Tensors along a specified dimension. More...

class	SwitchOp
	Conditionally forwards an input to one of two outputs. More...

class	TanhOp
	Implements the tanh operator. More...

class	Tensor
	Tensor represents a single multi-dimensional array of data. More...

class	TensorBase
	The base class of all Tensor objects. More...

class	TensorIndexIterator
	An iterator over a multidimensional tensor's indices, accounting for data alignment padding. More...

struct	TensorIndices
	Additional metadata for edges in the graph. More...

class	TensorRegionIndexIterator
	A tensor index iterator that stays within a specified rectangular region. More...

class	TensorShape
	TensorShape describes the shape of a Tensor. More...

class	ThreadPool
	A user-space cooperatve thread pool implementation designed for gem5 in SE mode. More...

class	TiledTensor
	A multidimensional container of Tensors. More...

struct	ToDataType
	Provides compile-time conversion from C types to SMAUG DataTypes. More...

struct	ToDataType< bool >

struct	ToDataType< double >

struct	ToDataType< float >

struct	ToDataType< float16 >

struct	ToDataType< int32_t >

struct	ToDataType< int64_t >

struct	ToDataType< uint32_t >

struct	ToDataType< uint64_t >

class	UnaryOp
	Base class for all operators with one input. More...

class	Workspace
	Workspace is the container and owner of all Tensors and Operators in the Network. More...

Typedefs
using	float16 = uint16_t

typedef void(*	FillTensorDataFunc) (Tensor *tensor)
	Any function that accepts a Tensor, fills it with data, and returns nothing.

Enumerations
enum	BackendName { Reference = REFERENCE, Smv = SMVBACKEND, UnknownBackend }
	The list of all hardware backends in the system. More...

Functions
Network *	buildNetwork (const std::string &modelTopoFile, const std::string &modelParamsFile, SamplingInfo &sampling, Workspace *workspace)
	buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run. More...

float16	fp16 (float fp32_data)
	This converts a float32 into a float16.

float	fp32 (float16 fp16_data)
	This converts a float16 into a float32.

Tensor *	convertFp16ToFp32Tensor (Tensor fp16Tensor, Workspace workspace)
	This creates a tensor with float32 data type and fills it with data converted from a source tensor with float16 data.

Tensor *	convertFp32ToFp16Tensor (Tensor fp32Tensor, Workspace workspace)
	This creates a tensor with float16 data type and fills it with data converted from a source tensor with float32 data.

template<>
void	printTensorElement< float16 > (std::ostream &os, const float16 *data, int index)

std::ostream &	operator<< (std::ostream &os, const TensorShape &shape)

std::ostream &	operator<< (std::ostream &os, const TensorIndexIterator &iter)

std::ostream &	operator<< (std::ostream &os, const Tensor &tensor)

void	copyTensorRegion (Tensor dest, Tensor src, std::vector< int > destOrigin, std::vector< int > srcOrigin, std::vector< int > regionSize)
	Copies a region of a source Tensor to a corresponding region in a destination Tensor. More...

void	copyTensorData (Tensor dest, Tensor src, std::vector< int > destOffset, std::vector< int > srcOffset, int copySize)
	Similar to copyTensorRegion, but the region is a contiguous block of memory.

void	copyRawTensorData (Tensor dest, Tensor src, int destOffset, int srcOffset, int copySize)
	Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account. More...

TiledTensor	generateTiledTensorPerBatchNC (Tensor tensor, const TensorShape &tileShape, Operator op, bool copyData=true)
	Tile the provided NC Tensor per batch. More...

TiledTensor	generateTiledTensorWithStrideAndPadding (Tensor tensor, const TensorShape &tileShape, Operator op, int fieldRows, int fieldCols, int rowStride, int colStride, PaddingType paddingType, bool copyData=false)
	Generates a TiledTensor from a source Tensor with the specified tile shape. More...

TiledTensor	generateTiledTensor (Tensor tensor, const TensorShape &tileShape, Operator op, bool copyData=false)
	Generates a TiledTensor from a source Tensor. More...

void	flattenTiledTensor (TiledTensor &tiledTensor, Tensor *destTensor)
	Copies the data from each tile in a TiledTensor into a destination Tensor as a contiguous block of memory, as if only one dimension ever existed.

Tensor *	concatTensors (std::vector< Tensor * > inputTensors, int concatDim, Workspace *workspace)
	Concatenates Tensors on the specified dimension into one single tensor.

template<typename DType >
void	printTensorElement (std::ostream &os, const DType *data, int index)

template<typename DType >
void	writeTensorToOstream (std::ostream &os, const Tensor &tensor)
	Pretty-print a Tensor's name, shape, and contents to the provided ostream.

std::string	getTraceName (int accelIdx)
	Return the name of the dynamic trace for this accelerator. More...

void	mapArrayToAccel (unsigned reqCode, const char arrayName, void baseAddr, size_t size)
	Maps an array of data to the accelerator. More...

void	setArrayMemTypeIfSimulating (unsigned reqCode, const char *arrayName, MemoryType memType)
	Sets what memory access mechanism the accelerator will use when accessing this array. More...

template<typename Kernel , typename... Args>
void	invokeKernel (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args)
	The generic blocking interface for all accelerator kernel functions. More...

template<typename Kernel , typename... Args>
void	invokeKernel (unsigned reqCode, const Kernel &kernel, Args &&... args)
	A generic interface for all accelerator kernel functions. More...

template<typename Kernel , typename... Args>
std::unique_ptr< volatile int >	invokeKernelNoBlock (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args)
	A generic non-blocking interface to accelerated kernel functions. More...

void	convertNchwToNhwc (Tensor input, Tensor output)

void	convertNhwcToNchw (Tensor input, Tensor output)

void	flatten (Tensor input, Tensor output)

void	transpose3D (Tensor input, Tensor output)

void	transpose2D (Tensor input, Tensor output)

template<typename DType >
void	convertNchwToNhwcImpl (Tensor input, Tensor output)

template<typename DType >
void	convertNhwcToNchwImpl (Tensor input, Tensor output)

template<typename DType >
void	flattenImpl (Tensor input, Tensor output)

template<typename DType >
void	transpose3DImpl (Tensor input, Tensor output)

template<typename DType >
void	transpose2DImpl (Tensor input, Tensor output)

std::normal_distribution< float >	normalDist (kMean, kVar)

void	fillTensorWithRandomData (Tensor *tensor)
	This fills the Tensor with normally distributed random values.

void	fillTensorWithFixedData (Tensor *tensor)
	This fills the Tensor with a fixed data pattern. More...

void	verifyTensorWithFixedData (Tensor *tensor, int valueOffset)
	Verify that the provided Tensor's data matches the fixed pattern produced by fillTensorWithFixedData, with the provided offset to each value.

void	initDebugStream (int debugLevel)
	Initializes the global debug stream for the given debug level.

const DebugStream &	dout (int debugLevel)
	Returns a DebugStream instance for the given debug level.

void *	malloc_aligned (size_t size, bool zeroOut=false)
	Return heap-allocated cacheline-aligned memory.

std::string	dataLayoutToStr (DataLayout layout)
	Get the string version of DataLayout.

int	calc_padding (int value, unsigned alignment)
	Return the difference between value and the next multiple of alignment.

template<typename T >
int	product (std::vector< T > array)

template<typename T >
std::vector< T >	sum (std::vector< T > array0, std::vector< T > array1)
	Returns the elementwise-sum of the two arrays, which must be of the same size.

template<typename T >
void	variadicToVector (std::vector< T > &vector, T elem)

template<typename T , typename... Args>
void	variadicToVector (std::vector< T > &vector, T e, Args... elems)
	Populates a std::vector with an arbitrary number of elements.

template<typename T , typename... Args>
std::array< T, sizeof...(Args)+1 >	variadicToArray (T i, Args... elems)
	Returns a std::array populated with the given elements. More...

Variables
bool	runningInSimulation
	This is true if the user chooses to run the network in gem5 simulation.

bool	fastForwardMode = true
	True if we are simulating in fast-forward mode.

int	numAcceleratorsAvailable
	The actual number of accelerator complexes currently in use.

ThreadPool *	threadPool = nullptr
	The user-space thread pool used by SMAUG to run multithreaded tasks.

bool	useSystolicArrayWhenAvailable
	If true, uses the systolic array for applicable operators when backend support exists.

constexpr const int	maxNumAccelerators = 8
	The maximum number of accelerators an operator's work can be split across. More...

constexpr const char *	kLayerFormat = "%-40s %-25s %=15d\n"

constexpr float	kMargin = 0.001
	Sets the absolute value by which a result can differ from Approx's expected value.

constexpr float	kEpsilon = 0.01
	Set the percentage by which a result can differ from Approx's expected value.

	BatchNormOp

	ReferenceBackend

	ConvolutionOp

	DepthwiseConvolutionOp

	EltwiseAddOp

	EltwiseMulOp

	EluOp

	SeluOp

	GreaterOp

	GreaterEqualOp

	InnerProductOp

	LessOp

	LessEqualOp

	MaxPoolingOp

	AvgPoolingOp

	ReluOp

	SigmoidOp

constexpr float	kMean = 0.0

constexpr float	kVar = 0.1

std::default_random_engine	generator

constexpr float	kFraction = 0.1

	SoftmaxOp

	TanhOp

	HardTanhOp

Detailed Description

The smaug namespace is the parent namespace of all C++ code in SMAUG.

Enumeration Type Documentation

◆ BackendName

enum smaug::BackendName

The list of all hardware backends in the system.

Enumerator
Reference	Reference backend.
Smv	SMV backend.
UnknownBackend	Invalid backend.

Definition at line 22 of file backend.h.

Function Documentation

◆ buildNetwork()

Network * smaug::buildNetwork	(	const std::string &	modelTopoFile,
		const std::string &	modelParamsFile,
		SamplingInfo &	sampling,
		Workspace *	workspace
	)

buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run.

Parameters

modelTopoFile	The path to the model topology protobuf.
modelParamsFile	The path to the model parameters protobuf, which contains values for all tensors in the network (weights and inputs).
sampling	Level of simulation sampling to apply to applicable kernels.
workspace	Pointer to the global Workspace holding all tensors and operators.

Definition at line 370 of file network_builder.cpp.

◆ copyRawTensorData()

void smaug::copyRawTensorData	(	Tensor *	dest,
		Tensor *	src,
		int	destOffset,
		int	srcOffset,
		int	copySize
	)

Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account.

Parameters

dest	Destination Tensor
src	Source Tensor
destOffset	The linear offset into the destination where data will be copied to.
srcOffset	The linear offset into the source where data will be copied from.
copySize	The size of the region in elements.

Definition at line 138 of file tensor_utils.cpp.

◆ copyTensorRegion()

void smaug::copyTensorRegion	(	Tensor *	dest,
		Tensor *	src,
		std::vector< int >	destOrigin,
		std::vector< int >	srcOrigin,
		std::vector< int >	regionSize
	)

Copies a region of a source Tensor to a corresponding region in a destination Tensor.

The two Tensors are expected to share the same layout. Region origins and sizes are all specified in elements (not bytes) and in accordance with the data layout.

For example: tensorA: 4x4, tensor B: 3x3 To copy upper left 2x2 block of tensorA to the lower left 2x2 block of * tensorB: copyTensorRegion(tensorB, tensorA, {1,1}, {0,0}, {2,2})

Parameters

dest	Destination Tensor
src	Source Tensor
destOrigin	The start of the copied region in the destination.
srcOrigin	The start of the copied region in the source.
regionSize	The size of the region.

Definition at line 65 of file tensor_utils.cpp.

◆ fillTensorWithFixedData()

void smaug::fillTensorWithFixedData ( Tensor * tensor )

This fills the Tensor with a fixed data pattern.

The Tensor should be in NWCH data layout. Each channel dimension is initialized with a different value, but each batch/row/col will share this same pattern

Definition at line 22 of file smv_test_common.cpp.

◆ generateTiledTensor()

TiledTensor smaug::generateTiledTensor	(	Tensor *	tensor,
		const TensorShape &	tileShape,
		Operator *	op,
		bool	copyData = `false`
	)

Generates a TiledTensor from a source Tensor.

This does not support generating tiles with overlap, striding, or padding options.

Parameters

tensor	The Tensor to tile.
tileShape	The maximum size of each tile.
op	The Operator that will be consuming this TiledTensor.
copyData	Whether to copy data from the source tensor into the tiles.

Definition at line 335 of file tensor_utils.cpp.

◆ generateTiledTensorPerBatchNC()

TiledTensor smaug::generateTiledTensorPerBatchNC	(	Tensor *	tensor,
		const TensorShape &	tileShape,
		Operator *	op,
		bool	copyData = `true`
	)

Tile the provided NC Tensor per batch.

The only requirement is to tile the Tensor in contiguous blocks of tileShape, without concern for strides, overlap, or padding. Thus, this is usually useful only for unary and elementwise operators.

Parameters

tensor	The Tensor to tile.
tileShape	The maximum size of each tile.
op	The Operator that will be consuming this TiledTensor.
copyData	Whether to copy data from the source tensor into the tiles.

Definition at line 199 of file tensor_utils.cpp.

◆ generateTiledTensorWithStrideAndPadding()

TiledTensor smaug::generateTiledTensorWithStrideAndPadding	(	Tensor *	tensor,
		const TensorShape &	tileShape,
		Operator *	op,
		int	fieldRows,
		int	fieldCols,
		int	rowStride,
		int	colStride,
		PaddingType	paddingType,
		bool	copyData = `false`
	)

Generates a TiledTensor from a source Tensor with the specified tile shape.

Depending on the operator that needs this TiledTensor, tiles may need to overlap each other (e.g. for a convolutional filter window).

Parameters

tensor	The Tensor to tile.
tileShape	The maximum size of each tile.
op	The Operator that will be consuming this TiledTensor.
fieldRows	Number of rows of a filter applied, if any.
fieldCols	Number of columns of a filter applied, if any.
rowStride	The row stride of a filter applied, if any.
colStride	The column stride of a filter applied, if any.
paddingType	The type of additional zero-padding applied on the Tensor by the Operator, if any.
copyData	Whether to copy data from the source tensor into the tiles.

Definition at line 233 of file tensor_utils.cpp.

◆ getTraceName()

std::string smaug::getTraceName ( int accelIdx )

Return the name of the dynamic trace for this accelerator.

Parameters

accelIdx The ID of this accelerator.

Definition at line 6 of file common.cpp.

◆ invokeKernel() [1/2]

template<typename Kernel , typename... Args>

void smaug::invokeKernel	(	int	accelIdx,
		unsigned	reqCode,
		const Kernel &	kernel,
		Args &&...	args
	)

The generic blocking interface for all accelerator kernel functions.

All accelerated kernels should be called via this interface, and different things will happen based on how the program is being run:

As a native binary: the kernel function is directly called.
As an LLVM-Tracer instrumented binary: sets the file name of the dynamic trace being generated, then calls the kernel function.
In gem5-Aladdin: invokes the Aladdin model of the specified accelerator.

This is a blocking call: in gem5-Aladdin mode, the thread will wait until the accelerator finishes. For a non-blocking call, use invokeKernelNoBlock.

Parameters

accelIdx	Setes the suffix of the dynamic trace to XXX_acc[accelIdx]. Used if you want to generate multiple independent traces to simulate multiple accelerators.
reqCode	The ID of the accelerator to invoke.
kernel	The kernel function to invoke in native/LLVM-Tracer mode.
args	The arguments to the kernel function.

Definition at line 72 of file common.h.

◆ invokeKernel() [2/2]

template<typename Kernel , typename... Args>

void smaug::invokeKernel	(	unsigned	reqCode,
		const Kernel &	kernel,
		Args &&...	args
	)

A generic interface for all accelerator kernel functions.

This is a convenience function that sets accelIdx = 0, so only one dynamic trace file will be generated.

Definition at line 93 of file common.h.

◆ invokeKernelNoBlock()

template<typename Kernel , typename... Args>

std::unique_ptr<volatile int> smaug::invokeKernelNoBlock	(	int	accelIdx,
		unsigned	reqCode,
		const Kernel &	kernel,
		Args &&...	args
	)

A generic non-blocking interface to accelerated kernel functions.

The only difference between this and invokeKernel is that in gem5-Aladdin mode, the thread will start Aladdin and then return immediately. The calling thread is responsible for checking the status of the accelerator and taking action appropriately.

Definition at line 106 of file common.h.

◆ mapArrayToAccel()

void smaug::mapArrayToAccel	(	unsigned	reqCode,
		const char *	arrayName,
		void *	baseAddr,
		size_t	size
	)

Maps an array of data to the accelerator.

This enables the accelerator to access host memory via DMA or caching memory accesses.

Parameters

reqCode	The ID of the accelerator
arrayName	The name of the array as it appears in the top-level accelerator function signature.
baseAddr	The base address of the array (e.g. &array[0]).
size	The size of the array.

Definition at line 12 of file common.cpp.

◆ setArrayMemTypeIfSimulating()

void smaug::setArrayMemTypeIfSimulating	(	unsigned	reqCode,
		const char *	arrayName,
		MemoryType	memType
	)

Sets what memory access mechanism the accelerator will use when accessing this array.

This lets the user decide at runtime whether to access a hots array over DMA, hardware caching, or ACP.

Parameters

reqCode	The ID of the accelerator
arrayName	The name of the array as it appears in the accelerator's function signature.
memType	The memory access mechanism.

Definition at line 21 of file common.cpp.

◆ variadicToArray()

template<typename T , typename... Args>

std::array<T, sizeof...(Args) + 1> smaug::variadicToArray	(	T	i,
		Args...	elems
	)

Returns a std::array populated with the given elements.

Must contain at least one element.

Parameters

i	The first element.
elems	All the remaining elements.

Definition at line 57 of file utils.h.

Variable Documentation

◆ maxNumAccelerators

constexpr const int smaug::maxNumAccelerators = 8

constexpr

The maximum number of accelerators an operator's work can be split across.

This limit exists to keep Aladdin simulation time and resources in check.

Definition at line 25 of file globals.h.

Namespaces

Classes

Typedefs

Enumerations

Functions

Variables

Detailed Description

Enumeration Type Documentation

◆ BackendName

Function Documentation

◆ buildNetwork()

◆ copyRawTensorData()

◆ copyTensorRegion()

◆ fillTensorWithFixedData()

◆ generateTiledTensor()

◆ generateTiledTensorPerBatchNC()

◆ generateTiledTensorWithStrideAndPadding()

◆ getTraceName()

◆ invokeKernel() [1/2]

◆ invokeKernel() [2/2]

◆ invokeKernelNoBlock()

◆ mapArrayToAccel()

◆ setArrayMemTypeIfSimulating()

◆ variadicToArray()

Variable Documentation

◆ maxNumAccelerators