SMAUG
Simulating Machine Learning Applications on gem5-Aladdin
|
The smaug namespace is the parent namespace of all C++ code in SMAUG. More...
Namespaces | |
gem5 | |
Contains utility functions for interacting with gem5. | |
ref | |
The ref namespace contains all code specific to the Reference backend. | |
smv | |
The smv namespace contains all code specific to the Smv backend. | |
Classes | |
class | AvgPoolingOp |
Implements the arithmetic-average-pooling operator. More... | |
class | BatchNormOp |
Implements the batch normalization layer. More... | |
class | ConcatOp |
Concatenates N Tensors along a specified axis. More... | |
class | ConvolutionOp |
The base class for all 4D spatial convolution operators. More... | |
class | DataflowGraphWriter |
DataflowGraphWriter writes the current network as a dot-graph file to the given ostream. More... | |
class | DataOp |
Exposes a Tensor as its only output. More... | |
class | DebugStream |
An stream class to consume debug logs. More... | |
class | DepthwiseConvolutionOp |
Implements the depthwise convolution operator. More... | |
class | EltwiseAddOp |
Adds two Tensors elementwise. More... | |
class | EltwiseMulOp |
Multiplies two Tensors elementwise. More... | |
class | EltwiseOp |
The base class of all elementwise operators. More... | |
class | EluOp |
Implements the exponential linear unit function. More... | |
class | FlattenOp |
Flattens each batch of a Tensor. More... | |
struct | FromDataType |
Provides compile-time conversion from SMAUG DataType to C type. More... | |
struct | FromDataType< Bool > |
struct | FromDataType< Float16 > |
struct | FromDataType< Float32 > |
struct | FromDataType< Float64 > |
struct | FromDataType< Int32 > |
struct | FromDataType< Int64 > |
class | FusedActivationOp |
An Operator fused with an activation function. More... | |
class | GreaterEqualOp |
Implements an elementwise greater than or equal to operator. More... | |
class | GreaterOp |
Implements an elementwise greater than operator. More... | |
class | HardTanhOp |
Implements the hard tanh operator, which bounds the min and max value of the tanh operator. More... | |
class | InnerProductOp |
Implements the inner product operator. More... | |
class | LessEqualOp |
Implements an elementwise less-than-or-equal-to operator. More... | |
class | LessOp |
Implements an elementwise less-than operator. More... | |
class | MaxPoolingOp |
Implements the max-pooling operator. More... | |
class | MergeOp |
Forwards the first live input to its output. More... | |
class | Network |
Network encapsulates all of the information SMAUG will use during execution: the overall computation graph of the model, all the operators and tensors, various housekeeping structures, and simulation information. More... | |
class | Operator |
Operator is the base class for all graph operators supported by SMAUG. More... | |
class | PaddingOp |
Pad a given tensor in any number of dimensions with arbitrary size. More... | |
class | PoolingOp |
Implements a pooling operator. More... | |
class | ReferenceBackend |
ReferenceBackend provides reference implementations of all operators supported by SMAUG. More... | |
class | ReluOp |
Implements the rectified linear unit operator: max(slope * x, 0). More... | |
class | ReorderOp |
Implements a Tensor reordering operation to convert between different DataLayouts. More... | |
class | RepeatOp |
Replicates a Tensor's data among all dimensions. More... | |
class | ReshapeOp |
Changes the Tensor's shape while retaining the number of elements. More... | |
class | Scheduler |
Scheduler is responsible for running the Network. More... | |
class | SeluOp |
Implements the scaled exponential linear unit function. More... | |
class | SigmoidOp |
Implements the sigmoid operator, defined as 1/(1 + exp(-input)). More... | |
class | SmaugTest |
The Catch2 test fixture used by all C++ unit tests. More... | |
class | SmvAcceleratorPool |
Implements a pool of worker accelerators. More... | |
class | SmvAvgPoolingOp |
Average pooling operator on SMV. More... | |
class | SmvBackend |
SmvBackend implements a set of models of optimized DL kernels that were taped out on a machine learning SoC by the Harvard Architecture, Circuits, and Compilers. More... | |
class | SmvBatchNormOp |
SMV backend implementation of batch normalization. More... | |
class | SmvConvolutionOp |
SMV backend implementation of convolution. More... | |
class | SmvEltwiseAddOp |
Elementwise addition on SMV. More... | |
class | SmvEltwiseMulOp |
Elementwise multiplication on SMV. More... | |
class | SmvEluOp |
Elementwise exponential linear unit on SMV. More... | |
class | SmvGreaterEqualOp |
Elementwise greater-than-or-equal-to operator on SMV. More... | |
class | SmvGreaterOp |
Elementwise greater-than operator on SMV. More... | |
class | SmvHardTanhOp |
Hard tanh operator on SMV. More... | |
class | SmvInnerProductOp |
Inner product operator on SMV. More... | |
class | SmvLessEqualOp |
Elementwise less-than-or-equal-to operator on SMV. More... | |
class | SmvLessOp |
Elementwise less-than operator on SMV. More... | |
class | SmvMaxPoolingOp |
Max-pooling operator on SMV. More... | |
class | SmvPoolingOp |
Base class for SMV pooling oeprators. More... | |
class | SmvReluOp |
Rectified linear-unit operator on SMV. More... | |
class | SmvSeluOp |
Elementwise scaled exponential linear unit on SMV. More... | |
class | SmvSigmoidOp |
Sigmoid linear-unit operator on SMV. More... | |
class | SmvSoftmaxOp |
Softmax operator on SMV. More... | |
class | SmvTanhOp |
Tanh operator on SMV. More... | |
class | SoftmaxOp |
Implements the softmax operator. More... | |
class | SplitOp |
Implements the split operator, which divides a Tensor into N output Tensors along a specified dimension. More... | |
class | SwitchOp |
Conditionally forwards an input to one of two outputs. More... | |
class | TanhOp |
Implements the tanh operator. More... | |
class | Tensor |
Tensor represents a single multi-dimensional array of data. More... | |
class | TensorBase |
The base class of all Tensor objects. More... | |
class | TensorIndexIterator |
An iterator over a multidimensional tensor's indices, accounting for data alignment padding. More... | |
struct | TensorIndices |
Additional metadata for edges in the graph. More... | |
class | TensorRegionIndexIterator |
A tensor index iterator that stays within a specified rectangular region. More... | |
class | TensorShape |
TensorShape describes the shape of a Tensor. More... | |
class | ThreadPool |
A user-space cooperatve thread pool implementation designed for gem5 in SE mode. More... | |
class | TiledTensor |
A multidimensional container of Tensors. More... | |
struct | ToDataType |
Provides compile-time conversion from C types to SMAUG DataTypes. More... | |
struct | ToDataType< bool > |
struct | ToDataType< double > |
struct | ToDataType< float > |
struct | ToDataType< float16 > |
struct | ToDataType< int32_t > |
struct | ToDataType< int64_t > |
struct | ToDataType< uint32_t > |
struct | ToDataType< uint64_t > |
class | UnaryOp |
Base class for all operators with one input. More... | |
class | Workspace |
Workspace is the container and owner of all Tensors and Operators in the Network. More... | |
Typedefs | |
using | float16 = uint16_t |
typedef void(* | FillTensorDataFunc) (Tensor *tensor) |
Any function that accepts a Tensor, fills it with data, and returns nothing. | |
Enumerations | |
enum | BackendName { Reference = REFERENCE, Smv = SMVBACKEND, UnknownBackend } |
The list of all hardware backends in the system. More... | |
Functions | |
Network * | buildNetwork (const std::string &modelTopoFile, const std::string &modelParamsFile, SamplingInfo &sampling, Workspace *workspace) |
buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run. More... | |
float16 | fp16 (float fp32_data) |
This converts a float32 into a float16. | |
float | fp32 (float16 fp16_data) |
This converts a float16 into a float32. | |
Tensor * | convertFp16ToFp32Tensor (Tensor *fp16Tensor, Workspace *workspace) |
This creates a tensor with float32 data type and fills it with data converted from a source tensor with float16 data. | |
Tensor * | convertFp32ToFp16Tensor (Tensor *fp32Tensor, Workspace *workspace) |
This creates a tensor with float16 data type and fills it with data converted from a source tensor with float32 data. | |
template<> | |
void | printTensorElement< float16 > (std::ostream &os, const float16 *data, int index) |
std::ostream & | operator<< (std::ostream &os, const TensorShape &shape) |
std::ostream & | operator<< (std::ostream &os, const TensorIndexIterator &iter) |
std::ostream & | operator<< (std::ostream &os, const Tensor &tensor) |
void | copyTensorRegion (Tensor *dest, Tensor *src, std::vector< int > destOrigin, std::vector< int > srcOrigin, std::vector< int > regionSize) |
Copies a region of a source Tensor to a corresponding region in a destination Tensor. More... | |
void | copyTensorData (Tensor *dest, Tensor *src, std::vector< int > destOffset, std::vector< int > srcOffset, int copySize) |
Similar to copyTensorRegion, but the region is a contiguous block of memory. | |
void | copyRawTensorData (Tensor *dest, Tensor *src, int destOffset, int srcOffset, int copySize) |
Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account. More... | |
TiledTensor | generateTiledTensorPerBatchNC (Tensor *tensor, const TensorShape &tileShape, Operator *op, bool copyData=true) |
Tile the provided NC Tensor per batch. More... | |
TiledTensor | generateTiledTensorWithStrideAndPadding (Tensor *tensor, const TensorShape &tileShape, Operator *op, int fieldRows, int fieldCols, int rowStride, int colStride, PaddingType paddingType, bool copyData=false) |
Generates a TiledTensor from a source Tensor with the specified tile shape. More... | |
TiledTensor | generateTiledTensor (Tensor *tensor, const TensorShape &tileShape, Operator *op, bool copyData=false) |
Generates a TiledTensor from a source Tensor. More... | |
void | flattenTiledTensor (TiledTensor &tiledTensor, Tensor *destTensor) |
Copies the data from each tile in a TiledTensor into a destination Tensor as a contiguous block of memory, as if only one dimension ever existed. | |
Tensor * | concatTensors (std::vector< Tensor * > inputTensors, int concatDim, Workspace *workspace) |
Concatenates Tensors on the specified dimension into one single tensor. | |
template<typename DType > | |
void | printTensorElement (std::ostream &os, const DType *data, int index) |
template<typename DType > | |
void | writeTensorToOstream (std::ostream &os, const Tensor &tensor) |
Pretty-print a Tensor's name, shape, and contents to the provided ostream. | |
std::string | getTraceName (int accelIdx) |
Return the name of the dynamic trace for this accelerator. More... | |
void | mapArrayToAccel (unsigned reqCode, const char *arrayName, void *baseAddr, size_t size) |
Maps an array of data to the accelerator. More... | |
void | setArrayMemTypeIfSimulating (unsigned reqCode, const char *arrayName, MemoryType memType) |
Sets what memory access mechanism the accelerator will use when accessing this array. More... | |
template<typename Kernel , typename... Args> | |
void | invokeKernel (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args) |
The generic blocking interface for all accelerator kernel functions. More... | |
template<typename Kernel , typename... Args> | |
void | invokeKernel (unsigned reqCode, const Kernel &kernel, Args &&... args) |
A generic interface for all accelerator kernel functions. More... | |
template<typename Kernel , typename... Args> | |
std::unique_ptr< volatile int > | invokeKernelNoBlock (int accelIdx, unsigned reqCode, const Kernel &kernel, Args &&... args) |
A generic non-blocking interface to accelerated kernel functions. More... | |
void | convertNchwToNhwc (Tensor *input, Tensor *output) |
void | convertNhwcToNchw (Tensor *input, Tensor *output) |
void | flatten (Tensor *input, Tensor *output) |
void | transpose3D (Tensor *input, Tensor *output) |
void | transpose2D (Tensor *input, Tensor *output) |
template<typename DType > | |
void | convertNchwToNhwcImpl (Tensor *input, Tensor *output) |
template<typename DType > | |
void | convertNhwcToNchwImpl (Tensor *input, Tensor *output) |
template<typename DType > | |
void | flattenImpl (Tensor *input, Tensor *output) |
template<typename DType > | |
void | transpose3DImpl (Tensor *input, Tensor *output) |
template<typename DType > | |
void | transpose2DImpl (Tensor *input, Tensor *output) |
std::normal_distribution< float > | normalDist (kMean, kVar) |
void | fillTensorWithRandomData (Tensor *tensor) |
This fills the Tensor with normally distributed random values. | |
void | fillTensorWithFixedData (Tensor *tensor) |
This fills the Tensor with a fixed data pattern. More... | |
void | verifyTensorWithFixedData (Tensor *tensor, int valueOffset) |
Verify that the provided Tensor's data matches the fixed pattern produced by fillTensorWithFixedData, with the provided offset to each value. | |
void | initDebugStream (int debugLevel) |
Initializes the global debug stream for the given debug level. | |
const DebugStream & | dout (int debugLevel) |
Returns a DebugStream instance for the given debug level. | |
void * | malloc_aligned (size_t size, bool zeroOut=false) |
Return heap-allocated cacheline-aligned memory. | |
std::string | dataLayoutToStr (DataLayout layout) |
Get the string version of DataLayout. | |
int | calc_padding (int value, unsigned alignment) |
Return the difference between value and the next multiple of alignment. | |
template<typename T > | |
int | product (std::vector< T > array) |
template<typename T > | |
std::vector< T > | sum (std::vector< T > array0, std::vector< T > array1) |
Returns the elementwise-sum of the two arrays, which must be of the same size. | |
template<typename T > | |
void | variadicToVector (std::vector< T > &vector, T elem) |
template<typename T , typename... Args> | |
void | variadicToVector (std::vector< T > &vector, T e, Args... elems) |
Populates a std::vector with an arbitrary number of elements. | |
template<typename T , typename... Args> | |
std::array< T, sizeof...(Args)+1 > | variadicToArray (T i, Args... elems) |
Returns a std::array populated with the given elements. More... | |
Variables | |
bool | runningInSimulation |
This is true if the user chooses to run the network in gem5 simulation. | |
bool | fastForwardMode = true |
True if we are simulating in fast-forward mode. | |
int | numAcceleratorsAvailable |
The actual number of accelerator complexes currently in use. | |
ThreadPool * | threadPool = nullptr |
The user-space thread pool used by SMAUG to run multithreaded tasks. | |
bool | useSystolicArrayWhenAvailable |
If true, uses the systolic array for applicable operators when backend support exists. | |
constexpr const int | maxNumAccelerators = 8 |
The maximum number of accelerators an operator's work can be split across. More... | |
constexpr const char * | kLayerFormat = "%-40s %-25s %=15d\n" |
constexpr float | kMargin = 0.001 |
Sets the absolute value by which a result can differ from Approx's expected value. | |
constexpr float | kEpsilon = 0.01 |
Set the percentage by which a result can differ from Approx's expected value. | |
BatchNormOp | |
ReferenceBackend | |
ConvolutionOp | |
DepthwiseConvolutionOp | |
EltwiseAddOp | |
EltwiseMulOp | |
EluOp | |
SeluOp | |
GreaterOp | |
GreaterEqualOp | |
InnerProductOp | |
LessOp | |
LessEqualOp | |
MaxPoolingOp | |
AvgPoolingOp | |
ReluOp | |
SigmoidOp | |
constexpr float | kMean = 0.0 |
constexpr float | kVar = 0.1 |
std::default_random_engine | generator |
constexpr float | kFraction = 0.1 |
SoftmaxOp | |
TanhOp | |
HardTanhOp | |
The smaug namespace is the parent namespace of all C++ code in SMAUG.
enum smaug::BackendName |
Network * smaug::buildNetwork | ( | const std::string & | modelTopoFile, |
const std::string & | modelParamsFile, | ||
SamplingInfo & | sampling, | ||
Workspace * | workspace | ||
) |
buildNetwork reads the specified model topology and parameters protobufs and simulation sampling directives and returns a populated Network that can be run.
modelTopoFile | The path to the model topology protobuf. |
modelParamsFile | The path to the model parameters protobuf, which contains values for all tensors in the network (weights and inputs). |
sampling | Level of simulation sampling to apply to applicable kernels. |
workspace | Pointer to the global Workspace holding all tensors and operators. |
Definition at line 370 of file network_builder.cpp.
void smaug::copyRawTensorData | ( | Tensor * | dest, |
Tensor * | src, | ||
int | destOffset, | ||
int | srcOffset, | ||
int | copySize | ||
) |
Directly copies a linear region of memory from dest to src, without taking dimensions/padding into account.
dest | Destination Tensor |
src | Source Tensor |
destOffset | The linear offset into the destination where data will be copied to. |
srcOffset | The linear offset into the source where data will be copied from. |
copySize | The size of the region in elements. |
Definition at line 138 of file tensor_utils.cpp.
void smaug::copyTensorRegion | ( | Tensor * | dest, |
Tensor * | src, | ||
std::vector< int > | destOrigin, | ||
std::vector< int > | srcOrigin, | ||
std::vector< int > | regionSize | ||
) |
Copies a region of a source Tensor to a corresponding region in a destination Tensor.
The two Tensors are expected to share the same layout. Region origins and sizes are all specified in elements (not bytes) and in accordance with the data layout.
For example: tensorA
: 4x4, tensor B: 3x3 To copy upper left 2x2 block of tensorA
to the lower left 2x2 block of * tensorB
: copyTensorRegion(tensorB, tensorA, {1,1}, {0,0}, {2,2})
dest | Destination Tensor |
src | Source Tensor |
destOrigin | The start of the copied region in the destination. |
srcOrigin | The start of the copied region in the source. |
regionSize | The size of the region. |
Definition at line 65 of file tensor_utils.cpp.
void smaug::fillTensorWithFixedData | ( | Tensor * | tensor | ) |
This fills the Tensor with a fixed data pattern.
The Tensor should be in NWCH data layout. Each channel dimension is initialized with a different value, but each batch/row/col will share this same pattern
Definition at line 22 of file smv_test_common.cpp.
TiledTensor smaug::generateTiledTensor | ( | Tensor * | tensor, |
const TensorShape & | tileShape, | ||
Operator * | op, | ||
bool | copyData = false |
||
) |
Generates a TiledTensor from a source Tensor.
This does not support generating tiles with overlap, striding, or padding options.
tensor | The Tensor to tile. |
tileShape | The maximum size of each tile. |
op | The Operator that will be consuming this TiledTensor. |
copyData | Whether to copy data from the source tensor into the tiles. |
Definition at line 335 of file tensor_utils.cpp.
TiledTensor smaug::generateTiledTensorPerBatchNC | ( | Tensor * | tensor, |
const TensorShape & | tileShape, | ||
Operator * | op, | ||
bool | copyData = true |
||
) |
Tile the provided NC Tensor per batch.
The only requirement is to tile the Tensor in contiguous blocks of tileShape, without concern for strides, overlap, or padding. Thus, this is usually useful only for unary and elementwise operators.
tensor | The Tensor to tile. |
tileShape | The maximum size of each tile. |
op | The Operator that will be consuming this TiledTensor. |
copyData | Whether to copy data from the source tensor into the tiles. |
Definition at line 199 of file tensor_utils.cpp.
TiledTensor smaug::generateTiledTensorWithStrideAndPadding | ( | Tensor * | tensor, |
const TensorShape & | tileShape, | ||
Operator * | op, | ||
int | fieldRows, | ||
int | fieldCols, | ||
int | rowStride, | ||
int | colStride, | ||
PaddingType | paddingType, | ||
bool | copyData = false |
||
) |
Generates a TiledTensor from a source Tensor with the specified tile shape.
Depending on the operator that needs this TiledTensor, tiles may need to overlap each other (e.g. for a convolutional filter window).
tensor | The Tensor to tile. |
tileShape | The maximum size of each tile. |
op | The Operator that will be consuming this TiledTensor. |
fieldRows | Number of rows of a filter applied, if any. |
fieldCols | Number of columns of a filter applied, if any. |
rowStride | The row stride of a filter applied, if any. |
colStride | The column stride of a filter applied, if any. |
paddingType | The type of additional zero-padding applied on the Tensor by the Operator, if any. |
copyData | Whether to copy data from the source tensor into the tiles. |
Definition at line 233 of file tensor_utils.cpp.
std::string smaug::getTraceName | ( | int | accelIdx | ) |
Return the name of the dynamic trace for this accelerator.
accelIdx | The ID of this accelerator. |
Definition at line 6 of file common.cpp.
void smaug::invokeKernel | ( | int | accelIdx, |
unsigned | reqCode, | ||
const Kernel & | kernel, | ||
Args &&... | args | ||
) |
The generic blocking interface for all accelerator kernel functions.
All accelerated kernels should be called via this interface, and different things will happen based on how the program is being run:
This is a blocking call: in gem5-Aladdin mode, the thread will wait until the accelerator finishes. For a non-blocking call, use invokeKernelNoBlock.
accelIdx | Setes the suffix of the dynamic trace to XXX_acc[accelIdx]. Used if you want to generate multiple independent traces to simulate multiple accelerators. |
reqCode | The ID of the accelerator to invoke. |
kernel | The kernel function to invoke in native/LLVM-Tracer mode. |
args | The arguments to the kernel function. |
void smaug::invokeKernel | ( | unsigned | reqCode, |
const Kernel & | kernel, | ||
Args &&... | args | ||
) |
std::unique_ptr<volatile int> smaug::invokeKernelNoBlock | ( | int | accelIdx, |
unsigned | reqCode, | ||
const Kernel & | kernel, | ||
Args &&... | args | ||
) |
A generic non-blocking interface to accelerated kernel functions.
The only difference between this and invokeKernel is that in gem5-Aladdin mode, the thread will start Aladdin and then return immediately. The calling thread is responsible for checking the status of the accelerator and taking action appropriately.
void smaug::mapArrayToAccel | ( | unsigned | reqCode, |
const char * | arrayName, | ||
void * | baseAddr, | ||
size_t | size | ||
) |
Maps an array of data to the accelerator.
This enables the accelerator to access host memory via DMA or caching memory accesses.
reqCode | The ID of the accelerator |
arrayName | The name of the array as it appears in the top-level accelerator function signature. |
baseAddr | The base address of the array (e.g. &array[0]). |
size | The size of the array. |
Definition at line 12 of file common.cpp.
void smaug::setArrayMemTypeIfSimulating | ( | unsigned | reqCode, |
const char * | arrayName, | ||
MemoryType | memType | ||
) |
Sets what memory access mechanism the accelerator will use when accessing this array.
This lets the user decide at runtime whether to access a hots array over DMA, hardware caching, or ACP.
reqCode | The ID of the accelerator |
arrayName | The name of the array as it appears in the accelerator's function signature. |
memType | The memory access mechanism. |
Definition at line 21 of file common.cpp.
std::array<T, sizeof...(Args) + 1> smaug::variadicToArray | ( | T | i, |
Args... | elems | ||
) |