CUTLASS: Fast Linear Algebra in CUDA C++

in machinelearning •  7 years ago 

Hi ! I saw on the NVidia site this interesting new library for the Matrix Multiplication called CUTLASS.

They have done a great job from a performance point of view there, it will be nice for several applications especially Machine Learning.

The library is versatile, it is built on purpose to implement in details specializations for your application.
Tiling and Software Pipelining are used to increase the performance while taking in account the architecture and the memory hierarchy:


Check out this other post too about Tensor Cores: https://devblogs.nvidia.com/parallelforall/programming-tensor-cores-cuda-9/

Probably in a few weeks or months it will be integrated in a few major libraries.

Follows the links down below:

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

so nice educational informations

This post has received a 1.77 % upvote from @buildawhale thanks to: @boucaron. Send at least 1 SBD to @buildawhale with a post link in the memo field for a portion of the next vote.

To support our daily curation initiative, please vote on my owner, @themarkymark, as a Steem Witness