Skip to content

InfiniTensor/ninetoothed

Repository files navigation

NineToothed

NineToothed Logo

Document PyPI Version License

NineToothed is a Triton-based domain-specific language (DSL). By introducing tensor-oriented meta-programming (TOM), it makes writing high-performance GPU kernels easier.

Installation

We can use pip to install ninetoothed.

pip install ninetoothed

After successfully running the above command, ninetoothed will be installed. However, to fully utilize its capabilities, you also need to install a deep learning framework supported by ninetoothed. For trial purposes, we recommend installing torch.

Usage

Thanks to tensor-oriented meta-programming, NineToothed can be written using the arrange-and-apply paradigm, which involves separately defining arrangement, application, and tensors, and then integrating them using ninetoothed.make to generate the kernel.

Matrix Multiplication

Here is the code we need for matrix multiplication:

import ninetoothed
import ninetoothed.language as ntl
from ninetoothed import Tensor, block_size

BLOCK_SIZE_M = block_size()
BLOCK_SIZE_N = block_size()
BLOCK_SIZE_K = block_size()


def arrangement(input, other, output):
    output_arranged = output.tile((BLOCK_SIZE_M, BLOCK_SIZE_N))

    input_arranged = input.tile((BLOCK_SIZE_M, BLOCK_SIZE_K))
    input_arranged = input_arranged.tile((1, -1))
    input_arranged = input_arranged.expand((-1, output_arranged.shape[1]))
    input_arranged.dtype = input_arranged.dtype.squeeze(0)

    other_arranged = other.tile((BLOCK_SIZE_K, BLOCK_SIZE_N))
    other_arranged = other_arranged.tile((-1, 1))
    other_arranged = other_arranged.expand((output_arranged.shape[0], -1))
    other_arranged.dtype = other_arranged.dtype.squeeze(1)

    return input_arranged, other_arranged, output_arranged


def application(input, other, output):
    accumulator = ntl.zeros(output.shape, dtype=ntl.float32)

    for k in range(input.shape[0]):
        accumulator += ntl.dot(input[k], other[k])

    output = accumulator


tensors = (Tensor(2), Tensor(2), Tensor(2))

kernel = ninetoothed.make(arrangement, application, tensors)

Useful Links

License

This project is distributed under the Apache-2.0 license. See the included LICENSE file for details.

About

A domain-specific language (DSL) based on Triton but providing higher-level abstractions.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages