jcuda.utils
Class KernelLauncher

java.lang.Object
  extended by jcuda.utils.KernelLauncher

public class KernelLauncher
extends java.lang.Object

This is a utility class that simplifies the setup and launching of CUDA kernels using the JCuda Driver API.

Instances of this class may be created using one of the following methods:


These instances may then be used to call a kernel function with the call(Object...) method. The actual kernel function arguments which are passed to this method will be set up automatically, and aligned appropriately for their respective size.

The setup of the execution may be performed similarly as the invocation of a kernel when using the Runtime API in C. Such a call has the form
    kernel<<<gridDim, blockDim, sharedMemorySize, stream>>>(...);
where Similarly, the KernelLauncher allows specifying these parameters in the setup(dim3, dim3, int, CUstream) method:

    kernelLauncher.setup(gridDim, blockDim, sharedMemorySize, stream).call(...);

When default values for some of the parameters should be used, one of the overloaded versions of the setup method may be called:

    kernelLauncher.setup(gridDim, blockDim).call(kernel);

The parameters may also be set individually:

    kernelLauncher.setGridSize(gridSize);
    kernelLauncher.setBlockSize(blockSize);
    kernelLauncher.call(...);


Method Summary
 void call(java.lang.Object... args)
          Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.
static KernelLauncher compile(java.lang.String sourceCode, java.lang.String functionName, java.lang.String... nvccArguments)
          Create a new KernelLauncher for the function with the given name, that is defined in the given source code.
static KernelLauncher create(java.lang.String cuFileName, java.lang.String functionName, boolean forceRebuild, java.lang.String... nvccArguments)
          Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
static KernelLauncher create(java.lang.String cuFileName, java.lang.String functionName, java.lang.String... nvccArguments)
          Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
 KernelLauncher forFunction(java.lang.String functionName)
          Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function.
 jcuda.driver.CUmodule getModule()
          Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed.
static KernelLauncher load(java.io.InputStream moduleInputStream, java.lang.String functionName)
          Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.
static KernelLauncher load(java.lang.String moduleFileName, java.lang.String functionName)
          Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.
 KernelLauncher setBlockSize(int x, int y, int z)
          Set the block size (number of threads per block) for the function call.
static void setCompilerPath(java.lang.String path)
          Set the path to the NVCC compiler.
static void setDeviceNumber(int number)
          Set the number (index) of the device which should be used by the KernelLauncher
 KernelLauncher setGridSize(int x, int y)
          Set the grid size (number of blocks per grid) for the function call.
 KernelLauncher setGridSize(int x, int y, int z)
          Set the grid size (number of blocks per grid) for the function call.
 KernelLauncher setSharedMemSize(int sharedMemSize)
          Set the size of the shared memory for the function call.
 KernelLauncher setStream(jcuda.driver.CUstream stream)
          Set the stream for the function call.
 KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize)
          Set the given grid size and block size for this KernelLauncher.
 KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize)
          Set the given grid size and block size and shared memory size for this KernelLauncher.
 KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream)
          Set the given grid size and block size, shared memory size and stream for this KernelLauncher.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

setCompilerPath

public static void setCompilerPath(java.lang.String path)
Set the path to the NVCC compiler. For example:
setCompilerPath("C:/CUDA/bin");
By default, this path is empty, assuming that the compiler is in a path that is visible via an environment variable.

Parameters:
path - The path to the NVCC compiler.

setDeviceNumber

public static void setDeviceNumber(int number)
Set the number (index) of the device which should be used by the KernelLauncher

Parameters:
number - The number of the device to use
Throws:
jcuda.CudaException - If number<0 or number>=deviceCount

compile

public static KernelLauncher compile(java.lang.String sourceCode,
                                     java.lang.String functionName,
                                     java.lang.String... nvccArguments)
Create a new KernelLauncher for the function with the given name, that is defined in the given source code.

The source code is stored in a temporary .CU CUDA source file, and a PTX file is compiled from this source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.

Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

extern "C"
__global__ void functionName(...)
{
...
}

Parameters:
sourceCode - The source code containing the function
functionName - The name of the function.
nvccArguments - Optional arguments for the NVCC
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the creation of the CU- or PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
See Also:
create(String, String, String...), create(String, String, boolean, String...), load(String, String)

create

public static KernelLauncher create(java.lang.String cuFileName,
                                    java.lang.String functionName,
                                    java.lang.String... nvccArguments)
Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

extern "C"
__global__ void functionName(...)
{
...
}

The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.

Parameters:
cuFileName - The name of the source file.
functionName - The name of the function.
nvccArguments - Optional arguments for the NVCC
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
See Also:
compile(String, String, String...), create(String, String, boolean, String...), load(String, String), load(InputStream, String)

create

public static KernelLauncher create(java.lang.String cuFileName,
                                    java.lang.String functionName,
                                    boolean forceRebuild,
                                    java.lang.String... nvccArguments)
Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

extern "C"
__global__ void functionName(...)
{
...
}

The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

If the forceRebuild flag is 'true', then the PTX file will be recompiled from the given source file, even if it already existed or was newer than the source file, and the already existing PTX file will be overwritten.

The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.

Parameters:
cuFileName - The name of the source file.
functionName - The name of the function.
forceRebuild - Whether the PTX file should be recompiled and overwritten if it already exists.
nvccArguments - Optional arguments for the NVCC
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
See Also:
compile(String, String, String...), create(String, String, String...), load(String, String), load(InputStream, String)

load

public static KernelLauncher load(java.lang.String moduleFileName,
                                  java.lang.String functionName)
Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.

Parameters:
moduleFileName - The name of the PTX- or CUBIN file
functionName - The name of the function
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
See Also:
compile(String, String, String...), create(String, String, boolean, String...), load(String, String), load(InputStream, String)

load

public static KernelLauncher load(java.io.InputStream moduleInputStream,
                                  java.lang.String functionName)
Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.

Parameters:
moduleInputStream - The stream for the PTX- or CUBIN data
functionName - The name of the function
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
See Also:
compile(String, String, String...), create(String, String, boolean, String...), load(String, String), load(InputStream, String)

forFunction

public KernelLauncher forFunction(java.lang.String functionName)
Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function. All parameters (grid size, block size, shared memory size and stream) of the returned KernelLauncher will be independent of 'this' one and initially contain the default values.

Parameters:
functionName - The name of the function
Returns:
The KernelLauncher for the specified function
Throws:
jcuda.CudaException - If the specified function can not be obtained from the module of this KernelLauncher.

getModule

public jcuda.driver.CUmodule getModule()
Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed. This module may also be used to access symbols and texture references. However, clients should not modify or unload the module.

Returns:
The CUmodule

setGridSize

public KernelLauncher setGridSize(int x,
                                  int y)
Set the grid size (number of blocks per grid) for the function call.

This corresponds to the first parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default grid size is (1,1,1)

Parameters:
x - The number of blocks per grid in x-direction
y - The number of blocks per grid in y-direction
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setGridSize

public KernelLauncher setGridSize(int x,
                                  int y,
                                  int z)
Set the grid size (number of blocks per grid) for the function call.

This corresponds to the first parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default grid size is (1,1,1)

Parameters:
x - The number of blocks per grid in x-direction
y - The number of blocks per grid in y-direction
z - The number of blocks per grid in z-direction
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setBlockSize

public KernelLauncher setBlockSize(int x,
                                   int y,
                                   int z)
Set the block size (number of threads per block) for the function call.

This corresponds to the second parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default block size is (1,1,1)

Parameters:
x - The number of threads per block in x-direction
y - The number of threads per block in y-direction
z - The number of threads per block in z-direction
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setSharedMemSize

public KernelLauncher setSharedMemSize(int sharedMemSize)
Set the size of the shared memory for the function call.

This corresponds to the third parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default shared memory size is 0.

Parameters:
sharedMemSize - The size of the shared memory, in bytes
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setStream

public KernelLauncher setStream(jcuda.driver.CUstream stream)
Set the stream for the function call.

This corresponds to the fourth parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default stream is null (0).

Parameters:
stream - The stream for the function call
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setup

public KernelLauncher setup(jcuda.runtime.dim3 gridSize,
                            jcuda.runtime.dim3 blockSize)
Set the given grid size and block size for this KernelLauncher.

Parameters:
gridSize - The grid size (number of blocks per grid)
blockSize - The block size (number of threads per block)
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3, int), setup(dim3, dim3, int, CUstream)

setup

public KernelLauncher setup(jcuda.runtime.dim3 gridSize,
                            jcuda.runtime.dim3 blockSize,
                            int sharedMemSize)
Set the given grid size and block size and shared memory size for this KernelLauncher.

Parameters:
gridSize - The grid size (number of blocks per grid)
blockSize - The block size (number of threads per block)
sharedMemSize - The size of the shared memory
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int, CUstream)

setup

public KernelLauncher setup(jcuda.runtime.dim3 gridSize,
                            jcuda.runtime.dim3 blockSize,
                            int sharedMemSize,
                            jcuda.driver.CUstream stream)
Set the given grid size and block size, shared memory size and stream for this KernelLauncher.

Parameters:
gridSize - The grid size (number of blocks per grid)
blockSize - The block size (number of threads per block)
sharedMemSize - The size of the shared memory
stream - The stream for the kernel invocation
Returns:
This instance
See Also:
call(Object...), setup(dim3, dim3), setup(dim3, dim3, int)

call

public void call(java.lang.Object... args)
Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.

The given arguments must all be either of the type Pointer, or of a primitive type except boolean. Otherwise, a CudaException will be thrown.

Parameters:
args - The arguments for the function call
Throws:
jcuda.CudaException - if an argument with an invalid type was given, or one of the internal functions for setting up and executing the kernel failed.