jcuda.runtime
Class JCuda

java.lang.Object
  extended by jcuda.runtime.JCuda

public class JCuda
extends java.lang.Object

Java bindings for the NVidia CUDA runtime API.

Most comments are taken from the CUDA reference manual


Field Summary
static int cudaDeviceBlockingSync
          Device flag - Use blocking synchronization
static int cudaDeviceLmemResizeToMax
          Device flag - Keep local memory allocation after launch
static int cudaDeviceMapHost
          Device flag - Support mapped pinned allocations
static int cudaDeviceMask
          Device flags mask
static int cudaDeviceScheduleAuto
          Device flag - Automatic scheduling
static int cudaDeviceScheduleSpin
          Device flag - Spin default scheduling
static int cudaDeviceScheduleYield
          Device flag - Yield default scheduling
static int cudaEventBlockingSync
          Event uses blocking synchronization
static int cudaEventDefault
          Default event flag
static int cudaHostAllocDefault
          Default page-locked allocation flag
static int cudaHostAllocMapped
          Map allocation into device space
static int cudaHostAllocPortable
          Pinned memory accessible by all CUDA contexts
static int cudaHostAllocWriteCombined
          Write-combined memory
 
Method Summary
static int cudaBindTexture(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long size)
          Binds size bytes of the memory area pointed to by devPtr to texture reference texRef.
static int cudaBindTexture2D(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long width, long height, long pitch)
          Binds the 2D memory area pointed to by devPtr to the texture reference texref.
static int cudaBindTextureToArray(textureReference texref, cudaArray array, cudaChannelFormatDesc desc)
          Binds the CUDA array array to texture reference texRef.
static int cudaChooseDevice(int[] device, cudaDeviceProp prop)
          Select compute-device which best matches criteria.
static int cudaConfigureCall(dim3 gridDim, dim3 blockDim, long sharedMem, cudaStream_t stream)
          Configure a device-launch.
static cudaChannelFormatDesc cudaCreateChannelDesc(int x, int y, int z, int w, int cudaChannelFormatKind_f)
          Returns a channel descriptor.
static int cudaDriverGetVersion(int[] driverVersion)
          Returns in driverVersion the version number of the installed CUDA driver.
static int cudaEventCreate(cudaEvent_t event)
          Creates an event-object.
static int cudaEventCreateWithFlags(cudaEvent_t event, int flags)
          Creates an event object with the specified flags.
static int cudaEventDestroy(cudaEvent_t event)
          Destroys an event-object.
static int cudaEventElapsedTime(float[] ms, cudaEvent_t start, cudaEvent_t end)
          Computes the elapsed time between events.
static int cudaEventQuery(cudaEvent_t event)
          Query if an event has been recorded.
static int cudaEventRecord(cudaEvent_t event, cudaStream_t stream)
          Records an event.
static int cudaEventSynchronize(cudaEvent_t event)
          Wait for an event to be recorded.
static int cudaFree(Pointer devPtr)
          Frees memory on the GPU.
static int cudaFreeArray(cudaArray array)
          Frees an array on the GPU.
static int cudaFreeHost(Pointer ptr)
          Frees page-locked memory.
static int cudaFuncGetAttributes(cudaFuncAttributes attr, java.lang.String func)
          This function obtains the attributes of a function specified via func.
static int cudaGetChannelDesc(cudaChannelFormatDesc desc, cudaArray array)
          Low-level texture API.
static int cudaGetDevice(int[] device)
          Returns which device is currently being used.
static int cudaGetDeviceCount(int[] count)
          Returns the number of compute-capable devices.
static int cudaGetDeviceProperties(cudaDeviceProp prop, int device)
          Returns information on the compute-device.
static java.lang.String cudaGetErrorString(int error)
          Returns the message string from an error.
static int cudaGetLastError()
          Returns the last error from a run-time call.
static int cudaGetSymbolAddress(Pointer devPtr, java.lang.String symbol)
          Finds the address associated with a CUDA symbol.
static int cudaGetSymbolSize(long[] size, java.lang.String symbol)
          Finds the size of the object associated with a CUDA symbol.
static int cudaGetTextureAlignmentOffset(long[] offset, textureReference texref)
          Returns in *offset the offset that was returned when texture reference texRef was bound.
static int cudaGetTextureReference(textureReference texref, java.lang.String symbol)
          Returns in *texRef the structure associated to the texture reference defined by symbol symbol.
static int cudaGLMapBufferObject(Pointer devPtr, int bufObj)
          Deprecated. As of CUDA 3.0
static int cudaGLMapBufferObjectAsync(Pointer devPtr, int bufObj, cudaStream_t stream)
          Deprecated. As of CUDA 3.0
static int cudaGLRegisterBufferObject(int bufObj)
          Deprecated. As of CUDA 3.0
static int cudaGLSetBufferObjectMapFlags(int bufObj, int flags)
          Deprecated. As of CUDA 3.0
static int cudaGLSetGLDevice(int device)
          Sets the CUDA device for use with GL Interopability.
static int cudaGLUnmapBufferObject(int bufObj)
          Deprecated. As of CUDA 3.0
static int cudaGLUnmapBufferObjectAsync(int bufObj, cudaStream_t stream)
          Deprecated. As of CUDA 3.0
static int cudaGLUnregisterBufferObject(int bufObj)
          Deprecated. As of CUDA 3.0
static int cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource, int buffer, int Flags)
          Registers the buffer object specified by buffer for access by CUDA.
static int cudaGraphicsGLRegisterImage(cudaGraphicsResource resource, int image, int target, int Flags)
          Registers the texture or renderbuffer object specified by image for access by CUDA. target must match the type of the object.
static int cudaGraphicsMapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
          Maps the count graphics resources in resources for access by CUDA.
static int cudaGraphicsResourceGetMappedPointer(Pointer devPtr, long[] size, cudaGraphicsResource resource)
          Returns in *devPtr a pointer through which the mapped graphics resource resource may be accessed.
static int cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource, int flags)
          Set flags for mapping the graphics resource resource.
static int cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr, cudaGraphicsResource resource, int arrayIndex, int mipLevel)
          Returns in *array an array through which the subresource of the mapped graphics resource resource which corresponds to array index arrayIndex and mipmap level mipLevel may be accessed.
static int cudaGraphicsUnmapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
          Unmaps the count graphics resources in resources.
static int cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
          Unregisters the graphics resource resource so it is not accessible by CUDA unless registered again.
static int cudaHostAlloc(Pointer ptr, long size, int flags)
          Allocates count bytes of host memory that is page-locked and accessible to the device.
static int cudaHostGetDevicePointer(Pointer pDevice, Pointer pHost, int flags)
          Passes back the device pointer corresponding to the mapped, pinned host buffer allocated by cudaHostAlloc().
static int cudaLaunch(java.lang.String symbol)
          Launches a device function.
static int cudaMalloc(Pointer devPtr, long size)
          Allocate memory on the GPU.
static int cudaMalloc3D(cudaPitchedPtr pitchDevPtr, cudaExtent extent)
          Allocates logical 1D, 2D, or 3D memory objects on the GPU.
static int cudaMalloc3DArray(cudaArray arrayPtr, cudaChannelFormatDesc desc, cudaExtent extent)
          Allocate an array on the GPU.
static int cudaMallocArray(cudaArray array, cudaChannelFormatDesc desc, long width, long height)
          Allocate an array on the GPU.
static int cudaMallocHost(Pointer ptr, long size)
          Allocates page-locked memory on the host.
static int cudaMallocPitch(Pointer devPtr, long[] pitch, long width, long height)
          Allocates memory on the GPU.
static int cudaMemcpy(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind)
          Copies data between GPU and host.
static int cudaMemcpy2D(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpy2DArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long width, long height, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpy2DAsync(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpy2DFromArray(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpy2DFromArrayAsync(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpy2DToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpy2DToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpy3D(cudaMemcpy3DParms p)
          Copies data between between 3D objects.
static int cudaMemcpy3DAsync(cudaMemcpy3DParms p, cudaStream_t stream)
           
static int cudaMemcpyArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long count, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpyAsync(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpyFromArray(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpyFromArrayAsync(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpyFromSymbol(Pointer dst, java.lang.String symbol, long count, long offset, int cudaMemcpyKind_kind)
          Copies data from GPU to host memory.
static int cudaMemcpyFromSymbolAsync(Pointer dst, java.lang.String symbol, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpyToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind)
          Copies data between host and device.
static int cudaMemcpyToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemcpyToSymbol(java.lang.String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind)
          Copies data from host memory to GPU.
static int cudaMemcpyToSymbolAsync(java.lang.String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
           
static int cudaMemset(Pointer mem, int c, long count)
          Initializes or sets GPU memory to a value.
static int cudaMemset2D(Pointer mem, long pitch, int c, long width, long height)
          Initializes or sets GPU memory to a value.
static int cudaMemset3D(cudaPitchedPtr pitchDevPtr, int value, cudaExtent extent)
          Initializes or sets GPU memory to a value.
static int cudaRuntimeGetVersion(int[] runtimeVersion)
          Returns in runtimeVersion the version number of the installed CUDA Runtime.
static int cudaSetDevice(int device)
          Sets device to be used for GPU executions.
static int cudaSetDeviceFlags(int flags)
          Records flags as the flags to use when the active host thread executes device code.
static int cudaSetupArgument(Pointer arg, long size, long offset)
          Configure a device-launch.
static int cudaSetValidDevices(int[] device_arr, int len)
          Sets a list of devices for CUDA execution in priority order using device_arr.
static int cudaStreamCreate(cudaStream_t stream)
          Create an async stream.
static int cudaStreamDestroy(cudaStream_t stream)
          Destroys and cleans-up a stream object.
static int cudaStreamQuery(cudaStream_t stream)
          Queries a stream for completion-status.
static int cudaStreamSynchronize(cudaStream_t stream)
          Waits for stream tasks to complete.
static int cudaThreadExit()
          Exit and clean-up from CUDA launches.
static int cudaThreadSynchronize()
          Wait for compute-device to finish.
static int cudaUnbindTexture(textureReference texref)
          Unbinds the texture bound to texture reference texRef.
static void initialize()
          Initializes the native library.
static void setEmulation(boolean emulation)
          Deprecated. The emulation mode has been deprecated in CUDA 3.0. This function no longer has any effect, and will be removed in the next release.
static void setExceptionsEnabled(boolean enabled)
          Enables or disables exceptions.
static void setLogLevel(LogLevel logLevel)
          Set the specified log level for the JCuda runtime library.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cudaHostAllocDefault

public static final int cudaHostAllocDefault
Default page-locked allocation flag

See Also:
Constant Field Values

cudaHostAllocPortable

public static final int cudaHostAllocPortable
Pinned memory accessible by all CUDA contexts

See Also:
Constant Field Values

cudaHostAllocMapped

public static final int cudaHostAllocMapped
Map allocation into device space

See Also:
Constant Field Values

cudaHostAllocWriteCombined

public static final int cudaHostAllocWriteCombined
Write-combined memory

See Also:
Constant Field Values

cudaEventDefault

public static final int cudaEventDefault
Default event flag

See Also:
Constant Field Values

cudaEventBlockingSync

public static final int cudaEventBlockingSync
Event uses blocking synchronization

See Also:
Constant Field Values

cudaDeviceScheduleAuto

public static final int cudaDeviceScheduleAuto
Device flag - Automatic scheduling

See Also:
Constant Field Values

cudaDeviceScheduleSpin

public static final int cudaDeviceScheduleSpin
Device flag - Spin default scheduling

See Also:
Constant Field Values

cudaDeviceScheduleYield

public static final int cudaDeviceScheduleYield
Device flag - Yield default scheduling

See Also:
Constant Field Values

cudaDeviceBlockingSync

public static final int cudaDeviceBlockingSync
Device flag - Use blocking synchronization

See Also:
Constant Field Values

cudaDeviceMapHost

public static final int cudaDeviceMapHost
Device flag - Support mapped pinned allocations

See Also:
Constant Field Values

cudaDeviceLmemResizeToMax

public static final int cudaDeviceLmemResizeToMax
Device flag - Keep local memory allocation after launch

See Also:
Constant Field Values

cudaDeviceMask

public static final int cudaDeviceMask
Device flags mask

See Also:
Constant Field Values
Method Detail

setEmulation

public static void setEmulation(boolean emulation)
Deprecated. The emulation mode has been deprecated in CUDA 3.0. This function no longer has any effect, and will be removed in the next release.

Set the flag which indicates whether a call to any function should initialize the JCuda runtime API in emulation mode.


setLogLevel

public static void setLogLevel(LogLevel logLevel)
Set the specified log level for the JCuda runtime library.

Currently supported log levels:
LOG_QUIET: Never print anything
LOG_ERROR: Print error messages
LOG_TRACE: Print a trace of all native function calls

Parameters:
logLevel - The log level to use.

setExceptionsEnabled

public static void setExceptionsEnabled(boolean enabled)
Enables or disables exceptions. By default, the methods of this class only return the cudaError error code from the underlying CUDA function. If exceptions are enabled, a CudaException with a detailed error message will be thrown if a method is about to return a result code that is not cudaError.cudaSuccess

Parameters:
enabled - Whether exceptions are enabled

initialize

public static void initialize()
Initializes the native library. Note that this method does not have to be called explicitly by the user of the library: The library will automatically be initialized with the first function call.


cudaGetDeviceCount

public static int cudaGetDeviceCount(int[] count)
Returns the number of compute-capable devices.

SYNOPSIS
cudaError_t cudaGetDeviceCount( int* count )

DESCRIPTION
Returns in *count the number of devices with compute capability greater or equal to 1.0 that are available for execution. If there is no such device, cudaGetDeviceCount() returns 1 and device 0 only supports device emulation mode. Since this device will be able to emulate all hardware features, this device will report major and minor compute capability versions of 9999.

Returns:
cudaSuccess,
See Also:
cudaGetDevice(int[]), cudaSetDevice(int), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaSetDevice

public static int cudaSetDevice(int device)
Sets device to be used for GPU executions.

SYNOPSIS
cudaError_t cudaSetDevice( int dev )

DESCRIPTION
Records dev as the device on which the active host thread executes the device code.

Returns:
cudaSuccess, cudaErrorInvalidDevice,
See Also:
cudaGetDeviceCount(int[]), cudaGetDevice(int[]), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaSetDeviceFlags

public static int cudaSetDeviceFlags(int flags)
Records flags as the flags to use when the active host thread executes device code. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions, this call returns cudaErrorSetOn- ActiveProcess.
The two LSBs of the flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.
- cudaDeviceScheduleAuto: The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
- cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
- cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
- cudaDeviceBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
- cudaDeviceMapHost: This flag must be set in order to allocate pinned host memory that is accessible to the device. If this flag is not set, cudaHostGetDevicePointer() will always return a failure code.

Parameters:
flags - - Parameters for device operation
Returns:
cudaSuccess, cudaErrorInvalidDevice, cudaErrorSetOnActiveProcess
See Also:
cudaGetDeviceCount(int[]), cudaGetDevice(int[]), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int), cudaSetDevice(int), cudaSetValidDevices(int[], int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaSetValidDevices

public static int cudaSetValidDevices(int[] device_arr,
                                      int len)
Sets a list of devices for CUDA execution in priority order using device_arr. The parameter len specifies the number of elements in the list. CUDA will try devices from the list sequentially until it finds one that works. If this function is not called, or if it is called with a len of 0, then CUDA will go back to its default behavior of trying devices sequentially from a default list containing all of the available CUDA devices in the system. If a specified device ID in the list does not exist, this function will return cudaErrorInvalidDevice. If len is not 0 and device_arr is NULL or if len is greater than the number of devices in the system, then cudaErrorInvalidValue is returned.

Parameters:
device_arr - - List of devices to try
len - - Number of devices in specified list
Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevice
See Also:
cudaGetDeviceCount(int[]), cudaSetDevice(int), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int), cudaSetDeviceFlags(int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaGetDevice

public static int cudaGetDevice(int[] device)
Returns which device is currently being used.

SYNOPSIS
cudaError_t cudaGetDevice( int* dev )

DESCRIPTION
Returns in *dev the device on which the active host thread executes the device code.

Returns:
cudaSuccess,
See Also:
cudaGetDeviceCount(int[]), cudaSetDevice(int), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaGetDeviceProperties

public static int cudaGetDeviceProperties(cudaDeviceProp prop,
                                          int device)
Returns information on the compute-device.

SYNOPSIS
cudaError_t cudaGetDeviceProperties( struct cudaDeviceProp* prop, int dev )

DESCRIPTION
Returns in *prop the properties of device dev. The cudaDeviceProp structure is defined as:
 struct cudaDeviceProp {
     char name[256];
     size_t totalGlobalMem;
     size_t sharedMemPerBlock;
     int regsPerBlock;
     int warpSize;
     size_t memPitch;
     int maxThreadsPerBlock;
     int maxThreadsDim[3];
     int maxGridSize[3];
     size_t totalConstMem;
     int major;
     int minor;
     int clockRate;
     size_t textureAlignment;
     int deviceOverlap;
     int multiProcessorCount;
 }
 
where: name is an ASCII string identifying the device; totalGlobalMem is the total amount of global memory available on the device in bytes; sharedMemPerBlock is the maximum amount of shared memory available to a thread block in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor; regsPerBlock is the maximum number of 32-bit registers available to a thread block; this number is shared by all thread blocks simultaneously resident on a multiprocessor; warpSize is the warp size in threads; memPitch is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch(); maxThreadsPerBlock is the maximum number of threads per block; maxThreadsDim[3] is the maximum sizes of each dimension of a block; maxGridSize[3] is the maximum sizes of each dimension of a grid; totalConstMem is the total amount of constant memory available on the device in bytes; major, minor are the major and minor revision numbers defining the device’s compute capability; clockRate is the clock frequency in kilohertz; textureAlignment is the alignment requirement; texture base addresses that are aligned to textureAlignment bytes do not need an offset applied to texture fetches; deviceOverlap is 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not; multiProcessorCount is the number of multiprocessors on the device.

Returns:
cudaSuccess, cudaErrorInvalidDevice,
See Also:
cudaGetDeviceCount(int[]), cudaGetDevice(int[]), cudaSetDevice(int), cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)

cudaChooseDevice

public static int cudaChooseDevice(int[] device,
                                   cudaDeviceProp prop)
Select compute-device which best matches criteria.

SYNOPSIS
cudaError_t cudaChooseDevice( int* dev, const struct cudaDeviceProp* prop )

DESCRIPTION
Returns in *dev the device which properties best match *prop.

Returns:
cudaSuccess, cudaErrorInvalidValue,
See Also:
cudaGetDeviceCount(int[]), cudaGetDevice(int[]), cudaSetDevice(int), cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)

cudaMalloc3D

public static int cudaMalloc3D(cudaPitchedPtr pitchDevPtr,
                               cudaExtent extent)
Allocates logical 1D, 2D, or 3D memory objects on the GPU.

SYNOPSIS
 struct cudaPitchedPtr {
     void *ptr;
     size_t pitch;
     size_t xsize;
     size_t ysize;
 };
 
 struct cudaExtent {
     size_t width;
     size_t height;
     size_t depth;
 };
 
cudaError_t cudaMalloc3D( struct cudaPitchedPtr* pitchDevPtr, struct cudaExtent extent )

DESCRIPTION
Allocates at least width*height*depth bytes of linear memory on the device and returns a pitchedDevPtr in which ptr is a pointer to the allocated memory. The function may pad the allocation to ensure hardware alignment requirements are met. The pitch returned in the pitch field of the pitchedDevPtr is the width in bytes of the allocation. The returned cudaPitchedPtr contains additional fields xsize and ysize, the logical width and height of the allocation, which are equivalent to the width and height extent parameters provided by the programmer during allocation. For allocations of 2D, 3D objects, it is highly recommended that programmers perform allocations using cudaMalloc3D() or cudaMallocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays).

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms), cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent), cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaMalloc3DArray

public static int cudaMalloc3DArray(cudaArray arrayPtr,
                                    cudaChannelFormatDesc desc,
                                    cudaExtent extent)
Allocate an array on the GPU.

SYNOPSIS
struct cudaExtent { size_t width; size_t height; size_t depth; }; cudaError_t cudaMalloc3DArray( struct cudaArray** arrayPtr, const struct cudaChannelFormatDesc* desc, struct cudaExtent extent )

DESCRIPTION
Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *arrayPtr. The cudaChannelFormatDesc is defined as:
 struct cudaChannelFormatDesc {
     int x, y, z, w;
     enum cudaChannelFormatKind f;
 };
 
where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKin- dUnsigned, cudaChannelFormatKindFloat.

cudaMalloc3DArray is able to allocate 1D, 2D, or 3D arrays.
A 1D array is allocated if the height and depth extent are both zero. For 1D arrays valid extents are {(1, 8192), 0, 0}.
A 2D array is allocated if only the depth extent is zero. For 2D arrays valid extents are {(1, 65536), (1, 32768), 0}.
A 3D array is allocate if all three extents are non-zero. For 3D arrays valid extents are {(1, 2048), (1, 2048), (1, 2048)}.

Note: That because of the differing extent limits it may be advantageous to use a degenerate array (with unused dimensions set to one) of higher dimensionality. For instance, a degenerate 2D array allows for significantly more linear storage than a 1D array.

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent), cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaMemset3D

public static int cudaMemset3D(cudaPitchedPtr pitchDevPtr,
                               int value,
                               cudaExtent extent)
Initializes or sets GPU memory to a value.

SYNOPSIS
 struct cudaPitchedPtr {
     void *ptr;
     size_t pitch;
     size_t xsize;
     size_t ysize;
 };
 
 struct cudaExtent {
     size_t width;
     size_t height;
     size_t depth;
 };
 
cudaError_t cudaMemset3D( struct cudaPitchedPtr dstPitchPtr, int value, struct cudaExtent extent )

DESCRIPTION
Initializes each element of a 3D array to the specified value value. The object to initialize is defined by dstPitchPtr. The pitch field of dstPitchPtr is the width in memory in bytes of the 3D array pointed to by dstPitchPtr, including any padding added to the end of each row. The xsize field specifies the logical width of each row in bytes, while the ysize field specifies the height of each 2D slice in rows. The extents of the initialized region are specified as a width in bytes, a height in rows, and a depth in slices. Extents with width greater than or equal to the xsize of dstPitchPtr may perform significantly faster than extents narrower than the xsize. Secondarily, extents with height equal to the ysize of dstPitchPtr will perform faster than when the hieght is shorter than the ysize. This function performs fastest when the dstPitchPtr has been allocated by cudaMalloc3D().

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer,
See Also:
cudaMemset(jcuda.Pointer, int, long), cudaMemset2D(jcuda.Pointer, long, int, long, long), cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)

cudaMemcpy3D

public static int cudaMemcpy3D(cudaMemcpy3DParms p)
Copies data between between 3D objects.

SYNOPSIS
 struct cudaExtent {
     size_t width, height, depth;
 };
 
 struct cudaPos {
     size_t x, y, z;
 };
 
 struct cudaMemcpy3DParms {
     struct cudaArray *srcArray;
     struct cudaPos srcPos;
     struct cudaPitchedPtr srcPtr;
     struct cudaArray *dstArray;
     struct cudaPos dstPos;
     struct cudaPitchedPtr dstPtr;
     struct cudaExtent extent;
     enum cudaMemcpyKind kind;
 };
 
cudaError_t cudaMemcpy3D( const struct cudaMemcpy3DParms *p ) cudaError_t cudaMemcpy3DAsync( const struct cudaMemcpy3DParms *p, cudaStream_t stream )

DESCRIPTION
cudaMemcpy3D() copies data between two 3D objects. The source and destination objects may be in either host memory, device memory, or a CUDA array. The source, destination, extent, and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use: cudaMemcpy3DParms myParms = {0}; The struct passed to cudaMemcpy3D() must specify one of srcArray or srcPtr and one of dstArray or dstPtr. Passing more than one non-zero source or destination will cause cudaMemcpy3D() to return an error. The srcPos and dstPos fields are optional offsets into the source and destination objects and are defined in units of each object’s elements. The element for a host or device pointer is assumed to be unsigned char. For CUDA arrays, positions must be in the range [0, 2048) for any dimension. The extent field defines the dimensions of the transferred area in elements. If a CUDA array is participating in the copy the extent is defined in terms of that array’s elements. If no CUDA array is participating in the copy then the extents are defined in elements of unsigned char. The kind field defines the direction of the copy. It must be one of cudaMemcpyHostToHost, cudaMem- cpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice. If the source and destination are both arrays cudaMemcpy3D() will return an error if they do not have the same element size. The source and destination object may not overlap. If overlapping source and destination objects are specified undefined behavior will result. cudaMemcpy3D() returns an error if the pitch of srcPtr or dstPtr is greater than the maximum allowed. The pitch of a cudaPitchedPtr allocated with cudaMalloc3D() will always be valid. cudaMemcpy3DAsync() is an asynchronous copy operation and can optionally be associated to a stream by passing a non-zero stream argument. If either the source or destination is a host object it must be allocated in page-locked memory returned from cudaMallocHost(). It will return an error if a pointer to memory not allocated with cudaMallocHost() is passed as input.

Returns:
cudaSuccess,
See Also:
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent), cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent), cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent), cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpy3DAsync

public static int cudaMemcpy3DAsync(cudaMemcpy3DParms p,
                                    cudaStream_t stream)
See Also:
cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms)

cudaHostAlloc

public static int cudaHostAlloc(Pointer ptr,
                                long size,
                                int flags)
Allocates count bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

The flags parameter enables different options to be specified that affect the allocation, as follows. - cudaHostAllocDefault: This flag’s value is defined to be 0 and causes cudaHostAlloc() to emulate cudaMallocHost().
- cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
- cudaHostAllocMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().
- cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers.

All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.

cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.

The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.

Memory allocated by this function must be freed with cudaFreeHost().

Parameters:
ptr - - Device pointer to allocated memory
size - - Requested allocation size in bytes
flags - - Requested properties of allocated memory
Returns:
cudaSuccess, cudaErrorMemoryAllocation
See Also:
cudaSetDeviceFlags(int), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaHostGetDevicePointer

public static int cudaHostGetDevicePointer(Pointer pDevice,
                                           Pointer pHost,
                                           int flags)
Passes back the device pointer corresponding to the mapped, pinned host buffer allocated by cudaHostAlloc(). cudaHostGetDevicePointer() will fail if the cudaDeviceMapHost flag was not specified before deferred context creation occurred, or if called on a device that does not support mapped, pinned memory. flags provides for future releases. For now, it must be set to 0. Parameters: pDevice - Returned device pointer for mapped memory pHost - Requested host pointer mapping flags - Flags for extensions (must be 0 for now) Returns: cudaSuccess, cudaErrorInvalidValue, cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cudaSetDeviceFlags, cudaHostAlloc


cudaMalloc

public static int cudaMalloc(Pointer devPtr,
                             long size)
Allocate memory on the GPU.

SYNOPSIS
cudaError_t cudaMalloc( void** devPtr, size_t count )

DESCRIPTION
Allocates count bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. cudaMalloc() returns cudaErrorMemoryAllocation in case of failure.

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaMallocHost

public static int cudaMallocHost(Pointer ptr,
                                 long size)
Allocates page-locked memory on the host.

SYNOPSIS
cudaError_t cudaMallocHost( void** hostPtr, size_t size )

DESCRIPTION
Allocates size bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy*(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of memory with cudaMallocHost() may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaFreeHost(jcuda.Pointer)

cudaMallocPitch

public static int cudaMallocPitch(Pointer devPtr,
                                  long[] pitch,
                                  long width,
                                  long height)
Allocates memory on the GPU.

SYNOPSIS
cudaError_t cudaMallocPitch( void** devPtr, size_t* pitch, size_t widthInBytes, size_t height )

DESCRIPTION
Allocates at least widthInBytes*height bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. The pitch returned in *pitch by cudaMallocPitch() is the width in bytes of the allocation. The intended usage of pitch is as a separate parameter of the allocation, used to compute addresses within the 2D array. Given the row and column of an array element of type T, the address is computed as
T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cudaMallocPitch(). Due to pitch alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaFree(jcuda.Pointer), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaMallocArray

public static int cudaMallocArray(cudaArray array,
                                  cudaChannelFormatDesc desc,
                                  long width,
                                  long height)
Allocate an array on the GPU.

SYNOPSIS
cudaError_t cudaMallocArray( struct cudaArray** array, const struct cudaChannelFormatDesc* desc, size_t width, size_t height )

DESCRIPTION
Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *array. The cudaChannelFormatDesc is defined as:
 struct cudaChannelFormatDesc {
     int x, y, z, w;
     enum cudaChannelFormatKind f;
 };
 
where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKin- dUnsigned, cudaChannelFormatKindFloat.

Returns:
cudaSuccess, cudaErrorMemoryAllocation,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaFree

public static int cudaFree(Pointer devPtr)
Frees memory on the GPU.

SYNOPSIS
cudaError_t cudaFree(void* devPtr)

DESCRIPTION
Frees the memory space pointed to by devPtr, which must have been returned by a previous call to cud- aMalloc() or cudaMallocPitch(). Otherwise, or if cudaFree(devPtr) has already been called before, an error is returned. If devPtr is 0, no operation is performed. cudaFree() returns cudaErrorInvalid- DevicePointer in case of failure.

Returns:
cudaSuccess, cudaErrorInvalidDevicePointer, cudaErrorInitializationError,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaFreeHost

public static int cudaFreeHost(Pointer ptr)
Frees page-locked memory.

SYNOPSIS
cudaError_t cudaFreeHost( void* hostPtr )

DESCRIPTION
Frees the memory space pointed to by hostPtr, which must have been returned by a previous call to cudaMallocHost().

Returns:
cudaSuccess, cudaErrorInitializationError,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaFreeArray(jcuda.runtime.cudaArray), cudaMallocHost(jcuda.Pointer, long)

cudaFreeArray

public static int cudaFreeArray(cudaArray array)
Frees an array on the GPU.

SYNOPSIS
cudaError_t cudaFreeArray( struct cudaArray* array )

DESCRIPTION
Frees the CUDA array array. If array is 0, no operation is performed.

Returns:
cudaSuccess, cudaErrorInitializationError,
See Also:
cudaMalloc(jcuda.Pointer, long), cudaMallocPitch(jcuda.Pointer, long[], long, long), cudaFree(jcuda.Pointer), cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long), cudaMallocHost(jcuda.Pointer, long), cudaFreeHost(jcuda.Pointer)

cudaMemcpy

public static int cudaMemcpy(Pointer dst,
                             Pointer src,
                             long count,
                             int cudaMemcpyKind_kind)
Copies data between GPU and host.

SYNOPSIS
cudaError_t cudaMemcpy( void* dst, const void* src, size_t count, enum cudaMemcpyKind kind ) cudaError_t cudaMemcpyAsync( void* dst, const void* src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream )

DESCRIPTION
Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDevice- ToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. The memory areas may not overlap. Calling cudaMemcpy() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. cudaMemcpyAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpyToArray

public static int cudaMemcpyToArray(cudaArray dst,
                                    long wOffset,
                                    long hOffset,
                                    Pointer src,
                                    long count,
                                    int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpyToArray(struct cudaArray* dstArray, size_t dstX, size_t dstY, const void* src, size_t count, enum cudaMemcpyKind kind) cudaError_t cudaMemcpyToArrayAsync(struct cudaArray* dstArray, size_t dstX, size_t dstY, const void* src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream)

DESCRIPTION
Copies count bytes from the memory area pointed to by src to the CUDA array dstArray starting at the upper left corner (dstX, dstY), where kind is one of cudaMemcpyHostToHost, cudaMemcpyHost- ToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. cudaMemcpyToArrayAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpyFromArray

public static int cudaMemcpyFromArray(Pointer dst,
                                      cudaArray src,
                                      long wOffset,
                                      long hOffset,
                                      long count,
                                      int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpyFromArray(void* dst, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t count, enum cudaMemcpyKind kind) cudaError_t cudaMemcpyFromArrayAsync(void* dst, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream)

DESCRIPTION
Copies count bytes from the CUDA array srcArray starting at the upper left corner (srcX, srcY) to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHost- ToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. cudaMemcpyFromArrayAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpyArrayToArray

public static int cudaMemcpyArrayToArray(cudaArray dst,
                                         long wOffsetDst,
                                         long hOffsetDst,
                                         cudaArray src,
                                         long wOffsetSrc,
                                         long hOffsetSrc,
                                         long count,
                                         int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpyArrayToArray(struct cudaArray* dstArray, size_t dstX, size_t dstY, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t count, enum cudaMemcpyKind kind)

DESCRIPTION
Copies count bytes from the CUDA array srcArray starting at the upper left corner (srcX, srcY) to the CUDA array dstArray starting at the upper left corner (dstX, dstY), where kind is one of cudaMem- cpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDe- viceToDevice, and specifies the direction of the copy.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpy2D

public static int cudaMemcpy2D(Pointer dst,
                               long dpitch,
                               Pointer src,
                               long spitch,
                               long width,
                               long height,
                               int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpy2D( void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind ) cudaError_t cudaMemcpy2DAsync( void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind, cudaStream_t stream )

DESCRIPTION
Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src, including any padding added to the end of each row. The memory areas may not overlap. Calling cudaMemcpy2D() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. cudaMemcpy2D() returns an error if dpitch or spitch is greater than the maximum allowed. cudaMemcpy2DAsync() is asynchronous and can optionally be associated to a stream by passing a non- zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidPitchValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpy2DToArray

public static int cudaMemcpy2DToArray(cudaArray dst,
                                      long wOffset,
                                      long hOffset,
                                      Pointer src,
                                      long spitch,
                                      long width,
                                      long height,
                                      int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpy2DToArray(struct cudaArray* dstArray, size_t dstX, size_t dstY, const void* src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind); cudaError_t cudaMemcpy2DToArrayAsync(struct cudaArray* dstArray, size_t dstX, size_t dstY, const void* src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind, cudaStream_t stream);

DESCRIPTION
Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the CUDA array dstArray starting at the upper left corner (dstX, dstY), where kind is one of cudaMemcpyHost- ToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDe- vice, and specifies the direction of the copy. spitch is the width in memory in bytes of the 2D array pointed to by src, including any padding added to the end of each row. cudaMemcpy2D() returns an error if spitch is greater than the maximum allowed. cudaMemcpy2DToArrayAsync() is asynchronous and can optionally be associated to a stream by pass- ing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidPitchValue, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpy2DFromArray

public static int cudaMemcpy2DFromArray(Pointer dst,
                                        long dpitch,
                                        cudaArray src,
                                        long wOffset,
                                        long hOffset,
                                        long width,
                                        long height,
                                        int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpy2DFromArray(void* dst, size_t dpitch, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t width, size_t height, enum cudaMemcpyKind kind) cudaError_t cudaMemcpy2DFromArrayAsync(void* dst, size_t dpitch, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t width, size_t height, enum cudaMemcpyKind kind, cudaStream_t stream)

DESCRIPTION
Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (srcX, srcY) to the memory area pointed to by dst, where kind is one of cudaMemcpyHost- ToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDe- vice, and specifies the direction of the copy. dpitch is the width in memory in bytes of the 2D array pointed to by dst, including any padding added to the end of each row. cudaMemcpy2D() returns an error if dpitch is greater than the maximum allowed. cudaMemcpy2DFromArrayAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidPitchValue, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpy2DArrayToArray

public static int cudaMemcpy2DArrayToArray(cudaArray dst,
                                           long wOffsetDst,
                                           long hOffsetDst,
                                           cudaArray src,
                                           long wOffsetSrc,
                                           long hOffsetSrc,
                                           long width,
                                           long height,
                                           int cudaMemcpyKind_kind)
Copies data between host and device.

SYNOPSIS
cudaError_t cudaMemcpy2DArrayToArray(struct cudaArray* dstArray, size_t dstX, size_t dstY, const struct cudaArray* srcArray, size_t srcX, size_t srcY, size_t width, size_t height, enum cudaMemcpyKind kind)

DESCRIPTION
Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (srcX, srcY) to the CUDA array dstArray starting at the upper left corner (dstX, dstY), where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDevice- ToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpyToSymbol

public static int cudaMemcpyToSymbol(java.lang.String symbol,
                                     Pointer src,
                                     long count,
                                     long offset,
                                     int cudaMemcpyKind_kind)
Copies data from host memory to GPU.

SYNOPSIS
template < class T > cudaError_t cudaMemcpyToSymbol( const T& symbol, const void* src, size_t count, size_t offset, enum cudaMemcpyKind kind)

DESCRIPTION
Copies count bytes from the memory area pointed to by src to the memory area pointed to by offset bytes from the start of symbol symbol. The memory areas may not overlap. symbol can either be a variable that resides in global or constant memory space, or it can be a character string, naming a vari- able that resides in global or constant memory space. kind can be either cudaMemcpyHostToDevice or cudaMemcpyDeviceToDevice.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidSymbol, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemcpyFromSymbol

public static int cudaMemcpyFromSymbol(Pointer dst,
                                       java.lang.String symbol,
                                       long count,
                                       long offset,
                                       int cudaMemcpyKind_kind)
Copies data from GPU to host memory.

SYNOPSIS
template < class T > cudaError_t cudaMemcpyFromSymbol( void *dst, const T& symbol, size_t count, size_t offset, enum cudaMemcpyKind kind)

DESCRIPTION
Copies count bytes from the memory area pointed to by offset bytes from the start of symbol symbol to the memory area pointed to by dst. The memory areas may not overlap. symbol can either be a variable that resides in global or constant memory space, or it can be a character string, naming a vari- able that resides in global or constant memory space. kind can be either cudaMemcpyDeviceToHost or cudaMemcpyDeviceToDevice.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidSymbol, cudaErrorInvalidDevicePointer, cudaErrorInvalidMemcpyDirection,
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int), cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int), cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int), cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int), cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int), cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int), cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)

cudaMemcpyAsync

public static int cudaMemcpyAsync(Pointer dst,
                                  Pointer src,
                                  long count,
                                  int cudaMemcpyKind_kind,
                                  cudaStream_t stream)
See Also:
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)

cudaMemcpyToArrayAsync

public static int cudaMemcpyToArrayAsync(cudaArray dst,
                                         long wOffset,
                                         long hOffset,
                                         Pointer src,
                                         long count,
                                         int cudaMemcpyKind_kind,
                                         cudaStream_t stream)
See Also:
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)

cudaMemcpyFromArrayAsync

public static int cudaMemcpyFromArrayAsync(Pointer dst,
                                           cudaArray src,
                                           long wOffset,
                                           long hOffset,
                                           long count,
                                           int cudaMemcpyKind_kind,
                                           cudaStream_t stream)
See Also:
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)

cudaMemcpy2DAsync

public static int cudaMemcpy2DAsync(Pointer dst,
                                    long dpitch,
                                    Pointer src,
                                    long spitch,
                                    long width,
                                    long height,
                                    int cudaMemcpyKind_kind,
                                    cudaStream_t stream)
See Also:
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)

cudaMemcpy2DToArrayAsync

public static int cudaMemcpy2DToArrayAsync(cudaArray dst,
                                           long wOffset,
                                           long hOffset,
                                           Pointer src,
                                           long spitch,
                                           long width,
                                           long height,
                                           int cudaMemcpyKind_kind,
                                           cudaStream_t stream)
See Also:
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)

cudaMemcpy2DFromArrayAsync

public static int cudaMemcpy2DFromArrayAsync(Pointer dst,
                                             long dpitch,
                                             cudaArray src,
                                             long wOffset,
                                             long hOffset,
                                             long width,
                                             long height,
                                             int cudaMemcpyKind_kind,
                                             cudaStream_t stream)
See Also:
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)

cudaMemcpyToSymbolAsync

public static int cudaMemcpyToSymbolAsync(java.lang.String symbol,
                                          Pointer src,
                                          long count,
                                          long offset,
                                          int cudaMemcpyKind_kind,
                                          cudaStream_t stream)
See Also:
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)

cudaMemcpyFromSymbolAsync

public static int cudaMemcpyFromSymbolAsync(Pointer dst,
                                            java.lang.String symbol,
                                            long count,
                                            long offset,
                                            int cudaMemcpyKind_kind,
                                            cudaStream_t stream)
See Also:
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)

cudaMemset

public static int cudaMemset(Pointer mem,
                             int c,
                             long count)
Initializes or sets GPU memory to a value.

SYNOPSIS
cudaError_t cudaMemset( void* devPtr, int value, size_t count )

DESCRIPTION
Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer,
See Also:
cudaMemset2D(jcuda.Pointer, long, int, long, long), cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)

cudaMemset2D

public static int cudaMemset2D(Pointer mem,
                               long pitch,
                               int c,
                               long width,
                               long height)
Initializes or sets GPU memory to a value.

SYNOPSIS
cudaError_t cudaMemset2D( void* dstPtr, size_t pitch, int value, size_t width, size_t height )

DESCRIPTION
Sets to the specified value value a matrix (height rows of width bytes each) pointed to by dstPtr. pitch is the width in memory in bytes of the 2D array pointed to by dstPtr, including any padding added to the end of each row. This function performs fastest when the pitch is one that has been passed back by cudaMallocPitch().

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer,
See Also:
cudaMemset(jcuda.Pointer, int, long), cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)

cudaGetChannelDesc

public static int cudaGetChannelDesc(cudaChannelFormatDesc desc,
                                     cudaArray array)
Low-level texture API.

SYNOPSIS
cudaError_t cudaGetChannelDesc(struct cudaChannelFormatDesc* desc, const struct cudaArray* array);

DESCRIPTION
Returns in *desc the channel descriptor of the CUDA array array.

Returns:
cudaSuccess, cudaErrorInvalidValue,
See Also:
cudaCreateChannelDesc(int, int, int, int, int), cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String), cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaUnbindTexture(jcuda.runtime.textureReference), cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)

cudaCreateChannelDesc

public static cudaChannelFormatDesc cudaCreateChannelDesc(int x,
                                                          int y,
                                                          int z,
                                                          int w,
                                                          int cudaChannelFormatKind_f)
Returns a channel descriptor.

SYNOPSIS
struct cudaChannelFormatDesc cudaCreateChannelDesc(int x, int y, int z, int w, enum cudaChannelFormatKind f);
DESCRIPTION
Returns a channel descriptor with format f and number of bits of each component x, y, z, and w. The cudaChannelFormatDesc is defined as:
 struct cudaChannelFormatDesc {
     int x, y, z, w;
     enum cudaChannelFormatKind f;
 };
 
where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, cudaChannelFormatKindFloat.

Returns:
cudaSuccess
See Also:
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray), cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String), cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaUnbindTexture(jcuda.runtime.textureReference), cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)

cudaGetLastError

public static int cudaGetLastError()
Returns the last error from a run-time call.

SYNOPSIS
cudaError_t cudaGetLastError( void )

DESCRIPTION
Returns the last error that was returned from any of the runtime calls in the same host thread and resets it to cudaSuccess.

Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorLaunchFailure, cudaErrorPriorLaunchFailure, cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration, cudaErrorInvalidDevice, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidTexture, cudaErrorInvalidTextureBinding, cudaErrorInvalidChannelDescriptor, cudaErrorTextureFetchFailed, cudaErrorTextureNotBound, cudaErrorSynchronizationError, cudaErrorUnknown, cudaErrorInvalidResourceHandle, cudaErrorNotReady, Note that this function may also return error codes from previous asynchronous launches., 95,
See Also:
cudaGetErrorString(int), cudaError

cudaGetErrorString

public static java.lang.String cudaGetErrorString(int error)
Returns the message string from an error.

SYNOPSIS
const char* cudaGetErrorString(cudaError_t error);

DESCRIPTION
Returns a message string from an error code.

Returns:
The error string
See Also:
cudaGetLastError()

cudaStreamCreate

public static int cudaStreamCreate(cudaStream_t stream)
Create an async stream.

SYNOPSIS
cudaError_t cudaStreamCreate( cudaStream_t* stream )

DESCRIPTION
Creates a stream.

Returns:
cudaSuccess, cudaErrorInvalidValue,
See Also:
cudaStreamQuery(jcuda.runtime.cudaStream_t), cudaStreamSynchronize(jcuda.runtime.cudaStream_t), cudaStreamDestroy(jcuda.runtime.cudaStream_t)

cudaStreamDestroy

public static int cudaStreamDestroy(cudaStream_t stream)
Destroys and cleans-up a stream object.

SYNOPSIS
cudaError_t cudaStreamDestroy( cudaStream_t stream )

DESCRIPTION
Destroys a stream object.

Returns:
cudaSuccess, cudaErrorInvalidResourceHandle,
See Also:
cudaStreamCreate(jcuda.runtime.cudaStream_t), cudaStreamSynchronize(jcuda.runtime.cudaStream_t), cudaStreamDestroy(jcuda.runtime.cudaStream_t)

cudaStreamSynchronize

public static int cudaStreamSynchronize(cudaStream_t stream)
Waits for stream tasks to complete.

SYNOPSIS
cudaError_t cudaStreamSynchronize( cudaStream_t stream )

DESCRIPTION
Blocks until the device has completed all operations in the stream.

Returns:
cudaSuccess, cudaErrorInvalidResourceHandle,
See Also:
cudaStreamCreate(jcuda.runtime.cudaStream_t), cudaStreamDestroy(jcuda.runtime.cudaStream_t), cudaStreamQuery(jcuda.runtime.cudaStream_t)

cudaStreamQuery

public static int cudaStreamQuery(cudaStream_t stream)
Queries a stream for completion-status.

SYNOPSIS
cudaError_t cudaStreamQuery(cudaStream_t stream)

DESCRIPTION
Returns cudaSuccess if all operations in the stream have completed, or cudaErrorNotReady if not.

Returns:
cudaSuccess, cudaErrorNotReady, cudaErrorInvalidResourceHandle,
See Also:
cudaStreamCreate(jcuda.runtime.cudaStream_t), cudaStreamDestroy(jcuda.runtime.cudaStream_t), cudaStreamSynchronize(jcuda.runtime.cudaStream_t)

cudaEventCreate

public static int cudaEventCreate(cudaEvent_t event)
Creates an event-object.

SYNOPSIS
cudaError_t cudaEventCreate( cudaEvent_t* event )

DESCRIPTION
Creates an event object.

Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue, cudaErrorMemoryAllocation,
See Also:
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventCreateWithFlags

public static int cudaEventCreateWithFlags(cudaEvent_t event,
                                           int flags)
Creates an event object with the specified flags. Valid flags include: cudaEventDefault, cudaEventBlockingSync

Parameters:
event - - Newly created event
flags - - Flags for new event
Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue, cudaErrorMemoryAllocation
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventRecord

public static int cudaEventRecord(cudaEvent_t event,
                                  cudaStream_t stream)
Records an event.

SYNOPSIS
cudaError_t cudaEventRecord( cudaEvent_t event, CUstream stream )

DESCRIPTION
Records an event. If stream is non-zero, the event is recorded after all preceding operations in the stream have been completed; otherwise, it is recorded after all preceding operations in the CUDA context have been completed. Since this operation is asynchronous, cudaEventQuery() and/or cudaEventSynchronize() must be used to determine when the event has actually been recorded. If cudaEventRecord() has previously been called and the event has not been recorded yet, this function returns cudaErrorInvalidValue.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidResourceHandle,
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventQuery

public static int cudaEventQuery(cudaEvent_t event)
Query if an event has been recorded.

SYNOPSIS
cudaError_t cudaEventQuery( cudaEvent_t event )

DESCRIPTION
Returns cudaSuccess if the event has actually been recorded, or cudaErrorNotReady if not. If cud- aEventRecord() has not been called on this event, the function returns cudaErrorInvalidValue.

Returns:
cudaSuccess, cudaErrorNotReady, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle,
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventSynchronize

public static int cudaEventSynchronize(cudaEvent_t event)
Wait for an event to be recorded.

SYNOPSIS
cudaError_t cudaEventSynchronize( cudaEvent_t event )

DESCRIPTION
Blocks until the event has actually been recorded. If cudaEventRecord() has not been called on this event, the function returns cudaErrorInvalidValue.

Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle,
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventDestroy

public static int cudaEventDestroy(cudaEvent_t event)
Destroys an event-object.

SYNOPSIS
cudaError_t cudaEventDestroy( cudaEvent_t event )

DESCRIPTION
Destroys the event-object.

Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue,
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t), cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)

cudaEventElapsedTime

public static int cudaEventElapsedTime(float[] ms,
                                       cudaEvent_t start,
                                       cudaEvent_t end)
Computes the elapsed time between events.

SYNOPSIS
cudaError_t cudaEventElapsedTime( float* time, cudaEvent_t start, cudaEvent_t end );

DESCRIPTION
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds). If either event has not been recorded yet, this function returns cudaErrorInvalidValue. If either event has been recorded with a non-zero stream, the result is undefined.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInitializationError, cudaErrorPriorLaunchFailure, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle,
See Also:
cudaEventCreate(jcuda.runtime.cudaEvent_t), cudaEventQuery(jcuda.runtime.cudaEvent_t), cudaEventSynchronize(jcuda.runtime.cudaEvent_t), cudaEventDestroy(jcuda.runtime.cudaEvent_t), cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)

cudaThreadExit

public static int cudaThreadExit()
Exit and clean-up from CUDA launches.

SYNOPSIS
cudaError_t cudaThreadExit(void)

DESCRIPTION
Explicitly cleans up all runtime-related resources associated with the calling host thread. Any subsequent API call reinitializes the runtime. cudaThreadExit() is implicitly called on host thread exit.

Returns:
cudaSuccess,
See Also:
cudaThreadSynchronize()

cudaThreadSynchronize

public static int cudaThreadSynchronize()
Wait for compute-device to finish.

SYNOPSIS
cudaError_t cudaThreadSynchronize(void)

DESCRIPTION
Blocks until the device has completed all preceding requested tasks. cudaThreadSynchronize() returns an error if one of the preceding tasks failed.

Returns:
cudaSuccess,
See Also:
cudaThreadExit()

cudaGetSymbolAddress

public static int cudaGetSymbolAddress(Pointer devPtr,
                                       java.lang.String symbol)
Finds the address associated with a CUDA symbol.

SYNOPSIS
template < class T > cudaError_t cudaGetSymbolAddress(void** devPtr, const T& symbol)

DESCRIPTION
Returns in *devPtr the address of symbol symbol on the device. symbol can either be a variable that resides in global memory space, or it can be a character string, naming a variable that resides in global memory space. If symbol cannot be found, or if symbol is not declared in global memory space, *devPtr is unchanged and an error is returned. cudaGetSymbolAddress() returns cudaErrorInvalidSymbol in case of failure

Returns:
cudaSuccess, cudaErrorInvalidSymbol, cudaErrorAddressOfConstant,
See Also:
cudaGetSymbolSize(long[], java.lang.String)

cudaGetSymbolSize

public static int cudaGetSymbolSize(long[] size,
                                    java.lang.String symbol)
Finds the size of the object associated with a CUDA symbol.

SYNOPSIS
template < class T > cudaError_t cudaGetSymbolSize(size_t* size, const T& symbol)

DESCRIPTION
Returns in *size the size of symbol symbol. symbol can either be a variable that resides in global or constant memory space, or it can be a character string, naming a variable that resides in global or constant memory space. If symbol cannot be found, or if symbol is not declared in global or constant memory space, *size is unchanged and an error is returned. cudaGetSymbolSize() returns cudaErrorInvalidSymbol in case of failure.

Returns:
cudaSuccess, cudaErrorInvalidSymbol,
See Also:
cudaGetSymbolAddress(jcuda.Pointer, java.lang.String)

cudaBindTexture

public static int cudaBindTexture(long[] offset,
                                  textureReference texref,
                                  Pointer devPtr,
                                  cudaChannelFormatDesc desc,
                                  long size)
Binds size bytes of the memory area pointed to by devPtr to texture reference texRef.

SYNOPSIS
template < class T, int dim, enum cudaTextureReadMode readMode > static __inline__ __host__ cudaError_t cudaBindTexture(size_t* offset, const struct texture < T, dim, readMode >& texRef, const void* devPtr, const struct cudaChannelFormatDesc& desc, size_t size = UINT_MAX)

DESCRIPTION
Binds size bytes of the memory area pointed to by devPtr to texture reference texRef. desc describes how the memory is interpreted when fetching values from the texture. The offset parameter is an optional byte offset as with the low-level cudaBindTexture() function. Any memory previously bound to texRef is unbound. template E<lt> class T, int dim, enum cudaTextureReadMode readMode E<gt> static __inline__ __host__ cudaError_t cudaBindTexture( size_t* offset, const struct texture E<lt> T, dim, readMode E<gt>& texRef, const void* devPtr, size_t size = UINT_MAX); binds size bytes of the memory area pointed to by devPtr to texture reference texRef. The channel descriptor is inherited from the texture reference type. The offset parameter is an optional byte offset as with the low- level cudaBindTexture() function described

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidTexture,

cudaBindTexture2D

public static int cudaBindTexture2D(long[] offset,
                                    textureReference texref,
                                    Pointer devPtr,
                                    cudaChannelFormatDesc desc,
                                    long width,
                                    long height,
                                    long pitch)
Binds the 2D memory area pointed to by devPtr to the texture reference texref. The size of the area is constrained by width in texel units, height in texel units, and pitch in byte units. desc describes how the memory is interpreted when fetching values from the texture. Any memory previously bound to texref is unbound. Since the hardware enforces an alignment requirement on texture base addresses, cudaBindTexture2D() returns in offset a byte offset that must be applied to texture fetches in order to read from the desired memory. This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex2D() function. If the device memory pointer was returned from cudaMalloc(), the offset is guaranteed to be 0 and NULL may be passed as the offset parameter.

Parameters:
offset - - Offset in bytes
texref - - Texture reference to bind
devPtr - - 2D memory area on device
desc - - Channel format
width - - Width in texel units
height - - Height in texel units
pitch - - Pitch in bytes
Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidTexture
See Also:
cudaCreateChannelDesc(int, int, int, int, int), cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray), cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String), cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long), cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)

cudaBindTextureToArray

public static int cudaBindTextureToArray(textureReference texref,
                                         cudaArray array,
                                         cudaChannelFormatDesc desc)
Binds the CUDA array array to texture reference texRef.

SYNOPSIS
template < class T, int dim, enum cudaTextureReadMode readMode > static __inline__ __host__ cudaError_t cudaBindTextureToArray( const struct texture < T, dim, readMode >& texRef, const struct cudaArray* cuArray, const struct cudaChannelFormatDesc& desc)

DESCRIPTION
Binds the CUDA array array to texture reference texRef. desc describes how the memory is interpreted when fetching values from the texture. Any CUDA array previously bound to texRef is unbound. template E<lt> class T, int dim, enum cudaTextureReadMode readMode E<gt> static __inline__ __host__ cudaError_t cudaBindTextureToArray( const struct texture E<lt> T, dim, readMode E<gt>& texRef, const struct cudaArray* cuArray); binds the CUDA array array to texture reference texRef. The channel descriptor is inherited from the CUDA array. Any CUDA array previously bound to texRef is unbound.

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidTexture,
See Also:
cudaCreateChannelDesc(int, int, int, int, int)

cudaUnbindTexture

public static int cudaUnbindTexture(textureReference texref)
Unbinds the texture bound to texture reference texRef.

SYNOPSIS
template < class T, int dim, enum cudaTextureReadMode readMode > static __inline__ __host__ cudaError_t cudaUnbindTexture(const struct texture < T, dim, readMode >& texRef)

DESCRIPTION
Unbinds the texture bound to texture reference texRef.

Returns:
cudaSuccess,

cudaGetTextureAlignmentOffset

public static int cudaGetTextureAlignmentOffset(long[] offset,
                                                textureReference texref)
Returns in *offset the offset that was returned when texture reference texRef was bound.

SYNOPSIS
cudaError_t cudaGetTextureAlignmentOffset(size_t* offset, const struct textureReference* texRef);

DESCRIPTION
Returns in *offset the offset that was returned when texture reference texRef was bound.

Returns:
cudaSuccess, cudaErrorInvalidTexture, cudaErrorInvalidTextureBinding,
See Also:
cudaCreateChannelDesc(int, int, int, int, int), cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray), cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String), cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaUnbindTexture(jcuda.runtime.textureReference)

cudaGetTextureReference

public static int cudaGetTextureReference(textureReference texref,
                                          java.lang.String symbol)
Returns in *texRef the structure associated to the texture reference defined by symbol symbol.

SYNOPSIS
cudaError_t cudaGetTextureReference( struct textureReference** texRef, const char* symbol)

DESCRIPTION
Returns in *texRef the structure associated to the texture reference defined by symbol symbol.

Returns:
cudaSuccess, cudaErrorInvalidTexture,
See Also:
cudaCreateChannelDesc(int, int, int, int, int), cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray), cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long), cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc), cudaUnbindTexture(jcuda.runtime.textureReference), cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)

cudaConfigureCall

public static int cudaConfigureCall(dim3 gridDim,
                                    dim3 blockDim,
                                    long sharedMem,
                                    cudaStream_t stream)
Configure a device-launch.

SYNOPSIS
cudaError_t cudaConfigureCall(dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, int tokens = 0)

DESCRIPTION
Specifies the grid and block dimensions for the device call to be executed similar to the execution configuration syntax. cudaConfigureCall() is stack based. Each call pushes data on top of an execution stack. This data contains the dimension for the grid and thread blocks, together with any arguments for the call.

Returns:
cudaSuccess, cudaErrorInvalidConfiguration,
See Also:
cudaLaunch(java.lang.String), cudaSetupArgument(jcuda.Pointer, long, long)

cudaSetupArgument

public static int cudaSetupArgument(Pointer arg,
                                    long size,
                                    long offset)
Configure a device-launch.

SYNOPSIS
cudaError_t cudaSetupArgument(void* arg, size_t count, size_t offset) template < class T > cudaError_t cudaSetupArgument(T arg, size_t offset)

DESCRIPTION
Pushes count bytes of the argument pointed to by arg at offset bytes from the start of the parameter passing area, which starts at offset 0. The arguments are stored in the top of the execution stack. cudaSe- tupArgument() must be preceded by a call to cudaConfigureCall().

Returns:
cudaSuccess,
See Also:
cudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t), cudaLaunch(java.lang.String)

cudaFuncGetAttributes

public static int cudaFuncGetAttributes(cudaFuncAttributes attr,
                                        java.lang.String func)
This function obtains the attributes of a function specified via func. The fetched attributes are placed in attr. If the specified function does not exist, then :cudaErrorInvalidDeviceFunction is returned.

Parameters:
attr - - Return pointer to function’s attributes
func - - Function to get attributes of
Returns:
cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction
See Also:
cudaLaunch(java.lang.String)

cudaLaunch

public static int cudaLaunch(java.lang.String symbol)
Launches a device function.

SYNOPSIS
template < class T > cudaError_t cudaLaunch(T entry)

DESCRIPTION
Launches the function entry on the device. entry can either be a function that executes on the device, or it can be a character string, naming a function that executes on the device. entry must be declared as a __global__ function. cudaLaunch() must be preceded by a call to cudaConfigureCall() since it pops the data that was pushed by cudaConfigureCall() from the execution stack.

Returns:
cudaSuccess, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration,
See Also:
cudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t), cudaSetupArgument(jcuda.Pointer, long, long)

cudaGLSetGLDevice

public static int cudaGLSetGLDevice(int device)
Sets the CUDA device for use with GL Interopability.

SYNOPSIS
cudaError_t cudaGLSetGLDevice(int device);

DESCRIPTION
Records dev as the device on which the active host thread executes the device code. Records the thread as using GL Interopability.

Returns:
cudaSuccess, cudaErrorInvalidDevice,
See Also:
cudaGLRegisterBufferObject(int), cudaGLMapBufferObject(jcuda.Pointer, int), cudaGLUnmapBufferObject(int), cudaGLUnregisterBufferObject(int)

cudaGraphicsGLRegisterImage

public static int cudaGraphicsGLRegisterImage(cudaGraphicsResource resource,
                                              int image,
                                              int target,
                                              int Flags)
Registers the texture or renderbuffer object specified by image for access by CUDA. target must match the type of the object. A handle to the registered object is returned as resource. The map flags Flags specify the intended usage, as follows:


Parameters:
resource - Pointer to the returned object handle
image - name of texture or renderbuffer object to be registered
target - Identifies the type of object specified by image, and must be one of GL_TEXTURE_2D, GL_TEXTURE_RECTANGLE, GL_TEXTURE_CUBE_MAP, GL_TEXTURE_3D, GL_TEXTURE_2D_ARRAY, or GL_RENDERBUFFER.
flags - Map flags

Returns:
cudaSuccess, cudaErrorInvalidDevice, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGLSetGLDevice(int), cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource), cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t), cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)

cudaGraphicsGLRegisterBuffer

public static int cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource,
                                               int buffer,
                                               int Flags)
Registers the buffer object specified by buffer for access by CUDA. A handle to the registered object is returned as resource. The map flags Flags specify the intended usage, as follows:

Parameters:
resource - Pointer to the returned object handle
buffer - name of buffer object to be registered
flags - Map flags

Returns:
cudaSuccess, cudaErrorInvalidDevice, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource), cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t), cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)

cudaGLRegisterBufferObject

public static int cudaGLRegisterBufferObject(int bufObj)
Deprecated. As of CUDA 3.0

OpenGL interoperability.

SYNOPSIS
cudaError_t cudaGLRegisterBufferObject(GLuint bufferObj)

DESCRIPTION
Registers the buffer object of ID bufferObj for access by CUDA. This function must be called before CUDA can map the buffer object. While it is registered, the buffer object cannot be used by any OpenGL commands except as a data source for OpenGL drawing commands.

Returns:
cudaSuccess, cudaErrorNotInitialized,
See Also:
cudaGLSetGLDevice(int), cudaGLMapBufferObject(jcuda.Pointer, int), cudaGLUnmapBufferObject(int), cudaGLUnregisterBufferObject(int)

cudaGLMapBufferObject

public static int cudaGLMapBufferObject(Pointer devPtr,
                                        int bufObj)
Deprecated. As of CUDA 3.0

OpenGL interoperability.

SYNOPSIS
cudaError_t cudaGLMapBufferObject(void** devPtr, GLuint bufferObj);

DESCRIPTION
Maps the buffer object of ID bufferObj into the address space of CUDA and returns in *devPtr the base pointer of the resulting mapping.

Returns:
cudaSuccess, cudaErrorMapBufferObjectFailed,
See Also:
cudaGLSetGLDevice(int), cudaGLRegisterBufferObject(int), cudaGLUnmapBufferObject(int), cudaGLUnregisterBufferObject(int)

cudaGLUnmapBufferObject

public static int cudaGLUnmapBufferObject(int bufObj)
Deprecated. As of CUDA 3.0

OpenGL interoperability.

SYNOPSIS
cudaError_t cudaGLUnmapBufferObject(GLuint bufferObj);

DESCRIPTION
Unmaps the buffer object of ID bufferObj for access by CUDA.

Returns:
cudaSuccess, cudaErrorInvalidDevicePointer, cudaErrorUnmapBufferObjectFailed,
See Also:
cudaGLSetGLDevice(int), cudaGLRegisterBufferObject(int), cudaGLMapBufferObject(jcuda.Pointer, int), cudaGLUnregisterBufferObject(int)

cudaGLUnregisterBufferObject

public static int cudaGLUnregisterBufferObject(int bufObj)
Deprecated. As of CUDA 3.0

OpenGL interoperability.

SYNOPSIS
cudaError_t cudaGLUnregisterBufferObject(GLuint bufferObj);

DESCRIPTION
Unregisters the buffer object of ID bufferObj for access by CUDA.

Returns:
cudaSuccess,
See Also:
cudaGLSetGLDevice(int), cudaGLRegisterBufferObject(int), cudaGLMapBufferObject(jcuda.Pointer, int), cudaGLUnmapBufferObject(int)

cudaGLSetBufferObjectMapFlags

public static int cudaGLSetBufferObjectMapFlags(int bufObj,
                                                int flags)
Deprecated. As of CUDA 3.0

Set flags for mapping the OpenGL buffer bufObj.

Changes to flags will take effect the next time bufObj is mapped. The flags argument may be any of the following:

If bufObj has not been registered for use with CUDA, then cudaErrorInvalidResourceHandle is returned. If bufObj is presently mapped for access by CUDA, then cudaErrorUnknown is returned.

Parameters:
bufObj - Registered buffer object to set flags for
flags - Parameters for buffer mapping

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsResourceSetMapFlags(jcuda.runtime.cudaGraphicsResource, int)

cudaGLMapBufferObjectAsync

public static int cudaGLMapBufferObjectAsync(Pointer devPtr,
                                             int bufObj,
                                             cudaStream_t stream)
Deprecated. As of CUDA 3.0

Maps the buffer object of ID bufObj into the address space of CUDA and returns in *devPtr the base pointer of the resulting mapping. The buffer must have previously been registered by calling cudaGLRegisterBufferObject(). While a buffer is mapped by CUDA, any OpenGL operation which references the buffer will result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.


Stream stream is synchronized with the current GL context.

Parameters:
devPtr - Returned device pointer to CUDA object
bufObj - Buffer object ID to map
stream - Stream to synchronize

Returns:
cudaSuccess, cudaErrorMapBufferObjectFailed
#see JCudacudaGraphicsMapResources

cudaGLUnmapBufferObjectAsync

public static int cudaGLUnmapBufferObjectAsync(int bufObj,
                                               cudaStream_t stream)
Deprecated. As of CUDA 3.0

Unmaps the buffer object of ID bufObj for access by CUDA. When a buffer is unmapped, the base address returned by cudaGLMapBufferObject() is invalid and subsequent references to the address result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.

Stream stream is synchronized with the current GL context.

Parameters:
bufObj - Buffer object to unmap
stream - Stream to synchronize

Returns:
cudaSuccess, cudaErrorInvalidDevicePointer, cudaErrorUnmapBufferObjectFailed
See Also:
cudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)

cudaDriverGetVersion

public static int cudaDriverGetVersion(int[] driverVersion)
Returns in driverVersion the version number of the installed CUDA driver. If no driver is installed, then 0 is returned as the driver version (via driverVersion). This function automatically returns cudaErrorInvalidValue if the driverVersion argument is NULL.

Parameters:
driverVersion - - Returns the CUDA driver version.
Returns:
cudaSuccess, cudaErrorInvalidValue
See Also:
cudaRuntimeGetVersion(int[])

cudaRuntimeGetVersion

public static int cudaRuntimeGetVersion(int[] runtimeVersion)
Returns in runtimeVersion the version number of the installed CUDA Runtime. This function automatically returns cudaErrorInvalidValue if the runtimeVersion argument is NULL.

Parameters:
runtimeVersion - - Returns the CUDA Runtime version.
Returns:
cudaSuccess, cudaErrorInvalidValue
See Also:
cudaDriverGetVersion(int[])

cudaGraphicsUnregisterResource

public static int cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
Unregisters the graphics resource resource so it is not accessible by CUDA unless registered again.

If resource is invalid then cudaErrorInvalidResourceHandle is returned.
Parameters:
resource - Resource to unregister

Returns:
cudaSuccess, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int), cudaGraphicsGLRegisterImage(jcuda.runtime.cudaGraphicsResource, int, int, int)

cudaGraphicsResourceSetMapFlags

public static int cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource,
                                                  int flags)
Set flags for mapping the graphics resource resource.

Changes to flags will take effect the next time resource is mapped. The flags argument may be any of the following:
If resource is presently mapped for access by CUDA then cudaErrorUnknown is returned. If flags is not one of the above values then cudaErrorInvalidValue is returned.

Parameters:
resource - Registered resource to set flags for
flags - Parameters for resource mapping

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown,
See Also:
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)

cudaGraphicsMapResources

public static int cudaGraphicsMapResources(int count,
                                           cudaGraphicsResource[] resources,
                                           cudaStream_t stream)
Maps the count graphics resources in resources for access by CUDA.

The resources in resources may be accessed by CUDA until they are unmapped. The graphics API from which resources were registered should not access any resources while they are mapped by CUDA. If an application does so, the results are undefined.

This function provides the synchronization guarantee that any graphics calls issued before cudaGraphicsMapResources() will complete before any subsequent CUDA work issued in stream begins.

If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. If any of resources are presently mapped for access by CUDA then cudaErrorUnknown is returned.

Parameters:
count - Number of resources to map
resources - Resources to map for CUDA
stream - Stream for synchronization

Returns:
cudaSuccess, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource), cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int), cudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)

cudaGraphicsUnmapResources

public static int cudaGraphicsUnmapResources(int count,
                                             cudaGraphicsResource[] resources,
                                             cudaStream_t stream)
Unmaps the count graphics resources in resources.

Once unmapped, the resources in resources may not be accessed by CUDA until they are mapped again.

This function provides the synchronization guarantee that any CUDA work issued in stream before cudaGraphicsUnmapResources() will complete before any subsequently issued graphics work begins.

If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. If any of resources are not presently mapped for access by Cuda then cudaErrorUnknown is returned.

Parameters:
count - Number of resources to unmap
resources - Resources to unmap
stream - Stream for synchronization

Returns:
cudaSuccess, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)

cudaGraphicsResourceGetMappedPointer

public static int cudaGraphicsResourceGetMappedPointer(Pointer devPtr,
                                                       long[] size,
                                                       cudaGraphicsResource resource)
Returns in *devPtr a pointer through which the mapped graphics resource resource may be accessed. Returns in *size the size of the memory in bytes which may be accessed from that pointer. The value set in devPtr may change every time that resource is mapped.

If resource is not a buffer then it cannot be accessed via a pointer and cudaErrorUnknown is returned. If resource is not mapped then cudaErrorUnknown is returned.

Parameters:
devPtr - Returned pointer through which resource may be accessed
size - Returned size of the buffer accessible starting at *devPtr
resource - Mapped resource to access

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t), cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)

cudaGraphicsSubResourceGetMappedArray

public static int cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr,
                                                        cudaGraphicsResource resource,
                                                        int arrayIndex,
                                                        int mipLevel)
Returns in *array an array through which the subresource of the mapped graphics resource resource which corresponds to array index arrayIndex and mipmap level mipLevel may be accessed. The value set in array may change every time that resource is mapped.

If resource is not a texture then it cannot be accessed via an array and cudaErrorUnknown is returned. If arrayIndex is not a valid array index for resource then cudaErrorInvalidValue is returned. If mipLevel is not a valid mipmap level for resource then cudaErrorInvalidValue is returned. If resource is not mapped then cudaErrorUnknown is returned.

Parameters:
array - Returned array through which a subresource of resource may be accessed
resource - Mapped resource to access
arrayIndex - Array index for array textures or cubemap face index as defined by cudaGraphicsCubeFace for cubemap textures for the subresource to access
mipLevel - Mipmap level for the subresource to access

Returns:
cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorUnknown
See Also:
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)