|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjcuda.driver.JCudaDriver
public class JCudaDriver
Java bindings for the NVidia CUDA driver API.
Most comments are extracted from the CUDA online documentation
Field Summary | |
---|---|
static Pointer |
CU_LAUNCH_PARAM_BUFFER_POINTER
Indicator that the next value in the \p extra parameter to ::cuLaunchKernel will be a pointer to a buffer containing all kernel parameters used for launching kernel \p f. |
static Pointer |
CU_LAUNCH_PARAM_BUFFER_SIZE
Indicator that the next value in the \p extra parameter to ::cuLaunchKernel will be a pointer to a size_t which contains the size of the buffer specified with ::CU_LAUNCH_PARAM_BUFFER_POINTER. |
static Pointer |
CU_LAUNCH_PARAM_END
End of array terminator for the \p extra parameter to ::cuLaunchKernel |
static int |
CU_MEMHOSTALLOC_DEVICEMAP
If set, host memory is mapped into CUDA address space and JCudaDriver#cuMemHostGetDevicePointer may be called on the host pointer. |
static int |
CU_MEMHOSTALLOC_PORTABLE
If set, host memory is portable between CUDA contexts. |
static int |
CU_MEMHOSTALLOC_WRITECOMBINED
If set, host memory is allocated as write-combined - fast to write, faster to DMA, slow to read except via SSE4 streaming load instruction (MOVNTDQA). |
static int |
CU_MEMHOSTREGISTER_DEVICEMAP
If set, host memory is mapped into CUDA address space and ::cuMemHostGetDevicePointer() may be called on the host pointer. |
static int |
CU_MEMHOSTREGISTER_PORTABLE
If set, host memory is portable between CUDA contexts. |
static int |
CU_MEMPEERREGISTER_DEVICEMAP
Deprecated. This value has been added in CUDA 4.0 RC, and removed in CUDA 4.0 RC2 |
static int |
CU_PARAM_TR_DEFAULT
For texture references loaded into the module, use default texunit from texture reference |
static int |
CU_TRSA_OVERRIDE_FORMAT
Override the texref format with a format inferred from the array |
static int |
CU_TRSF_NORMALIZED_COORDINATES
Use normalized texture coordinates in the range [0,1) instead of [0,dim) |
static int |
CU_TRSF_READ_AS_INTEGER
Read the texture as integers rather than promoting the values to floats in the range [0,1] |
static int |
CU_TRSF_SRGB
Perform sRGB->linear conversion during texture read. |
static int |
CUDA_ARRAY3D_2DARRAY
Deprecated. use CUDA_ARRAY3D_LAYERED |
static int |
CUDA_ARRAY3D_LAYERED
If set, the CUDA array is a collection of layers, where each layer is either a 1D or a 2D array and the Depth member of CUDA_ARRAY3D_DESCRIPTOR specifies the number of layers, not the depth of a 3D array. |
static int |
CUDA_ARRAY3D_SURFACE_LDST
This flag must be set in order to bind a surface reference to the CUDA array |
static int |
CUDA_VERSION
The CUDA version |
Method Summary | |
---|---|
static int |
align(int value,
int alignment)
Returns the given (address) value, adjusted to have the given alignment. |
static int |
cuArray3DCreate(CUarray pHandle,
CUDA_ARRAY3D_DESCRIPTOR pAllocateArray)
Creates a 3D CUDA array. |
static int |
cuArray3DGetDescriptor(CUDA_ARRAY3D_DESCRIPTOR pArrayDescriptor,
CUarray hArray)
Get a 3D CUDA array descriptor. |
static int |
cuArrayCreate(CUarray pHandle,
CUDA_ARRAY_DESCRIPTOR pAllocateArray)
Creates a 1D or 2D CUDA array. |
static int |
cuArrayDestroy(CUarray hArray)
Destroys a CUDA array. |
static int |
cuArrayGetDescriptor(CUDA_ARRAY_DESCRIPTOR pArrayDescriptor,
CUarray hArray)
Get a 1D or 2D CUDA array descriptor. |
static int |
cuCtxAttach(CUcontext pctx,
int flags)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuCtxCreate(CUcontext pctx,
int flags,
CUdevice dev)
Create a CUDA context. |
static int |
cuCtxDestroy(CUcontext ctx)
Destroy a CUDA context. |
static int |
cuCtxDetach(CUcontext ctx)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuCtxDisablePeerAccess(CUcontext peerContext)
Disables direct access to memory allocations in a peer context and unregisters any registered allocations. |
static int |
cuCtxEnablePeerAccess(CUcontext peerContext,
int Flags)
Enables direct access to memory allocations in a peer context. |
static int |
cuCtxGetCacheConfig(int[] pconfig)
Returns the preferred cache configuration for the current context. |
static int |
cuCtxGetCurrent(CUcontext pctx)
Returns the CUDA context bound to the calling CPU thread. |
static int |
cuCtxGetDevice(CUdevice device)
Returns the device ID for the current context. |
static int |
cuCtxGetLimit(long[] pvalue,
int limit)
Returns resource limits. |
static int |
cuCtxPopCurrent(CUcontext pctx)
Pops the current CUDA context from the current CPU thread. |
static int |
cuCtxPushCurrent(CUcontext ctx)
Pushes a context on the current CPU thread. |
static int |
cuCtxSetCacheConfig(int config)
Sets the preferred cache configuration for the current context. |
static int |
cuCtxSetCurrent(CUcontext ctx)
Binds the specified CUDA context to the calling CPU thread. |
static int |
cuCtxSetLimit(int limit,
long value)
Set resource limits. |
static int |
cuCtxSynchronize()
Block for a context's tasks to complete. |
static int |
cuDeviceCanAccessPeer(int[] canAccessPeer,
CUdevice dev,
CUdevice peerDev)
Queries if a device may directly access a peer device's memory. |
static int |
cuDeviceComputeCapability(int[] major,
int[] minor,
CUdevice dev)
Returns the compute capability of the device. |
static int |
cuDeviceGet(CUdevice device,
int ordinal)
Returns a handle to a compute device. |
static int |
cuDeviceGetAttribute(int[] pi,
int attrib,
CUdevice dev)
Returns information about the device. |
static int |
cuDeviceGetCount(int[] count)
Returns the number of compute-capable devices. |
static int |
cuDeviceGetName(byte[] name,
int len,
CUdevice dev)
Returns an identifer string for the device. |
static int |
cuDeviceGetProperties(CUdevprop prop,
CUdevice dev)
Returns properties for a selected device. |
static int |
cuDeviceTotalMem(long[] bytes,
CUdevice dev)
Returns the total amount of memory on the device. |
static int |
cuDriverGetVersion(int[] driverVersion)
Returns the CUDA driver version. |
static int |
cuEventCreate(CUevent phEvent,
int Flags)
Creates an event. |
static int |
cuEventDestroy(CUevent hEvent)
Destroys an event. |
static int |
cuEventElapsedTime(float[] pMilliseconds,
CUevent hStart,
CUevent hEnd)
Computes the elapsed time between two events. |
static int |
cuEventQuery(CUevent hEvent)
Queries an event's status. |
static int |
cuEventRecord(CUevent hEvent,
CUstream hStream)
Records an event. |
static int |
cuEventSynchronize(CUevent hEvent)
Waits for an event to complete. |
static int |
cuFuncGetAttribute(int[] pi,
int attrib,
CUfunction func)
Returns information about a function. |
static int |
cuFuncSetBlockShape(CUfunction hfunc,
int x,
int y,
int z)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuFuncSetCacheConfig(CUfunction hfunc,
int config)
Sets the preferred cache configuration for a device function. |
static int |
cuFuncSetSharedSize(CUfunction hfunc,
int bytes)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLCtxCreate(CUcontext pCtx,
int Flags,
CUdevice device)
Create a CUDA context for interoperability with OpenGL. |
static int |
cuGLInit()
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLMapBufferObject(CUdeviceptr dptr,
long[] size,
int bufferobj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLMapBufferObjectAsync(CUdeviceptr dptr,
long[] size,
int buffer,
CUstream hStream)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLRegisterBufferObject(int bufferobj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLSetBufferObjectMapFlags(int buffer,
int Flags)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLUnmapBufferObject(int bufferobj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLUnmapBufferObjectAsync(int buffer,
CUstream hStream)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGLUnregisterBufferObject(int bufferobj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuGraphicsGLRegisterBuffer(CUgraphicsResource pCudaResource,
int buffer,
int Flags)
Registers an OpenGL buffer object. |
static int |
cuGraphicsGLRegisterImage(CUgraphicsResource pCudaResource,
int image,
int target,
int Flags)
Register an OpenGL texture or renderbuffer object. |
static int |
cuGraphicsMapResources(int count,
CUgraphicsResource[] resources,
CUstream hStream)
Map graphics resources for access by CUDA. |
static int |
cuGraphicsResourceGetMappedPointer(CUdeviceptr pDevPtr,
long[] pSize,
CUgraphicsResource resource)
Get a device pointer through which to access a mapped graphics resource. |
static int |
cuGraphicsResourceSetMapFlags(CUgraphicsResource resource,
int flags)
Set usage flags for mapping a graphics resource. |
static int |
cuGraphicsSubResourceGetMappedArray(CUarray pArray,
CUgraphicsResource resource,
int arrayIndex,
int mipLevel)
Get an array through which to access a subresource of a mapped graphics resource. |
static int |
cuGraphicsUnmapResources(int count,
CUgraphicsResource[] resources,
CUstream hStream)
Unmap graphics resources. |
static int |
cuGraphicsUnregisterResource(CUgraphicsResource resource)
Unregisters a graphics resource for access by CUDA. |
static int |
cuInit(int Flags)
Initialize the CUDA driver API. |
static int |
cuLaunch(CUfunction f)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuLaunchGrid(CUfunction f,
int grid_width,
int grid_height)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuLaunchGridAsync(CUfunction f,
int grid_width,
int grid_height,
CUstream hStream)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuLaunchKernel(CUfunction f,
int gridDimX,
int gridDimY,
int gridDimZ,
int blockDimX,
int blockDimY,
int blockDimZ,
int sharedMemBytes,
CUstream hStream,
Pointer kernelParams,
Pointer extra)
|
static int |
cuMemAlloc(CUdeviceptr dptr,
long bytesize)
Allocates device memory. |
static int |
cuMemAllocHost(Pointer pointer,
long bytesize)
Allocates page-locked host memory. |
static int |
cuMemAllocPitch(CUdeviceptr dptr,
long[] pPitch,
long WidthInBytes,
long Height,
int ElementSizeBytes)
Allocates pitched device memory. |
static int |
cuMemcpy(CUdeviceptr dst,
CUdeviceptr src,
long ByteCount)
Copies memory. |
static int |
cuMemcpy2D(CUDA_MEMCPY2D pCopy)
Copies memory for 2D arrays. |
static int |
cuMemcpy2DAsync(CUDA_MEMCPY2D pCopy,
CUstream hStream)
Copies memory for 2D arrays. |
static int |
cuMemcpy2DUnaligned(CUDA_MEMCPY2D pCopy)
Copies memory for 2D arrays. |
static int |
cuMemcpy3D(CUDA_MEMCPY3D pCopy)
Copies memory for 3D arrays. |
static int |
cuMemcpy3DAsync(CUDA_MEMCPY3D pCopy,
CUstream hStream)
Copies memory for 3D arrays. |
static int |
cuMemcpy3DPeer(CUDA_MEMCPY3D_PEER pCopy)
Copies memory between contexts. |
static int |
cuMemcpy3DPeerAsync(CUDA_MEMCPY3D_PEER pCopy,
CUstream hStream)
Copies memory between contexts asynchronously. |
static int |
cuMemcpyAsync(CUdeviceptr dst,
CUdeviceptr src,
long ByteCount,
CUstream hStream)
Copies memory asynchronously. |
static int |
cuMemcpyAtoA(CUarray dstArray,
long dstIndex,
CUarray srcArray,
long srcIndex,
long ByteCount)
Copies memory from Array to Array. |
static int |
cuMemcpyAtoD(CUdeviceptr dstDevice,
CUarray hSrc,
long SrcIndex,
long ByteCount)
Copies memory from Array to Device. |
static int |
cuMemcpyAtoH(Pointer dstHost,
CUarray srcArray,
long srcIndex,
long ByteCount)
Copies memory from Array to Host. |
static int |
cuMemcpyAtoHAsync(Pointer dstHost,
CUarray srcArray,
long srcIndex,
long ByteCount,
CUstream hStream)
Copies memory from Array to Host. |
static int |
cuMemcpyDtoA(CUarray dstArray,
long dstIndex,
CUdeviceptr srcDevice,
long ByteCount)
Copies memory from Device to Array. |
static int |
cuMemcpyDtoD(CUdeviceptr dstDevice,
CUdeviceptr srcDevice,
long ByteCount)
Copies memory from Device to Device. |
static int |
cuMemcpyDtoDAsync(CUdeviceptr dstDevice,
CUdeviceptr srcDevice,
long ByteCount,
CUstream hStream)
Copies memory from Device to Device. |
static int |
cuMemcpyDtoH(Pointer dstHost,
CUdeviceptr srcDevice,
long ByteCount)
Copies memory from Device to Host. |
static int |
cuMemcpyDtoHAsync(Pointer dstHost,
CUdeviceptr srcDevice,
long ByteCount,
CUstream hStream)
Copies memory from Device to Host. |
static int |
cuMemcpyHtoA(CUarray dstArray,
long dstIndex,
Pointer pSrc,
long ByteCount)
Copies memory from Host to Array. |
static int |
cuMemcpyHtoAAsync(CUarray dstArray,
long dstIndex,
Pointer pSrc,
long ByteCount,
CUstream hStream)
Copies memory from Host to Array. |
static int |
cuMemcpyHtoD(CUdeviceptr dstDevice,
Pointer srcHost,
long ByteCount)
Copies memory from Host to Device. |
static int |
cuMemcpyHtoDAsync(CUdeviceptr dstDevice,
Pointer srcHost,
long ByteCount,
CUstream hStream)
Copies memory from Host to Device. |
static int |
cuMemcpyPeer(CUdeviceptr dstDevice,
CUcontext dstContext,
CUdeviceptr srcDevice,
CUcontext srcContext,
long ByteCount)
Copies device memory between two contexts. |
static int |
cuMemcpyPeerAsync(CUdeviceptr dstDevice,
CUcontext dstContext,
CUdeviceptr srcDevice,
CUcontext srcContext,
long ByteCount,
CUstream hStream)
Copies device memory between two contexts asynchronously. |
static int |
cuMemFree(CUdeviceptr dptr)
Frees device memory. |
static int |
cuMemFreeHost(Pointer p)
Frees page-locked host memory. |
static int |
cuMemGetAddressRange(CUdeviceptr pbase,
long[] psize,
CUdeviceptr dptr)
Get information on memory allocations. |
static int |
cuMemGetInfo(long[] free,
long[] total)
Gets free and total memory. |
static int |
cuMemHostAlloc(Pointer pp,
long bytes,
int Flags)
Allocates page-locked host memory. |
static int |
cuMemHostGetDevicePointer(CUdeviceptr ret,
Pointer p,
int Flags)
Passes back device pointer of mapped pinned memory. |
static int |
cuMemHostGetFlags(int[] pFlags,
Pointer p)
Passes back flags that were used for a pinned allocation. |
static int |
cuMemHostRegister(Pointer p,
long bytesize,
int Flags)
Registers an existing host memory range for use by CUDA. |
static int |
cuMemHostUnregister(Pointer p)
Unregisters a memory range that was registered with cuMemHostRegister(). |
static int |
cuMemPeerGetDevicePointer(CUdeviceptr pdptr,
CUdeviceptr peerPointer,
CUcontext peerContext,
int Flags)
Deprecated. This function has been added in CUDA 4.0 RC, and removed in CUDA 4.0 RC2. In the current release, it will throw an UnsupportedOperationException. |
static int |
cuMemPeerRegister(CUdeviceptr peerPointer,
CUcontext peerContext,
int Flags)
Deprecated. This function has been added in CUDA 4.0 RC, and removed in CUDA 4.0 RC2. In the current release, it will throw an UnsupportedOperationException. |
static int |
cuMemPeerUnregister(CUdeviceptr peerPointer,
CUcontext peerContext)
Deprecated. This function has been added in CUDA 4.0 RC, and removed in CUDA 4.0 RC2. In the current release, it will throw an UnsupportedOperationException. |
static int |
cuMemsetD16(CUdeviceptr dstDevice,
short us,
long N)
Initializes device memory. |
static int |
cuMemsetD16Async(CUdeviceptr dstDevice,
short us,
long N,
CUstream hStream)
Sets device memory. |
static int |
cuMemsetD2D16(CUdeviceptr dstDevice,
long dstPitch,
short us,
long Width,
long Height)
Initializes device memory. |
static int |
cuMemsetD2D16Async(CUdeviceptr dstDevice,
long dstPitch,
short us,
long Width,
long Height,
CUstream hStream)
Sets device memory. |
static int |
cuMemsetD2D32(CUdeviceptr dstDevice,
long dstPitch,
int ui,
long Width,
long Height)
Initializes device memory. |
static int |
cuMemsetD2D32Async(CUdeviceptr dstDevice,
long dstPitch,
int ui,
long Width,
long Height,
CUstream hStream)
Sets device memory. |
static int |
cuMemsetD2D8(CUdeviceptr dstDevice,
long dstPitch,
byte uc,
long Width,
long Height)
Initializes device memory. |
static int |
cuMemsetD2D8Async(CUdeviceptr dstDevice,
long dstPitch,
byte uc,
long Width,
long Height,
CUstream hStream)
Sets device memory. |
static int |
cuMemsetD32(CUdeviceptr dstDevice,
int ui,
long N)
Initializes device memory. |
static int |
cuMemsetD32Async(CUdeviceptr dstDevice,
int ui,
long N,
CUstream hStream)
Sets device memory. |
static int |
cuMemsetD8(CUdeviceptr dstDevice,
byte uc,
long N)
Initializes device memory. |
static int |
cuMemsetD8Async(CUdeviceptr dstDevice,
byte uc,
long N,
CUstream hStream)
Sets device memory. |
static int |
cuModuleGetFunction(CUfunction hfunc,
CUmodule hmod,
java.lang.String name)
Returns a function handle. |
static int |
cuModuleGetGlobal(CUdeviceptr dptr,
long[] bytes,
CUmodule hmod,
java.lang.String name)
Returns a global pointer from a module. |
static int |
cuModuleGetSurfRef(CUsurfref pSurfRef,
CUmodule hmod,
java.lang.String name)
Returns a handle to a surface reference. |
static int |
cuModuleGetTexRef(CUtexref pTexRef,
CUmodule hmod,
java.lang.String name)
Returns a handle to a texture reference. |
static int |
cuModuleLoad(CUmodule module,
java.lang.String fname)
Loads a compute module. |
static int |
cuModuleLoadData(CUmodule module,
byte[] image)
Load a module's data. |
static int |
cuModuleLoadDataEx(CUmodule phMod,
Pointer p,
int numOptions,
int[] options,
Pointer optionValues)
Load a module's data with options. |
static int |
cuModuleLoadFatBinary(CUmodule module,
byte[] fatCubin)
Load a module's data. |
static int |
cuModuleUnload(CUmodule hmod)
Unloads a module. |
static int |
cuParamSetf(CUfunction hfunc,
int offset,
float value)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuParamSeti(CUfunction hfunc,
int offset,
int value)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuParamSetSize(CUfunction hfunc,
int numbytes)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuParamSetTexRef(CUfunction hfunc,
int texunit,
CUtexref hTexRef)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuParamSetv(CUfunction hfunc,
int offset,
Pointer ptr,
int numbytes)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuPointerGetAttribute(Pointer data,
int attribute,
CUdeviceptr ptr)
Returns information about a pointer. |
static int |
cuProfilerInitialize(java.lang.String configFile,
java.lang.String outputFile,
int outputMode)
Initialize the profiling. |
static int |
cuProfilerStart()
Start the profiling. |
static int |
cuProfilerStop()
Stop the profiling. |
static int |
cuStreamCreate(CUstream phStream,
int Flags)
Create a stream. |
static int |
cuStreamDestroy(CUstream hStream)
Destroys a stream. |
static int |
cuStreamQuery(CUstream hStream)
Determine status of a compute stream. |
static int |
cuStreamSynchronize(CUstream hStream)
Wait until a stream's tasks are completed. |
static int |
cuStreamWaitEvent(CUstream hStream,
CUevent hEvent,
int Flags)
Make a compute stream wait on an event. |
static int |
cuSurfRefGetArray(CUarray phArray,
CUsurfref hSurfRef)
Passes back the CUDA array bound to a surface reference. |
static int |
cuSurfRefSetArray(CUsurfref hSurfRef,
CUarray hArray,
int Flags)
Sets the CUDA array for a surface reference. |
static int |
cuTexRefCreate(CUtexref pTexRef)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuTexRefDestroy(CUtexref hTexRef)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cuTexRefGetAddress(CUdeviceptr pdptr,
CUtexref hTexRef)
Gets the address associated with a texture reference. |
static int |
cuTexRefGetAddressMode(int[] pam,
CUtexref hTexRef,
int dim)
Gets the addressing mode used by a texture reference. |
static int |
cuTexRefGetArray(CUarray phArray,
CUtexref hTexRef)
Gets the array bound to a texture reference. |
static int |
cuTexRefGetFilterMode(int[] pfm,
CUtexref hTexRef)
Gets the filter-mode used by a texture reference. |
static int |
cuTexRefGetFlags(int[] pFlags,
CUtexref hTexRef)
Gets the flags used by a texture reference. |
static int |
cuTexRefGetFormat(int[] pFormat,
int[] pNumChannels,
CUtexref hTexRef)
Gets the format used by a texture reference. |
static int |
cuTexRefSetAddress(long[] ByteOffset,
CUtexref hTexRef,
CUdeviceptr dptr,
long bytes)
Binds an address as a texture reference. |
static int |
cuTexRefSetAddress2D(CUtexref hTexRef,
CUDA_ARRAY_DESCRIPTOR desc,
CUdeviceptr dptr,
long PitchInBytes)
Binds an address as a 2D texture reference. |
static int |
cuTexRefSetAddressMode(CUtexref hTexRef,
int dim,
int am)
Sets the addressing mode for a texture reference. |
static int |
cuTexRefSetArray(CUtexref hTexRef,
CUarray hArray,
int Flags)
Binds an array as a texture reference. |
static int |
cuTexRefSetFilterMode(CUtexref hTexRef,
int fm)
Sets the filtering mode for a texture reference. |
static int |
cuTexRefSetFlags(CUtexref hTexRef,
int Flags)
Sets the flags for a texture reference. |
static int |
cuTexRefSetFormat(CUtexref hTexRef,
int fmt,
int NumPackedComponents)
Sets the format for a texture reference. |
static void |
setExceptionsEnabled(boolean enabled)
Enables or disables exceptions. |
static void |
setLogLevel(LogLevel logLevel)
Set the specified log level for the JCuda driver library. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int CUDA_VERSION
public static final int CU_MEMHOSTALLOC_PORTABLE
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CU_MEMHOSTALLOC_DEVICEMAP
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CU_MEMHOSTALLOC_WRITECOMBINED
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CU_MEMHOSTREGISTER_PORTABLE
public static final int CU_MEMHOSTREGISTER_DEVICEMAP
public static final int CU_MEMPEERREGISTER_DEVICEMAP
public static final int CUDA_ARRAY3D_LAYERED
public static final int CUDA_ARRAY3D_2DARRAY
public static final int CUDA_ARRAY3D_SURFACE_LDST
public static final int CU_PARAM_TR_DEFAULT
public static final int CU_TRSA_OVERRIDE_FORMAT
public static final int CU_TRSF_READ_AS_INTEGER
public static final int CU_TRSF_NORMALIZED_COORDINATES
public static final int CU_TRSF_SRGB
public static final Pointer CU_LAUNCH_PARAM_END
public static final Pointer CU_LAUNCH_PARAM_BUFFER_POINTER
public static final Pointer CU_LAUNCH_PARAM_BUFFER_SIZE
Method Detail |
---|
public static void setLogLevel(LogLevel logLevel)
logLevel
- The log level to use.public static void setExceptionsEnabled(boolean enabled)
enabled
- Whether exceptions are enabledpublic static int align(int value, int alignment)
value
- The address valuealignment
- The desired alignment
public static int cuInit(int Flags)
CUresult cuInit | ( | unsigned int | Flags | ) |
Initializes the driver API and must be called before any other function
from the driver API. Currently, the Flags
parameter must
be 0. If cuInit() has not been called, any function from the driver
API will return CUDA_ERROR_NOT_INITIALIZED.
public static int cuDeviceGet(CUdevice device, int ordinal)
CUresult cuDeviceGet | ( | CUdevice * | device, | |
int | ordinal | |||
) |
Returns in *device
a device handle given an ordinal in
the range [0, cuDeviceGetCount()-1].
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDeviceGetCount(int[] count)
CUresult cuDeviceGetCount | ( | int * | count | ) |
Returns in *count
the number of devices with compute
capability greater than or equal to 1.0 that are available for
execution. If there is no such device, cuDeviceGetCount() returns
0.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDeviceGetName(byte[] name, int len, CUdevice dev)
CUresult cuDeviceGetName | ( | char * | name, | |
int | len, | |||
CUdevice | dev | |||
) |
Returns an ASCII string identifying the device dev
in the
NULL-terminated string pointed to by name
. len
specifies the maximum length of the string that may be returned.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDeviceComputeCapability(int[] major, int[] minor, CUdevice dev)
CUresult cuDeviceComputeCapability | ( | int * | major, | |
int * | minor, | |||
CUdevice | dev | |||
) |
Returns in *major
and *minor
the major and
minor revision numbers that define the compute capability of the device
dev
.
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDeviceTotalMem(long[] bytes, CUdevice dev)
CUresult cuDeviceTotalMem | ( | size_t * | bytes, | |
CUdevice | dev | |||
) |
Returns in *bytes
the total amount of memory available on
the device dev
in bytes.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
public static int cuDeviceGetProperties(CUdevprop prop, CUdevice dev)
CUresult cuDeviceGetProperties | ( | CUdevprop * | prop, | |
CUdevice | dev | |||
) |
Returns in *prop
the properties of device dev
.
The CUdevprop structure is defined as:
typedef struct CUdevprop_st { int maxThreadsPerBlock; int maxThreadsDim[3]; int maxGridSize[3]; int sharedMemPerBlock; int totalConstantMemory; int SIMDWidth; int memPitch; int regsPerBlock; int clockRate; int textureAlign } CUdevprop;
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDeviceGetAttribute(int[] pi, int attrib, CUdevice dev)
CUresult cuDeviceGetAttribute | ( | int * | pi, | |
CUdevice_attribute | attrib, | |||
CUdevice | dev | |||
) |
Returns in *pi
the integer value of the attribute
attrib
on device dev
. The supported attributes
are:
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(long[], jcuda.driver.CUdevice)
public static int cuDriverGetVersion(int[] driverVersion)
CUresult cuDriverGetVersion | ( | int * | driverVersion | ) |
Returns in *driverVersion
the version number of the
installed CUDA driver. This function automatically returns
CUDA_ERROR_INVALID_VALUE if the driverVersion
argument is
NULL.
public static int cuCtxCreate(CUcontext pctx, int flags, CUdevice dev)
CUresult cuCtxCreate | ( | CUcontext * | pctx, | |
unsigned int | flags, | |||
CUdevice | dev | |||
) |
Creates a new CUDA context and associates it with the calling thread.
The flags
parameter is described below. The context is
created with a usage count of 1 and the caller of cuCtxCreate() must
call cuCtxDestroy() or when done using the context. If a context is
already current to the thread, it is supplanted by the newly created
context and may be restored by a subsequent call to
cuCtxPopCurrent().
The three LSBs of the flags
parameter can be used to
control how the OS thread, which owns the CUDA context at the time of
an API call, interacts with the OS scheduler when waiting for results
from the GPU. Only one of the scheduling flags can be set when creating
a context.
flags
parameter is zero, uses a heuristic based on the number of active CUDA
contexts in the process C and the number of logical processors
in the system P. If C > P, then CUDA will
yield to other OS threads when waiting for the GPU, otherwise CUDA will
not yield while waiting for results and actively spin on the
processor.
Note to Linux users:
Context creation will fail with CUDA_ERROR_UNKNOWN if the compute mode of the device is CU_COMPUTEMODE_PROHIBITED. Similarly, context creation will also fail with CUDA_ERROR_UNKNOWN if the compute mode for the device is set to CU_COMPUTEMODE_EXCLUSIVE and there is already an active context on the device. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. The nvidia-smi tool can be used to set the compute mode for devices. Documentation for nvidia-smi can be obtained by passing a -h option to it.
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxDestroy(CUcontext ctx)
CUresult cuCtxDestroy | ( | CUcontext | ctx | ) |
Destroys the CUDA context specified by ctx
. The context
ctx
will be destroyed regardless of how many threads it
is current to. It is the caller's responsibility to ensure that no API
call is issued to ctx
while cuCtxDestroy() is
executing.
If ctx
is current to the calling thread then ctx
will also be popped from the current thread's context stack (as though
cuCtxPopCurrent() were called). If ctx
is current to other
threads, then ctx
will remain current to those threads,
and attempting to access ctx
from those threads will
result in the error CUDA_ERROR_CONTEXT_IS_DESTROYED.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxAttach(CUcontext pctx, int flags)
CUresult cuCtxAttach | ( | CUcontext * | pctx, | |
unsigned int | flags | |||
) |
Increments the usage count of the context and passes back a context
handle in *pctx
that must be passed to cuCtxDetach() when
the application is done with the context. cuCtxAttach() fails if there
is no context current to the thread.
Currently, the flags
parameter must be 0.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxDetach(CUcontext ctx)
CUresult cuCtxDetach | ( | CUcontext | ctx | ) |
Decrements the usage count of the context ctx
, and destroys
the context if the usage count goes to 0. The context must be a handle
that was passed back by cuCtxCreate() or cuCtxAttach(), and must be
current to the calling thread.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxPushCurrent(CUcontext ctx)
CUresult cuCtxPushCurrent | ( | CUcontext | ctx | ) |
Pushes the given context ctx
onto the CPU thread's stack
of current contexts. The specified context becomes the CPU thread's
current context, so all CUDA functions that operate on the current
context are affected.
The previous current context may be made current again by calling cuCtxDestroy() or cuCtxPopCurrent().
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxPopCurrent(CUcontext pctx)
CUresult cuCtxPopCurrent | ( | CUcontext * | pctx | ) |
Pops the current CUDA context from the CPU thread and passes back the
old context handle in *pctx
. That context may then be made
current to a different CPU thread by calling cuCtxPushCurrent().
If a context was current to the CPU thread before cuCtxCreate() or cuCtxPushCurrent() was called, this function makes that context current to the CPU thread again.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxSetCurrent(CUcontext ctx)
CUresult cuCtxSetCurrent | ( | CUcontext | ctx | ) |
Binds the specified CUDA context to the calling CPU thread. If
ctx
is NULL then the CUDA context previously bound to the
calling CPU thread is unbound and CUDA_SUCCESS is returned.
If there exists a CUDA context stack on the calling CPU thread, this
will replace the top of that stack with ctx
. If
ctx
is NULL then this will be equivalent to popping the
top of the calling CPU thread's CUDA context stack (or a no-op if the
calling CPU thread's CUDA context stack is empty).
cuCtxGetCurrent(jcuda.driver.CUcontext)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
public static int cuCtxGetCurrent(CUcontext pctx)
CUresult cuCtxGetCurrent | ( | CUcontext * | pctx | ) |
Returns in *pctx
the CUDA context bound to the calling
CPU thread. If no context is bound to the calling CPU thread then
*pctx
is set to NULL and CUDA_SUCCESS is returned.
cuCtxSetCurrent(jcuda.driver.CUcontext)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
public static int cuCtxGetDevice(CUdevice device)
CUresult cuCtxGetDevice | ( | CUdevice * | device | ) |
Returns in *device
the ordinal of the current context's
device.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuCtxSynchronize()
CUresult cuCtxSynchronize | ( | void | ) |
Blocks until the device has completed all preceding requested tasks. cuCtxSynchronize() returns an error if one of the preceding tasks failed. If the context was created with the CU_CTX_SCHED_BLOCKING_SYNC flag, the CPU thread will block until the GPU context has finished its work.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetLimit(int, long)
public static int cuModuleLoad(CUmodule module, java.lang.String fname)
CUresult cuModuleLoad | ( | CUmodule * | module, | |
const char * | fname | |||
) |
Takes a filename fname
and loads the corresponding module
module
into the current context. The CUDA driver API does
not attempt to lazily allocate the resources needed by a module; if
the memory for functions and data (constant and global) needed by the
module cannot be allocated, cuModuleLoad() fails. The file should be a
cubin file as output by nvcc, or a PTX file
either as output by nvcc or handwritten, or a fatbin
file as output by nvcc from toolchain 4.0 or later.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadData(CUmodule module, byte[] image)
CUresult cuModuleLoadData | ( | CUmodule * | module, | |
const void * | image | |||
) |
Takes a pointer image
and loads the corresponding module
module
into the current context. The pointer may be
obtained by mapping a cubin or PTX or fatbin
file, passing a cubin or PTX or fatbin file
as a NULL-terminated text string, or incorporating a cubin or
fatbin object into the executable resources and using operating
system calls such as Windows FindResource()
to obtain the
pointer.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadDataEx(CUmodule phMod, Pointer p, int numOptions, int[] options, Pointer optionValues)
CUresult cuModuleLoadDataEx | ( | CUmodule * | module, | |
const void * | image, | |||
unsigned int | numOptions, | |||
CUjit_option * | options, | |||
void ** | optionValues | |||
) |
Takes a pointer image
and loads the corresponding module
module
into the current context. The pointer may be
obtained by mapping a cubin or PTX or fatbin
file, passing a cubin or PTX or fatbin file
as a NULL-terminated text string, or incorporating a cubin or
fatbin object into the executable resources and using operating
system calls such as Windows FindResource()
to obtain the
pointer. Options are passed as an array via options
and
any corresponding parameters are passed in optionValues
.
The number of total options is supplied via numOptions
.
Any outputs will be returned via optionValues
. Supported
options are (types for the option values are specified in parentheses
after the option name):
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadFatBinary(CUmodule module, byte[] fatCubin)
CUresult cuModuleLoadFatBinary | ( | CUmodule * | module, | |
const void * | fatCubin | |||
) |
Takes a pointer fatCubin
and loads the corresponding
module module
into the current context. The pointer
represents a fat binary object, which is a collection of
different cubin and/or PTX files, all representing
the same device code, but compiled and optimized for different
architectures.
Prior to CUDA 4.0, there was no documented API for constructing and using fat binary objects by programmers. Starting with CUDA 4.0, fat binary objects can be constructed by providing the -fatbin option to nvcc. More information can be found in the nvcc document.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleUnload(CUmodule hmod)
CUresult cuModuleUnload | ( | CUmodule | hmod | ) |
Unloads a module hmod
from the current context.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
public static int cuModuleGetFunction(CUfunction hfunc, CUmodule hmod, java.lang.String name)
CUresult cuModuleGetFunction | ( | CUfunction * | hfunc, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *hfunc
the handle of the function of name
name
located in module hmod
. If no function
of that name exists, cuModuleGetFunction() returns
CUDA_ERROR_NOT_FOUND.
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetGlobal(CUdeviceptr dptr, long[] bytes, CUmodule hmod, java.lang.String name)
CUresult cuModuleGetGlobal | ( | CUdeviceptr * | dptr, | |
size_t * | bytes, | |||
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *dptr
and *bytes
the base pointer
and size of the global of name name
located in module
hmod
. If no variable of that name exists, cuModuleGetGlobal()
returns CUDA_ERROR_NOT_FOUND. Both parameters dptr
and
bytes
are optional. If one of them is NULL, it is
ignored.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetTexRef(CUtexref pTexRef, CUmodule hmod, java.lang.String name)
CUresult cuModuleGetTexRef | ( | CUtexref * | pTexRef, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *pTexRef
the handle of the texture reference
of name name
in the module hmod
. If no
texture reference of that name exists, cuModuleGetTexRef() returns
CUDA_ERROR_NOT_FOUND. This texture reference handle should not be
destroyed, since it will be destroyed when the module is unloaded.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetSurfRef(CUsurfref pSurfRef, CUmodule hmod, java.lang.String name)
CUresult cuModuleGetSurfRef | ( | CUsurfref * | pSurfRef, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *pSurfRef
the handle of the surface reference
of name name
in the module hmod
. If no
surface reference of that name exists, cuModuleGetSurfRef() returns
CUDA_ERROR_NOT_FOUND.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuMemGetInfo(long[] free, long[] total)
CUresult cuMemGetInfo | ( | size_t * | free, | |
size_t * | total | |||
) |
Returns in *free
and *total
respectively,
the free and total amount of memory available for allocation by the
CUDA context, in bytes.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemHostAlloc(Pointer pp, long bytes, int Flags)
CUresult cuMemHostAlloc | ( | void ** | pp, | |
size_t | bytesize, | |||
unsigned int | Flags | |||
) |
Allocates bytesize
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cuMemcpyHtoD(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of pinned memory may degrade system
performance, since it reduces the amount of memory available to the
system for paging. As a result, this function is best used sparingly
to allocate staging areas for data exchange between host and
device.
The Flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
The CUDA context must have been created with the CU_CTX_MAP_HOST flag in order for the CU_MEMHOSTALLOC_MAPPED flag to have any effect.
The CU_MEMHOSTALLOC_MAPPED flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cuMemHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the CU_MEMHOSTALLOC_PORTABLE flag.
The memory allocated by this function must be freed with cuMemFreeHost().
Note all host memory allocated using cuMemHostAlloc() will automatically
be immediately accessible to all contexts on all devices which support
unified addressing (as may be queried using
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING). Unless the flag
CU_MEMHOSTALLOC_WRITECOMBINED is specified, the device pointer that
may be used to access this host memory from those contexts is always
equal to the returned host pointer *pp
. If the flag
CU_MEMHOSTALLOC_WRITECOMBINED is specified, then the function
cuMemHostGetDevicePointer() must be used to query the device pointer,
even if the context supports unified addressing. See Unified Addressing
for additional details.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemHostGetDevicePointer(CUdeviceptr ret, Pointer p, int Flags)
CUresult cuMemHostGetDevicePointer | ( | CUdeviceptr * | pdptr, | |
void * | p, | |||
unsigned int | Flags | |||
) |
Passes back the device pointer pdptr
corresponding to the
mapped, pinned host buffer p
allocated by
cuMemHostAlloc.
cuMemHostGetDevicePointer() will fail if the CU_MEMALLOCHOST_DEVICEMAP flag was not specified at the time the memory was allocated, or if the function is called on a GPU that does not support mapped pinned memory.
Flags
provides for future releases. For now, it must be
set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemHostGetFlags(int[] pFlags, Pointer p)
CUresult cuMemHostGetFlags | ( | unsigned int * | pFlags, | |
void * | p | |||
) |
Passes back the flags pFlags
that were specified when
allocating the pinned host buffer p
allocated by
cuMemHostAlloc.
cuMemHostGetFlags() will fail if the pointer does not reside in an allocation performed by cuMemAllocHost() or cuMemHostAlloc().
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemHostAlloc(jcuda.Pointer, long, int)
public static int cuMemHostRegister(Pointer p, long bytesize, int Flags)
CUresult cuMemHostRegister | ( | void * | p, | |
size_t | bytesize, | |||
unsigned int | Flags | |||
) |
Page-locks the memory range specified by p
and
bytesize
and maps it for the device(s) as specified by
Flags
. This memory range also is added to the same tracking
mechanism as cuMemHostAlloc to automatically accelerate calls to
functions such as cuMemcpyHtoD(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory that has not been registered. Page-locking
excessive amounts of memory may degrade system performance, since it
reduces the amount of memory available to the system for paging. As a
result, this function is best used sparingly to register staging areas
for data exchange between host and device.
This function is not yet supported on Mac OS X.
The Flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may page-lock memory that is portable or mapped with no restrictions.
The CUDA context must have been created with the CU_CTX_MAP_HOST flag in order for the CU_MEMHOSTREGISTER_DEVICEMAP flag to have any effect.
The CU_MEMHOSTREGISTER_DEVICEMAP flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cuMemHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the CU_MEMHOSTREGISTER_PORTABLE flag.
The pointer p
and size bytesize
must be
aligned to the host page size (4 KB).
The memory page-locked by this function must be unregistered with cuMemHostUnregister().
cuMemHostUnregister(jcuda.Pointer)
,
cuMemHostGetFlags(int[], jcuda.Pointer)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
public static int cuMemHostUnregister(Pointer p)
CUresult cuMemHostUnregister | ( | void * | p | ) |
Unmaps the memory range whose base address is specified by
p
, and makes it pageable again.
The base address must be the same one specified to cuMemHostRegister().
cuMemHostRegister(jcuda.Pointer, long, int)
public static int cuMemcpy(CUdeviceptr dst, CUdeviceptr src, long ByteCount)
CUresult cuMemcpy | ( | CUdeviceptr | dst, | |
CUdeviceptr | src, | |||
size_t | ByteCount | |||
) |
Copies data between two pointers. dst
and src
are base pointers of the destination and source, respectively.
ByteCount
specifies the number of bytes to copy. Note that
this function infers the type of the transfer (host to host, host to
device, device to device, or device to host) from the pointer values.
This function is only allowed in contexts which support unified
addressing. Note that this function is synchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyPeer(CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, long ByteCount)
CUresult cuMemcpyPeer | ( | CUdeviceptr | dstDevice, | |
CUcontext | dstContext, | |||
CUdeviceptr | srcDevice, | |||
CUcontext | srcContext, | |||
size_t | ByteCount | |||
) |
Copies from device memory in one context to device memory in another
context. dstDevice
is the base device pointer of the
destination memory and dstContext
is the destination
context. srcDevice
is the base device pointer of the
source memory and srcContext
is the source pointer.
ByteCount
specifies the number of bytes to copy.
Note that this function is asynchronous with respect to the host, but
serialized with respect all pending and future asynchronous work in to
the current context, srcContext
, and dstContext
(use cuMemcpyPeerAsync to avoid this synchronization).
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpy3DPeer(jcuda.driver.CUDA_MEMCPY3D_PEER)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyPeerAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long, jcuda.driver.CUstream)
,
cuMemcpy3DPeerAsync(jcuda.driver.CUDA_MEMCPY3D_PEER, jcuda.driver.CUstream)
public static int cuMemAlloc(CUdeviceptr dptr, long bytesize)
CUresult cuMemAlloc | ( | CUdeviceptr * | dptr, | |
size_t | bytesize | |||
) |
Allocates bytesize
bytes of linear memory on the device
and returns in *dptr
a pointer to the allocated memory.
The allocated memory is suitably aligned for any kind of variable. The
memory is not cleared. If bytesize
is 0, cuMemAlloc()
returns CUDA_ERROR_INVALID_VALUE.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemAllocPitch(CUdeviceptr dptr, long[] pPitch, long WidthInBytes, long Height, int ElementSizeBytes)
CUresult cuMemAllocPitch | ( | CUdeviceptr * | dptr, | |
size_t * | pPitch, | |||
size_t | WidthInBytes, | |||
size_t | Height, | |||
unsigned int | ElementSizeBytes | |||
) |
Allocates at least WidthInBytes
* Height
bytes of linear memory on the device and returns in *dptr
a pointer to the allocated memory. The function may pad the allocation
to ensure that corresponding pointers in any given row will continue
to meet the alignment requirements for coalescing as the address is
updated from row to row. ElementSizeBytes
specifies the
size of the largest reads and writes that will be performed on the
memory range. ElementSizeBytes
may be 4, 8 or 16 (since
coalesced memory transactions are not possible on other data sizes).
If ElementSizeBytes
is smaller than the actual read/write
size of a kernel, the kernel will run correctly, but possibly at
reduced speed. The pitch returned in *pPitch
by
cuMemAllocPitch() is the width in bytes of the allocation. The intended
usage of pitch is as a separate parameter of the allocation, used to
compute addresses within the 2D array. Given the row and column of an
array element of type T, the address is computed as:
T* pElement = (T*)((char*)BaseAddress + Row * Pitch)
+ Column;
The pitch returned by cuMemAllocPitch() is guaranteed to work with cuMemcpy2D() under all circumstances. For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cuMemAllocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).
The byte alignment of the pitch returned by cuMemAllocPitch() is guaranteed to match or exceed the alignment requirement for texture binding with cuTexRefSetAddress2D().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemFree(CUdeviceptr dptr)
CUresult cuMemFree | ( | CUdeviceptr | dptr | ) |
Frees the memory space pointed to by dptr
, which must have
been returned by a previous call to cuMemAlloc() or
cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemGetAddressRange(CUdeviceptr pbase, long[] psize, CUdeviceptr dptr)
CUresult cuMemGetAddressRange | ( | CUdeviceptr * | pbase, | |
size_t * | psize, | |||
CUdeviceptr | dptr | |||
) |
Returns the base address in *pbase
and size in
*psize
of the allocation by cuMemAlloc() or cuMemAllocPitch()
that contains the input pointer dptr
. Both parameters
pbase
and psize
are optional. If one of them
is NULL, it is ignored.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemAllocHost(Pointer pointer, long bytesize)
CUresult cuMemAllocHost | ( | void ** | pp, | |
size_t | bytesize | |||
) |
Allocates bytesize
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cuMemcpy(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of memory with cuMemAllocHost() may
degrade system performance, since it reduces the amount of memory
available to the system for paging. As a result, this function is best
used sparingly to allocate staging areas for data exchange between host
and device.
Note all host memory allocated using cuMemHostAlloc() will automatically
be immediately accessible to all contexts on all devices which support
unified addressing (as may be queried using
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING). The device pointer that may
be used to access this host memory from those contexts is always equal
to the returned host pointer *pp
. See Unified Addressing
for additional details.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemFreeHost(Pointer p)
CUresult cuMemFreeHost | ( | void * | p | ) |
Frees the memory space pointed to by p
, which must have
been returned by a previous call to cuMemAllocHost().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyHtoD(CUdeviceptr dstDevice, Pointer srcHost, long ByteCount)
CUresult cuMemcpyHtoD | ( | CUdeviceptr | dstDevice, | |
const void * | srcHost, | |||
size_t | ByteCount | |||
) |
Copies from host memory to device memory. dstDevice
and
srcHost
are the base addresses of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is synchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyDtoH(Pointer dstHost, CUdeviceptr srcDevice, long ByteCount)
CUresult cuMemcpyDtoH | ( | void * | dstHost, | |
CUdeviceptr | srcDevice, | |||
size_t | ByteCount | |||
) |
Copies from device to host memory. dstHost
and
srcDevice
specify the base pointers of the destination
and source, respectively. ByteCount
specifies the number
of bytes to copy. Note that this function is synchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyDtoD(CUdeviceptr dstDevice, CUdeviceptr srcDevice, long ByteCount)
CUresult cuMemcpyDtoD | ( | CUdeviceptr | dstDevice, | |
CUdeviceptr | srcDevice, | |||
size_t | ByteCount | |||
) |
Copies from device memory to device memory. dstDevice
and
srcDevice
are the base pointers of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is asynchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyDtoA(CUarray dstArray, long dstIndex, CUdeviceptr srcDevice, long ByteCount)
CUresult cuMemcpyDtoA | ( | CUarray | dstArray, | |
size_t | dstOffset, | |||
CUdeviceptr | srcDevice, | |||
size_t | ByteCount | |||
) |
Copies from device memory to a 1D CUDA array. dstArray
and dstOffset
specify the CUDA array handle and starting
index of the destination data. srcDevice
specifies the
base pointer of the source. ByteCount
specifies the number
of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyAtoD(CUdeviceptr dstDevice, CUarray hSrc, long SrcIndex, long ByteCount)
CUresult cuMemcpyAtoD | ( | CUdeviceptr | dstDevice, | |
CUarray | srcArray, | |||
size_t | srcOffset, | |||
size_t | ByteCount | |||
) |
Copies from one 1D CUDA array to device memory. dstDevice
specifies the base pointer of the destination and must be naturally
aligned with the CUDA array elements. srcArray
and
srcOffset
specify the CUDA array handle and the offset in
bytes into the array where the copy is to begin. ByteCount
specifies the number of bytes to copy and must be evenly divisible by
the array element size.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyHtoA(CUarray dstArray, long dstIndex, Pointer pSrc, long ByteCount)
CUresult cuMemcpyHtoA | ( | CUarray | dstArray, | |
size_t | dstOffset, | |||
const void * | srcHost, | |||
size_t | ByteCount | |||
) |
Copies from host memory to a 1D CUDA array. dstArray
and
dstOffset
specify the CUDA array handle and starting
offset in bytes of the destination data. pSrc
specifies
the base address of the source. ByteCount
specifies the
number of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyAtoH(Pointer dstHost, CUarray srcArray, long srcIndex, long ByteCount)
CUresult cuMemcpyAtoH | ( | void * | dstHost, | |
CUarray | srcArray, | |||
size_t | srcOffset, | |||
size_t | ByteCount | |||
) |
Copies from one 1D CUDA array to host memory. dstHost
specifies the base pointer of the destination. srcArray
and srcOffset
specify the CUDA array handle and starting
offset in bytes of the source data. ByteCount
specifies
the number of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpyAtoA(CUarray dstArray, long dstIndex, CUarray srcArray, long srcIndex, long ByteCount)
CUresult cuMemcpyAtoA | ( | CUarray | dstArray, | |
size_t | dstOffset, | |||
CUarray | srcArray, | |||
size_t | srcOffset, | |||
size_t | ByteCount | |||
) |
Copies from one 1D CUDA array to another. dstArray
and
srcArray
specify the handles of the destination and source
CUDA arrays for the copy, respectively. dstOffset
and
srcOffset
specify the destination and source offsets in
bytes into the CUDA arrays. ByteCount
is the number of
bytes to be copied. The size of the elements in the CUDA arrays need
not be the same format, but the elements must be the same size; and
count must be evenly divisible by that size.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpy2D(CUDA_MEMCPY2D pCopy)
CUresult cuMemcpy2D | ( | const CUDA_MEMCPY2D * | pCopy | ) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03, CU_MEMORYTYPE_UNIFIED = 0x04 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpy2DUnaligned(CUDA_MEMCPY2D pCopy)
CUresult cuMemcpy2DUnaligned | ( | const CUDA_MEMCPY2D * | pCopy | ) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03, CU_MEMORYTYPE_UNIFIED = 0x04 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpy3D(CUDA_MEMCPY3D pCopy)
CUresult cuMemcpy3D | ( | const CUDA_MEMCPY3D * | pCopy | ) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY3D structure is defined as:
typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes, srcY, srcZ; unsigned int srcLOD; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; // ignored when src is array unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1 unsigned int dstXInBytes, dstY, dstZ; unsigned int dstLOD; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; // ignored when dst is array unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1 unsigned int WidthInBytes; unsigned int Height; unsigned int Depth; } CUDA_MEMCPY3D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03, CU_MEMORYTYPE_UNIFIED = 0x04 } CUmemorytype;
void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemcpy3DPeer(CUDA_MEMCPY3D_PEER pCopy)
CUresult cuMemcpy3DPeer | ( | const CUDA_MEMCPY3D_PEER * | pCopy | ) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. See the definition of the CUDA_MEMCPY3D_PEER
structure for documentation of its parameters.
Note that this function is synchronous with respect to the host only if the source or destination memory is of type CU_MEMORYTYPE_HOST. Note also that this copy is serialized with respect all pending and future asynchronous work in to the current context, the copy's source context, and the copy's destination context (use cuMemcpy3DPeerAsync to avoid this synchronization).
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyPeer(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyPeerAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long, jcuda.driver.CUstream)
,
cuMemcpy3DPeerAsync(jcuda.driver.CUDA_MEMCPY3D_PEER, jcuda.driver.CUstream)
public static int cuMemcpyAsync(CUdeviceptr dst, CUdeviceptr src, long ByteCount, CUstream hStream)
CUresult cuMemcpyAsync | ( | CUdeviceptr | dst, | |
CUdeviceptr | src, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies data between two pointers. dst
and src
are base pointers of the destination and source, respectively.
ByteCount
specifies the number of bytes to copy. Note that
this function infers the type of the transfer (host to host, host to
device, device to device, or device to host) from the pointer values.
This function is only allowed in contexts which support unified
addressing. Note that this function is asynchronous and can optionally
be associated to a stream by passing a non-zero hStream
argument
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpyPeerAsync(CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, long ByteCount, CUstream hStream)
CUresult cuMemcpyPeerAsync | ( | CUdeviceptr | dstDevice, | |
CUcontext | dstContext, | |||
CUdeviceptr | srcDevice, | |||
CUcontext | srcContext, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from device memory in one context to device memory in another
context. dstDevice
is the base device pointer of the
destination memory and dstContext
is the destination
context. srcDevice
is the base device pointer of the
source memory and srcContext
is the source pointer.
ByteCount
specifies the number of bytes to copy. Note that
this function is asynchronous with respect to the host and all work in
other streams in other devices.
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyPeer(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long)
,
cuMemcpy3DPeer(jcuda.driver.CUDA_MEMCPY3D_PEER)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpy3DPeerAsync(jcuda.driver.CUDA_MEMCPY3D_PEER, jcuda.driver.CUstream)
public static int cuMemcpyHtoDAsync(CUdeviceptr dstDevice, Pointer srcHost, long ByteCount, CUstream hStream)
CUresult cuMemcpyHtoDAsync | ( | CUdeviceptr | dstDevice, | |
const void * | srcHost, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from host memory to device memory. dstDevice
and
srcHost
are the base addresses of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy.
cuMemcpyHtoDAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpyDtoHAsync(Pointer dstHost, CUdeviceptr srcDevice, long ByteCount, CUstream hStream)
CUresult cuMemcpyDtoHAsync | ( | void * | dstHost, | |
CUdeviceptr | srcDevice, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from device to host memory. dstHost
and
srcDevice
specify the base pointers of the destination
and source, respectively. ByteCount
specifies the number
of bytes to copy.
cuMemcpyDtoHAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpyDtoDAsync(CUdeviceptr dstDevice, CUdeviceptr srcDevice, long ByteCount, CUstream hStream)
CUresult cuMemcpyDtoDAsync | ( | CUdeviceptr | dstDevice, | |
CUdeviceptr | srcDevice, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from device memory to device memory. dstDevice
and
srcDevice
are the base pointers of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is asynchronous and can
optionally be associated to a stream by passing a non-zero
hStream
argument
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpyHtoAAsync(CUarray dstArray, long dstIndex, Pointer pSrc, long ByteCount, CUstream hStream)
CUresult cuMemcpyHtoAAsync | ( | CUarray | dstArray, | |
size_t | dstOffset, | |||
const void * | srcHost, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from host memory to a 1D CUDA array. dstArray
and
dstOffset
specify the CUDA array handle and starting
offset in bytes of the destination data. srcHost
specifies
the base address of the source. ByteCount
specifies the
number of bytes to copy.
cuMemcpyHtoAAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpyAtoHAsync(Pointer dstHost, CUarray srcArray, long srcIndex, long ByteCount, CUstream hStream)
CUresult cuMemcpyAtoHAsync | ( | void * | dstHost, | |
CUarray | srcArray, | |||
size_t | srcOffset, | |||
size_t | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from one 1D CUDA array to host memory. dstHost
specifies the base pointer of the destination. srcArray
and srcOffset
specify the CUDA array handle and starting
offset in bytes of the source data. ByteCount
specifies
the number of bytes to copy.
cuMemcpyAtoHAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero stream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpy2DAsync(CUDA_MEMCPY2D pCopy, CUstream hStream)
CUresult cuMemcpy2DAsync | ( | const CUDA_MEMCPY2D * | pCopy, | |
CUstream | hStream | |||
) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03, CU_MEMORYTYPE_UNIFIED = 0x04 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
hStream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpy3DAsync(CUDA_MEMCPY3D pCopy, CUstream hStream)
CUresult cuMemcpy3DAsync | ( | const CUDA_MEMCPY3D * | pCopy, | |
CUstream | hStream | |||
) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY3D structure is defined as:
typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes, srcY, srcZ; unsigned int srcLOD; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; // ignored when src is array unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1 unsigned int dstXInBytes, dstY, dstZ; unsigned int dstLOD; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; // ignored when dst is array unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1 unsigned int WidthInBytes; unsigned int Height; unsigned int Depth; } CUDA_MEMCPY3D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03, CU_MEMORYTYPE_UNIFIED = 0x04 } CUmemorytype;
void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
hStream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
The srcLOD and dstLOD members of the CUDA_MEMCPY3D structure must be set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemcpy3DPeerAsync(CUDA_MEMCPY3D_PEER pCopy, CUstream hStream)
CUresult cuMemcpy3DPeerAsync | ( | const CUDA_MEMCPY3D_PEER * | pCopy, | |
CUstream | hStream | |||
) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. See the definition of the CUDA_MEMCPY3D_PEER
structure for documentation of its parameters.
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyPeer(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyPeerAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, jcuda.driver.CUdeviceptr, jcuda.driver.CUcontext, long, jcuda.driver.CUstream)
,
cuMemcpy3DPeerAsync(jcuda.driver.CUDA_MEMCPY3D_PEER, jcuda.driver.CUstream)
public static int cuMemsetD8(CUdeviceptr dstDevice, byte uc, long N)
CUresult cuMemsetD8 | ( | CUdeviceptr | dstDevice, | |
unsigned char | uc, | |||
size_t | N | |||
) |
Sets the memory range of N
8-bit values to the specified
value uc
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD16(CUdeviceptr dstDevice, short us, long N)
CUresult cuMemsetD16 | ( | CUdeviceptr | dstDevice, | |
unsigned short | us, | |||
size_t | N | |||
) |
Sets the memory range of N
16-bit values to the specified
value us
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD32(CUdeviceptr dstDevice, int ui, long N)
CUresult cuMemsetD32 | ( | CUdeviceptr | dstDevice, | |
unsigned int | ui, | |||
size_t | N | |||
) |
Sets the memory range of N
32-bit values to the specified
value ui
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD2D8(CUdeviceptr dstDevice, long dstPitch, byte uc, long Width, long Height)
CUresult cuMemsetD2D8 | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned char | uc, | |||
size_t | Width, | |||
size_t | Height | |||
) |
Sets the 2D memory range of Width
8-bit values to the
specified value uc
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD2D16(CUdeviceptr dstDevice, long dstPitch, short us, long Width, long Height)
CUresult cuMemsetD2D16 | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned short | us, | |||
size_t | Width, | |||
size_t | Height | |||
) |
Sets the 2D memory range of Width
16-bit values to the
specified value us
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD2D32(CUdeviceptr dstDevice, long dstPitch, int ui, long Width, long Height)
CUresult cuMemsetD2D32 | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned int | ui, | |||
size_t | Width, | |||
size_t | Height | |||
) |
Sets the 2D memory range of Width
32-bit values to the
specified value ui
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD8Async(CUdeviceptr dstDevice, byte uc, long N, CUstream hStream)
CUresult cuMemsetD8Async | ( | CUdeviceptr | dstDevice, | |
unsigned char | uc, | |||
size_t | N, | |||
CUstream | hStream | |||
) |
Sets the memory range of N
8-bit values to the specified
value uc
.
cuMemsetD8Async() is asynchronous and can optionally be associated to
a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD16Async(CUdeviceptr dstDevice, short us, long N, CUstream hStream)
CUresult cuMemsetD16Async | ( | CUdeviceptr | dstDevice, | |
unsigned short | us, | |||
size_t | N, | |||
CUstream | hStream | |||
) |
Sets the memory range of N
16-bit values to the specified
value us
.
cuMemsetD16Async() is asynchronous and can optionally be associated to
a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD32Async(CUdeviceptr dstDevice, int ui, long N, CUstream hStream)
CUresult cuMemsetD32Async | ( | CUdeviceptr | dstDevice, | |
unsigned int | ui, | |||
size_t | N, | |||
CUstream | hStream | |||
) |
Sets the memory range of N
32-bit values to the specified
value ui
.
cuMemsetD32Async() is asynchronous and can optionally be associated to
a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuMemsetD2D8Async(CUdeviceptr dstDevice, long dstPitch, byte uc, long Width, long Height, CUstream hStream)
CUresult cuMemsetD2D8Async | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned char | uc, | |||
size_t | Width, | |||
size_t | Height, | |||
CUstream | hStream | |||
) |
Sets the 2D memory range of Width
8-bit values to the
specified value uc
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuMemsetD2D8Async() is asynchronous and can optionally be associated
to a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD2D16Async(CUdeviceptr dstDevice, long dstPitch, short us, long Width, long Height, CUstream hStream)
CUresult cuMemsetD2D16Async | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned short | us, | |||
size_t | Width, | |||
size_t | Height, | |||
CUstream | hStream | |||
) |
Sets the 2D memory range of Width
16-bit values to the
specified value us
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuMemsetD2D16Async() is asynchronous and can optionally be associated
to a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD2D32Async(jcuda.driver.CUdeviceptr, long, int, long, long, jcuda.driver.CUstream)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuMemsetD2D32Async(CUdeviceptr dstDevice, long dstPitch, int ui, long Width, long Height, CUstream hStream)
CUresult cuMemsetD2D32Async | ( | CUdeviceptr | dstDevice, | |
size_t | dstPitch, | |||
unsigned int | ui, | |||
size_t | Width, | |||
size_t | Height, | |||
CUstream | hStream | |||
) |
Sets the 2D memory range of Width
32-bit values to the
specified value ui
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuMemsetD2D32Async() is asynchronous and can optionally be associated
to a stream by passing a non-zero stream
argument.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D8Async(jcuda.driver.CUdeviceptr, long, byte, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D16Async(jcuda.driver.CUdeviceptr, long, short, long, long, jcuda.driver.CUstream)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD8Async(jcuda.driver.CUdeviceptr, byte, long, jcuda.driver.CUstream)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD16Async(jcuda.driver.CUdeviceptr, short, long, jcuda.driver.CUstream)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
,
cuMemsetD32Async(jcuda.driver.CUdeviceptr, int, long, jcuda.driver.CUstream)
public static int cuFuncGetAttribute(int[] pi, int attrib, CUfunction func)
CUresult cuFuncGetAttribute | ( | int * | pi, | |
CUfunction_attribute | attrib, | |||
CUfunction | hfunc | |||
) |
Returns in *pi
the integer value of the attribute
attrib
on the kernel given by hfunc
. The
supported attributes are:
cuCtxGetCacheConfig(int[])
,
cuCtxSetCacheConfig(int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z)
CUresult cuFuncSetBlockShape | ( | CUfunction | hfunc, | |
int | x, | |||
int | y, | |||
int | z | |||
) |
x
, y
, and z
dimensions of the thread blocks that are created when the kernel given
by hfunc
is launched.
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuFuncSetSharedSize(CUfunction hfunc, int bytes)
CUresult cuFuncSetSharedSize | ( | CUfunction | hfunc, | |
unsigned int | bytes | |||
) |
bytes
the amount of dynamic shared memory
that will be available to each thread block when the kernel given by
hfunc
is launched.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuFuncSetCacheConfig(CUfunction hfunc, int config)
CUresult cuFuncSetCacheConfig | ( | CUfunction | hfunc, | |
CUfunc_cache | config | |||
) |
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through config
the preferred cache
configuration for the device function hfunc
. This is only
a preference. The driver will use the requested configuration if
possible, but it is free to choose a different configuration if required
to execute hfunc
. Any context-wide preference set via
cuCtxSetCacheConfig() will be overridden by this per-function setting
unless the per-function setting is CU_FUNC_CACHE_PREFER_NONE. In that
case, the current context-wide setting will be used.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cuCtxGetCacheConfig(int[])
,
cuCtxSetCacheConfig(int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuArrayCreate(CUarray pHandle, CUDA_ARRAY_DESCRIPTOR pAllocateArray)
CUresult cuArrayCreate | ( | CUarray * | pHandle, | |
const CUDA_ARRAY_DESCRIPTOR * | pAllocateArray | |||
) |
Creates a CUDA array according to the CUDA_ARRAY_DESCRIPTOR structure
pAllocateArray
and returns a handle to the new CUDA array
in *pHandle
. The CUDA_ARRAY_DESCRIPTOR is defined as:
typedef struct { unsigned int Width; unsigned int Height; CUarray_format Format; unsigned int NumChannels; } CUDA_ARRAY_DESCRIPTOR;
Width
, and Height
are the width, and
height of the CUDA array (in elements); the CUDA array is one-dimensional
if height is 0, two-dimensional otherwise;
typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01, CU_AD_FORMAT_UNSIGNED_INT16 = 0x02, CU_AD_FORMAT_UNSIGNED_INT32 = 0x03, CU_AD_FORMAT_SIGNED_INT8 = 0x08, CU_AD_FORMAT_SIGNED_INT16 = 0x09, CU_AD_FORMAT_SIGNED_INT32 = 0x0a, CU_AD_FORMAT_HALF = 0x10, CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format;
NumChannels
specifies the number of
packed components per CUDA array element; it may be 1, 2, or
4;
Here are examples of CUDA array descriptions:
Description for a CUDA array of 2048 floats:
CUDA_ARRAY_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 2048; desc.Height = 1;
Description for a 64 x 64 CUDA array of floats:
CUDA_ARRAY_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 64; desc.Height = 64;
Description for a width
x height
CUDA array
of 64-bit, 4x16-bit float16's:
CUDA_ARRAY_DESCRIPTOR desc; desc.FormatFlags = CU_AD_FORMAT_HALF; desc.NumChannels = 4; desc.Width = width; desc.Height = height;
Description for a width
x height
CUDA array
of 16-bit elements, each of which is two 8-bit unsigned chars:
CUDA_ARRAY_DESCRIPTOR arrayDesc; desc.FormatFlags = CU_AD_FORMAT_UNSIGNED_INT8; desc.NumChannels = 2; desc.Width = width; desc.Height = height;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuArrayGetDescriptor(CUDA_ARRAY_DESCRIPTOR pArrayDescriptor, CUarray hArray)
CUresult cuArrayGetDescriptor | ( | CUDA_ARRAY_DESCRIPTOR * | pArrayDescriptor, | |
CUarray | hArray | |||
) |
Returns in *pArrayDescriptor
a descriptor containing
information on the format and dimensions of the CUDA array
hArray
. It is useful for subroutines that have been passed
a CUDA array, but need to know the CUDA array parameters for validation
or other purposes.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuArrayDestroy(CUarray hArray)
CUresult cuArrayDestroy | ( | CUarray | hArray | ) |
Destroys the CUDA array hArray
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuArray3DCreate(CUarray pHandle, CUDA_ARRAY3D_DESCRIPTOR pAllocateArray)
CUresult cuArray3DCreate | ( | CUarray * | pHandle, | |
const CUDA_ARRAY3D_DESCRIPTOR * | pAllocateArray | |||
) |
Creates a CUDA array according to the CUDA_ARRAY3D_DESCRIPTOR structure
pAllocateArray
and returns a handle to the new CUDA array
in *pHandle
. The CUDA_ARRAY3D_DESCRIPTOR is defined
as:
typedef struct { unsigned int Width; unsigned int Height; unsigned int Depth; CUarray_format Format; unsigned int NumChannels; unsigned int Flags; } CUDA_ARRAY3D_DESCRIPTOR;
Width
, Height
, and Depth
are the width, height, and depth of the CUDA array (in elements); the
CUDA array is one-dimensional if height and depth are 0, two-dimensional
if depth is 0, and three-dimensional otherwise; If the CUDA_ARRAY3D_LAYERED
flag is set, then the CUDA array is a collection of layers, where
Depth
indicates the number of layers. Each layer is a 1D
array if Height
is 0, and a 2D array otherwise.
typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01, CU_AD_FORMAT_UNSIGNED_INT16 = 0x02, CU_AD_FORMAT_UNSIGNED_INT32 = 0x03, CU_AD_FORMAT_SIGNED_INT8 = 0x08, CU_AD_FORMAT_SIGNED_INT16 = 0x09, CU_AD_FORMAT_SIGNED_INT32 = 0x0a, CU_AD_FORMAT_HALF = 0x10, CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format;
NumChannels
specifies the number of
packed components per CUDA array element; it may be 1, 2, or
4;
Depth
specifies the number of layers,
not the depth of a 3D array.
Here are examples of CUDA array descriptions:
Description for a CUDA array of 2048 floats:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 2048; desc.Height = 0; desc.Depth = 0;
Description for a 64 x 64 CUDA array of floats:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 64; desc.Height = 64; desc.Depth = 0;
Description for a width
x height
x
depth
CUDA array of 64-bit, 4x16-bit float16's:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.FormatFlags = CU_AD_FORMAT_HALF; desc.NumChannels = 4; desc.Width = width; desc.Height = height; desc.Depth = depth;
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuArray3DGetDescriptor(CUDA_ARRAY3D_DESCRIPTOR pArrayDescriptor, CUarray hArray)
CUresult cuArray3DGetDescriptor | ( | CUDA_ARRAY3D_DESCRIPTOR * | pArrayDescriptor, | |
CUarray | hArray | |||
) |
Returns in *pArrayDescriptor
a descriptor containing
information on the format and dimensions of the CUDA array
hArray
. It is useful for subroutines that have been passed
a CUDA array, but need to know the CUDA array parameters for validation
or other purposes.
This function may be called on 1D and 2D arrays, in which case the
Height
and/or Depth
members of the descriptor
struct will be set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, long[], long, long, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, long, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, long, long)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, long, long, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, long, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, long)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, long, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, long, jcuda.Pointer, long)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, long, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, long)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, long, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(long[], long[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, long, byte, long, long)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, long, short, long, long)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, long, int, long, long)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, byte, long)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, long)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, long)
public static int cuTexRefCreate(CUtexref pTexRef)
CUresult cuTexRefCreate | ( | CUtexref * | pTexRef | ) |
*pTexRef
. Once created, the application must call
cuTexRefSetArray() or cuTexRefSetAddress() to associate the reference
with allocated memory. Other texture reference functions are used to
specify the format and interpretation (addressing, filtering, etc.) to
be used when the memory is read through this texture reference.
cuTexRefDestroy(jcuda.driver.CUtexref)
public static int cuTexRefDestroy(CUtexref hTexRef)
CUresult cuTexRefDestroy | ( | CUtexref | hTexRef | ) |
hTexRef
.
cuTexRefCreate(jcuda.driver.CUtexref)
public static int cuTexRefSetArray(CUtexref hTexRef, CUarray hArray, int Flags)
CUresult cuTexRefSetArray | ( | CUtexref | hTexRef, | |
CUarray | hArray, | |||
unsigned int | Flags | |||
) |
Binds the CUDA array hArray
to the texture reference
hTexRef
. Any previous address or CUDA array state
associated with the texture reference is superseded by this function.
Flags
must be set to CU_TRSA_OVERRIDE_FORMAT. Any CUDA
array previously bound to hTexRef
is unbound.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddress(long[] ByteOffset, CUtexref hTexRef, CUdeviceptr dptr, long bytes)
CUresult cuTexRefSetAddress | ( | size_t * | ByteOffset, | |
CUtexref | hTexRef, | |||
CUdeviceptr | dptr, | |||
size_t | bytes | |||
) |
Binds a linear address range to the texture reference hTexRef
.
Any previous address or CUDA array state associated with the texture
reference is superseded by this function. Any memory previously bound
to hTexRef
is unbound.
Since the hardware enforces an alignment requirement on texture base
addresses, cuTexRefSetAddress() passes back a byte offset in
*ByteOffset
that must be applied to texture fetches in
order to read from the desired memory. This offset must be divided by
the texel size and passed to kernels that read from the texture so they
can be applied to the tex1Dfetch() function.
If the device memory pointer was returned from cuMemAlloc(), the offset
is guaranteed to be 0 and NULL may be passed as the ByteOffset
parameter.
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFormat(CUtexref hTexRef, int fmt, int NumPackedComponents)
CUresult cuTexRefSetFormat | ( | CUtexref | hTexRef, | |
CUarray_format | fmt, | |||
int | NumPackedComponents | |||
) |
Specifies the format of the data to be read by the texture reference
hTexRef
. fmt
and NumPackedComponents
are exactly analogous to the Format and NumChannels members of the
CUDA_ARRAY_DESCRIPTOR structure: They specify the format of each
component and the number of components per array element.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddress2D(CUtexref hTexRef, CUDA_ARRAY_DESCRIPTOR desc, CUdeviceptr dptr, long PitchInBytes)
CUresult cuTexRefSetAddress2D | ( | CUtexref | hTexRef, | |
const CUDA_ARRAY_DESCRIPTOR * | desc, | |||
CUdeviceptr | dptr, | |||
size_t | Pitch | |||
) |
Binds a linear address range to the texture reference hTexRef
.
Any previous address or CUDA array state associated with the texture
reference is superseded by this function. Any memory previously bound
to hTexRef
is unbound.
Using a tex2D() function inside a kernel requires a call to either cuTexRefSetArray() to bind the corresponding texture reference to an array, or cuTexRefSetAddress2D() to bind the texture reference to linear memory.
Function calls to cuTexRefSetFormat() cannot follow calls to cuTexRefSetAddress2D() for the same texture reference.
It is required that dptr
be aligned to the appropriate
hardware-specific texture alignment. You can query this value using
the device attribute CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT. If an
unaligned dptr
is supplied, CUDA_ERROR_INVALID_VALUE is
returned.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddressMode(CUtexref hTexRef, int dim, int am)
CUresult cuTexRefSetAddressMode | ( | CUtexref | hTexRef, | |
int | dim, | |||
CUaddress_mode | am | |||
) |
Specifies the addressing mode am
for the given dimension
dim
of the texture reference hTexRef
. If
dim
is zero, the addressing mode is applied to the first
parameter of the functions used to fetch from the texture; if
dim
is 1, the second, and so on. CUaddress_mode is defined
as:
typedef enum CUaddress_mode_enum { CU_TR_ADDRESS_MODE_WRAP = 0, CU_TR_ADDRESS_MODE_CLAMP = 1, CU_TR_ADDRESS_MODE_MIRROR = 2, CU_TR_ADDRESS_MODE_BORDER = 3 } CUaddress_mode;
Note that this call has no effect if hTexRef
is bound to
linear memory. Also, if the flag, CU_TRSF_NORMALIZED_COORDINATES, is
not set, the only supported address mode is
CU_TR_ADDRESS_MODE_CLAMP.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFilterMode(CUtexref hTexRef, int fm)
CUresult cuTexRefSetFilterMode | ( | CUtexref | hTexRef, | |
CUfilter_mode | fm | |||
) |
Specifies the filtering mode fm
to be used when reading
memory through the texture reference hTexRef
.
CUfilter_mode_enum is defined as:
typedef enum CUfilter_mode_enum { CU_TR_FILTER_MODE_POINT = 0, CU_TR_FILTER_MODE_LINEAR = 1 } CUfilter_mode;
Note that this call has no effect if hTexRef
is bound to
linear memory.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFlags(CUtexref hTexRef, int Flags)
CUresult cuTexRefSetFlags | ( | CUtexref | hTexRef, | |
unsigned int | Flags | |||
) |
Specifies optional flags via Flags
to specify the behavior
of data returned through the texture reference hTexRef
.
The valid flags are:
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetAddress(CUdeviceptr pdptr, CUtexref hTexRef)
CUresult cuTexRefGetAddress | ( | CUdeviceptr * | pdptr, | |
CUtexref | hTexRef | |||
) |
Returns in *pdptr
the base address bound to the texture
reference hTexRef
, or returns CUDA_ERROR_INVALID_VALUE if
the texture reference is not bound to any device memory range.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetArray(CUarray phArray, CUtexref hTexRef)
CUresult cuTexRefGetArray | ( | CUarray * | phArray, | |
CUtexref | hTexRef | |||
) |
Returns in *phArray
the CUDA array bound to the texture
reference hTexRef
, or returns CUDA_ERROR_INVALID_VALUE if
the texture reference is not bound to any CUDA array.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetAddressMode(int[] pam, CUtexref hTexRef, int dim)
CUresult cuTexRefGetAddressMode | ( | CUaddress_mode * | pam, | |
CUtexref | hTexRef, | |||
int | dim | |||
) |
Returns in *pam
the addressing mode corresponding to the
dimension dim
of the texture reference hTexRef
.
Currently, the only valid value for dim
are 0 and 1.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFilterMode(int[] pfm, CUtexref hTexRef)
CUresult cuTexRefGetFilterMode | ( | CUfilter_mode * | pfm, | |
CUtexref | hTexRef | |||
) |
Returns in *pfm
the filtering mode of the texture reference
hTexRef
.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFormat(int[] pFormat, int[] pNumChannels, CUtexref hTexRef)
CUresult cuTexRefGetFormat | ( | CUarray_format * | pFormat, | |
int * | pNumChannels, | |||
CUtexref | hTexRef | |||
) |
Returns in *pFormat
and *pNumChannels
the
format and number of components of the CUDA array bound to the texture
reference hTexRef
. If pFormat
or
pNumChannels
is NULL, it will be ignored.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFlags(int[] pFlags, CUtexref hTexRef)
CUresult cuTexRefGetFlags | ( | unsigned int * | pFlags, | |
CUtexref | hTexRef | |||
) |
Returns in *pFlags
the flags of the texture reference
hTexRef
.
cuTexRefSetAddress(long[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, long)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuSurfRefSetArray(CUsurfref hSurfRef, CUarray hArray, int Flags)
CUresult cuSurfRefSetArray | ( | CUsurfref | hSurfRef, | |
CUarray | hArray, | |||
unsigned int | Flags | |||
) |
Sets the CUDA array hArray
to be read and written by the
surface reference hSurfRef
. Any previous CUDA array state
associated with the surface reference is superseded by this function.
Flags
must be set to 0. The CUDA_ARRAY3D_SURFACE_LDST flag
must have been set for the CUDA array. Any CUDA array previously bound
to hSurfRef
is unbound.
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuSurfRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUsurfref)
public static int cuSurfRefGetArray(CUarray phArray, CUsurfref hSurfRef)
CUresult cuSurfRefGetArray | ( | CUarray * | phArray, | |
CUsurfref | hSurfRef | |||
) |
Returns in *phArray
the CUDA array bound to the surface
reference hSurfRef
, or returns CUDA_ERROR_INVALID_VALUE
if the surface reference is not bound to any CUDA array.
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuSurfRefSetArray(jcuda.driver.CUsurfref, jcuda.driver.CUarray, int)
public static int cuDeviceCanAccessPeer(int[] canAccessPeer, CUdevice dev, CUdevice peerDev)
CUresult cuDeviceCanAccessPeer | ( | int * | canAccessPeer, | |
CUdevice | dev, | |||
CUdevice | peerDev | |||
) |
Returns in *canAccessPeer
a value of 1 if contexts on
dev
are capable of directly accessing memory from contexts
on peerDev
and 0 otherwise. If direct access of
peerDev
from dev
is possible, then access
may be enabled on two specific contexts by calling
cuCtxEnablePeerAccess().
cuCtxEnablePeerAccess(jcuda.driver.CUcontext, int)
,
cuCtxDisablePeerAccess(jcuda.driver.CUcontext)
public static int cuCtxEnablePeerAccess(CUcontext peerContext, int Flags)
CUresult cuCtxEnablePeerAccess | ( | CUcontext | peerContext, | |
unsigned int | Flags | |||
) |
If both the current context and peerContext
are on devices
which support unified addressing (as may be queried using
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING), then on success all allocations
from peerContext
will immediately be accessible by the
current context. See Unified Addressing for additional details.
Note that access granted by this call is unidirectional and that in
order to access memory from the current context in peerContext
,
a separate symmetric call to cuCtxEnablePeerAccess() is required.
Returns CUDA_ERROR_INVALID_DEVICE if cuDeviceCanAccessPeer() indicates
that the CUdevice of the current context cannot directly access memory
from the CUdevice of peerContext
.
Returns CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED if direct access of
peerContext
from the current context has already been
enabled.
Returns CUDA_ERROR_INVALID_CONTEXT if there is no current context,
peerContext
is not a valid context, or if the current
context is peerContext
.
Returns CUDA_ERROR_INVALID_VALUE if Flags
is not 0.
cuDeviceCanAccessPeer(int[], jcuda.driver.CUdevice, jcuda.driver.CUdevice)
,
cuCtxDisablePeerAccess(jcuda.driver.CUcontext)
public static int cuCtxDisablePeerAccess(CUcontext peerContext)
CUresult cuCtxDisablePeerAccess | ( | CUcontext | peerContext | ) |
Returns CUDA_ERROR_PEER_ACCESS_NOT_ENABLED if direct peer access has
not yet been enabled from peerContext
to the current
context.
Returns CUDA_ERROR_INVALID_CONTEXT if there is no current context, or
if peerContext
is not a valid context.
cuDeviceCanAccessPeer(int[], jcuda.driver.CUdevice, jcuda.driver.CUdevice)
,
cuCtxEnablePeerAccess(jcuda.driver.CUcontext, int)
public static int cuMemPeerRegister(CUdeviceptr peerPointer, CUcontext peerContext, int Flags)
public static int cuMemPeerUnregister(CUdeviceptr peerPointer, CUcontext peerContext)
public static int cuMemPeerGetDevicePointer(CUdeviceptr pdptr, CUdeviceptr peerPointer, CUcontext peerContext, int Flags)
public static int cuParamSetSize(CUfunction hfunc, int numbytes)
CUresult cuParamSetSize | ( | CUfunction | hfunc, | |
unsigned int | numbytes | |||
) |
numbytes
the total size in bytes needed by
the function parameters of the kernel corresponding to
hfunc
.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuParamSeti(CUfunction hfunc, int offset, int value)
CUresult cuParamSeti | ( | CUfunction | hfunc, | |
int | offset, | |||
unsigned int | value | |||
) |
hfunc
will be invoked.
offset
is a byte offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuParamSetf(CUfunction hfunc, int offset, float value)
CUresult cuParamSetf | ( | CUfunction | hfunc, | |
int | offset, | |||
float | value | |||
) |
hfunc
will be invoked.
offset
is a byte offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuParamSetv(CUfunction hfunc, int offset, Pointer ptr, int numbytes)
CUresult cuParamSetv | ( | CUfunction | hfunc, | |
int | offset, | |||
void * | ptr, | |||
unsigned int | numbytes | |||
) |
numbytes
)
from ptr
into the parameter space of the kernel
corresponding to hfunc
. offset
is a byte
offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuParamSetTexRef(CUfunction hfunc, int texunit, CUtexref hTexRef)
CUresult cuParamSetTexRef | ( | CUfunction | hfunc, | |
int | texunit, | |||
CUtexref | hTexRef | |||
) |
hTexRef
available to a device program as a texture. In
this version of CUDA, the texture-reference must be obtained via
cuModuleGetTexRef() and the texunit
parameter must be set
to CU_PARAM_TR_DEFAULT.
public static int cuLaunch(CUfunction f)
CUresult cuLaunch | ( | CUfunction | f | ) |
f
on a 1 x 1 x 1 grid of blocks. The
block contains the number of threads specified by a previous call to
cuFuncSetBlockShape().
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuLaunchGrid(CUfunction f, int grid_width, int grid_height)
CUresult cuLaunchGrid | ( | CUfunction | f, | |
int | grid_width, | |||
int | grid_height | |||
) |
f
on a grid_width
x
grid_height
grid of blocks. Each block contains the number
of threads specified by a previous call to cuFuncSetBlockShape().
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuLaunchGridAsync(CUfunction f, int grid_width, int grid_height, CUstream hStream)
CUresult cuLaunchGridAsync | ( | CUfunction | f, | |
int | grid_width, | |||
int | grid_height, | |||
CUstream | hStream | |||
) |
f
on a grid_width
x
grid_height
grid of blocks. Each block contains the number
of threads specified by a previous call to cuFuncSetBlockShape().
cuLaunchGridAsync() can optionally be associated to a stream by passing
a non-zero hStream
argument.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchKernel(jcuda.driver.CUfunction, int, int, int, int, int, int, int, jcuda.driver.CUstream, jcuda.Pointer, jcuda.Pointer)
public static int cuEventCreate(CUevent phEvent, int Flags)
CUresult cuEventCreate | ( | CUevent * | phEvent, | |
unsigned int | Flags | |||
) |
Creates an event *phEvent with the flags specified via Flags
.
Valid flags include:
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventRecord(CUevent hEvent, CUstream hStream)
CUresult cuEventRecord | ( | CUevent | hEvent, | |
CUstream | hStream | |||
) |
Records an event. If hStream
is non-zero, the event is
recorded after all preceding operations in hStream
have
been completed; otherwise, it is recorded after all preceding operations
in the CUDA context have been completed. Since operation is asynchronous,
cuEventQuery and/or cuEventSynchronize() must be used to determine when
the event has actually been recorded.
If cuEventRecord() has previously been called on hEvent
,
then this call will overwrite any existing state in hEvent
.
Any subsequent calls which examine the status of hEvent
will only examine the completion of this most recent call to
cuEventRecord().
It is necessary that hEvent
and hStream
be
created on the same context.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuStreamWaitEvent(jcuda.driver.CUstream, jcuda.driver.CUevent, int)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventQuery(CUevent hEvent)
CUresult cuEventQuery | ( | CUevent | hEvent | ) |
Query the status of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams, as specified by the arguments to cuEventRecord()).
If this work has successfully been completed by the device, or if
cuEventRecord() has not been called on hEvent
, then
CUDA_SUCCESS is returned. If this work has not yet been completed by
the device then CUDA_ERROR_NOT_READY is returned.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventSynchronize(CUevent hEvent)
CUresult cuEventSynchronize | ( | CUevent | hEvent | ) |
Wait until the completion of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams, as specified by the arguments to cuEventRecord()).
If cuEventRecord() has not been called on hEvent
,
CUDA_SUCCESS is returned immediately.
Waiting for an event that was created with the CU_EVENT_BLOCKING_SYNC flag will cause the calling CPU thread to block until the event has been completed by the device. If the CU_EVENT_BLOCKING_SYNC flag has not been set, then the CPU thread will busy-wait until the event has been completed by the device.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventDestroy(CUevent hEvent)
CUresult cuEventDestroy | ( | CUevent | hEvent | ) |
Destroys the event specified by hEvent
.
In the case that hEvent
has been recorded but has not yet
been completed when cuEventDestroy() is called, the function will
return immediately and the resources associated with hEvent
will be released automatically once the device has completed
hEvent
.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventElapsedTime(float[] pMilliseconds, CUevent hStart, CUevent hEnd)
CUresult cuEventElapsedTime | ( | float * | pMilliseconds, | |
CUevent | hStart, | |||
CUevent | hEnd | |||
) |
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cuEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
If cuEventRecord() has not been called on either event then CUDA_ERROR_INVALID_HANDLE is returned. If cuEventRecord() has been called on both events but one or both of them has not yet been completed (that is, cuEventQuery() would return CUDA_ERROR_NOT_READY on at least one of the events), CUDA_ERROR_NOT_READY is returned. If either event was created with the CU_EVENT_DISABLE_TIMING flag, then this function will return CUDA_ERROR_INVALID_HANDLE.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
public static int cuPointerGetAttribute(Pointer data, int attribute, CUdeviceptr ptr)
CUresult cuPointerGetAttribute | ( | void * | data, | |
CUpointer_attribute | attribute, | |||
CUdeviceptr | ptr | |||
) |
The supported attributes are:
Returns in *data
the CUcontext in which ptr
was allocated or registered. The type of data
must be
CUcontext *.
If ptr
was not allocated by, mapped by, or registered with
a CUcontext which uses unified virtual addressing then
CUDA_ERROR_INVALID_VALUE is returned.
Returns in *data
the physical memory type of the memory
that ptr
addresses as a CUmemorytype enumerated value.
The type of data
must be unsigned int.
If ptr
addresses device memory then *data
is
set to CU_MEMORYTYPE_DEVICE. The particular CUdevice on which the
memory resides is the CUdevice of the CUcontext returned by the
CU_POINTER_ATTRIBUTE_CONTEXT attribute of ptr
.
If ptr
addresses host memory then *data
is
set to CU_MEMORYTYPE_HOST.
If ptr
was not allocated by, mapped by, or registered with
a CUcontext which uses unified virtual addressing then
CUDA_ERROR_INVALID_VALUE is returned.
If the current CUcontext does not support unified virtual addressing then CUDA_ERROR_INVALID_CONTEXT is returned.
Returns in *data
the device pointer value through which
ptr
may be accessed by kernels running in the current
CUcontext. The type of data
must be CUdeviceptr *.
If there exists no device pointer value through which kernels running
in the current CUcontext may access ptr
then
CUDA_ERROR_INVALID_VALUE is returned.
If there is no current CUcontext then CUDA_ERROR_INVALID_CONTEXT is returned.
Except in the exceptional disjoint addressing cases discussed below,
the value returned in *data
will equal the input value
ptr
.
Returns in *data
the host pointer value through which
ptr
may be accessed by by the host program. The type of
data
must be void **. If there exists no host pointer
value through which the host program may directly access ptr
then CUDA_ERROR_INVALID_VALUE is returned.
Except in the exceptional disjoint addressing cases discussed below,
the value returned in *data
will equal the input value
ptr
.
cuMemAlloc(jcuda.driver.CUdeviceptr, long)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemAllocHost(jcuda.Pointer, long)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostRegister(jcuda.Pointer, long, int)
,
cuMemHostUnregister(jcuda.Pointer)
public static int cuStreamCreate(CUstream phStream, int Flags)
CUresult cuStreamCreate | ( | CUstream * | phStream, | |
unsigned int | Flags | |||
) |
Creates a stream and returns a handle in phStream
.
Flags
is required to be 0.
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamWaitEvent(jcuda.driver.CUstream, jcuda.driver.CUevent, int)
,
cuStreamQuery(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuStreamWaitEvent(CUstream hStream, CUevent hEvent, int Flags)
CUresult cuStreamWaitEvent | ( | CUstream | hStream, | |
CUevent | hEvent, | |||
unsigned int | Flags | |||
) |
Makes all future work submitted to hStream
wait until
hEvent
reports completion before beginning execution. This
synchronization will be performed efficiently on the device. The event
hEvent
may be from a different context than
hStream
, in which case this function will perform
cross-device synchronization.
The stream hStream
will wait only for the completion of
the most recent host call to cuEventRecord() on hEvent
.
Once this call has returned, any functions (including cuEventRecord()
and cuEventDestroy()) may be called on hEvent
again, and
the subsequent calls will not have any effect on
hStream
.
If hStream
is 0 (the NULL stream) any future work submitted
in any stream will wait for hEvent
to complete before
beginning execution. This effectively creates a barrier for all future
work submitted to the context.
If cuEventRecord() has not been called on hEvent
, this
call acts as if the record has already completed, and so is a functional
no-op.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuStreamQuery(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
,
cuStreamDestroy(jcuda.driver.CUstream)
public static int cuStreamQuery(CUstream hStream)
CUresult cuStreamQuery | ( | CUstream | hStream | ) |
Returns CUDA_SUCCESS if all operations in the stream specified by
hStream
have completed, or CUDA_ERROR_NOT_READY if
not.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamWaitEvent(jcuda.driver.CUstream, jcuda.driver.CUevent, int)
,
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuStreamSynchronize(CUstream hStream)
CUresult cuStreamSynchronize | ( | CUstream | hStream | ) |
Waits until the device has completed all operations in the stream
specified by hStream
. If the context was created with the
CU_CTX_SCHED_BLOCKING_SYNC flag, the CPU thread will block until the
stream is finished with all of its tasks.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamWaitEvent(jcuda.driver.CUstream, jcuda.driver.CUevent, int)
,
cuStreamQuery(jcuda.driver.CUstream)
public static int cuStreamDestroy(CUstream hStream)
CUresult cuStreamDestroy | ( | CUstream | hStream | ) |
Destroys the stream specified by hStream
.
In the case that the device is still doing work in the stream
hStream
when cuStreamDestroy() is called, the function
will return immediately and the resources associated with
hStream
will be released automatically once the device
has completed all work in hStream
.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamWaitEvent(jcuda.driver.CUstream, jcuda.driver.CUevent, int)
,
cuStreamQuery(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuGLInit()
CUresult cuGLInit | ( | void | ) |
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGLMapBufferObject(jcuda.driver.CUdeviceptr, long[], int)
,
cuGLRegisterBufferObject(int)
,
cuGLUnmapBufferObject(int)
,
cuGLUnregisterBufferObject(int)
,
cuGLMapBufferObjectAsync(jcuda.driver.CUdeviceptr, long[], int, jcuda.driver.CUstream)
,
cuGLUnmapBufferObjectAsync(int, jcuda.driver.CUstream)
,
cuGLSetBufferObjectMapFlags(int, int)
public static int cuGLCtxCreate(CUcontext pCtx, int Flags, CUdevice device)
CUresult cuGLCtxCreate | ( | CUcontext * | pCtx, | |
unsigned int | Flags, | |||
CUdevice | device | |||
) |
Creates a new CUDA context, initializes OpenGL interoperability, and
associates the CUDA context with the calling thread. It must be called
before performing any other OpenGL interoperability operations. It may
fail if the needed OpenGL driver facilities are not available. For
usage of the Flags
parameter, see cuCtxCreate().
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGLInit()
,
cuGLMapBufferObject(jcuda.driver.CUdeviceptr, long[], int)
,
cuGLRegisterBufferObject(int)
,
cuGLUnmapBufferObject(int)
,
cuGLUnregisterBufferObject(int)
,
cuGLMapBufferObjectAsync(jcuda.driver.CUdeviceptr, long[], int, jcuda.driver.CUstream)
,
cuGLUnmapBufferObjectAsync(int, jcuda.driver.CUstream)
,
cuGLSetBufferObjectMapFlags(int, int)
public static int cuGraphicsGLRegisterBuffer(CUgraphicsResource pCudaResource, int buffer, int Flags)
CUresult cuGraphicsGLRegisterBuffer | ( | CUgraphicsResource * | pCudaResource, | |
GLuint | buffer, | |||
unsigned int | Flags | |||
) |
Registers the buffer object specified by buffer
for access
by CUDA. A handle to the registered object is returned as
pCudaResource
. The register flags Flags
specify the intended usage, as follows:
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
,
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsGLRegisterImage(CUgraphicsResource pCudaResource, int image, int target, int Flags)
CUresult cuGraphicsGLRegisterImage | ( | CUgraphicsResource * | pCudaResource, | |
GLuint | image, | |||
GLenum | target, | |||
unsigned int | Flags | |||
) |
Registers the texture or renderbuffer object specified by
image
for access by CUDA. target
must match
the type of the object. A handle to the registered object is returned
as pCudaResource
. The register flags Flags
specify the intended usage, as follows:
The following image classes are currently disallowed:
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
,
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsSubResourceGetMappedArray(jcuda.driver.CUarray, jcuda.driver.CUgraphicsResource, int, int)
public static int cuGLRegisterBufferObject(int bufferobj)
CUresult cuGLRegisterBufferObject | ( | GLuint | buffer | ) |
buffer
for access
by CUDA. This function must be called before CUDA can map the buffer
object. There must be a valid OpenGL context bound to the current
thread when this function is called, and the buffer name is resolved
by that context.
cuGraphicsGLRegisterBuffer(jcuda.driver.CUgraphicsResource, int, int)
public static int cuGLMapBufferObject(CUdeviceptr dptr, long[] size, int bufferobj)
CUresult cuGLMapBufferObject | ( | CUdeviceptr * | dptr, | |
size_t * | size, | |||
GLuint | buffer | |||
) |
buffer
into the
address space of the current CUDA context and returns in
*dptr
and *size
the base pointer and size of
the resulting mapping.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
All streams in the current CUDA context are synchronized with the current GL context.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnmapBufferObject(int bufferobj)
CUresult cuGLUnmapBufferObject | ( | GLuint | buffer | ) |
buffer
for access
by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
All streams in the current CUDA context are synchronized with the current GL context.
cuGraphicsUnmapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnregisterBufferObject(int bufferobj)
CUresult cuGLUnregisterBufferObject | ( | GLuint | buffer | ) |
buffer
. This
releases any resources associated with the registered buffer. After
this call, the buffer may no longer be mapped for access by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
public static int cuGLSetBufferObjectMapFlags(int buffer, int Flags)
CUresult cuGLSetBufferObjectMapFlags | ( | GLuint | buffer, | |
unsigned int | Flags | |||
) |
buffer
.
Changes to Flags
will take effect the next time
buffer
is mapped. The Flags
argument may be
any of the following:
If buffer
has not been registered for use with CUDA, then
CUDA_ERROR_INVALID_HANDLE is returned. If buffer
is
presently mapped for access by CUDA, then CUDA_ERROR_ALREADY_MAPPED is
returned.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
cuGraphicsResourceSetMapFlags(jcuda.driver.CUgraphicsResource, int)
public static int cuGLMapBufferObjectAsync(CUdeviceptr dptr, long[] size, int buffer, CUstream hStream)
CUresult cuGLMapBufferObjectAsync | ( | CUdeviceptr * | dptr, | |
size_t * | size, | |||
GLuint | buffer, | |||
CUstream | hStream | |||
) |
buffer
into the
address space of the current CUDA context and returns in
*dptr
and *size
the base pointer and size of
the resulting mapping.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
Stream hStream
in the current CUDA context is synchronized
with the current GL context.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnmapBufferObjectAsync(int buffer, CUstream hStream)
CUresult cuGLUnmapBufferObjectAsync | ( | GLuint | buffer, | |
CUstream | hStream | |||
) |
buffer
for access
by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
Stream hStream
in the current CUDA context is synchronized
with the current GL context.
cuGraphicsUnmapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGraphicsUnregisterResource(CUgraphicsResource resource)
CUresult cuGraphicsUnregisterResource | ( | CUgraphicsResource | resource | ) |
Unregisters the graphics resource resource
so it is not
accessible by CUDA unless registered again.
If resource
is invalid then CUDA_ERROR_INVALID_HANDLE is
returned.
cuGraphicsGLRegisterBuffer(jcuda.driver.CUgraphicsResource, int, int)
,
cuGraphicsGLRegisterImage(jcuda.driver.CUgraphicsResource, int, int, int)
public static int cuGraphicsSubResourceGetMappedArray(CUarray pArray, CUgraphicsResource resource, int arrayIndex, int mipLevel)
CUresult cuGraphicsSubResourceGetMappedArray | ( | CUarray * | pArray, | |
CUgraphicsResource | resource, | |||
unsigned int | arrayIndex, | |||
unsigned int | mipLevel | |||
) |
Returns in *pArray
an array through which the subresource
of the mapped graphics resource resource
which corresponds
to array index arrayIndex
and mipmap level
mipLevel
may be accessed. The value set in
*pArray
may change every time that resource
is mapped.
If resource
is not a texture then it cannot be accessed
via an array and CUDA_ERROR_NOT_MAPPED_AS_ARRAY is returned. If
arrayIndex
is not a valid array index for
resource
then CUDA_ERROR_INVALID_VALUE is returned. If
mipLevel
is not a valid mipmap level for resource
then CUDA_ERROR_INVALID_VALUE is returned. If resource
is
not mapped then CUDA_ERROR_NOT_MAPPED is returned.
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsResourceGetMappedPointer(CUdeviceptr pDevPtr, long[] pSize, CUgraphicsResource resource)
CUresult cuGraphicsResourceGetMappedPointer | ( | CUdeviceptr * | pDevPtr, | |
size_t * | pSize, | |||
CUgraphicsResource | resource | |||
) |
Returns in *pDevPtr
a pointer through which the mapped
graphics resource resource
may be accessed. Returns in
pSize
the size of the memory in bytes which may be accessed
from that pointer. The value set in pPointer
may change
every time that resource
is mapped.
If resource
is not a buffer then it cannot be accessed
via a pointer and CUDA_ERROR_NOT_MAPPED_AS_POINTER is returned. If
resource
is not mapped then CUDA_ERROR_NOT_MAPPED is
returned. *
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsSubResourceGetMappedArray(jcuda.driver.CUarray, jcuda.driver.CUgraphicsResource, int, int)
public static int cuGraphicsResourceSetMapFlags(CUgraphicsResource resource, int flags)
CUresult cuGraphicsResourceSetMapFlags | ( | CUgraphicsResource | resource, | |
unsigned int | flags | |||
) |
Set flags
for mapping the graphics resource
resource
.
Changes to flags
will take effect the next time
resource
is mapped. The flags
argument may
be any of the following:
If resource
is presently mapped for access by CUDA then
CUDA_ERROR_ALREADY_MAPPED is returned. If flags
is not
one of the above values then CUDA_ERROR_INVALID_VALUE is returned.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGraphicsMapResources(int count, CUgraphicsResource[] resources, CUstream hStream)
CUresult cuGraphicsMapResources | ( | unsigned int | count, | |
CUgraphicsResource * | resources, | |||
CUstream | hStream | |||
) |
Maps the count
graphics resources in resources
for access by CUDA.
The resources in resources
may be accessed by CUDA until
they are unmapped. The graphics API from which resources
were registered should not access any resources while they are mapped
by CUDA. If an application does so, the results are undefined.
This function provides the synchronization guarantee that any graphics
calls issued before cuGraphicsMapResources() will complete before any
subsequent CUDA work issued in stream
begins.
If resources
includes any duplicate entries then
CUDA_ERROR_INVALID_HANDLE is returned. If any of resources
are presently mapped for access by CUDA then CUDA_ERROR_ALREADY_MAPPED
is returned.
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, long[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsUnmapResources(int count, CUgraphicsResource[] resources, CUstream hStream)
CUresult cuGraphicsUnmapResources | ( | unsigned int | count, | |
CUgraphicsResource * | resources, | |||
CUstream | hStream | |||
) |
Unmaps the count
graphics resources in
resources
.
Once unmapped, the resources in resources
may not be
accessed by CUDA until they are mapped again.
This function provides the synchronization guarantee that any CUDA work
issued in stream
before cuGraphicsUnmapResources() will
complete before any subsequently issued graphics work begins.
If resources
includes any duplicate entries then
CUDA_ERROR_INVALID_HANDLE is returned. If any of resources
are not presently mapped for access by CUDA then CUDA_ERROR_NOT_MAPPED
is returned.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuCtxSetLimit(int limit, long value)
CUresult cuCtxSetLimit | ( | CUlimit | limit, | |
size_t | value | |||
) |
Setting limit
to value
is a request by the
application to update the current limit maintained by the context. The
driver is free to modify the requested value to meet h/w requirements
(this could be clamping to minimum or maximum values, rounding up to
nearest element size, etc). The application can use cuCtxGetLimit() to
find out exactly what the limit has been set to.
Setting each CUlimit has its own specific restrictions, so each is discussed here.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSynchronize()
public static int cuCtxGetCacheConfig(int[] pconfig)
CUresult cuCtxGetCacheConfig | ( | CUfunc_cache * | pconfig | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this returns through pconfig
the preferred
cache configuration for the current context. This is only a preference.
The driver will use the requested configuration if possible, but it is
free to choose a different configuration if required to execute
functions.
This will return a pconfig
of CU_FUNC_CACHE_PREFER_NONE
on devices where the size of the L1 cache and shared memory are
fixed.
The supported cache configurations are:
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
public static int cuCtxSetCacheConfig(int config)
CUresult cuCtxSetCacheConfig | ( | CUfunc_cache | config | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through config
the preferred cache
configuration for the current context. This is only a preference. The
driver will use the requested configuration if possible, but it is free
to choose a different configuration if required to execute the function.
Any function preference set via cuFuncSetCacheConfig() will be preferred
over this context-wide setting. Setting the context-wide cache
configuration to CU_FUNC_CACHE_PREFER_NONE will cause subsequent kernel
launches to prefer to not change the cache configuration unless required
to launch the kernel.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxGetLimit(long[], int)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
public static int cuLaunchKernel(CUfunction f, int gridDimX, int gridDimY, int gridDimZ, int blockDimX, int blockDimY, int blockDimZ, int sharedMemBytes, CUstream hStream, Pointer kernelParams, Pointer extra)
public static int cuCtxGetLimit(long[] pvalue, int limit)
CUresult cuCtxGetLimit | ( | size_t * | pvalue, | |
CUlimit | limit | |||
) |
Returns in *pvalue
the current size of limit
.
The supported CUlimit values are:
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetCacheConfig(int[])
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSetCacheConfig(int)
,
cuCtxSetLimit(int, long)
,
cuCtxSynchronize()
public static int cuProfilerInitialize(java.lang.String configFile, java.lang.String outputFile, int outputMode)
configFile
- Name of the config file that lists the counters for profiling.outputFile
- Name of the outputFile where the profiling results will be stored.outputMode
- outputMode, can be CU_OUT_KEY_VALUE_PAIR or CU_OUT_CSV.
cuProfilerStart()
,
cuProfilerStop()
public static int cuProfilerStart()
cuProfilerInitialize(java.lang.String, java.lang.String, int)
,
cuProfilerStop()
public static int cuProfilerStop()
cuProfilerInitialize(java.lang.String, java.lang.String, int)
,
cuProfilerStart()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |