|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjcuda.runtime.JCuda
public class JCuda
Java bindings for the NVidia CUDA runtime API.
Most comments are extracted from the CUDA online documentation
Field Summary | |
---|---|
static int |
cudaArrayCubemap
Must be set in cudaMalloc3DArray to create a cubemap CUDA array |
static int |
cudaArrayDefault
Default CUDA array allocation flag |
static int |
cudaArrayLayered
Must be set in cudaMalloc3DArray to create a layered CUDA array |
static int |
cudaArraySurfaceLoadStore
Must be set in cudaMallocArray or cudaMalloc3DArray in order to bind surfaces to the CUDA array |
static int |
cudaArrayTextureGather
Must be set in cudaMallocArray or cudaMalloc3DArray in order to perform texture gather operations on the CUDA array |
static int |
cudaDeviceBlockingSync
Deprecated. As of CUDA 4.0 and replaced by cudaDeviceScheduleBlockingSync |
static int |
cudaDeviceLmemResizeToMax
Device flag - Keep local memory allocation after launch |
static int |
cudaDeviceMapHost
Device flag - Support mapped pinned allocations |
static int |
cudaDeviceMask
Device flags mask |
static int |
cudaDeviceScheduleAuto
Device flag - Automatic scheduling |
static int |
cudaDeviceScheduleBlockingSync
Device flag - Use blocking synchronization |
static int |
cudaDeviceScheduleMask
Device schedule flags mask |
static int |
cudaDeviceScheduleSpin
Device flag - Spin default scheduling |
static int |
cudaDeviceScheduleYield
Device flag - Yield default scheduling |
static int |
cudaEventBlockingSync
Event uses blocking synchronization |
static int |
cudaEventDefault
Default event flag |
static int |
cudaEventDisableTiming
Event will not record timing data |
static int |
cudaEventInterprocess
Event is suitable for interprocess use. cudaEventDisableTiming must be set |
static int |
cudaHostAllocDefault
Default page-locked allocation flag |
static int |
cudaHostAllocMapped
Map allocation into device space |
static int |
cudaHostAllocPortable
Pinned memory accessible by all CUDA contexts |
static int |
cudaHostAllocWriteCombined
Write-combined memory |
static int |
cudaHostRegisterDefault
Default host memory registration flag |
static int |
cudaHostRegisterMapped
Map registered memory into device space |
static int |
cudaHostRegisterPortable
Pinned memory accessible by all CUDA contexts |
static int |
cudaIpcMemLazyEnablePeerAccess
Automatically enable peer access between remote devices as needed |
static int |
cudaPeerAccessDefault
Default peer addressing enable flag |
static int |
CUDART_VERSION
CUDA runtime version |
static int |
cudaSurfaceType1D
cudaSurfaceType1D |
static int |
cudaSurfaceType1DLayered
cudaSurfaceType1DLayered |
static int |
cudaSurfaceType2D
cudaSurfaceType2D |
static int |
cudaSurfaceType2DLayered
cudaSurfaceType2DLayered |
static int |
cudaSurfaceType3D
cudaSurfaceType3D |
static int |
cudaSurfaceTypeCubemap
cudaSurfaceTypeCubemap |
static int |
cudaSurfaceTypeCubemapLayered
cudaSurfaceTypeCubemapLayered |
static int |
cudaTextureType1D
cudaTextureType1D |
static int |
cudaTextureType1DLayered
cudaTextureType1DLayered |
static int |
cudaTextureType2D
cudaTextureType2D |
static int |
cudaTextureType2DLayered
cudaTextureType2DLayered |
static int |
cudaTextureType3D
cudaTextureType3D |
static int |
cudaTextureTypeCubemap
cudaTextureTypeCubemap |
static int |
cudaTextureTypeCubemapLayered
cudaTextureTypeCubemapLayered |
Method Summary | |
---|---|
static int |
cudaArrayGetInfo(cudaChannelFormatDesc desc,
cudaExtent extent,
int[] flags,
cudaArray array)
Gets info about the specified cudaArray. |
static int |
cudaBindSurfaceToArray(surfaceReference surfref,
cudaArray array,
cudaChannelFormatDesc desc)
Binds an array to a surface. |
static int |
cudaBindTexture(long[] offset,
textureReference texref,
Pointer devPtr,
cudaChannelFormatDesc desc,
long size)
Binds a memory area to a texture. |
static int |
cudaBindTexture2D(long[] offset,
textureReference texref,
Pointer devPtr,
cudaChannelFormatDesc desc,
long width,
long height,
long pitch)
Binds a 2D memory area to a texture. |
static int |
cudaBindTextureToArray(textureReference texref,
cudaArray array,
cudaChannelFormatDesc desc)
Binds an array to a texture. |
static int |
cudaChooseDevice(int[] device,
cudaDeviceProp prop)
Select compute-device which best matches criteria. |
static int |
cudaConfigureCall(dim3 gridDim,
dim3 blockDim,
long sharedMem,
cudaStream_t stream)
Configure a device-launch. |
static cudaChannelFormatDesc |
cudaCreateChannelDesc(int x,
int y,
int z,
int w,
int cudaChannelFormatKind_f)
Returns a channel descriptor using the specified format. |
static int |
cudaDeviceCanAccessPeer(int[] canAccessPeer,
int device,
int peerDevice)
Queries if a device may directly access a peer device's memory. |
static int |
cudaDeviceDisablePeerAccess(int peerDevice)
Disables direct access to memory allocations on a peer device and unregisters any registered allocations from that device. |
static int |
cudaDeviceEnablePeerAccess(int peerDevice,
int flags)
Enables direct access to memory allocations on a peer device. |
static int |
cudaDeviceGetByPCIBusId(int[] device,
java.lang.String pciBusId)
Returns a handle to a compute device. |
static int |
cudaDeviceGetCacheConfig(int[] pCacheConfig)
Returns the preferred cache configuration for the current device. |
static int |
cudaDeviceGetLimit(long[] pValue,
int limit)
Returns resource limits. |
static int |
cudaDeviceGetPCIBusId(java.lang.String[] pciBusId,
int len,
int device)
Returns a PCI Bus Id string for the device. |
static int |
cudaDeviceReset()
Destroy all allocations and reset all state on the current device in the current process. |
static int |
cudaDeviceSetCacheConfig(int cacheConfig)
Sets the preferred cache configuration for the current device. |
static int |
cudaDeviceSetLimit(int limit,
long value)
Set resource limits. |
static int |
cudaDeviceSynchronize()
Wait for compute device to finish. |
static int |
cudaDriverGetVersion(int[] driverVersion)
Returns the CUDA driver version. |
static int |
cudaEventCreate(cudaEvent_t event)
Creates an event object. |
static int |
cudaEventCreateWithFlags(cudaEvent_t event,
int flags)
Creates an event object with the specified flags. |
static int |
cudaEventDestroy(cudaEvent_t event)
Destroys an event object. |
static int |
cudaEventElapsedTime(float[] ms,
cudaEvent_t start,
cudaEvent_t end)
Computes the elapsed time between events. |
static int |
cudaEventQuery(cudaEvent_t event)
Queries an event's status. |
static int |
cudaEventRecord(cudaEvent_t event,
cudaStream_t stream)
Records an event. |
static int |
cudaEventSynchronize(cudaEvent_t event)
Waits for an event to complete. |
static int |
cudaFree(Pointer devPtr)
Frees memory on the device. |
static int |
cudaFreeArray(cudaArray array)
Frees an array on the device. |
static int |
cudaFreeHost(Pointer ptr)
Frees page-locked memory. |
static int |
cudaFuncGetAttributes(cudaFuncAttributes attr,
java.lang.String func)
Find out attributes for a given function. |
static int |
cudaGetChannelDesc(cudaChannelFormatDesc desc,
cudaArray array)
Get the channel descriptor of an array. |
static int |
cudaGetDevice(int[] device)
Returns which device is currently being used. |
static int |
cudaGetDeviceCount(int[] count)
Returns the number of compute-capable devices. |
static int |
cudaGetDeviceProperties(cudaDeviceProp prop,
int device)
Returns information about the compute-device. |
static java.lang.String |
cudaGetErrorString(int error)
Returns the message string from an error code. |
static int |
cudaGetLastError()
Returns the last error from a runtime call. |
static int |
cudaGetSurfaceReference(surfaceReference surfref,
java.lang.String symbol)
Deprecated. As of CUDA 4.1 |
static int |
cudaGetSymbolAddress(Pointer devPtr,
java.lang.String symbol)
Finds the address associated with a CUDA symbol. |
static int |
cudaGetSymbolSize(long[] size,
java.lang.String symbol)
Finds the size of the object associated with a CUDA symbol. |
static int |
cudaGetTextureAlignmentOffset(long[] offset,
textureReference texref)
Get the alignment offset of a texture. |
static int |
cudaGetTextureReference(textureReference texref,
java.lang.String symbol)
Deprecated. As of CUDA 4.1 |
static int |
cudaGLGetDevices(int[] pCudaDeviceCount,
int[] pCudaDevices,
int cudaDeviceCount,
int cudaGLDeviceList_deviceList)
Gets the CUDA devices associated with the current OpenGL context. |
static int |
cudaGLMapBufferObject(Pointer devPtr,
int bufObj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLMapBufferObjectAsync(Pointer devPtr,
int bufObj,
cudaStream_t stream)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLRegisterBufferObject(int bufObj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLSetBufferObjectMapFlags(int bufObj,
int flags)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLSetGLDevice(int device)
Sets a CUDA device to use OpenGL interoperability. |
static int |
cudaGLUnmapBufferObject(int bufObj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLUnmapBufferObjectAsync(int bufObj,
cudaStream_t stream)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGLUnregisterBufferObject(int bufObj)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource,
int buffer,
int Flags)
Registers an OpenGL buffer object. |
static int |
cudaGraphicsGLRegisterImage(cudaGraphicsResource resource,
int image,
int target,
int Flags)
Register an OpenGL texture or renderbuffer object. |
static int |
cudaGraphicsMapResources(int count,
cudaGraphicsResource[] resources,
cudaStream_t stream)
Map graphics resources for access by CUDA. |
static int |
cudaGraphicsResourceGetMappedPointer(Pointer devPtr,
long[] size,
cudaGraphicsResource resource)
Get an device pointer through which to access a mapped graphics resource. |
static int |
cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource,
int flags)
Set usage flags for mapping a graphics resource. |
static int |
cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr,
cudaGraphicsResource resource,
int arrayIndex,
int mipLevel)
Get an array through which to access a subresource of a mapped graphics resource. |
static int |
cudaGraphicsUnmapResources(int count,
cudaGraphicsResource[] resources,
cudaStream_t stream)
Unmap graphics resources. |
static int |
cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
Unregisters a graphics resource for access by CUDA. |
static int |
cudaHostAlloc(Pointer ptr,
long size,
int flags)
Allocates page-locked memory on the host. |
static int |
cudaHostGetDevicePointer(Pointer pDevice,
Pointer pHost,
int flags)
Passes back device pointer of mapped host memory allocated by cudaHostAlloc() or registered by cudaHostRegister(). |
static int |
cudaHostRegister(Pointer ptr,
long size,
int flags)
Registers an existing host memory range for use by CUDA. |
static int |
cudaHostUnregister(Pointer ptr)
Unregisters a memory range that was registered with cudaHostRegister(). |
static int |
cudaIpcCloseMemHandle(Pointer devPtr)
Close memory mapped with ::cudaIpcOpenMemHandle. |
static int |
cudaIpcGetEventHandle(cudaIpcEventHandle handle,
cudaEvent_t event)
Gets an interprocess handle for a previously allocated event. |
static int |
cudaIpcGetMemHandle(cudaIpcMemHandle handle,
Pointer devPtr)
Gets an interprocess memory handle for an existing device memory allocation. |
static int |
cudaIpcOpenEventHandle(cudaEvent_t event,
cudaIpcEventHandle handle)
Opens an interprocess event handle for use in the current process. |
static int |
cudaIpcOpenMemHandle(Pointer devPtr,
cudaIpcMemHandle handle,
int flags)
Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process. |
static int |
cudaLaunch(java.lang.String symbol)
Launches a device function. |
static int |
cudaMalloc(Pointer devPtr,
long size)
Allocate memory on the device. |
static int |
cudaMalloc3D(cudaPitchedPtr pitchDevPtr,
cudaExtent extent)
Allocates logical 1D, 2D, or 3D memory objects on the device. |
static int |
cudaMalloc3DArray(cudaArray arrayPtr,
cudaChannelFormatDesc desc,
cudaExtent extent)
Calls cudaMalloc3DArray wit the default value '0' as the last parameter |
static int |
cudaMalloc3DArray(cudaArray arrayPtr,
cudaChannelFormatDesc desc,
cudaExtent extent,
int flags)
Allocate an array on the device. |
static int |
cudaMallocArray(cudaArray array,
cudaChannelFormatDesc desc,
long width,
long height)
Calls cudaMallocArray wit the default value '0' as the last parameter |
static int |
cudaMallocArray(cudaArray array,
cudaChannelFormatDesc desc,
long width,
long height,
int flags)
Allocate an array on the device. |
static int |
cudaMallocHost(Pointer ptr,
long size)
Allocates page-locked memory on the host. |
static int |
cudaMallocPitch(Pointer devPtr,
long[] pitch,
long width,
long height)
Allocates pitched memory on the device. |
static int |
cudaMemcpy(Pointer dst,
Pointer src,
long count,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpy2D(Pointer dst,
long dpitch,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpy2DArrayToArray(cudaArray dst,
long wOffsetDst,
long hOffsetDst,
cudaArray src,
long wOffsetSrc,
long hOffsetSrc,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpy2DAsync(Pointer dst,
long dpitch,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpy2DFromArray(Pointer dst,
long dpitch,
cudaArray src,
long wOffset,
long hOffset,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpy2DFromArrayAsync(Pointer dst,
long dpitch,
cudaArray src,
long wOffset,
long hOffset,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpy2DToArray(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpy2DToArrayAsync(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpy3D(cudaMemcpy3DParms p)
Copies data between 3D objects. |
static int |
cudaMemcpy3DAsync(cudaMemcpy3DParms p,
cudaStream_t stream)
Copies data between 3D objects. |
static int |
cudaMemcpy3DPeer(cudaMemcpy3DPeerParms p)
Copies memory between devices. |
static int |
cudaMemcpy3DPeerAsync(cudaMemcpy3DPeerParms p,
cudaStream_t stream)
Copies memory between devices asynchronously. |
static int |
cudaMemcpyArrayToArray(cudaArray dst,
long wOffsetDst,
long hOffsetDst,
cudaArray src,
long wOffsetSrc,
long hOffsetSrc,
long count,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpyAsync(Pointer dst,
Pointer src,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpyFromArray(Pointer dst,
cudaArray src,
long wOffset,
long hOffset,
long count,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpyFromArrayAsync(Pointer dst,
cudaArray src,
long wOffset,
long hOffset,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpyFromSymbol(Pointer dst,
java.lang.String symbol,
long count,
long offset,
int cudaMemcpyKind_kind)
Copies data from the given symbol on the device. |
static int |
cudaMemcpyFromSymbolAsync(Pointer dst,
java.lang.String symbol,
long count,
long offset,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data from the given symbol on the device. |
static int |
cudaMemcpyPeer(Pointer dst,
int dstDevice,
Pointer src,
int srcDevice,
long count)
Copies memory between two devices. |
static int |
cudaMemcpyPeerAsync(Pointer dst,
int dstDevice,
Pointer src,
int srcDevice,
long count,
cudaStream_t stream)
Copies memory between two devices asynchronously. |
static int |
cudaMemcpyToArray(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long count,
int cudaMemcpyKind_kind)
Copies data between host and device. |
static int |
cudaMemcpyToArrayAsync(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device. |
static int |
cudaMemcpyToSymbol(java.lang.String symbol,
Pointer src,
long count,
long offset,
int cudaMemcpyKind_kind)
Copies data to the given symbol on the device. |
static int |
cudaMemcpyToSymbolAsync(java.lang.String symbol,
Pointer src,
long count,
long offset,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data to the given symbol on the device. |
static int |
cudaMemGetInfo(long[] free,
long[] total)
Gets free and total device memory. |
static int |
cudaMemset(Pointer mem,
int c,
long count)
Initializes or sets device memory to a value. |
static int |
cudaMemset2D(Pointer mem,
long pitch,
int c,
long width,
long height)
Initializes or sets device memory to a value. |
static int |
cudaMemset2DAsync(Pointer devPtr,
long pitch,
int value,
long width,
long height,
cudaStream_t stream)
Initializes or sets device memory to a value. |
static int |
cudaMemset3D(cudaPitchedPtr pitchDevPtr,
int value,
cudaExtent extent)
Initializes or sets device memory to a value. |
static int |
cudaMemset3DAsync(cudaPitchedPtr pitchedDevPtr,
int value,
cudaExtent extent,
cudaStream_t stream)
Initializes or sets device memory to a value. |
static int |
cudaMemsetAsync(Pointer devPtr,
int value,
long count,
cudaStream_t stream)
Initializes or sets device memory to a value. |
static int |
cudaPeekAtLastError()
Returns the last error from a runtime call. |
static int |
cudaPointerGetAttributes(cudaPointerAttributes attributes,
Pointer ptr)
Returns attributes about a specified pointer. |
static int |
cudaProfilerInitialize(java.lang.String configFile,
java.lang.String outputFile,
int outputMode)
Initialize the profiling. |
static int |
cudaProfilerStart()
Start the profiling. |
static int |
cudaProfilerStop()
Stop the profiling. |
static int |
cudaRuntimeGetVersion(int[] runtimeVersion)
Returns the CUDA Runtime version. |
static int |
cudaSetDevice(int device)
Set device to be used for GPU executions. |
static int |
cudaSetDeviceFlags(int flags)
Sets flags to be used for device executions. |
static int |
cudaSetupArgument(Pointer arg,
long size,
long offset)
Configure a device launch. |
static int |
cudaSetValidDevices(int[] device_arr,
int len)
Set a list of devices that can be used for CUDA. |
static int |
cudaStreamCreate(cudaStream_t stream)
Create an asynchronous stream. |
static int |
cudaStreamDestroy(cudaStream_t stream)
Destroys and cleans up an asynchronous stream. |
static int |
cudaStreamQuery(cudaStream_t stream)
Queries an asynchronous stream for completion status. |
static int |
cudaStreamSynchronize(cudaStream_t stream)
Waits for stream tasks to complete. |
static int |
cudaStreamWaitEvent(cudaStream_t stream,
cudaEvent_t event,
int flags)
Make a compute stream wait on an event. |
static int |
cudaThreadExit()
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaThreadGetCacheConfig(int[] pCacheConfig)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaThreadGetLimit(long[] pValue,
int limit)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaThreadSetCacheConfig(int cacheConfig)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaThreadSetLimit(int limit,
long value)
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaThreadSynchronize()
Deprecated. This function is deprecated in the latest CUDA version |
static int |
cudaUnbindTexture(textureReference texref)
Unbinds a texture. |
static void |
initialize()
Initializes the native library. |
static void |
setExceptionsEnabled(boolean enabled)
Enables or disables exceptions. |
static void |
setLogLevel(LogLevel logLevel)
Set the specified log level for the JCuda runtime library. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int CUDART_VERSION
public static final int cudaHostAllocDefault
public static final int cudaHostAllocPortable
public static final int cudaHostAllocMapped
public static final int cudaHostAllocWriteCombined
public static final int cudaHostRegisterDefault
public static final int cudaHostRegisterPortable
public static final int cudaHostRegisterMapped
public static final int cudaPeerAccessDefault
public static final int cudaEventDefault
public static final int cudaEventBlockingSync
public static final int cudaEventDisableTiming
public static final int cudaEventInterprocess
public static final int cudaDeviceScheduleAuto
public static final int cudaDeviceScheduleSpin
public static final int cudaDeviceScheduleYield
public static final int cudaDeviceScheduleBlockingSync
public static final int cudaDeviceBlockingSync
public static final int cudaDeviceScheduleMask
public static final int cudaDeviceMapHost
public static final int cudaDeviceLmemResizeToMax
public static final int cudaDeviceMask
public static final int cudaArrayDefault
public static final int cudaArrayLayered
public static final int cudaArraySurfaceLoadStore
public static final int cudaArrayCubemap
public static final int cudaArrayTextureGather
public static final int cudaIpcMemLazyEnablePeerAccess
public static final int cudaSurfaceType1D
public static final int cudaSurfaceType2D
public static final int cudaSurfaceType3D
public static final int cudaSurfaceTypeCubemap
public static final int cudaSurfaceType1DLayered
public static final int cudaSurfaceType2DLayered
public static final int cudaSurfaceTypeCubemapLayered
public static final int cudaTextureType1D
public static final int cudaTextureType2D
public static final int cudaTextureType3D
public static final int cudaTextureTypeCubemap
public static final int cudaTextureType1DLayered
public static final int cudaTextureType2DLayered
public static final int cudaTextureTypeCubemapLayered
Method Detail |
---|
public static void initialize()
public static void setLogLevel(LogLevel logLevel)
logLevel
- The log level to use.public static void setExceptionsEnabled(boolean enabled)
enabled
- Whether exceptions are enabledpublic static int cudaGetDeviceCount(int[] count)
cudaError_t cudaGetDeviceCount | ( | int * | count | ) |
Returns in *count
the number of devices with compute
capability greater or equal to 1.0 that are available for execution.
If there is no such device then cudaGetDeviceCount() will return
cudaErrorNoDevice. If no driver can be loaded to determine if any such
devices exist then cudaGetDeviceCount() will return
cudaErrorInsufficientDriver.
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaSetDevice(int device)
cudaError_t cudaSetDevice | ( | int | device | ) |
Sets device
as the current device for the calling host
thread.
Any device memory subsequently allocated from this host thread using
cudaMalloc(), cudaMallocPitch() or cudaMallocArray() will be physically
resident on device
. Any host memory allocated from this
host thread using cudaMallocHost() or cudaHostAlloc() or
cudaHostRegister() will have its lifetime associated with
device
. Any streams or events created from this host
thread will be associated with device
. Any kernels launched
from this host thread using the <<<>>> operator or
cudaLaunch() will be executed on device
.
This call may be made from any host thread, to any device, and at any time. This function will do no synchronization with the previous or new device, and should be considered a very low overhead call.
cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaSetDeviceFlags(int flags)
cudaError_t cudaSetDeviceFlags | ( | unsigned int | flags | ) |
Records flags
as the flags to use when initializing the
current device. If no device has been made current to the calling
thread then flags
will be applied to the initialization
of any device initialized by the calling host thread, unless that
device has had its initialization flags set explicitly by this or any
host thread.
If the current device has been set and that device has already been
initialized then this call will fail with the error
cudaErrorSetOnActiveProcess. In this case it is necessary to reset
device
using cudaDeviceReset() before the device's
initialization flags may be set.
The two LSBs of the flags
parameter can be used to control
how the CPU thread interacts with the OS scheduler when waiting for
results from the device.
flags
parameter is zero, uses a heuristic based on the number of active CUDA
contexts in the process C
and the number of logical
processors in the system P
. If C
>
P
, then CUDA will yield to other OS threads when waiting
for the device, otherwise CUDA will not yield while waiting for results
and actively spin on the processor.
cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaSetDevice(int)
,
cudaSetValidDevices(int[], int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaSetValidDevices(int[] device_arr, int len)
cudaError_t cudaSetValidDevices | ( | int * | device_arr, | |
int | len | |||
) |
Sets a list of devices for CUDA execution in priority order using
device_arr
. The parameter len
specifies the
number of elements in the list. CUDA will try devices from the list
sequentially until it finds one that works. If this function is not
called, or if it is called with a len
of 0, then CUDA will
go back to its default behavior of trying devices sequentially from a
default list containing all of the available CUDA devices in the
system. If a specified device ID in the list does not exist, this
function will return cudaErrorInvalidDevice. If len
is
not 0 and device_arr
is NULL or if len
exceeds the number of devices in the system, then cudaErrorInvalidValue
is returned.
cudaGetDeviceCount(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaSetDeviceFlags(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaGetDevice(int[] device)
cudaError_t cudaGetDevice | ( | int * | device | ) |
Returns in *device
the current device for the calling host
thread.
cudaGetDeviceCount(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaGetDeviceProperties(cudaDeviceProp prop, int device)
cudaError_t cudaGetDeviceProperties | ( | struct cudaDeviceProp * | prop, | |
int | device | |||
) |
Returns in *prop
the properties of device dev
.
The cudaDeviceProp structure is defined as:
struct cudaDeviceProp { char name[256]; size_t totalGlobalMem; size_t sharedMemPerBlock; int regsPerBlock; int warpSize; size_t memPitch; int maxThreadsPerBlock; int maxThreadsDim[3]; int maxGridSize[3]; int clockRate; size_t totalConstMem; int major; int minor; size_t textureAlignment; size_t texturePitchAlignment; int deviceOverlap; int multiProcessorCount; int kernelExecTimeoutEnabled; int integrated; int canMapHostMemory; int computeMode; int maxTexture1D; int maxTexture1DLinear; int maxTexture2D[2]; int maxTexture2DLinear[3]; int maxTexture2DGather[2]; int maxTexture3D[3]; int maxTextureCubemap; int maxTexture1DLayered[2]; int maxTexture2DLayered[3]; int maxTextureCubemapLayered[2]; int maxSurface1D; int maxSurface2D[2]; int maxSurface3D[3]; int maxSurface1DLayered[2]; int maxSurface2DLayered[3]; int maxSurfaceCubemap; int maxSurfaceCubemapLayered[2]; size_t surfaceAlignment; int concurrentKernels; int ECCEnabled; int pciBusID; int pciDeviceID; int pciDomainID; int tccDriver; int asyncEngineCount; int unifiedAddressing; int memoryClockRate; int memoryBusWidth; int l2CacheSize; int maxThreadsPerMultiProcessor; }
device
with computeMode cudaComputeModeExclusive, cudaErrorDeviceAlreadyInUse
will be immediately returned indicating the device cannot be used. When
an occupied exclusive mode device is chosen with cudaSetDevice, all
subsequent non-device management runtime functions will return
cudaErrorDevicesUnavailable.
cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaChooseDevice(int[] device, cudaDeviceProp prop)
cudaError_t cudaChooseDevice | ( | int * | device, | |
const struct cudaDeviceProp * | prop | |||
) |
Returns in *device
the device which has properties that
best match *prop
.
cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
public static int cudaMalloc3D(cudaPitchedPtr pitchDevPtr, cudaExtent extent)
cudaError_t cudaMalloc3D | ( | struct cudaPitchedPtr * | pitchedDevPtr, | |
struct cudaExtent | extent | |||
) |
Allocates at least width
* height
*
depth
bytes of linear memory on the device and returns a
cudaPitchedPtr in which ptr
is a pointer to the allocated
memory. The function may pad the allocation to ensure hardware alignment
requirements are met. The pitch returned in the pitch
field of pitchedDevPtr
is the width in bytes of the
allocation.
The returned cudaPitchedPtr contains additional fields xsize
and ysize
, the logical width and height of the allocation,
which are equivalent to the width
and height
extent
parameters provided by the programmer during
allocation.
For allocations of 2D and 3D objects, it is highly recommended that programmers perform allocations using cudaMalloc3D() or cudaMallocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays).
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMalloc3DArray(cudaArray arrayPtr, cudaChannelFormatDesc desc, cudaExtent extent)
cudaMalloc3DArray(cudaArray, cudaChannelFormatDesc, cudaExtent, int)
public static int cudaMalloc3DArray(cudaArray arrayPtr, cudaChannelFormatDesc desc, cudaExtent extent, int flags)
cudaError_t cudaMalloc3DArray | ( | struct cudaArray ** | array, | |
const struct cudaChannelFormatDesc * | desc, | |||
struct cudaExtent | extent, | |||
unsigned int | flags = 0 |
|||
) |
Allocates a CUDA array according to the cudaChannelFormatDesc structure
desc
and returns a handle to the new CUDA array in
*array
.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };
cudaMalloc3DArray() can allocate the following:
The flags
parameter enables different options to be
specified that affect the allocation, as follows.
The width, height and depth extents must meet certain size requirements as listed in the following table. All values are specified in elements.
Note that 2D CUDA arrays have different size requirements if the cudaArrayTextureGather flag is set. In that case, the valid range for (width, height, depth) is ((1,maxTexture2DGather[0]), (1,maxTexture2DGather[1]), 0).
CUDA array type | Valid extents that must always
be met {(width range in elements), (height range), (depth range)} |
Valid extents with cudaArraySurfaceLoadStore set {(width range in elements), (height range), (depth range)} |
1D | { (1,maxTexture1D), 0, 0 } | { (1,maxSurface1D), 0, 0 } |
2D | { (1,maxTexture2D[0]), (1,maxTexture2D[1]), 0 } | { (1,maxSurface2D[0]), (1,maxSurface2D[1]), 0 } |
3D | { (1,maxTexture3D[0]), (1,maxTexture3D[1]), (1,maxTexture3D[2]) } | { (1,maxSurface3D[0]), (1,maxSurface3D[1]), (1,maxSurface3D[2]) } |
1D Layered | { (1,maxTexture1DLayered[0]), 0, (1,maxTexture1DLayered[1]) } | { (1,maxSurface1DLayered[0]), 0, (1,maxSurface1DLayered[1]) } |
2D Layered | { (1,maxTexture2DLayered[0]), (1,maxTexture2DLayered[1]), (1,maxTexture2DLayered[2]) } | { (1,maxSurface2DLayered[0]), (1,maxSurface2DLayered[1]), (1,maxSurface2DLayered[2]) } |
Cubemap | { (1,maxTextureCubemap), (1,maxTextureCubemap), 6 } | { (1,maxSurfaceCubemap), (1,maxSurfaceCubemap), 6 } |
Cubemap Layered | { (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[1]) } | { (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[1]) } |
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMemset3D(cudaPitchedPtr pitchDevPtr, int value, cudaExtent extent)
cudaError_t cudaMemset3D | ( | struct cudaPitchedPtr | pitchedDevPtr, | |
int | value, | |||
struct cudaExtent | extent | |||
) |
Initializes each element of a 3D array to the specified value
value
. The object to initialize is defined by
pitchedDevPtr
. The pitch
field of
pitchedDevPtr
is the width in memory in bytes of the 3D
array pointed to by pitchedDevPtr
, including any padding
added to the end of each row. The xsize
field specifies
the logical width of each row in bytes, while the ysize
field specifies the height of each 2D slice in rows.
The extents of the initialized region are specified as a
width
in bytes, a height
in rows, and a
depth
in slices.
Extents with width
greater than or equal to the
xsize
of pitchedDevPtr
may perform
significantly faster than extents narrower than the xsize
.
Secondarily, extents with height
equal to the
ysize
of pitchedDevPtr
will perform faster
than when the height
is shorter than the
ysize
.
This function performs fastest when the pitchedDevPtr
has
been allocated by cudaMalloc3D().
cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
public static int cudaMemsetAsync(Pointer devPtr, int value, long count, cudaStream_t stream)
cudaError_t cudaMemsetAsync | ( | void * | devPtr, | |
int | value, | |||
size_t | count, | |||
cudaStream_t | stream = 0 |
|||
) |
Fills the first count
bytes of the memory area pointed to
by devPtr
with the constant byte value
value
.
cudaMemsetAsync() is asynchronous with respect to the host, so the call
may return before the memset is complete. The operation can optionally
be associated to a stream by passing a non-zero stream
argument. If stream
is non-zero, the operation may overlap
with operations in other streams.
cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset2DAsync(Pointer devPtr, long pitch, int value, long width, long height, cudaStream_t stream)
cudaError_t cudaMemset2DAsync | ( | void * | devPtr, | |
size_t | pitch, | |||
int | value, | |||
size_t | width, | |||
size_t | height, | |||
cudaStream_t | stream = 0 |
|||
) |
Sets to the specified value value
a matrix
(height
rows of width
bytes each) pointed to
by dstPtr
. pitch
is the width in bytes of
the 2D array pointed to by dstPtr
, including any padding
added to the end of each row. This function performs fastest when the
pitch is one that has been passed back by cudaMallocPitch().
cudaMemset2DAsync() is asynchronous with respect to the host, so the
call may return before the memset is complete. The operation can
optionally be associated to a stream by passing a non-zero
stream
argument. If stream
is non-zero, the
operation may overlap with operations in other streams.
cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset3DAsync(cudaPitchedPtr pitchedDevPtr, int value, cudaExtent extent, cudaStream_t stream)
cudaError_t cudaMemset3DAsync | ( | struct cudaPitchedPtr | pitchedDevPtr, | |
int | value, | |||
struct cudaExtent | extent, | |||
cudaStream_t | stream = 0 |
|||
) |
Initializes each element of a 3D array to the specified value
value
. The object to initialize is defined by
pitchedDevPtr
. The pitch
field of
pitchedDevPtr
is the width in memory in bytes of the 3D
array pointed to by pitchedDevPtr
, including any padding
added to the end of each row. The xsize
field specifies
the logical width of each row in bytes, while the ysize
field specifies the height of each 2D slice in rows.
The extents of the initialized region are specified as a
width
in bytes, a height
in rows, and a
depth
in slices.
Extents with width
greater than or equal to the
xsize
of pitchedDevPtr
may perform
significantly faster than extents narrower than the xsize
.
Secondarily, extents with height
equal to the
ysize
of pitchedDevPtr
will perform faster
than when the height
is shorter than the
ysize
.
This function performs fastest when the pitchedDevPtr
has
been allocated by cudaMalloc3D().
cudaMemset3DAsync() is asynchronous with respect to the host, so the
call may return before the memset is complete. The operation can
optionally be associated to a stream by passing a non-zero
stream
argument. If stream
is non-zero, the
operation may overlap with operations in other streams.
cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
public static int cudaMemcpy3D(cudaMemcpy3DParms p)
cudaError_t cudaMemcpy3D | ( | const struct cudaMemcpy3DParms * | p | ) |
struct cudaExtent { size_t width; size_t height; size_t depth; }; struct cudaExtent make_cudaExtent(size_t w, size_t h, size_t d); struct cudaPos { size_t x; size_t y; size_t z; }; struct cudaPos make_cudaPos(size_t x, size_t y, size_t z); struct cudaMemcpy3DParms { struct cudaArray *srcArray; struct cudaPos srcPos; struct cudaPitchedPtr srcPtr; struct cudaArray *dstArray; struct cudaPos dstPos; struct cudaPitchedPtr dstPtr; struct cudaExtent extent; enum cudaMemcpyKind kind; };
cudaMemcpy3D() copies data betwen two 3D objects. The source and destination objects may be in either host memory, device memory, or a CUDA array. The source, destination, extent, and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use:
cudaMemcpy3DParms myParms = {0};
The struct passed to cudaMemcpy3D() must specify one of
srcArray
or srcPtr
and one of
dstArray
or dstPtr
. Passing more than one
non-zero source or destination will cause cudaMemcpy3D() to return an
error.
The srcPos
and dstPos
fields are optional
offsets into the source and destination objects and are defined in
units of each object's elements. The element for a host or device
pointer is assumed to be unsigned char. For CUDA arrays,
positions must be in the range [0, 2048) for any dimension.
The extent
field defines the dimensions of the transferred
area in elements. If a CUDA array is participating in the copy, the
extent is defined in terms of that array's elements. If no CUDA array
is participating in the copy then the extents are defined in elements
of unsigned char.
The kind
field defines the direction of the copy. It must
be one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice.
If the source and destination are both arrays, cudaMemcpy3D() will return an error if they do not have the same element size.
The source and destination object may not overlap. If overlapping source and destination objects are specified, undefined behavior will result.
The source object must lie entirely within the region defined by
srcPos
and extent
. The destination object
must lie entirely within the region defined by dstPos
and
extent
.
cudaMemcpy3D() returns an error if the pitch of srcPtr
or
dstPtr
exceeds the maximum allowed. The pitch of a
cudaPitchedPtr allocated with cudaMalloc3D() will always be valid.
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemcpy3DAsync(jcuda.runtime.cudaMemcpy3DParms, jcuda.runtime.cudaStream_t)
,
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy3DPeer(cudaMemcpy3DPeerParms p)
cudaError_t cudaMemcpy3DPeer | ( | const struct cudaMemcpy3DPeerParms * | p | ) |
Perform a 3D memory copy according to the parameters specified in
p
. See the definition of the cudaMemcpy3DPeerParms
structure for documentation of its parameters.
Note that this function is synchronous with respect to the host only if the source or destination of the transfer is host memory. Note also that this copy is serialized with respect to all pending and future asynchronous work in to the current device, the copy's source device, and the copy's destination device (use cudaMemcpy3DPeerAsync to avoid this synchronization).
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy3DAsync(cudaMemcpy3DParms p, cudaStream_t stream)
cudaError_t cudaMemcpy3DAsync | ( | const struct cudaMemcpy3DParms * | p, | |
cudaStream_t | stream = 0 |
|||
) |
struct cudaExtent { size_t width; size_t height; size_t depth; }; struct cudaExtent make_cudaExtent(size_t w, size_t h, size_t d); struct cudaPos { size_t x; size_t y; size_t z; }; struct cudaPos make_cudaPos(size_t x, size_t y, size_t z); struct cudaMemcpy3DParms { struct cudaArray *srcArray; struct cudaPos srcPos; struct cudaPitchedPtr srcPtr; struct cudaArray *dstArray; struct cudaPos dstPos; struct cudaPitchedPtr dstPtr; struct cudaExtent extent; enum cudaMemcpyKind kind; };
cudaMemcpy3DAsync() copies data betwen two 3D objects. The source and destination objects may be in either host memory, device memory, or a CUDA array. The source, destination, extent, and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use:
cudaMemcpy3DParms myParms = {0};
The struct passed to cudaMemcpy3DAsync() must specify one of
srcArray
or srcPtr
and one of
dstArray
or dstPtr
. Passing more than one
non-zero source or destination will cause cudaMemcpy3DAsync() to return
an error.
The srcPos
and dstPos
fields are optional
offsets into the source and destination objects and are defined in
units of each object's elements. The element for a host or device
pointer is assumed to be unsigned char. For CUDA arrays,
positions must be in the range [0, 2048) for any dimension.
The extent
field defines the dimensions of the transferred
area in elements. If a CUDA array is participating in the copy, the
extent is defined in terms of that array's elements. If no CUDA array
is participating in the copy then the extents are defined in elements
of unsigned char.
The kind
field defines the direction of the copy. It must
be one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice.
If the source and destination are both arrays, cudaMemcpy3DAsync() will return an error if they do not have the same element size.
The source and destination object may not overlap. If overlapping source and destination objects are specified, undefined behavior will result.
The source object must lie entirely within the region defined by
srcPos
and extent
. The destination object
must lie entirely within the region defined by dstPos
and
extent
.
cudaMemcpy3DAsync() returns an error if the pitch of srcPtr
or dstPtr
exceeds the maximum allowed. The pitch of a
cudaPitchedPtr allocated with cudaMalloc3D() will always be valid.
cudaMemcpy3DAsync() is asynchronous with respect to the host, so the
call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms)
,
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy3DPeerAsync(cudaMemcpy3DPeerParms p, cudaStream_t stream)
cudaError_t cudaMemcpy3DPeerAsync | ( | const struct cudaMemcpy3DPeerParms * | p, | |
cudaStream_t | stream = 0 |
|||
) |
Perform a 3D memory copy according to the parameters specified in
p
. See the definition of the cudaMemcpy3DPeerParms
structure for documentation of its parameters.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemGetInfo(long[] free, long[] total)
free
- Returned free memory in bytestotal
- Returned total memory in bytes
public static int cudaArrayGetInfo(cudaChannelFormatDesc desc, cudaExtent extent, int[] flags, cudaArray array)
desc
- - Returned array typeextent
- - Returned array shape. 2D arrays will have depth of zeroflags
- - Returned array flagsarray
- - The ::cudaArray to get info for
public static int cudaHostAlloc(Pointer ptr, long size, int flags)
cudaError_t cudaHostAlloc | ( | void ** | pHost, | |
size_t | size, | |||
unsigned int | flags | |||
) |
Allocates size
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cudaMemcpy(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of pinned memory may degrade system
performance, since it reduces the amount of memory available to the
system for paging. As a result, this function is best used sparingly
to allocate staging areas for data exchange between host and
device.
The flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.
The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.
Memory allocated by this function must be freed with cudaFreeHost().
cudaSetDeviceFlags(int)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
public static int cudaHostRegister(Pointer ptr, long size, int flags)
cudaError_t cudaHostRegister | ( | void * | ptr, | |
size_t | size, | |||
unsigned int | flags | |||
) |
Page-locks the memory range specified by ptr
and
size
and maps it for the device(s) as specified by
flags
. This memory range also is added to the same tracking
mechanism as cudaHostAlloc() to automatically accelerate calls to
functions such as cudaMemcpy(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory that has not been registered. Page-locking
excessive amounts of memory may degrade system performance, since it
reduces the amount of memory available to the system for paging. As a
result, this function is best used sparingly to register staging areas
for data exchange between host and device.
The flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may page-lock memory that is portable or mapped with no restrictions.
The CUDA context must have been created with the cudaMapHost flag in order for the cudaHostRegisterMapped flag to have any effect.
The cudaHostRegisterMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostRegisterPortable flag.
The pointer ptr
and size size
must be aligned
to the host page size (4 KB).
The memory page-locked by this function must be unregistered with cudaHostUnregister().
cudaHostUnregister(jcuda.Pointer)
,
cudaHostGetDevicePointer(jcuda.Pointer, jcuda.Pointer, int)
public static int cudaHostUnregister(Pointer ptr)
cudaError_t cudaHostUnregister | ( | void * | ptr | ) |
Unmaps the memory range whose base address is specified by
ptr
, and makes it pageable again.
The base address must be the same one specified to cudaHostRegister().
cudaHostUnregister(jcuda.Pointer)
public static int cudaHostGetDevicePointer(Pointer pDevice, Pointer pHost, int flags)
cudaError_t cudaHostGetDevicePointer | ( | void ** | pDevice, | |
void * | pHost, | |||
unsigned int | flags | |||
) |
Passes back the device pointer corresponding to the mapped, pinned host buffer allocated by cudaHostAlloc() or registered by cudaHostRegister().
cudaHostGetDevicePointer() will fail if the cudaDeviceMapHost flag was not specified before deferred context creation occurred, or if called on a device that does not support mapped, pinned memory.
flags
provides for future releases. For now, it must be
set to 0.
cudaSetDeviceFlags(int)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMalloc(Pointer devPtr, long size)
cudaError_t cudaMalloc | ( | void ** | devPtr, | |
size_t | size | |||
) |
Allocates size
bytes of linear memory on the device and
returns in *devPtr
a pointer to the allocated memory. The
allocated memory is suitably aligned for any kind of variable. The
memory is not cleared. cudaMalloc() returns cudaErrorMemoryAllocation
in case of failure.
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocHost(Pointer ptr, long size)
cudaError_t cudaMallocHost | ( | void ** | ptr, | |
size_t | size, | |||
unsigned int | flags | |||
) |
Allocates size
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cudaMemcpy(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of pinned memory may degrade system
performance, since it reduces the amount of memory available to the
system for paging. As a result, this function is best used sparingly
to allocate staging areas for data exchange between host and
device.
The flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.
The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.
Memory allocated by this function must be freed with cudaFreeHost().
cudaSetDeviceFlags(int)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocPitch(Pointer devPtr, long[] pitch, long width, long height)
cudaError_t cudaMallocPitch | ( | void ** | devPtr, | |
size_t * | pitch, | |||
size_t | width, | |||
size_t | height | |||
) |
Allocates at least width
(in bytes) * height
bytes of linear memory on the device and returns in *devPtr
a pointer to the allocated memory. The function may pad the allocation
to ensure that corresponding pointers in any given row will continue
to meet the alignment requirements for coalescing as the address is
updated from row to row. The pitch returned in *pitch
by
cudaMallocPitch() is the width in bytes of the allocation. The intended
usage of pitch
is as a separate parameter of the
allocation, used to compute addresses within the 2D array. Given the
row and column of an array element of type T
, the address
is computed as:
T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cudaMallocPitch(). Due to pitch alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).
cudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocArray(cudaArray array, cudaChannelFormatDesc desc, long width, long height)
cudaMallocArray(cudaArray, cudaChannelFormatDesc, long, long, int)
public static int cudaMallocArray(cudaArray array, cudaChannelFormatDesc desc, long width, long height, int flags)
cudaError_t cudaMallocArray | ( | struct cudaArray ** | array, | |
const struct cudaChannelFormatDesc * | desc, | |||
size_t | width, | |||
size_t | height = 0 , |
|||
unsigned int | flags = 0 |
|||
) |
Allocates a CUDA array according to the cudaChannelFormatDesc structure
desc
and returns a handle to the new CUDA array in
*array
.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };
The flags
parameter enables different options to be
specified that affect the allocation, as follows.
width
and height
must meet certain size
requirements. See cudaMalloc3DArray() for more details.
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFree(Pointer devPtr)
cudaError_t cudaFree | ( | void * | devPtr | ) |
Frees the memory space pointed to by devPtr
, which must
have been returned by a previous call to cudaMalloc() or
cudaMallocPitch(). Otherwise, or if cudaFree(devPtr
) has
already been called before, an error is returned. If devPtr
is 0, no operation is performed. cudaFree() returns
cudaErrorInvalidDevicePointer in case of failure.
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFreeHost(Pointer ptr)
cudaError_t cudaFreeHost | ( | void * | ptr | ) |
Frees the memory space pointed to by hostPtr
, which must
have been returned by a previous call to cudaMallocHost() or
cudaHostAlloc().
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFreeArray(cudaArray array)
cudaError_t cudaFreeArray | ( | struct cudaArray * | array | ) |
Frees the CUDA array array
, which must have been * returned
by a previous call to cudaMallocArray(). If cudaFreeArray(array
)
has already been called before, cudaErrorInvalidValue is returned. If
devPtr
is 0, no operation is performed.
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMemcpy(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy | ( | void * | dst, | |
const void * | src, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies count
bytes from the memory area pointed to by
src
to the memory area pointed to by dst
,
where kind
is one of cudaMemcpyHostToHost,
cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice,
and specifies the direction of the copy. The memory areas may not
overlap. Calling cudaMemcpy() with dst
and src
pointers that do not match the direction of the copy results in an
undefined behavior.
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyPeer(Pointer dst, int dstDevice, Pointer src, int srcDevice, long count)
cudaError_t cudaMemcpyPeer | ( | void * | dst, | |
int | dstDevice, | |||
const void * | src, | |||
int | srcDevice, | |||
size_t | count | |||
) |
Copies memory from one device to memory on another device.
dst
is the base device pointer of the destination memory
and dstDevice
is the destination device. src
is the base device pointer of the source memory and srcDevice
is the source device. count
specifies the number of bytes
to copy.
Note that this function is asynchronous with respect to the host, but
serialized with respect all pending and future asynchronous work in to
the current device, srcDevice
, and dstDevice
(use cudaMemcpyPeerAsync to avoid this synchronization).
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy3DPeer(jcuda.runtime.cudaMemcpy3DPeerParms)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyToArray | ( | struct cudaArray * | dst, | |
size_t | wOffset, | |||
size_t | hOffset, | |||
const void * | src, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies count
bytes from the memory area pointed to by
src
to the CUDA array dst
starting at the
upper left corner (wOffset
, hOffset
), where
kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromArray(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyFromArray | ( | void * | dst, | |
const struct cudaArray * | src, | |||
size_t | wOffset, | |||
size_t | hOffset, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies count
bytes from the CUDA array src
starting at the upper left corner (wOffset
, hOffset) to
the memory area pointed to by dst
, where kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyArrayToArray | ( | struct cudaArray * | dst, | |
size_t | wOffsetDst, | |||
size_t | hOffsetDst, | |||
const struct cudaArray * | src, | |||
size_t | wOffsetSrc, | |||
size_t | hOffsetSrc, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind =
cudaMemcpyDeviceToDevice
|
|||
) |
Copies count
bytes from the CUDA array src
starting at the upper left corner (wOffsetSrc
,
hOffsetSrc
) to the CUDA array dst
starting
at the upper left corner (wOffsetDst
, hOffsetDst
)
where kind
is one of cudaMemcpyHostToHost,
cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice,
and specifies the direction of the copy.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2D(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2D | ( | void * | dst, | |
size_t | dpitch, | |||
const void * | src, | |||
size_t | spitch, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies a matrix (height
rows of width
bytes
each) from the memory area pointed to by src
to the memory
area pointed to by dst
, where kind
is one of
cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost,
or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
dpitch
and spitch
are the widths in memory
in bytes of the 2D arrays pointed to by dst
and
src
, including any padding added to the end of each row.
The memory areas may not overlap. width
must not exceed
either dpitch
or spitch
. Calling cudaMemcpy2D()
with dst
and src
pointers that do not match
the direction of the copy results in an undefined behavior.
cudaMemcpy2D() returns an error if dpitch
or
spitch
exceeds the maximum allowed.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DToArray | ( | struct cudaArray * | dst, | |
size_t | wOffset, | |||
size_t | hOffset, | |||
const void * | src, | |||
size_t | spitch, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies a matrix (height
rows of width
bytes
each) from the memory area pointed to by src
to the CUDA
array dst
starting at the upper left corner
(wOffset
, hOffset
) where kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy. spitch
is the width in memory in
bytes of the 2D array pointed to by src
, including any
padding added to the end of each row. wOffset
+
width
must not exceed the width of the CUDA array
dst
. width
must not exceed spitch
.
cudaMemcpy2DToArray() returns an error if spitch
exceeds
the maximum allowed.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DFromArray(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DFromArray | ( | void * | dst, | |
size_t | dpitch, | |||
const struct cudaArray * | src, | |||
size_t | wOffset, | |||
size_t | hOffset, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind | |||
) |
Copies a matrix (height
rows of width
bytes
each) from the CUDA array srcArray
starting at the upper
left corner (wOffset
, hOffset
) to the memory
area pointed to by dst
, where kind
is one of
cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost,
or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
dpitch
is the width in memory in bytes of the 2D array
pointed to by dst
, including any padding added to the end
of each row. wOffset
+ width
must not exceed
the width of the CUDA array src
. width
must
not exceed dpitch
. cudaMemcpy2DFromArray() returns an
error if dpitch
exceeds the maximum allowed.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DArrayToArray | ( | struct cudaArray * | dst, | |
size_t | wOffsetDst, | |||
size_t | hOffsetDst, | |||
const struct cudaArray * | src, | |||
size_t | wOffsetSrc, | |||
size_t | hOffsetSrc, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind =
cudaMemcpyDeviceToDevice
|
|||
) |
Copies a matrix (height
rows of width
bytes
each) from the CUDA array srcArray
starting at the upper
left corner (wOffsetSrc
, hOffsetSrc
) to the
CUDA array dst
starting at the upper left corner
(wOffsetDst
, hOffsetDst
), where
kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy. wOffsetDst
+ width
must not exceed the width of the CUDA array dst
.
wOffsetSrc
+ width
must not exceed the width
of the CUDA array src
.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToSymbol(java.lang.String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyToSymbol | ( | const char * | symbol, | |
const void * | src, | |||
size_t | count, | |||
size_t | offset = 0 , |
|||
enum cudaMemcpyKind | kind =
cudaMemcpyHostToDevice
|
|||
) |
Copies count
bytes from the memory area pointed to by
src
to the memory area pointed to by offset
bytes from the start of symbol symbol
. The memory areas
may not overlap. symbol
can either be a variable that
resides in global or constant memory space, or it can be a character
string, naming a variable that resides in global or constant memory
space. kind
can be either cudaMemcpyHostToDevice or
cudaMemcpyDeviceToDevice.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromSymbol(Pointer dst, java.lang.String symbol, long count, long offset, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyFromSymbol | ( | void * | dst, | |
const char * | symbol, | |||
size_t | count, | |||
size_t | offset = 0 , |
|||
enum cudaMemcpyKind | kind =
cudaMemcpyDeviceToHost
|
|||
) |
Copies count
bytes from the memory area pointed to by
offset
bytes from the start of symbol symbol
to the memory area pointed to by dst
. The memory areas
may not overlap. symbol
can either be a variable that
resides in global or constant memory space, or it can be a character
string, naming a variable that resides in global or constant memory
space. kind
can be either cudaMemcpyDeviceToHost or
cudaMemcpyDeviceToDevice.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyAsync(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyAsync | ( | void * | dst, | |
const void * | src, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies count
bytes from the memory area pointed to by
src
to the memory area pointed to by dst
,
where kind
is one of cudaMemcpyHostToHost,
cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice,
and specifies the direction of the copy. The memory areas may not
overlap. Calling cudaMemcpyAsync() with dst
and
src
pointers that do not match the direction of the copy
results in an undefined behavior.
cudaMemcpyAsync() is asynchronous with respect to the host, so the call
may return before the copy is complete. It only works on page-locked
host memory and returns an error if a pointer to pageable memory is
passed as input. The copy can optionally be associated to a stream by
passing a non-zero stream
argument. If kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and the
stream
is non-zero, the copy may overlap with operations
in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyPeerAsync(Pointer dst, int dstDevice, Pointer src, int srcDevice, long count, cudaStream_t stream)
cudaError_t cudaMemcpyPeerAsync | ( | void * | dst, | |
int | dstDevice, | |||
const void * | src, | |||
int | srcDevice, | |||
size_t | count, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies memory from one device to memory on another device.
dst
is the base device pointer of the destination memory
and dstDevice
is the destination device. src
is the base device pointer of the source memory and srcDevice
is the source device. count
specifies the number of bytes
to copy.
Note that this function is asynchronous with respect to the host and all work in other streams and other devices.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpy3DPeer(jcuda.runtime.cudaMemcpy3DPeerParms)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyToArrayAsync | ( | struct cudaArray * | dst, | |
size_t | wOffset, | |||
size_t | hOffset, | |||
const void * | src, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies count
bytes from the memory area pointed to by
src
to the CUDA array dst
starting at the
upper left corner (wOffset
, hOffset
), where
kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy.
cudaMemcpyToArrayAsync() is asynchronous with respect to the host, so
the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromArrayAsync(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyFromArrayAsync | ( | void * | dst, | |
const struct cudaArray * | src, | |||
size_t | wOffset, | |||
size_t | hOffset, | |||
size_t | count, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies count
bytes from the CUDA array src
starting at the upper left corner (wOffset
, hOffset) to
the memory area pointed to by dst
, where kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy.
cudaMemcpyFromArrayAsync() is asynchronous with respect to the host,
so the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DAsync(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DAsync | ( | void * | dst, | |
size_t | dpitch, | |||
const void * | src, | |||
size_t | spitch, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies a matrix (height
rows of width
bytes
each) from the memory area pointed to by src
to the memory
area pointed to by dst
, where kind
is one of
cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost,
or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
dpitch
and spitch
are the widths in memory
in bytes of the 2D arrays pointed to by dst
and
src
, including any padding added to the end of each row.
The memory areas may not overlap. width
must not exceed
either dpitch
or spitch
. Calling
cudaMemcpy2DAsync() with dst
and src
pointers
that do not match the direction of the copy results in an undefined
behavior. cudaMemcpy2DAsync() returns an error if dpitch
or spitch
is greater than the maximum allowed.
cudaMemcpy2DAsync() is asynchronous with respect to the host, so the
call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DToArrayAsync | ( | struct cudaArray * | dst, | |
size_t | wOffset, | |||
size_t | hOffset, | |||
const void * | src, | |||
size_t | spitch, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies a matrix (height
rows of width
bytes
each) from the memory area pointed to by src
to the CUDA
array dst
starting at the upper left corner
(wOffset
, hOffset
) where kind
is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice,
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the
direction of the copy. spitch
is the width in memory in
bytes of the 2D array pointed to by src
, including any
padding added to the end of each row. wOffset
+
width
must not exceed the width of the CUDA array
dst
. width
must not exceed spitch
.
cudaMemcpy2DToArrayAsync() returns an error if spitch
exceeds the maximum allowed.
cudaMemcpy2DToArrayAsync() is asynchronous with respect to the host,
so the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DFromArrayAsync(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DFromArrayAsync | ( | void * | dst, | |
size_t | dpitch, | |||
const struct cudaArray * | src, | |||
size_t | wOffset, | |||
size_t | hOffset, | |||
size_t | width, | |||
size_t | height, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies a matrix (height
rows of width
bytes
each) from the CUDA array srcArray
starting at the upper
left corner (wOffset
, hOffset
) to the memory
area pointed to by dst
, where kind
is one of
cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost,
or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
dpitch
is the width in memory in bytes of the 2D array
pointed to by dst
, including any padding added to the end
of each row. wOffset
+ width
must not exceed
the width of the CUDA array src
. width
must
not exceed dpitch
. cudaMemcpy2DFromArrayAsync() returns
an error if dpitch
exceeds the maximum allowed.
cudaMemcpy2DFromArrayAsync() is asynchronous with respect to the host,
so the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
and stream
is non-zero, the copy may overlap with
operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToSymbolAsync(java.lang.String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyToSymbolAsync | ( | const char * | symbol, | |
const void * | src, | |||
size_t | count, | |||
size_t | offset, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies count
bytes from the memory area pointed to by
src
to the memory area pointed to by offset
bytes from the start of symbol symbol
. The memory areas
may not overlap. symbol
can either be a variable that
resides in global or constant memory space, or it can be a character
string, naming a variable that resides in global or constant memory
space. kind
can be either cudaMemcpyHostToDevice or
cudaMemcpyDeviceToDevice.
cudaMemcpyToSymbolAsync() is asynchronous with respect to the host, so
the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyHostToDevice and stream
is
non-zero, the copy may overlap with operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromSymbolAsync(Pointer dst, java.lang.String symbol, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyFromSymbolAsync | ( | void * | dst, | |
const char * | symbol, | |||
size_t | count, | |||
size_t | offset, | |||
enum cudaMemcpyKind | kind, | |||
cudaStream_t | stream = 0 |
|||
) |
Copies count
bytes from the memory area pointed to by
offset
bytes from the start of symbol symbol
to the memory area pointed to by dst
. The memory areas
may not overlap. symbol
can either be a variable that
resides in global or constant memory space, or it can be a character
string, naming a variable that resides in global or constant memory
space. kind
can be either cudaMemcpyDeviceToHost or
cudaMemcpyDeviceToDevice.
cudaMemcpyFromSymbolAsync() is asynchronous with respect to the host,
so the call may return before the copy is complete. It only works on
page-locked host memory and returns an error if a pointer to pageable
memory is passed as input. The copy can optionally be associated to a
stream by passing a non-zero stream
argument. If
kind
is cudaMemcpyDeviceToHost and stream
is
non-zero, the copy may overlap with operations in other streams.
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemset(Pointer mem, int c, long count)
cudaError_t cudaMemset | ( | void * | devPtr, | |
int | value, | |||
size_t | count | |||
) |
Fills the first count
bytes of the memory area pointed to
by devPtr
with the constant byte value
value
.
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset2D(Pointer mem, long pitch, int c, long width, long height)
cudaError_t cudaMemset2D | ( | void * | devPtr, | |
size_t | pitch, | |||
int | value, | |||
size_t | width, | |||
size_t | height | |||
) |
Sets to the specified value value
a matrix
(height
rows of width
bytes each) pointed to
by dstPtr
. pitch
is the width in bytes of
the 2D array pointed to by dstPtr
, including any padding
added to the end of each row. This function performs fastest when the
pitch is one that has been passed back by cudaMallocPitch().
cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaGetChannelDesc(cudaChannelFormatDesc desc, cudaArray array)
cudaError_t cudaGetChannelDesc | ( | struct cudaChannelFormatDesc * | desc, | |
const struct cudaArray * | array | |||
) |
Returns in *desc
the channel descriptor of the CUDA array
array
.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static cudaChannelFormatDesc cudaCreateChannelDesc(int x, int y, int z, int w, int cudaChannelFormatKind_f)
cudaChannelFormatDesc cudaCreateChannelDesc | ( | void | ) |
Returns a channel descriptor with format f
and number of
bits of each component x
, y
, z
,
and w
. The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };
where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
f
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetLastError()
cudaError_t cudaGetLastError | ( | void | ) |
Returns the last error that has been produced by any of the runtime calls in the same host thread and resets it to cudaSuccess.
cudaPeekAtLastError()
,
cudaGetErrorString(int)
,
cudaError
public static int cudaPeekAtLastError()
cudaError_t cudaPeekAtLastError | ( | void | ) |
Returns the last error that has been produced by any of the runtime calls in the same host thread. Note that this call does not reset the error to cudaSuccess like cudaGetLastError().
cudaGetLastError()
,
cudaGetErrorString(int)
,
cudaError
public static java.lang.String cudaGetErrorString(int error)
const char* cudaGetErrorString | ( | cudaError_t | error | ) |
Returns the message string from an error code.
char*
pointer to a NULL-terminated stringcudaGetLastError()
,
cudaPeekAtLastError()
,
cudaError
public static int cudaStreamCreate(cudaStream_t stream)
cudaError_t cudaStreamCreate | ( | cudaStream_t * | pStream | ) |
Creates a new asynchronous stream.
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamDestroy(cudaStream_t stream)
cudaError_t cudaStreamDestroy | ( | cudaStream_t | stream | ) |
Destroys and cleans up the asynchronous stream specified by
stream
.
In the case that the device is still doing work in the stream
stream
when cudaStreamDestroy() is called, the function
will return immediately and the resources associated with
stream
will be released automatically once the device has
completed all work in stream
.
cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
public static int cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, int flags)
cudaError_t cudaStreamWaitEvent | ( | cudaStream_t | stream, | |
cudaEvent_t | event, | |||
unsigned int | flags | |||
) |
Makes all future work submitted to stream
wait until
event
reports completion before beginning execution. This
synchronization will be performed efficiently on the device. The event
event
may be from a different context than
stream
, in which case this function will perform
cross-device synchronization.
The stream stream
will wait only for the completion of
the most recent host call to cudaEventRecord() on event
.
Once this call has returned, any functions (including cudaEventRecord()
and cudaEventDestroy()) may be called on event
again, and
the subsequent calls will not have any effect on
stream
.
If stream
is NULL, any future work submitted in any stream
will wait for event
to complete before beginning execution.
This effectively creates a barrier for all future work submitted to
the device on this thread.
If cudaEventRecord() has not been called on event
, this
call acts as if the record has already completed, and so is a functional
no-op.
cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamSynchronize(cudaStream_t stream)
cudaError_t cudaStreamSynchronize | ( | cudaStream_t | stream | ) |
Blocks until stream
has completed all operations. If the
cudaDeviceScheduleBlockingSync flag was set for this device, the host
thread will block until the stream is finished with all of its
tasks.
cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamQuery(cudaStream_t stream)
cudaError_t cudaStreamQuery | ( | cudaStream_t | stream | ) |
Returns cudaSuccess if all operations in stream
have
completed, or cudaErrorNotReady if not.
cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaEventCreate(cudaEvent_t event)
cudaError_t cudaEventCreate | ( | cudaEvent_t * | event, | |
unsigned int | flags | |||
) |
Creates an event object with the specified flags. Valid flags include:
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventCreateWithFlags(cudaEvent_t event, int flags)
cudaError_t cudaEventCreateWithFlags | ( | cudaEvent_t * | event, | |
unsigned int | flags | |||
) |
Creates an event object with the specified flags. Valid flags include:
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventRecord(cudaEvent_t event, cudaStream_t stream)
cudaError_t cudaEventRecord | ( | cudaEvent_t | event, | |
cudaStream_t | stream = 0 |
|||
) |
Records an event. If stream
is non-zero, the event is
recorded after all preceding operations in stream
have
been completed; otherwise, it is recorded after all preceding operations
in the CUDA context have been completed. Since operation is asynchronous,
cudaEventQuery() and/or cudaEventSynchronize() must be used to determine
when the event has actually been recorded.
If cudaEventRecord() has previously been called on event
,
then this call will overwrite any existing state in event
.
Any subsequent calls which examine the status of event
will only examine the completion of this most recent call to
cudaEventRecord().
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventQuery(cudaEvent_t event)
cudaError_t cudaEventQuery | ( | cudaEvent_t | event | ) |
Query the status of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams, as specified by the arguments to cudaEventRecord()).
If this work has successfully been completed by the device, or if
cudaEventRecord() has not been called on event
, then
cudaSuccess is returned. If this work has not yet been completed by
the device then cudaErrorNotReady is returned.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventSynchronize(cudaEvent_t event)
cudaError_t cudaEventSynchronize | ( | cudaEvent_t | event | ) |
Wait until the completion of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams, as specified by the arguments to cudaEventRecord()).
If cudaEventRecord() has not been called on event
,
cudaSuccess is returned immediately.
Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling CPU thread to block until the event has been completed by the device. If the cudaEventBlockingSync flag has not been set, then the CPU thread will busy-wait until the event has been completed by the device.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventDestroy(cudaEvent_t event)
cudaError_t cudaEventDestroy | ( | cudaEvent_t | event | ) |
Destroys the event specified by event
.
In the case that event
has been recorded but has not yet
been completed when cudaEventDestroy() is called, the function will
return immediately and the resources associated with event
will be released automatically once the device has completed
event
.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventElapsedTime(float[] ms, cudaEvent_t start, cudaEvent_t end)
cudaError_t cudaEventElapsedTime | ( | float * | ms, | |
cudaEvent_t | start, | |||
cudaEvent_t | end | |||
) |
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
If cudaEventRecord() has not been called on either event, then cudaErrorInvalidResourceHandle is returned. If cudaEventRecord() has been called on both events but one or both of them has not yet been completed (that is, cudaEventQuery() would return cudaErrorNotReady on at least one of the events), cudaErrorNotReady is returned. If either event was created with the cudaEventDisableTiming flag, then this function will return cudaErrorInvalidResourceHandle.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
public static int cudaDeviceReset()
cudaError_t cudaDeviceReset | ( | void | ) |
Explicitly destroys and cleans up all resources associated with the current device in the current process. Any subsequent API call to this device will reinitialize the device.
Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.
cudaDeviceSynchronize()
public static int cudaDeviceSynchronize()
cudaError_t cudaDeviceSynchronize | ( | void | ) |
Blocks until the device has completed all preceding requested tasks. cudaDeviceSynchronize() returns an error if one of the preceding tasks has failed. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the device has finished its work.
cudaDeviceSynchronize()
public static int cudaDeviceSetLimit(int limit, long value)
cudaError_t cudaDeviceSetLimit | ( | enum cudaLimit | limit, | |
size_t | value | |||
) |
Setting limit
to value
is a request by the
application to update the current limit maintained by the device. The
driver is free to modify the requested value to meet h/w requirements
(this could be clamping to minimum or maximum values, rounding up to
nearest element size, etc). The application can use cudaDeviceGetLimit()
to find out exactly what the limit has been set to.
Setting each cudaLimit has its own specific restrictions, so each is discussed here.
cudaDeviceGetLimit(long[], int)
public static int cudaDeviceGetLimit(long[] pValue, int limit)
cudaError_t cudaDeviceGetLimit | ( | size_t * | pValue, | |
enum cudaLimit | limit | |||
) |
Returns in *pValue
the current size of limit
.
The supported cudaLimit values are:
cudaDeviceSetLimit(int, long)
public static int cudaDeviceGetCacheConfig(int[] pCacheConfig)
cudaError_t cudaDeviceGetCacheConfig | ( | enum cudaFuncCache * | pCacheConfig | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this returns through pCacheConfig
the preferred
cache configuration for the current device. This is only a preference.
The runtime will use the requested configuration if possible, but it
is free to choose a different configuration if required to execute
functions.
This will return a pCacheConfig
of cudaFuncCachePreferNone
on devices where the size of the L1 cache and shared memory are
fixed.
The supported cache configurations are:
cudaDeviceSetCacheConfig(int)
public static int cudaDeviceSetCacheConfig(int cacheConfig)
cudaError_t cudaDeviceSetCacheConfig | ( | enum cudaFuncCache | cacheConfig | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through cacheConfig
the preferred
cache configuration for the current device. This is only a preference.
The runtime will use the requested configuration if possible, but it
is free to choose a different configuration if required to execute the
function. Any function preference set via cudaDeviceSetCacheConfig (C
API) or cudaDeviceSetCacheConfig (C++ API) will be preferred over this
device-wide setting. Setting the device-wide cache configuration to
cudaFuncCachePreferNone will cause subsequent kernel launches to prefer
to not change the cache configuration unless required to launch the
kernel.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cudaDeviceGetCacheConfig(int[])
public static int cudaDeviceGetByPCIBusId(int[] device, java.lang.String pciBusId)
cudaError_t cudaDeviceGetByPCIBusId | ( | int * | device, | |
char * | pciBusId | |||
) |
Returns in *device
a device ordinal given a PCI bus ID
string.
cudaDeviceGetPCIBusId(java.lang.String[], int, int)
public static int cudaDeviceGetPCIBusId(java.lang.String[] pciBusId, int len, int device)
cudaError_t cudaDeviceGetPCIBusId | ( | char * | pciBusId, | |
int | len, | |||
int | device | |||
) |
Returns an ASCII string identifying the device dev
in the
NULL-terminated string pointed to by pciBusId
.
len
specifies the maximum length of the string that may
be returned.
cudaDeviceGetByPCIBusId(int[], java.lang.String)
public static int cudaIpcGetEventHandle(cudaIpcEventHandle handle, cudaEvent_t event)
cudaError_t cudaIpcGetEventHandle | ( | cudaIpcEventHandle_t * | handle, | |
cudaEvent_t | event | |||
) |
Takes as input a previously allocated event. This event must have been created with the cudaEventInterprocess and cudaEventDisableTiming flags set. This opaque handle may be copied into other processes and opened with cudaIpcOpenEventHandle to allow efficient hardware synchronization between GPU work in different processes.
After the event has been been opened in the importing process, cudaEventRecord, cudaEventSynchronize, cudaStreamWaitEvent and cudaEventQuery may be used in either process. Performing operations on the imported event after the exported event has been freed with cudaEventDestroy will result in undefined behavior.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcOpenEventHandle(cudaEvent_t event, cudaIpcEventHandle handle)
cudaError_t cudaIpcOpenEventHandle | ( | cudaEvent_t * | event, | |
cudaIpcEventHandle_t | handle | |||
) |
Opens an interprocess event handle exported from another process with cudaIpcGetEventHandle. This function returns a cudaEvent_t that behaves like a locally created event with the cudaEventDisableTiming flag specified. This event must be freed with cudaEventDestroy.
Performing operations on the imported event after the exported event has been freed with cudaEventDestroy will result in undefined behavior.
cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcGetMemHandle(cudaIpcMemHandle handle, Pointer devPtr)
cudaError_t cudaIpcGetMemHandle | ( | cudaIpcMemHandle_t * | handle, | |
void * | devPtr | |||
) |
/brief Gets an interprocess memory handle for an existing device memory allocation
Takes a pointer to the base of an existing device memory allocation created with cudaMalloc and exports it for use in another process. This is a lightweight operation and may be called multiple times on an allocation without adverse effects.
If a region of memory is freed with cudaFree and a subsequent call to cudaMalloc returns memory with the same device address, cudaIpcGetMemHandle will return a unique handle for the new memory.
cudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcOpenMemHandle(Pointer devPtr, cudaIpcMemHandle handle, int flags)
cudaError_t cudaIpcOpenMemHandle | ( | void ** | devPtr, | |
cudaIpcMemHandle_t | handle | |||
) |
/brief Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process.
Maps memory exported from another process with cudaIpcGetMemHandle into the current device address space. For contexts on different devices cudaIpcOpenMemHandle will attempt to enable peer access between the devices as if the user called cudaDeviceEnablePeerAccess. Calling cudaDeviceCanAccessPeer can determine if this mapping is possible.
Calling cudaFree on an exported memory region before calling cudaIpcCloseMemHandle in the importing context will result in undefined behavior.
Memory returned from cudaIpcOpenMemHandle must be freed with cudaIpcCloseMemHandle.
cudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
,
cudaDeviceEnablePeerAccess(int, int)
,
cudaDeviceCanAccessPeer(int[], int, int)
public static int cudaIpcCloseMemHandle(Pointer devPtr)
cudaError_t cudaIpcCloseMemHandle | ( | void * | devPtr | ) |
/brief Close memory mapped with cudaIpcOpenMemHandle
Unmaps memory returnd by cudaIpcOpenMemHandle. The original allocation in the exporting process as well as imported mappings in other processes will be unaffected.
Any resources used to enable peer access will be freed if this is the last mapping using them.
cudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
public static int cudaThreadExit()
cudaError_t cudaThreadExit | ( | void | ) |
Explicitly destroys all cleans up all resources associated with the current device in the current process. Any subsequent API call to this device will reinitialize the device.
Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.
cudaDeviceReset()
public static int cudaThreadSynchronize()
cudaError_t cudaThreadSynchronize | ( | void | ) |
Blocks until the device has completed all preceding requested tasks. cudaThreadSynchronize() returns an error if one of the preceding tasks has failed. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the device has finished its work.
cudaDeviceSynchronize()
public static int cudaThreadSetLimit(int limit, long value)
cudaError_t cudaThreadSetLimit | ( | enum cudaLimit | limit, | |
size_t | value | |||
) |
Setting limit
to value
is a request by the
application to update the current limit maintained by the device. The
driver is free to modify the requested value to meet h/w requirements
(this could be clamping to minimum or maximum values, rounding up to
nearest element size, etc). The application can use cudaThreadGetLimit()
to find out exactly what the limit has been set to.
Setting each cudaLimit has its own specific restrictions, so each is discussed here.
cudaDeviceSetLimit(int, long)
public static int cudaThreadGetCacheConfig(int[] pCacheConfig)
cudaError_t cudaThreadGetCacheConfig | ( | enum cudaFuncCache * | pCacheConfig | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this returns through pCacheConfig
the preferred
cache configuration for the current device. This is only a preference.
The runtime will use the requested configuration if possible, but it
is free to choose a different configuration if required to execute
functions.
This will return a pCacheConfig
of cudaFuncCachePreferNone
on devices where the size of the L1 cache and shared memory are
fixed.
The supported cache configurations are:
cudaDeviceGetCacheConfig(int[])
public static int cudaThreadSetCacheConfig(int cacheConfig)
cudaError_t cudaThreadSetCacheConfig | ( | enum cudaFuncCache | cacheConfig | ) |
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through cacheConfig
the preferred
cache configuration for the current device. This is only a preference.
The runtime will use the requested configuration if possible, but it
is free to choose a different configuration if required to execute the
function. Any function preference set via cudaDeviceSetCacheConfig (C
API) or cudaDeviceSetCacheConfig (C++ API) will be preferred over this
device-wide setting. Setting the device-wide cache configuration to
cudaFuncCachePreferNone will cause subsequent kernel launches to prefer
to not change the cache configuration unless required to launch the
kernel.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cudaDeviceSetCacheConfig(int)
public static int cudaThreadGetLimit(long[] pValue, int limit)
cudaError_t cudaThreadGetLimit | ( | size_t * | pValue, | |
enum cudaLimit | limit | |||
) |
Returns in *pValue
the current size of limit
.
The supported cudaLimit values are:
cudaDeviceGetLimit(long[], int)
public static int cudaGetSymbolAddress(Pointer devPtr, java.lang.String symbol)
cudaError_t cudaGetSymbolAddress | ( | void ** | devPtr, | |
const T & | symbol | |||
) |
Returns in *devPtr
the address of symbol symbol
on the device. symbol
can either be a variable that
resides in global or constant memory space, or it can be a character
string, naming a variable that resides in global or constant memory
space. If symbol
cannot be found, or if symbol
is not declared in the global or constant memory space,
*devPtr
is unchanged and the error cudaErrorInvalidSymbol
is returned. If there are multiple global or constant variables with
the same string name (from separate files) and the lookup is done via
character string, cudaErrorDuplicateVariableName is returned.
cudaGetSymbolAddress(jcuda.Pointer, java.lang.String)
public static int cudaGetSymbolSize(long[] size, java.lang.String symbol)
cudaError_t cudaGetSymbolSize | ( | size_t * | size, | |
const T & | symbol | |||
) |
Returns in *size
the size of symbol symbol
.
symbol
can either be a variable that resides in global or
constant memory space, or it can be a character string, naming a
variable that resides in global or constant memory space. If
symbol
cannot be found, or if symbol
is not
declared in global or constant memory space, *size
is
unchanged and the error cudaErrorInvalidSymbol is returned. If there
are multiple global variables with the same string name (from separate
files) and the lookup is done via character string,
cudaErrorDuplicateVariableName is returned.
cudaGetSymbolAddress(jcuda.Pointer, java.lang.String)
public static int cudaBindTexture(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long size)
cudaError_t cudaBindTexture | ( | size_t * | offset, | |
const struct texture< T, dim, readMode > & | tex, | |||
const void * | devPtr, | |||
const struct cudaChannelFormatDesc & | desc, | |||
size_t | size =
UINT_MAX
|
|||
) |
Binds size
bytes of the memory area pointed to by
devPtr
to texture reference tex
.
desc
describes how the memory is interpreted when fetching
values from the texture. The offset
parameter is an
optional byte offset as with the low-level cudaBindTexture() function.
Any memory previously bound to tex
is unbound.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaBindTexture2D(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long width, long height, long pitch)
cudaError_t cudaBindTexture2D | ( | size_t * | offset, | |
const struct texture< T, dim, readMode > & | tex, | |||
const void * | devPtr, | |||
const struct cudaChannelFormatDesc & | desc, | |||
size_t | width, | |||
size_t | height, | |||
size_t | pitch | |||
) |
Binds the 2D memory area pointed to by devPtr
to the
texture reference tex
. The size of the area is constrained
by width
in texel units, height
in texel
units, and pitch
in byte units. desc
describes
how the memory is interpreted when fetching values from the texture.
Any memory previously bound to tex
is unbound.
Since the hardware enforces an alignment requirement on texture base
addresses, cudaBindTexture2D() returns in *offset
a byte
offset that must be applied to texture fetches in order to read from
the desired memory. This offset must be divided by the texel size and
passed to kernels that read from the texture so they can be applied to
the tex2D() function. If the device memory pointer was returned from
cudaMalloc(), the offset is guaranteed to be 0 and NULL may be passed
as the offset
parameter.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaBindTextureToArray(textureReference texref, cudaArray array, cudaChannelFormatDesc desc)
cudaError_t cudaBindTextureToArray | ( | const struct texture< T, dim, readMode > & | tex, | |
const struct cudaArray * | array, | |||
const struct cudaChannelFormatDesc & | desc | |||
) |
Binds the CUDA array array
to the texture reference
tex
. desc
describes how the memory is
interpreted when fetching values from the texture. Any CUDA array
previously bound to tex
is unbound.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaUnbindTexture(textureReference texref)
cudaError_t cudaUnbindTexture | ( | const struct texture< T, dim, readMode > & | tex | ) |
Unbinds the texture bound to tex
.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetTextureAlignmentOffset(long[] offset, textureReference texref)
cudaError_t cudaGetTextureAlignmentOffset | ( | size_t * | offset, | |
const struct texture< T, dim, readMode > & | tex | |||
) |
Returns in *offset
the offset that was returned when
texture reference tex
was bound.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetTextureReference(textureReference texref, java.lang.String symbol)
cudaError_t cudaGetTextureReference | ( | const struct textureReference ** | texref, | |
const char * | symbol | |||
) |
Returns in *texref
the structure associated to the texture
reference defined by symbol symbol
.
cudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
public static int cudaBindSurfaceToArray(surfaceReference surfref, cudaArray array, cudaChannelFormatDesc desc)
cudaError_t cudaBindSurfaceToArray | ( | const struct surface< T, dim > & | surf, | |
const struct cudaArray * | array, | |||
const struct cudaChannelFormatDesc & | desc | |||
) |
Binds the CUDA array array
to the surface reference
surf
. desc
describes how the memory is
interpreted when dealing with the surface. Any CUDA array previously
bound to surf
is unbound.
cudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
public static int cudaGetSurfaceReference(surfaceReference surfref, java.lang.String symbol)
cudaError_t cudaGetSurfaceReference | ( | const struct surfaceReference ** | surfref, | |
const char * | symbol | |||
) |
Returns in *surfref
the structure associated to the
surface reference defined by symbol symbol
.
cudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
public static int cudaConfigureCall(dim3 gridDim, dim3 blockDim, long sharedMem, cudaStream_t stream)
cudaError_t cudaConfigureCall | ( | dim3 | gridDim, | |
dim3 | blockDim, | |||
size_t | sharedMem = 0 , |
|||
cudaStream_t | stream = 0 |
|||
) |
Specifies the grid and block dimensions for the device call to be executed similar to the execution configuration syntax. cudaConfigureCall() is stack based. Each call pushes data on top of an execution stack. This data contains the dimension for the grid and thread blocks, together with any arguments for the call.
cudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
cudaSetupArgument(jcuda.Pointer, long, long)
public static int cudaSetupArgument(Pointer arg, long size, long offset)
cudaError_t cudaSetupArgument | ( | T | arg, | |
size_t | offset | |||
) |
Pushes size
bytes of the argument pointed to by
arg
at offset
bytes from the start of the
parameter passing area, which starts at offset 0. The arguments are
stored in the top of the execution stack. cudaSetupArgument() must be
preceded by a call to cudaConfigureCall().
cudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
cudaSetupArgument(jcuda.Pointer, long, long)
public static int cudaFuncGetAttributes(cudaFuncAttributes attr, java.lang.String func)
cudaError_t cudaFuncGetAttributes | ( | struct cudaFuncAttributes * | attr, | |
T * | entry | |||
) |
This function obtains the attributes of a function specified via
entry
. The parameter entry
can either be a
pointer to a function that executes on the device, or it can be a
character string specifying the fully-decorated (C++) name of a function
that executes on the device. The parameter specified by entry
must be declared as a __global__
function. The fetched
attributes are placed in attr
. If the specified function
does not exist, then cudaErrorInvalidDeviceFunction is returned.
Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used.
cudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
cudaSetupArgument(jcuda.Pointer, long, long)
public static int cudaLaunch(java.lang.String symbol)
cudaError_t cudaLaunch | ( | T * | entry | ) |
Launches the function entry
on the device. The parameter
entry
can either be a function that executes on the
device, or it can be a character string, naming a function that executes
on the device. The parameter specified by entry
must be
declared as a __global__
function. cudaLaunch() must be
preceded by a call to cudaConfigureCall() since it pops the data that
was pushed by cudaConfigureCall() from the execution stack.
cudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
cudaSetupArgument(jcuda.Pointer, long, long)
,
cudaThreadGetCacheConfig(int[])
,
cudaThreadSetCacheConfig(int)
public static int cudaGLSetGLDevice(int device)
cudaError_t cudaGLSetGLDevice | ( | int | device | ) |
Records the calling thread's current OpenGL context as the OpenGL
context to use for OpenGL interoperability with the CUDA device
device
and sets device
as the current device
for the calling host thread.
If device
has already been initialized then this call will
fail with the error cudaErrorSetOnActiveProcess. In this case it is
necessary to reset device
using cudaDeviceReset() before
OpenGL interoperability on device
may be enabled.
cudaGLRegisterBufferObject(int)
,
cudaGLMapBufferObject(jcuda.Pointer, int)
,
cudaGLUnmapBufferObject(int)
,
cudaGLUnregisterBufferObject(int)
,
cudaGLMapBufferObjectAsync(jcuda.Pointer, int, jcuda.runtime.cudaStream_t)
,
cudaGLUnmapBufferObjectAsync(int, jcuda.runtime.cudaStream_t)
,
cudaDeviceReset()
public static int cudaGLGetDevices(int[] pCudaDeviceCount, int[] pCudaDevices, int cudaDeviceCount, int cudaGLDeviceList_deviceList)
cudaError_t cudaGLGetDevices | ( | unsigned int * | pCudaDeviceCount, | |
int * | pCudaDevices, | |||
unsigned int | cudaDeviceCount, | |||
enum cudaGLDeviceList | deviceList | |||
) |
Returns in *pCudaDeviceCount
the number of CUDA-compatible
devices corresponding to the current OpenGL context. Also returns in
*pCudaDevices
at most cudaDeviceCount
of the
CUDA-compatible devices corresponding to the current OpenGL context.
If any of the GPUs being used by the current OpenGL context are not
CUDA capable then the call will return cudaErrorNoDevice.
cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaGraphicsGLRegisterImage(cudaGraphicsResource resource, int image, int target, int Flags)
cudaError_t cudaGraphicsGLRegisterImage | ( | struct cudaGraphicsResource ** | resource, | |
GLuint | image, | |||
GLenum | target, | |||
unsigned int | flags | |||
) |
Registers the texture or renderbuffer object specified by
image
for access by CUDA. A handle to the registered
object is returned as resource
.
target
must match the type of the object, and must be one
of GL_TEXTURE_2D, GL_TEXTURE_RECTANGLE, GL_TEXTURE_CUBE_MAP,
GL_TEXTURE_3D, GL_TEXTURE_2D_ARRAY, or GL_RENDERBUFFER.
The register flags flags
specify the intended usage, as
follows:
The following image formats are supported. For brevity's sake, the list is abbreviated. For ex., {GL_R, GL_RG} X {8, 16} would expand to the following 4 formats {GL_R8, GL_R16, GL_RG8, GL_RG16} :
The following image classes are currently disallowed:
cudaGLSetGLDevice(int)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
public static int cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource, int buffer, int Flags)
cudaError_t cudaGraphicsGLRegisterBuffer | ( | struct cudaGraphicsResource ** | resource, | |
GLuint | buffer, | |||
unsigned int | flags | |||
) |
Registers the buffer object specified by buffer
for access
by CUDA. A handle to the registered object is returned as
resource
. The register flags flags
specify
the intended usage, as follows:
cudaGLSetGLDevice(int)
,
cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaGLRegisterBufferObject(int bufObj)
cudaError_t cudaGLRegisterBufferObject | ( | GLuint | bufObj | ) |
bufObj
for access by
CUDA. This function must be called before CUDA can map the buffer
object. The OpenGL context used to create the buffer, or another
context from the same share group, must be bound to the current thread
when this is called.
cudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int)
public static int cudaGLMapBufferObject(Pointer devPtr, int bufObj)
cudaError_t cudaGLMapBufferObject | ( | void ** | devPtr, | |
GLuint | bufObj | |||
) |
bufObj
into the address space
of CUDA and returns in *devPtr
the base pointer of the
resulting mapping. The buffer must have previously been registered by
calling cudaGLRegisterBufferObject(). While a buffer is mapped by CUDA,
any OpenGL operation which references the buffer will result in
undefined behavior. The OpenGL context used to create the buffer, or
another context from the same share group, must be bound to the current
thread when this is called.
All streams in the current thread are synchronized with the current GL context.
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGLUnmapBufferObject(int bufObj)
cudaError_t cudaGLUnmapBufferObject | ( | GLuint | bufObj | ) |
bufObj
for access by CUDA.
When a buffer is unmapped, the base address returned by
cudaGLMapBufferObject() is invalid and subsequent references to the
address result in undefined behavior. The OpenGL context used to create
the buffer, or another context from the same share group, must be bound
to the current thread when this is called.
All streams in the current thread are synchronized with the current GL context.
cudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGLUnregisterBufferObject(int bufObj)
cudaError_t cudaGLUnregisterBufferObject | ( | GLuint | bufObj | ) |
bufObj
for access by
CUDA and releases any CUDA resources associated with the buffer. Once
a buffer is unregistered, it may no longer be mapped by CUDA. The GL
context used to create the buffer, or another context from the same
share group, must be bound to the current thread when this is
called.
cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
public static int cudaGLSetBufferObjectMapFlags(int bufObj, int flags)
cudaError_t cudaGLSetBufferObjectMapFlags | ( | GLuint | bufObj, | |
unsigned int | flags | |||
) |
bufObj
Changes to flags will take effect the next time bufObj
is
mapped. The flags
argument may be any of the
following:
If bufObj
has not been registered for use with CUDA, then
cudaErrorInvalidResourceHandle is returned. If bufObj
is
presently mapped for access by CUDA, then cudaErrorUnknown is
returned.
cudaGraphicsResourceSetMapFlags(jcuda.runtime.cudaGraphicsResource, int)
public static int cudaGLMapBufferObjectAsync(Pointer devPtr, int bufObj, cudaStream_t stream)
cudaError_t cudaGLMapBufferObjectAsync | ( | void ** | devPtr, | |
GLuint | bufObj, | |||
cudaStream_t | stream | |||
) |
bufObj
into the address space
of CUDA and returns in *devPtr
the base pointer of the
resulting mapping. The buffer must have previously been registered by
calling cudaGLRegisterBufferObject(). While a buffer is mapped by CUDA,
any OpenGL operation which references the buffer will result in
undefined behavior. The OpenGL context used to create the buffer, or
another context from the same share group, must be bound to the current
thread when this is called.
Stream /p stream is synchronized with the current GL context.
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGLUnmapBufferObjectAsync(int bufObj, cudaStream_t stream)
cudaError_t cudaGLUnmapBufferObjectAsync | ( | GLuint | bufObj, | |
cudaStream_t | stream | |||
) |
bufObj
for access by CUDA.
When a buffer is unmapped, the base address returned by
cudaGLMapBufferObject() is invalid and subsequent references to the
address result in undefined behavior. The OpenGL context used to create
the buffer, or another context from the same share group, must be bound
to the current thread when this is called.
Stream /p stream is synchronized with the current GL context.
cudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaDriverGetVersion(int[] driverVersion)
cudaError_t cudaDriverGetVersion | ( | int * | driverVersion | ) |
Returns in *driverVersion
the version number of the
installed CUDA driver. If no driver is installed, then 0 is returned
as the driver version (via driverVersion
). This function
automatically returns cudaErrorInvalidValue if the
driverVersion
argument is NULL.
cudaRuntimeGetVersion(int[])
public static int cudaRuntimeGetVersion(int[] runtimeVersion)
cudaError_t cudaRuntimeGetVersion | ( | int * | runtimeVersion | ) |
Returns in *runtimeVersion
the version number of the
installed CUDA Runtime. This function automatically returns
cudaErrorInvalidValue if the runtimeVersion
argument is
NULL.
cudaDriverGetVersion(int[])
public static int cudaPointerGetAttributes(cudaPointerAttributes attributes, Pointer ptr)
cudaError_t cudaPointerGetAttributes | ( | struct cudaPointerAttributes * | attributes, | |
void * | ptr | |||
) |
Returns in *attributes
the attributes of the pointer
ptr
.
The cudaPointerAttributes structure is defined as:
struct cudaPointerAttributes { enum cudaMemoryType memoryType; int device; void *devicePointer; void *hostPointer; }
ptr
. It can be cudaMemoryTypeHost for host
memory or cudaMemoryTypeDevice for device memory.
ptr
was allocated.
If ptr
has memory type cudaMemoryTypeDevice then this
identifies the device on which the memory referred to by ptr
physically resides. If ptr
has memory type cudaMemoryTypeHost
then this identifies the device which was current when the allocation
was made (and if that device is deinitialized then this allocation will
vanish with that device's state).
ptr
may be accessed on the current device.
If the memory referred to by ptr
cannot be accessed
directly by the current device then this is NULL.
ptr
may be accessed on the host. If the
memory referred to by ptr
cannot be accessed directly by
the host then this is NULL.
cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaDeviceCanAccessPeer(int[] canAccessPeer, int device, int peerDevice)
cudaError_t cudaDeviceCanAccessPeer | ( | int * | canAccessPeer, | |
int | device, | |||
int | peerDevice | |||
) |
Returns in *canAccessPeer
a value of 1 if device
device
is capable of directly accessing memory from
peerDevice
and 0 otherwise. If direct access of
peerDevice
from device
is possible, then
access may be enabled by calling cudaDeviceEnablePeerAccess().
cudaDeviceEnablePeerAccess(int, int)
,
cudaDeviceDisablePeerAccess(int)
public static int cudaDeviceEnablePeerAccess(int peerDevice, int flags)
cudaError_t cudaDeviceEnablePeerAccess | ( | int | peerDevice, | |
unsigned int | flags | |||
) |
Enables registering memory on peerDevice
for direct access
from the current device. On success, allocations on peerDevice
may be registered for access from the current device using
cudaPeerRegister(). Registering peer memory will be possible until it
is explicitly disabled using cudaDeviceDisablePeerAccess(), or either
the current device or peerDevice
is reset using
cudaDeviceReset().
If both the current device and peerDevice
support unified
addressing then all allocations from peerDevice
will
immediately be accessible by the current device upon success. In this
case, explicitly sharing allocations using cudaPeerRegister() is not
necessary.
Note that access granted by this call is unidirectional and that in
order to access memory on the current device from peerDevice
,
a separate symmetric call to cudaDeviceEnablePeerAccess() is
required.
Returns cudaErrorInvalidDevice if cudaDeviceCanAccessPeer() indicates
that the current device cannot directly access memory from
peerDevice
.
Returns cudaErrorPeerAccessAlreadyEnabled if direct access of
peerDevice
from the current device has already been
enabled.
Returns cudaErrorInvalidValue if flags
is not 0.
cudaDeviceCanAccessPeer(int[], int, int)
,
cudaDeviceDisablePeerAccess(int)
public static int cudaDeviceDisablePeerAccess(int peerDevice)
cudaError_t cudaDeviceDisablePeerAccess | ( | int | peerDevice | ) |
Disables registering memory on peerDevice
for direct
access from the current device. If there are any allocations on
peerDevice
which were registered in the current device
using cudaPeerRegister() then these allocations will be automatically
unregistered.
Returns cudaErrorPeerAccessNotEnabled if direct access to memory on
peerDevice
has not yet been enabled from the current
device.
cudaDeviceCanAccessPeer(int[], int, int)
,
cudaDeviceEnablePeerAccess(int, int)
public static int cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
cudaError_t cudaGraphicsUnregisterResource | ( | cudaGraphicsResource_t | resource | ) |
Unregisters the graphics resource resource
so it is not
accessible by CUDA unless registered again.
If resource
is invalid then cudaErrorInvalidResourceHandle
is returned.
cudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsGLRegisterImage(jcuda.runtime.cudaGraphicsResource, int, int, int)
public static int cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource, int flags)
cudaError_t cudaGraphicsResourceSetMapFlags | ( | cudaGraphicsResource_t | resource, | |
unsigned int | flags | |||
) |
Set flags
for mapping the graphics resource
resource
.
Changes to flags
will take effect the next time
resource
is mapped. The flags
argument may
be any of the following:
resource
will be used. It is therefore assumed that CUDA
may read from or write to
resource
.
resource
.
resource
and will write
over the entire contents of resource
, so none of the data
previously stored in resource
will be
preserved.
If resource
is presently mapped for access by CUDA then
cudaErrorUnknown is returned. If flags
is not one of the
above values then cudaErrorInvalidValue is returned.
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGraphicsMapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
cudaError_t cudaGraphicsMapResources | ( | int | count, | |
cudaGraphicsResource_t * | resources, | |||
cudaStream_t | stream = 0 |
|||
) |
Maps the count
graphics resources in resources
for access by CUDA.
The resources in resources
may be accessed by CUDA until
they are unmapped. The graphics API from which resources
were registered should not access any resources while they are mapped
by CUDA. If an application does so, the results are undefined.
This function provides the synchronization guarantee that any graphics
calls issued before cudaGraphicsMapResources() will complete before
any subsequent CUDA work issued in stream
begins.
If resources
contains any duplicate entries then
cudaErrorInvalidResourceHandle is returned. If any of
resources
are presently mapped for access by CUDA then
cudaErrorUnknown is returned.
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaGraphicsUnmapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
cudaError_t cudaGraphicsUnmapResources | ( | int | count, | |
cudaGraphicsResource_t * | resources, | |||
cudaStream_t | stream = 0 |
|||
) |
Unmaps the count
graphics resources in
resources
.
Once unmapped, the resources in resources
may not be
accessed by CUDA until they are mapped again.
This function provides the synchronization guarantee that any CUDA work
issued in stream
before cudaGraphicsUnmapResources() will
complete before any subsequently issued graphics work begins.
If resources
contains any duplicate entries then
cudaErrorInvalidResourceHandle is returned. If any of
resources
are not presently mapped for access by Cuda then
cudaErrorUnknown is returned.
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGraphicsResourceGetMappedPointer(Pointer devPtr, long[] size, cudaGraphicsResource resource)
cudaError_t cudaGraphicsResourceGetMappedPointer | ( | void ** | devPtr, | |
size_t * | size, | |||
cudaGraphicsResource_t | resource | |||
) |
Returns in *devPtr
a pointer through which the mapped
graphics resource resource
may be accessed. Returns in
*size
the size of the memory in bytes which may be accessed
from that pointer. The value set in devPtr
may change
every time that resource
is mapped.
If resource
is not a buffer then it cannot be accessed
via a pointer and cudaErrorUnknown is returned. If resource
is not mapped then cudaErrorUnknown is returned. *
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
public static int cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr, cudaGraphicsResource resource, int arrayIndex, int mipLevel)
cudaError_t cudaGraphicsSubResourceGetMappedArray | ( | struct cudaArray ** | array, | |
cudaGraphicsResource_t | resource, | |||
unsigned int | arrayIndex, | |||
unsigned int | mipLevel | |||
) |
Returns in *array
an array through which the subresource
of the mapped graphics resource resource
which corresponds
to array index arrayIndex
and mipmap level
mipLevel
may be accessed. The value set in array
may change every time that resource
is mapped.
If resource
is not a texture then it cannot be accessed
via an array and cudaErrorUnknown is returned. If arrayIndex
is not a valid array index for resource
then
cudaErrorInvalidValue is returned. If mipLevel
is not a
valid mipmap level for resource
then cudaErrorInvalidValue
is returned. If resource
is not mapped then cudaErrorUnknown
is returned.
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaProfilerInitialize(java.lang.String configFile, java.lang.String outputFile, int outputMode)
cudaError_t cudaProfilerInitialize | ( | const char * | configFile, | |
const char * | outputFile, | |||
cudaOutputMode_t | outputMode | |||
) |
Using this API user can specify the configuration file, output file
and output file format. This API is generally used to profile different
set of counters by looping the kernel launch. configFile
parameter can be used to select profiling options including profiler
counters. Refer the "Command Line Profiler" section in the "Compute
Visual Profiler User Guide" for supported profiler options and
counters.
Configurations defined initially by environment variable settings are overwritten by cudaProfilerInitialize().
Limitation: Profiling APIs do not work when the application is running with any profiler tool such as Compute Visual Profiler. User must handle error cudaErrorProfilerDisabled returned by profiler APIs if application is likely to be used with any profiler tool.
Typical usage of the profiling APIs is as follows:
for each set of counters
{
cudaProfilerInitialize(); //Initialize profiling,set the counters/options
in the config file
...
cudaProfilerStart();
// code to be profiled
cudaProfilerStop();
...
cudaProfilerStart();
// code to be profiled
cudaProfilerStop();
...
}
cudaProfilerStart()
,
cudaProfilerStop()
public static int cudaProfilerStart()
cudaError_t cudaProfilerStart | ( | void | ) |
This API is used in conjunction with cudaProfilerStop to selectively profile subsets of the CUDA program. Profiler must be initialized using cudaProfilerInitialize() before making a call to cudaProfilerStart(). API returns an error cudaErrorProfilerNotInitialized if it is called without initializing profiler.
cudaProfilerInitialize(java.lang.String, java.lang.String, int)
,
cudaProfilerStop()
public static int cudaProfilerStop()
cudaError_t cudaProfilerStop | ( | void | ) |
This API is used in conjunction with cudaProfilerStart to selectively profile subsets of the CUDA program. Profiler must be initialized using cudaProfilerInitialize() before making a call to cudaProfilerStop().API returns an error cudaErrorProfilerNotInitialized if it is called without initializing profiler.
cudaProfilerInitialize(java.lang.String, java.lang.String, int)
,
cudaProfilerStart()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |