Table Comparing Syntax for Different Compute APIs

Term

CUDA

HIP

HC

C++AMP

OpenCL

Device

int deviceId

int deviceId

hc::accelerator

concurrency::accelerator

cl_device

Queue

cudaStream_t

hipStream_t

hc::accelerator_view

concurrency::accelerator_view

cl_command_queue

Event

cudaEvent_t

hipEvent_t

hc::completion_future

concurrency::completion_future

cl_event

Memory

void *

void *

void * ; hc::array ; hc::array_view

concurrency::array ; concurrency::array_view

cl_mem

grid

grid

extent

extent

NDRange

block

block

tile

tile

work-group

thread

thread

thread

thread

work-item

warp

warp

wavefront

N/A

sub-group

Thread-index

threadIdx.x

hipThreadIdx_x

t_idx.local[0]

t_idx.local[0]

get_local_id(0)

Block-index

blockIdx.x

hipBlockIdx_x

t_idx.tile[0]

t_idx.tile[0]

get_group_id(0)

Block-dim

blockDim.x

hipBlockDim_x

t_ext.tile_dim[0]

t_idx.tile_dim0

get_local_size(0)

Grid-dim

gridDim.x

hipGridDim_x

t_ext[0]

t_ext[0]

get_global_size(0)

Device Kernel

__global__

__global__

lambda inside hc::parallel_for_each or [[hc]]

restrict(amp)

__kernel

Device Function

__device__

__device__

[[hc]] (detected automatically in many case)

restrict(amp)

Implied in device compilation

Host Function

__host_ (default)

__host_ (default)

[[cpu]] (default)

restrict(cpu) (default)

Implied in host compilation.

Host + Device Function

__host__ __device__

__host__ __device__

[[hc]] [[cpu]]

restrict(amp,cpu)

No equivalent

Kernel Launch

<<< >>>

hipLaunchKernel

hc::parallel_for_each

concurrency::parallel_for_each

clEnqueueNDRangeKernel

Global Memory

__global__

__global__

Unnecessary / Implied

Unnecessary / Implied

__global__

Group Memory

__shared__

__shared__

tile_static

tile_static

__local

Constant

__constant__

__constant__

Unnecessary / Implied

Unnecessary / Implied

__constant

__syncthreads

__syncthreads

tile_static.barrier()

t_idx.barrier()

barrier(CLK_LOCAL_MEMFENCE)

Atomic Builtins

atomicAdd

atomicAdd

hc::atomic_fetch_add

concurrency::atomic_fetch_add

atomic_add

Precise Math

cos(f)

cos(f)

hc::precise_math::cos(f)

concurrency::precise_math::cos(f)

cos(f)

Fast Math

__cos(f)

__cos(f)

hc::fast_math::cos(f)

concurrency::fast_math::cos(f)

native_cos(f)

Vector

float4

float4

hc::short_vector::float4

concurrency::graphics::float_4

float4

Notes

  1. For HC and C++AMP, assume a captured tiled_ext named “t_ext” and captured extent named “ext”. These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.

  2. The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.

  3. HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time. Thus hc syntax for tile dims is t_ext.tile_dim[0] while C++AMP is t_ext.tile_dim0.