Table Comparing Syntax for Different Compute APIs¶

Term	CUDA	HIP	HC	C++AMP	OpenCL
Device	int deviceId	int deviceId	hc::accelerator	concurrency::accelerator	cl_device
Queue	cudaStream_t	hipStream_t	hc::accelerator_view	concurrency::accelerator_view	cl_command_queue
Event	cudaEvent_t	hipEvent_t	hc::completion_future	concurrency::completion_future	cl_event
Memory	void *	void *	void * ; hc::array ; hc::array_view	concurrency::array ; concurrency::array_view	cl_mem
	grid	grid	extent	extent	NDRange
	block	block	tile	tile	work-group
	thread	thread	thread	thread	work-item
	warp	warp	wavefront	N/A	sub-group
Thread-index	threadIdx.x	hipThreadIdx_x	t_idx.local[0]	t_idx.local[0]	get_local_id(0)
Block-index	blockIdx.x	hipBlockIdx_x	t_idx.tile[0]	t_idx.tile[0]	get_group_id(0)
Block-dim	blockDim.x	hipBlockDim_x	t_ext.tile_dim[0]	t_idx.tile_dim0	get_local_size(0)
Grid-dim	gridDim.x	hipGridDim_x	t_ext[0]	t_ext[0]	get_global_size(0)
Device Kernel	__global__	__global__	lambda inside hc::parallel_for_each or [[hc]]	restrict(amp)	__kernel
Device Function	__device__	__device__	[[hc]] (detected automatically in many case)	restrict(amp)	Implied in device compilation
Host Function	__host_ (default)	__host_ (default)	[[cpu]] (default)	restrict(cpu) (default)	Implied in host compilation.
Host + Device Function	__host__ __device__	__host__ __device__	[[hc]] [[cpu]]	restrict(amp,cpu)	No equivalent
Kernel Launch	<<< >>>	hipLaunchKernel	hc::parallel_for_each	concurrency::parallel_for_each	clEnqueueNDRangeKernel
Global Memory	__global__	__global__	Unnecessary / Implied	Unnecessary / Implied	__global__
Group Memory	__shared__	__shared__	tile_static	tile_static	__local
Constant	__constant__	__constant__	Unnecessary / Implied	Unnecessary / Implied	__constant
	__syncthreads	__syncthreads	tile_static.barrier()	t_idx.barrier()	barrier(CLK_LOCAL_MEMFENCE)
Atomic Builtins	atomicAdd	atomicAdd	hc::atomic_fetch_add	concurrency::atomic_fetch_add	atomic_add
Precise Math	cos(f)	cos(f)	hc::precise_math::cos(f)	concurrency::precise_math::cos(f)	cos(f)
Fast Math	__cos(f)	__cos(f)	hc::fast_math::cos(f)	concurrency::fast_math::cos(f)	native_cos(f)
Vector	float4	float4	hc::short_vector::float4	concurrency::graphics::float_4	float4

Notes¶

For HC and C++AMP, assume a captured tiled_ext named “t_ext” and captured extent named “ext”. These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.
The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.
HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time. Thus hc syntax for tile dims is t_ext.tile_dim[0] while C++AMP is t_ext.tile_dim0.