OpenCL GPU Utilities

Utility functions concerning GPU programming.

syris.gpu.util.are_images_supported()

Is the INTENSITY|FLOAT image format supported?

syris.gpu.util.cache(mem, shape, dtype, cache_type=1)

Cache a device memory object mem with shape and numpy data type dtype on host or device based on cache_type.

syris.gpu.util.execute_profiled(function)

Execute a function which can be an OpenCL kernel or other OpenCL related function and profile it.

syris.gpu.util.get_all_varconvolutions()

Create all variable convolutions.

syris.gpu.util.get_array(data, queue=None)

Get pyopencl.array.Array from data which can be a numpy array, a pyopencl.array.Array or a pyopencl.Image. queue is an OpenCL command queue.

syris.gpu.util.get_cache(buf)

Get a device memory object from cache buf, which can reside either on host or on device.

syris.gpu.util.get_command_queues(context, devices=None, queue_kwargs=None)

Create command queues for each of the devices within a specified context. If devices is None, they are obtained from context. queue_kwargs are passed to the CommandQueue constructor.

syris.gpu.util.get_cpu_platform()

Get any platform with CPUs.

syris.gpu.util.get_cuda_platform()

Get the NVIDIA CUDA platform if any.

syris.gpu.util.get_event_duration(event, start=4738, stop=4739)

Get OpenCL event duration. start and stop define the OpenCL timer start and stop.

syris.gpu.util.get_gpu_platform()

Get any platform with GPUs.

syris.gpu.util.get_host(data, queue=None)

Get data as numpy.ndarray.

syris.gpu.util.get_image(data, access=4, queue=None)

Get pyopencl.Image from data which can be a numpy array, a pyopencl.array.Array or a pyopencl.Image. The image channel order is pyopencl.channel_order.INTENSITY and channel_type is pyopencl.channel_type.FLOAT. access is either pyopencl.mem_flags.READ_ONLY or pyopencl.mem_flags.WRITE_ONLY. queue is an OpenCL command queue.

syris.gpu.util.get_intel_platform()

Get the Intel platform if any.

syris.gpu.util.get_metaobjects_source()

Get source string for metaobjects creation.

syris.gpu.util.get_platform(name)

Get the first OpenCL platform which contains name as its substring.

syris.gpu.util.get_platform_by_device_type(device_type)

Get platform with specific device type (CPU, GPU, …).

syris.gpu.util.get_precision_header()

Return single or double precision vfloat definitions header.

syris.gpu.util.get_program(src)

Create and build an OpenCL program from source string src.

syris.gpu.util.get_source(file_names, precision_sensitive=True)

Get source by concatenating files from file_names list and apply single or double precision parametrization if precision_sensitive is True.

syris.gpu.util.get_varconvolution_source(name, header='', inputs='', init='', compute_outer='', compute_inner='weight = 1.0;', after='', cplx=False, only_kernel=False)

Create a shift dependent convolution kernel function with name. header is an OpenCL code which is placed in the front of the source before the kernel function. inputs are additional kernel inputs (see opencl/varconvolution.in for the fixed ones), init is the kernel initialization code, compute_outer is called at every iteration of the outer (y) loop, compute_inner is called in the inner (x) loop. after is the code after both loops. If cplx is True, the complex version of the kernel is used. Pseudo-code of the OpenCL source for the noncomplex version will look like this:

*header*

kernel void *name* (read_only image2d_t input,
                    global vfloat *output,
                    const sampler_t sampler,
                    int2 window, *inputs*)
{
    int idx = get_global_id (0);
    int idy = get_global_id (1);
    int width = get_global_size (0);
    int i, j, tx, ty, imx, imy;
    vfloat value, weight, result = 0.0;

    *init*

    for (j = 0; j < window.y; j++) {
        ty = window.y - j - 1;
        imy = idy + j - window.y / 2;
        *compute_outer*
        for (i = 0; i < window.x; i++) {
            imx = idx + i - window.x / 2;
            value = read_imagef (input, sampler, (int2)(imx, imy)).x;
            tx = window.x - i - 1;
            *compute_inner*
            result += value * weight;
        }
    }

    *after*

    output[idy * width + idx] = result;
}

The complex version uses two inputs, input_real and input_imag which are also image2d_t instances. compute_inner must set the weight variable in order to apply the convolution kernel weight.

syris.gpu.util.get_varconvolve_disk(normalized=True, smooth=True, only_kernel=False)

Create variable circlular kernel convolution, kernel sum is 1 if normalized is True, if smooth is True smooth out sharp edges of the disk. If only_kernel is True only the kernel is returned.

syris.gpu.util.get_varconvolve_gauss(normalized=True, window_fwnm=1000, only_kernel=False)

Create variable Gaussian convolution. The kernel sum is 1 if normalized is True, window is computed automatically for every x, y position in the original image based on the sigma at x, y and window_fwnm as 2 * sqrt(2 * log(window_fwnm)) * sigma. If only_kernel is True only the kernel is returned.

syris.gpu.util.get_varconvolve_propagator(only_kernel=False)

Create the variable propagator convolution. If only_kernel is True only the kernel is returned.

syris.gpu.util.init_programs()

Initialize all OpenCL kernels needed by syris.

syris.gpu.util.make_opencl_defaults(platform_name=None, device_type=None, device_index=None, profiling=True)

Create default OpenCL context from platform_name and a command queue based on device_index to the devices list. If None, all devices are used in the context. If platform_name is not specified and device_type is, get a platform which has devices of that type. If profiling is True enable it.

syris.gpu.util.make_vcomplex(value)

Make complex value for OpenCL based on the set floating point precision.

syris.gpu.util.qmap(func, items, queues=None, args=(), kwargs=None)

Apply func to items on multiple command queues. The function func should block until the execution on a device is finished, otherwise the command queue which is assigned to it might return to the pool of usable resources too soon and stall execution. Consider using another mechanism if func is a kernel, i.e. the multi gpu execution might be realized without threading, which is used here. func is a callable with signature func(item, queue, args, **kwargs) where item is an item to be processed and queue is the OpenCL command queue to be used. *queues are the command queues to be used for computation, if not specified, all the default ones are used. args is a list and kwargs a dictionary, both passed to func.