VP Research and Development, Codeplay Software
Michael Wong is the Vice President of Research and Development at Codeplay Software, Director and VP of ISOCPP.org, a senior member of the C++ Standards Committee with 15 years of experience and Vice-Chair of Programming Languages for Canada’s Standard Council. He is the Head of Delegation for Canada to the C++ Standard Committee and the past CEO of OpenMP. Previously Michael was the Senior Technical Strategy Architect for IBM compilers.
Michael chairs the WG21 SG5 Transactional Memory and SG14 Games Development/Low Latency/Financials, and is the co-author of a number of C++/OpenMP/Transactional Memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. He has published numerous research papers and is the author of a book on C++11, and he is the current Editor for the Concurrency TS and the Transactional Memory TS.
Michael has been invited to speak and deliver keynotes at many conferences, universities, companies and research institutions including ACCU, C++Now, Meeting C++, ADC++, CASCON, Bloomberg, Activision/Blizzard, and CERN.
Massive Parallel Dispatch for Heterogeneous Computing in C++ for Self-Driving Cars
As Chair of C++ Standard’s SG14 where the gamers, financial traders, and embedded device programmers have been demanding a heterogeneous programming model, I have been studying programming models that can show us learning experience that enables a future ISO C++ to support heterogeneous devices. The number is actually numerous. My search has brought me through SYCL, HPX, Agency, HCC, OpenMP, OpenACC, OpenCL, C++ AMP, Halide, CUDA, Kokkos, Raja and many others. Yet as performance and power-efficiency become the holy grail of modern C++ applications, the hardware solutions that deliver them differ greatly in architecture decisions and designs. The combination of CPUs, GPUs, FPGAs and custom domain specific hardware is gaining a lot of momentum. In view of this, C++ programming techniques and features are changing as well. Modern C++ standards are enabling more and more parallelism and heterogeneity in the library and language features. This talk will compare many of the most popular model in terms of their memory model, data movement, and execution abstraction.
This talk will cover the emerging C++ parallelism and concurrency technical specifications, some of which is in C++17, while the rest in coming in C++20/23. We will discuss how executors is a way to unify execution resources and parallel control constructs. Executors enable us to demonstrate how future heterogeneous computing can be added to ISO C++ and one such model is Khronos’ SYCL and SYCL ParallelSTL that enable C++ libraries to be extended using standard C++ language features and libraries. SYCL is a system that provides the building blocks for building such C++ libraries, where the gap between the hardware agnostic C++ features and the C++ abstractions of the hardware features can be bridged. SYCL has also been released as a free to download Community Edition called ComputeCPP to help you build higher abstractions for neural network, and machine vision, all leading to the ability to program self-driving cars. The Sunday Mastercourse will offer a hands-on demonstration of SYCL and Parallel STL, as well as further description of Concurrency.
October 30, 2016
In English with translation to Russian
Requires separate registration
The Khronos™ Group maintains the OpenCL™ and SYCL™ standards, both designed to offer dispatch using C and C++ to heterogeneous devices such as GPUs, integrated CPUs, DSPs and even FPGAs. The C++ Standard is also building towards similar support starting with the C++17 Parallel and Concurrency Technical Specifications, but currently is restricted to CPUs. With the help of early design experience from SYCL, and other heterogeneous computing models, it will enable a single high-level performance-portable programming standard for programming autonomous vehicles, computer vision, and neural networks.
SYCL is already able to disptach to heterogeneous devices and it implements C++17 ParallelSTL augmenting it with ability to dispatch to GPUs in addition to CPUs. This workshop will demonstrate how to write parallel SYCL code and how to use the Khronos Group’s experimental Parallel STL implementation. The course outline is as follows
- start with a basic Hello World-style program that shows how to submit queues in a single task and stream-like object, comparing CPU, SYCL and OpenCL version
- demonstrate how to access data across host and GPUs using buffers and accessors, the importance of life-time, and basic parallel constructs
- use advanced techniques such as aussian blur to handle images, that can lead to computer vision useful for pedestrian detection
- demonstrate C++17 parallel algorithms using SYCL’s parallelSTL implementation that can dispatch to not just CPU, but heterogeneous devices
Along with the hands-on workshop, we will also deliver lectures that describe SYCL and OpenCL as well as C++ Parallelism and Concurrency Technical Specifications, including C++17’s Parallel STL, C++11 async, futures, atomics, and emerging specifications on continuation-style programming, latches, barriers, atomic shared_ptr, as well as lock-free programming toolkits.
Attendees are expected to have programming experience with C++ and a laptop. The suitable software will be provided on USB-sticks. This course is suitable for beginners, but is focused on intermediate to advanced parallel programming using C++.