profile_picture
Cedric Nugteren's home
Current job:
Machine learning
software engineer
Location:
Amsterdam, NL
Year of birth:
1986
Contact:

Video presentations

Demo of the world’s fastest inference engine for Arm Cortex-M
ARM AI Tech Talk Mar 11, 2022 Link to video
Demoing the world’s fastest inference engine for Arm Cortex-M
TinyML talks Jan 4, 2022 Link to video

Handouts of talks

CLBlast: A Tuned BLAS Library
IWOCL '18, Oxford, UK May 16, 2018 Link to program  and  Handouts
CLBlast: A Tuned BLAS Library for Faster Deep Learning
GTC '17, San Jose, CA May 11, 2017 Link to program  and  Handouts
GPU Programming 101
C++ Meetup, Amsterdam, NL August 25, 2016 Link to program  and  Handouts
Better Than All the Rest: Finding Max-Performance GPU Kernels Using Auto-Tuning
GTC '16, San Jose, CA April 7, 2016 Link to program  and  Handouts
CLTune: A Generic Auto-Tuner for OpenCL Kernels
MCSoC '15, Torino, Italy September 24, 2015 Link to program  and  Handouts
A Study of the Potential of Locality-Aware Thread Scheduling for GPUs
MuCoCoS '14, Porto, Portugal August 26, 2014 Link to program  and  Handouts
A Detailed GPU Cache Model Based on Reuse Distance Theory
HPCA '14, Orlando, US February, 2014 Link to program  and  Handouts
Algorithmic Species Revisited: A Program Code Classification Based on Array References
MuCoCoS '13, Edinburgh, UK September 7, 2013 Link to program  and  Handouts
Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification
APPT '13, Stockholm, Sweden August 28, 2013 Link to program  and  Handouts

Posters

Auto-Tuning OpenCL Matrix-Multiplication: K40 versus K80
GTC '15, San Jose, CA March 16, 2015 Link to program  and  Poster PDF