I've mentioned several times that the Predictive Vision Model (PVM) is not expressible in any of the current deep learning frameworks such as TensorFlow or Caffe (not easily or for that matter efficiently at least). This is due to the inherent feedback and multi scale structure. PVM is not an end to end trained system, it is a collection of intertwined sequence learners. That being said, I'm currently working in my free time to bring PVM to the GPU.
I'm not the most experienced person in the GPU programming domain, but I can definitely write a kernel and use Nvidia profiler. So far my results look very encouraging: I can train more than 210 million float32 parameters at 21 fps with Nvidia Titan X based on the Pascal architecture. In other words that is 4.4 billion trained float32 parameters per second. This training performance matches that of deep learning models, where e.g. RESNET-50 with ~25 million parameters can be trained at approximately 100-150 samples/s (single GPU). In fact my GPU utilisation is now close to 97% with most kernels. To some degree I feel that PVM will be even better suited for GPU implementation than end-to-end deep learning because of it's uniform structure. I can also foresee a multi-GPU version quite clearly with relatively straightforward scaling.
My current implementation is fairly basic distributed sequence to sequence learner (large collection of relatively small MLP's), but from that there is a relatively short path to full blown PVM (just a lot of laborious arranging of memory) . I'm excited about this, since this will allow me to train models at least an order of magnitude larger and more sophisticated (or equivalently train models similar to what can be trained now at 10x speed). Although in order to get any close to a human (or any higher animal) brain scale one would need to boost this performance by another 2-3 orders of magnitude, I'm confident this will soon bring very exciting new results.