broadcast only works from top row of 64-core chips some things just work on rows 2-3 . . . silicon errata (same for 16-core chips) if want to have high bw streaming from epiphany to e-link, need streaming core on edge of array can't access CPU regs or interrupt regs while doing a DMA because it goes out to the router and back in (shared buffer/gates) can't know that there is a DMA in progress w/o reading the DMA status register -- catch 22! fix: write to a local mem location just before starting DMA and have interrupt routine to clear it -- that should restrict to just local trx Anders will do ubenches to test this re-read systolic array stuff can interrupts be broadcast/multicast? only for first row . . . only top leftmost (core 0,0) can do multicast Mary: try programming FFT? sorting? (look at Adapteva's FFT and radix sort) Anders: they found performance bugs (irratic numbers) due to external noise testing weak memory models on GPUs (get refs from Mary) systematic way to find problems no streaming examples out there -- could do many FFTs in a row to test streaming . . . convolution . . . Erik Ryman (industrial PhD Omnisys) has spent much time on placement (full custom chips) -- has journal paper accepted, so can get that from him