the entire raw image is already present in memory
Would a viable path to generate this real-time preview be that a software/driver running on the ARM core reads pixel data just from every 8th pixel address and wraps it into a new image/device? Then no FPGA processing would be required at all and the load on the CPU is also minimal as no image resampling/processing is required.