embedded.fi are objects that include control information about the representation and the attached behaviour (such as overflow). The overhead for a single such number is a fair bit. However when you have arrays of them then each additional entry would not take up much space.
Fixed point numbers that do not happen to duplicate a hardware numeric format are going to be slower to calculate with on systems that have built-in floating-point hardware. They are best suited for a few cases:
- extended arithmetic beyond 53 bits of precision
- hardware such as vhdl or embedded processors that do not have appropriate floating point hardware
- cases where the attachable behaviour (such as overflow) are important to do differently than the default (there are systems where it is more important that the behaviour be exactly specified rather than as fast as possible)
If you are working on a host system rather than an embedded system, perhaps using a gpu would be better than worrying about memory transfer details.