Microprocessor Design Clinic
Machine learning (AI) is an exciting new field. Many systems are now being made adaptive to
contend with data, inputs and situationsthat have not been encountered during design and
A critical component inmachine learning algorithmsismatrix multiplications. Matrix multiplications
can be very time-consuming and taxing for the CPU, where each data is loaded from memory and
processed in the ALU.
Forthis design project, you are asked to design a Matrix Multiplication accelerator and interface it
to the RISC-V CPU.
In groups of three (you and 2 of your fellow students), please undertake the following design
Presentation: Week of the 12th of June (Please contact Tutor and Lecturer) to organise a mutually
Report Due Date: June 16th, 5pm
1. The accelerator must be able to multiply 2 fixed-size matrices where each matrix is(32x32)
(ie 1024) elements large.
2. The elements in the input matrices are of type floating point, IEEE 754 compliant, numbers.
3. The output matrix elements are to be floating point values, IEEE 754 compliant.
4. The input matrices input to the accelerator will need to be preserved (must not
5. The matrices values are to be held in the main memory.
6. The CPU needs to configure the accelerator for the matrix multiplication it is requested to
7. The accelerator may have its own memory management (e.g., register or buffer) to load and
store the data operands and outputs. Ideally, memory transfers should occur
independently of the CPU (hint DMA).
8. The accelerator may have its own internal memory, but total internal memory
should be limited.
9. You should design new R-Format instructions for the CPU to configure and interact with the
accelerator and supportthe functionality of your accelerator.
10. The accelerator should indicate to the CPU that its calculation has finished. The
mechanism by which this occurs is your design decision.
11. The accelerator should let the processor know if the calculation has completed
correctly and the result can be depended upon or some errors have occurred.
12. Provide test code (using RoCC macros to include your functions in custom instructions) that
showsthe performance speed up that your accelerator achievesrelative to using the base
1. Modify your design to support the arbitrary size matrices. The size of the matrices is to be
determined and / orspecified atrun-time and can change at each operation.
Presentation: (50%) (Group Presentation, Conducted in Question and Answer Format)
During the presentation, you will need to justify your design decisions and your choices for your
implementation. You will need to also present performance results and testing you have undertaken
to verify the correct performance of your design.
Please provide a detailed report outlining your design. The report should include:
1. Adescription and information supporting your design decisions. Illustrate all
the functional blocks of your design and their interaction.
2. Your design schematics, data flow and behavior of the accelerator
3. Test code that exercises / tests your design
4. Performance Analysis of your test design
This is a design project. There is no single correct answer. Many answers are correct and
valid. The process is one of trading-off various aspects of the design components.
However, at the completion of your project, the accelerator should be able to correctly
multiply the two matrices.
1. Implement an IEEE 754 floating-point multiplier and accumulator.
2. Define how data for your input and output matrices are to be stored in memory
addressable by your CPU.
3. Consider how you would transfer the data to and from your accelerator without
requiring the CPU to perform the transfer (ie DMA) and when can the calculation
4. Define how your accelerator is going to interact with the rest of the processor
microarchitecture (ie design the accelerator interface)
5. Consider the errors that may occur. How are these errors going to be reported? How
is the CPU going to know that the end of the calculation has been completed
6. Algorithm: choose an algorithmfor yourmatrixmultiplier
7. Custom Instruction: indicate your input and output matrices using source registers (rs1 and
rs2) and destination register(rd); list all of functionalitiesfor your accelerator by using
8. Block diagram: Draw a block diagram to include important modules (registers, FSM,
buffers…) and signals (data, handshake, controlling…)
9. FSM: design an FSM for your data flow with states, input signals, state values…
10. Programming: implement your accelerator in chisel language based on your block diagram
and FSM (refer to workshop 4)
11. Testing:write a test programin c (include yourmacros)to testthe performance
12. Provide the test code so that we can input 2 matrices and validate the
Microprocessor Design Clinic