Software Pipelining on the new Mill CPU (Talk July 14 2014)
Posted: Tue Jul 08, 2014 3:07 pm
Ivan Godard, CTO of Mill Computing, Inc., will be giving a talk at Facebook.
The particulars:
Monday, July 14, 2014
Doors open at 10:30 AM, Talk is from 11 AM to 12:30 PM
1 Hacker Way, Bldg 10
Menlo Park, CA 94025
Enter via the left lane of the Willow Road entrance. There is visitor parking along the front of Building 10, but if all of those are taken there is overflow parking across the street. Guests should come to Building 10 and sign in. They will then be escorted to Room 11.2. Our hosts at Facebook are Edwin Smith and Jason Evans.
This will be the ninth topic publicly presented on the Mill general-purpose CPU architecture. It will cover the methods used to perform software pipelining on the Mill Architecture. The talk will assume some general familiarity with software pipelining.
Software pipelining on the Mill CPU:
Instant pipeline: add loop, no stirring needed
The Mill CPU architecture is very wide, able to issue and execute 30+independent MIMD operations per cycle. Non-looping open code often cannot use this raw compute capacity, but fortunately >80% of cycles are in loops. Loops potentially have unbounded instruction-level parallelism and can absorb all the capacity available – if the loop can be pipelined.
This talk addresses how loops are pipelined on the Mill architecture. On a conventional machine, pipelining requires lengthy prelude and postlude instruction sequences to get the pipeline started and wound down, frequently destroying the benefit of pipelining the main body. Conventional pipelining can be of negative benefit on short loops, especially “while” type loops whose length is unknown and data dependent. Not so on a Mill: Mill pipelines have neither prelude nor postlude, and early conditional exit has no added cost.
Pipelines on conventional machines also have problems with loop-carried data, values produced by one iteration but consumed by another. Conventional code must resort to bucket-brigade register copies, or fail to pipeline altogether. Even architectures like the Itanium, which have special hardware to support pipelining, provide it only for the innermost loop. In contrast, the Mill needs no copies and can pipeline outer as well as inner loops.
Familiarity with prior talks in this series, especially the Belt and Metadata talks will be helpful but not essential.
Videos and other material about other aspects of the Mill can be found at http://millcomputing.com/docs.
Previous mentions of the Mill CPU on this forum here and here.
The particulars:
Monday, July 14, 2014
Doors open at 10:30 AM, Talk is from 11 AM to 12:30 PM
1 Hacker Way, Bldg 10
Menlo Park, CA 94025
Enter via the left lane of the Willow Road entrance. There is visitor parking along the front of Building 10, but if all of those are taken there is overflow parking across the street. Guests should come to Building 10 and sign in. They will then be escorted to Room 11.2. Our hosts at Facebook are Edwin Smith and Jason Evans.
This will be the ninth topic publicly presented on the Mill general-purpose CPU architecture. It will cover the methods used to perform software pipelining on the Mill Architecture. The talk will assume some general familiarity with software pipelining.
Software pipelining on the Mill CPU:
Instant pipeline: add loop, no stirring needed
The Mill CPU architecture is very wide, able to issue and execute 30+independent MIMD operations per cycle. Non-looping open code often cannot use this raw compute capacity, but fortunately >80% of cycles are in loops. Loops potentially have unbounded instruction-level parallelism and can absorb all the capacity available – if the loop can be pipelined.
This talk addresses how loops are pipelined on the Mill architecture. On a conventional machine, pipelining requires lengthy prelude and postlude instruction sequences to get the pipeline started and wound down, frequently destroying the benefit of pipelining the main body. Conventional pipelining can be of negative benefit on short loops, especially “while” type loops whose length is unknown and data dependent. Not so on a Mill: Mill pipelines have neither prelude nor postlude, and early conditional exit has no added cost.
Pipelines on conventional machines also have problems with loop-carried data, values produced by one iteration but consumed by another. Conventional code must resort to bucket-brigade register copies, or fail to pipeline altogether. Even architectures like the Itanium, which have special hardware to support pipelining, provide it only for the innermost loop. In contrast, the Mill needs no copies and can pipeline outer as well as inner loops.
Familiarity with prior talks in this series, especially the Belt and Metadata talks will be helpful but not essential.
Videos and other material about other aspects of the Mill can be found at http://millcomputing.com/docs.
Previous mentions of the Mill CPU on this forum here and here.