Function level parallelism driven by data dependencies

Papers/Multi_Core기반 테스트

Function level parallelism driven by data dependencies

tomato13 2008. 9. 13. 21:45

www.cse.ucsd.edu/~rakumar/dasCMP06/paper07.pdf

이 논문은 case study이다. 논문에서는 function간의 호출관계(data flow graph)와 공유변수관계(data sharing graph)를 분석하여 기존의 sequential program에 parallelism을 적용하는 framework를 제안한다. 그리고 실제 test program에 적용하여 기존의 다른 논문에서 제시하는 방법보다 크게 효과를 보았다고 설명한다.

이 논문의 key point는 data flow graph와 data sharing graph를 기반으로 clustering 단위를 정하는 것이다. define된 clustering는 상호 독립적이기에 개별 core에서 동작할 수 있게 된다.

주요 내용을 인용하면 아래와 같다.

In this paper, we propose a framework for extracting potential parallelism from programs.

...

Our framework is profile-based, implying that it is not safe. It builds two new graph representations of the profile-data: the interprocedural data flow graph and the data sharing graph. This graphs show the data-flow between functions and the data structures facilitating this data-flow, respectively.

...

single core에서 multi core로의 transition이 필요한 이유를 아래와 같이 설명한다.

Firstly, the instruction level parallelism(ILP) has been exploited to its full extent, such that extracting more ILP becomes overly complex. Secondly, power dissipation kept increasing alarmingly, caused by implementing a single large core with long wires and clocked at high frequencies.

...

To extract large amounts of thread-level parallelism, it is necessary to look past these limiting control flow and data flow restrictions. In this paper, we develop a framework for extracting thread-level parallelism from sequential programs that assumes perfect knowledge of these dependencies. As such, it is able to discover large amounts of TLP

...

Our analysis focusses on memory dependencies, since register depencies can be predicted or precomputed.

...

If data streams between two clusters of functions go in both directions, it is hard to parallelize them further on.

...

cluster의 size를 적정하게(?) 가져가야한다고 설명한다.

So the second role of the call graph is to find clusters that are balanced in execution time.

...

Parallel 구조를 가져가는 방법에 대해서 세가지 방안을 설명한다.

- Master-Slave

- Workpile

- Pipeline

논문에서는 Pipeline방식을 사용한다.

........

The first is the heterogeneous pipeline in which each stage of the pipeline handles a different function clusters. The second is called the homogeneous pipeline where each stage executes the same code.

'Papers > Multi_Core기반 테스트' 카테고리의 다른 글

Dependece analysis (0)	2008.10.01
Improving Event Processing Performance through Parallel~ (0)	2008.09.15
Memory/Cache coherency (0)	2008.08.14
인텔 컴파일러를 이용한 어플리케이션 최적화 방안 (0)	2008.08.12
X10 (0)	2008.07.23

현재글Function level parallelism driven by data dependencies

Joonghee's Laboratory

inline, logistic regression, EDUCATION, Quantization, linux path, nlp, 강화학습, DQN, auto_ptr, Reinforcement learning,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Joonghee's Laboratory