01_05. Cluster Topology

Notice

Recent Posts

Recent Comments

Link

관리 메뉴

핀아의 저장소 ( •̀ ω •́ )✧

Big Data/Engineering

_핀아_ 2023. 5. 13. 02:10

[실행 과정]

Driver Program이 SparkContext를 생성해서 Spark 어플리케이션을 처음에 만들게 되고, SparkContext가 Cluster Manager에 연결을 하고, Cluster Manager는 자원들을 할당한다.
Cluster Manager가 Cluster에 있는 노드들의 Executor를 수집하고, 이 Executor들은 연산을 수행하고, 데이터들을 저장한다.
SparkContext가 Executor에게 실행할 task들을 전송한 다음에 실행된 task들은 결과값을 내뱉게 되는데 그런 결과값들은 다시 Driver Program에 보내진다.

RDD.foreach(lambda x: print(x))

자세한 내용은 아래 글에서 확인

01_03. RDD Transformations and Actions

Transformations & Actions Transformations 결과값으로 새로운 RDD를 반환 Actions가 실행되기 전까진 실행되지 않는다. 지연 실행(Lazy Execution) Actions 결과값을 연산하여 출력하거나 저장 파이썬 오브젝트나 리

mydb-lib.tistory.com

foods = sc.parallelize(["짜장면", "마라탕", ...])
three = foods.take(3)

01_07. Key-Value RDD Operations & Joins (0)	2023.05.14
01_06. Reduction Operations (0)	2023.05.13
01_04. Cache & Persist (0)	2023.05.13
01_03. RDD Transformations and Actions (2)	2023.05.13
01_02. 병렬처리에서 분산처리까지 (0)	2023.05.12

'Big Data/Engineering' Related Articles

Comments