在终端上输入这个指令并敲下回车时,spark的旅程就开始了。
1 2 |
exec "${SPARK_HOME}" /bin/spark-class org.apache.spark.deploy.SparkSubmit "$@" |
在spark-class中:
1 2 |
"$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" |
在org.apache.spark.launcher.Main中:
1 2 3 4 5 6 7 8 9 10 |
if (className.equals("org.apache.spark.deploy.SparkSubmit")) { try { builder = new SparkSubmitCommandBuilder(args); } catch (IllegalArgumentException e) { ... } } else { builder = new SparkClassCommandBuilder(className, args); } |
方法中对调用的类进行了判断,并对传入的参数进行了验证。提交任务的类是org.apache.spark.deploy.SparkSubmit,该类的main方法:
1 2 3 4 5 6 7 8 9 |
override def main(args: Array[String]): Unit = { ... appArgs.action match { case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog) case SparkSubmitAction.KILL => kill(appArgs) case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs) } } |
提交任务调用的是submit方法,忽略中间的逻辑,最终调用的是runMain方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
private def runMain( childArgs: Seq[String], childClasspath: Seq[String], sparkConf: SparkConf, childMainClass: String, verbose: Boolean): Unit = { ... var mainClass: Class[_] = null try { mainClass = Utils.classForName(childMainClass) } catch { ... } val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) { mainClass.newInstance().asInstanceOf[SparkApplication] } else { // SPARK-4170 if (classOf[scala.App].isAssignableFrom(mainClass)) { printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.") } new JavaMainApplication(mainClass) } ... try { app.start(childArgs.toArray, sparkConf) } catch { ... } } |
childMainClass就是指令中参数–class后面的类,因为提交的是scala的代码,所以创建的是JavaMainApplication,并调用了start方法:
1 2 3 4 5 6 7 8 9 10 11 12 |
override def start(args: Array[String], conf: SparkConf): Unit = { val mainMethod = klass.getMethod("main", new Array[String](0).getClass) ... val sysProps = conf.getAll.toMap sysProps.foreach { case (k, v) => sys.props(k) = v } mainMethod.invoke(null, args) } |
最后通过反射的方式,调用了该类的main方法。main方法中创建了SparkContext,随之启动了Driver。
13 Replies to “spark源码阅读1:一切源于spark-submit”