spark源码阅读1:一切源于spark-submit

在终端上输入这个指令并敲下回车时,spark的旅程就开始了。

exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"

在spark-class中:

"$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"

在org.apache.spark.launcher.Main中:

if (className.equals("org.apache.spark.deploy.SparkSubmit")) {
  try {
    builder = new SparkSubmitCommandBuilder(args);
  } catch (IllegalArgumentException e) {
    ...
  }
} else {
  builder = new SparkClassCommandBuilder(className, args);
}

方法中对调用的类进行了判断,并对传入的参数进行了验证。提交任务的类是org.apache.spark.deploy.SparkSubmit,该类的main方法:

override def main(args: Array[String]): Unit = {
  ...
  appArgs.action match {
    case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    case SparkSubmitAction.KILL => kill(appArgs)
    case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
  }
}

提交任务调用的是submit方法,忽略中间的逻辑,最终调用的是runMain方法:

private def runMain(
    childArgs: Seq[String],
    childClasspath: Seq[String],
    sparkConf: SparkConf,
    childMainClass: String,
    verbose: Boolean): Unit = {
  ...

  var mainClass: Class[_] = null

  try {
    mainClass = Utils.classForName(childMainClass)
  } catch {
    ...
  }

  val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
    mainClass.newInstance().asInstanceOf[SparkApplication]
  } else {
    // SPARK-4170
    if (classOf[scala.App].isAssignableFrom(mainClass)) {
      printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
    }
    new JavaMainApplication(mainClass)
  }

  ...

  try {
    app.start(childArgs.toArray, sparkConf)
  } catch {
    ...
  }
}

childMainClass就是指令中参数–class后面的类,因为提交的是scala的代码,所以创建的是JavaMainApplication,并调用了start方法:

override def start(args: Array[String], conf: SparkConf): Unit = {
  val mainMethod = klass.getMethod("main", new Array[String](0).getClass)
  ...

  val sysProps = conf.getAll.toMap
  sysProps.foreach { case (k, v) =>
    sys.props(k) = v
  }

  mainMethod.invoke(null, args)
}

最后通过反射的方式,调用了该类的main方法。main方法中创建了SparkContext,随之启动了Driver。

发表评论