idea写spark程序-EW帮帮网

步骤 1：创建 Maven 项目

打开 IntelliJ IDEA，选择 File > New > Project。
选择 Maven，勾选 Create from archetype，选择 org.apache.maven.archetypes:maven-archetype-quickstart。
填写 GroupId（如 com.example）和 ArtifactId（如 spark-example），点击 Next。
配置 Maven 设置，点击 Finish。

步骤 2：添加 Spark 依赖

在 pom.xml 中添加以下依赖：

xml

<dependencies>
    <!-- Spark Core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.4.1</version> <!-- 根据你的 Spark 版本调整 -->
    </dependency>
    
    <!-- Spark SQL (可选) -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.4.1</version>
    </dependency>
    
    <!-- Spark Streaming (可选) -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.12</artifactId>
        <version>3.4.1</version>
    </dependency>
</dependencies>

步骤 3：编写 Spark 程序

创建一个 Scala 或 Java 类，编写 Spark 程序。以下是一个简单的 Scala 示例：

scala

import org.apache.spark.sql.SparkSession

object WordCount {
  def main(args: Array[String]): Unit = {
    // 创建 SparkSession
    val spark = SparkSession.builder()
      .appName("WordCount")
      .master("local[*]") // 本地模式，使用所有 CPU 核心
      .getOrCreate()
    
    // 读取文本文件
    val textFile = spark.sparkContext.textFile("src/main/resources/input.txt")
    
    // 计算单词计数
    val counts = textFile
      .flatMap(line => line.split(" "))
      .map(word => (word, 1))
      .reduceByKey(_ + _)
    
    // 输出结果
    counts.collect().foreach(println)
    
    // 停止 SparkSession
    spark.stop()
  }
}

步骤 4：配置运行环境

添加 Scala 支持：
- 若项目未自动识别 Scala，右键点击项目 > Add Framework Support > 勾选 Scala。
- 下载并配置 Scala SDK（版本需与 Spark 兼容，如 Scala 2.12.x）。
设置运行参数：
- 点击 Run > Edit Configurations。
- 添加新的 Application 配置，设置：
  - Main class：WordCount（或你的主类名）。
  - JVM options（可选）：-Xmx2g（设置最大堆内存）。

步骤 5：运行程序

在项目根目录下创建 src/main/resources/input.txt 文件，添加测试文本。
点击运行按钮或使用快捷键（如 Shift + F10）执行程序。
查看控制台输出，验证单词计数结果。

步骤 6：打包并提交到集群（可选）

如果需要在 Spark 集群上运行，需打包项目：

在 pom.xml 中添加打包插件：

xml

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>3.4.1</version>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <filters>
              <filter>
                <artifact>*:*</artifact>
                <excludes>
                  <exclude>META-INF/*.SF</exclude>
                  <exclude>META-INF/*.DSA</exclude>
                  <exclude>META-INF/*.RSA</exclude>
                </excludes>
              </filter>
            </filters>
          </configuration>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

执行 mvn clean package 生成 JAR 文件。

使用 spark-submit 提交到集群：

bash

spark-submit \
  --class "WordCount" \
  --master yarn \  # 或 "spark://host:port"
  --deploy-mode cluster \
  /path/to/your-jar/spark-example-1.0-SNAPSHOT.jar

idea写spark程序

步骤 1：创建 Maven 项目

步骤 2：添加 Spark 依赖

步骤 3：编写 Spark 程序

步骤 4：配置运行环境

步骤 5：运行程序

步骤 6：打包并提交到集群（可选）

网站公告

今日签到

热门文章

最新发布