scala – Spark:按组排序记录?

我有一组记录,我需要:

1)按’日期’,’城市’和’亲切’分组

2)按奖项对每组进行排序

在我的代码中:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Sort {

  case class Record(name:String, day: String, kind: String, city: String, prize:Int)

  val recs = Array (
      Record("n1", "d1", "k1", "c1", 10),
      Record("n1", "d1", "k1", "c1", 9),
      Record("n1", "d1", "k1", "c1", 8),
      Record("n2", "d2", "k2", "c2", 1),
      Record("n2", "d2", "k2", "c2", 2),
      Record("n2", "d2", "k2", "c2", 3)
      )

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
      .setAppName("Test")
      .set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)
    val rs = sc.parallelize(recs)
    val rsGrp = rs.groupBy(r => (r.day, r.kind, r.city)).map(_._2)
    val x = rsGrp.map{r => 
      val lst = r.toList
      lst.map{e => (e.prize, e)}
      }
    x.sortByKey()
  }

}

当我尝试对组进行排序时,我收到错误:

value sortByKey is not a member of org.apache.spark.rdd.RDD[List[(Int, 
 Sort.Record)]]

怎么了?怎么排序?

您需要定义一个Key,然后mapValues对它们进行排序.

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._

  object Sort {

    case class Record(name:String, day: String, kind: String, city: String, prize:Int)

    // Define your data

    def main(args: Array[String]): Unit = {
      val conf = new SparkConf()
        .setAppName("Test")
        .setMaster("local")
        .set("spark.executor.memory", "2g")
      val sc = new SparkContext(conf)
      val rs = sc.parallelize(recs)

      // Generate pair RDD neccesary to call groupByKey and group it
      val key: RDD[((String, String, String), Iterable[Record])] = rs.keyBy(r => (r.day, r.city, r.kind)).groupByKey

      // Once grouped you need to sort values of each Key
      val values: RDD[((String, String, String), List[Record])] = key.mapValues(iter => iter.toList.sortBy(_.prize))

      // Print result
      values.collect.foreach(println)
    }
}
翻译自:https://stackoverflow.com/questions/28543510/spark-sort-records-in-groups

转载注明原文:scala – Spark:按组排序记录?