程序问答   发布时间:2022-06-02  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

如何解决引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe?

开发过程中遇到引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe的问题如何解决?下面主要结合日常开发的经验,给出你关于引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe的解决方法建议,希望对你解决引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe有所启发或帮助;

我正在尝试在 Databricks 环境中使用 Scala 在 Apache Spark 中加入 2 个数据帧在加入这 2 个数据帧时,我收到一个错误,我无法弄清楚问题是什么以及如何解决它。非常感谢任何帮助。

第一个输入文件

   %scala
   import org.apache.spark.sql.types._
   import org.apache.spark.sql.functions._

   val rawUserArtistData = sc.textfile("/fileStore/tables/user_artist_data.txt")
   val rawUserArtistDataDF = rawUserArtistData.map(_.split(" ")).map{Case Array(a,b,C) => 
   (a.toInt,b.toInt,c.toint)}.toDF("userID","artist_ID","playcount")

   rawUserArtistDataDF.show() 

输出

   +-------+---------+---------+
   | userID|artist_ID|playcount|
   +-------+---------+---------+
   |1000002|        1|       55|
   |1000002|  1000006|       33|
   |1000002|  1000007|        8|
   |1000002|  1000009|      144|
   |1000002|  1000010|      314|
   |1000002|  1000013|        8|
   |1000002|  1000014|       42|
   |1000002|  1000017|       69|
   |1000002|  1000024|      329|
   |1000002|  1000025|        1|
   +-------+---------+---------+

第二个文件

 %scala
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.functions._

 val rawArtistData = sc.textfile("/fileStore/tables/artist_data.txt")
 val rawArtistDataDF = rawArtistData.map(_.split("\t")).map{Case Array(a,b) => 
 (a.toInt,b)}.toDF("artistID","artist_name")

 rawArtistDataDF.show(10,falsE)

输出

 +--------+---------------------------------+
 |artistID|artist_name                      |
 +--------+---------------------------------+
 |1134999 |06Crazy life                     |
 |6821360 |Pang Nakarin                     |
 |10113088|Terfel,bartoli- Mozart: Don     |
 |10151459|The Flaming SIDebur              |
 |6826647 |Bodenstandig 3000                |
 |10186265|Jota Quest e Ivete Sangalo       |
 |6828986 |Toto_XX (1977                    |
 |10236364|U.S Bombs -                      |
 |1135000 |artist formaly kNow as Mat       |
 |10299728|KassIErer - Musik für bEIDe Ohren|
 +--------+---------------------------------+ 

加入数据框代码

%scala

val CombinedDF = rawUserArtistDataDF.join(rawArtistDataDF,rawUserArtistDataDF("artist_ID") === rawArtistDataDF("artistID"),"leftouter")

CombinedDF.show()

错误

 Job aborted due to stage failure.
 Caused by: MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;)
 at 
 $line72e2ce7142694dbeb5cc11da58bc59cb37.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.$anonfun$rawArtistDataDF$2(command-1764271964671849:5)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at org.apache.spark.sql.catalyst.Expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(UnkNown sourcE)
at org.apache.spark.sql.execution.bufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:754)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.shuffle.sort.bypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:155)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:148)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:732)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:735)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
 Driver stacktrace:
    at org.apache.spark.scheduler.DAGscheduler.failJobAndindependentStages(DAGscheduler.scala:2766)
    at org.apache.spark.scheduler.DAGscheduler.$anonfun$abortStage$2(DAGscheduler.scala:2713)
    at org.apache.spark.scheduler.DAGscheduler.$anonfun$abortStage$2$adapted(DAGscheduler.scala:2707)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGscheduler.abortStage(DAGscheduler.scala:2707)
    at org.apache.spark.scheduler.DAGscheduler.$anonfun$handleTaskSetFailed$1(DAGscheduler.scala:1256)
    at org.apache.spark.scheduler.DAGscheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGscheduler.scala:1256)
    at scala.option.foreach(Option.scala:407)
    at org.apache.spark.scheduler.DAGscheduler.handleTaskSetFailed(DAGscheduler.scala:1256)
    at org.apache.spark.scheduler.DAGschedulerEventProcessLoop.doOnReceive(DAGscheduler.scala:2974)
    at org.apache.spark.scheduler.DAGschedulerEventProcessLoop.onReceive(DAGscheduler.scala:2915)
    at org.apache.spark.scheduler.DAGschedulerEventProcessLoop.onReceive(DAGscheduler.scala:2903)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: scala.MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;)
    at $line72e2ce7142694dbeb5cc11da58bc59cb37.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.$anonfun$rawArtistDataDF$2(command-1764271964671849:5)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
    at org.apache.spark.sql.catalyst.Expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(UnkNown sourcE)
    at org.apache.spark.sql.execution.bufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:754)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at org.apache.spark.shuffle.sort.bypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:155)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:148)
    at org.apache.spark.scheduler.Task.run(Task.scala:117)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:732)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:735)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

 
 
 
 
 
 

解决方法

我的第二个文件有问题,我通过以下方式解决了

%scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val rawArtistData = sc.textFile("/FileStore/tables/artist_data.txt")

val rawArtistDataDF = rawArtistData.flatMap { line =>
  val (id,Name) = line.span(_ != '\t')
  if (name.isEmpty) {
    None
  } else {
    try {
      Some((id.toInt,name.trim))
    } catch {
      case _: @R_874_10793@erFormatException => None
    }
  } 
}.toDF("artistid","artist_name")

大佬总结

以上是大佬教程为你收集整理的引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe全部内容,希望文章能够帮你解决引起:MatchError: [Ljava.lang.String;@21a536b1 (of class [Ljava.lang.String;) in join 2 Dataframe所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。