sql >> Database >  >> RDS >> Mysql

Twee dataframes transformeren in spark sql

Hier is de voorbeeldcode die u kunt gebruiken om deze uitvoer te genereren:

De code ziet er als volgt uit:

val df1=sc.parallelize(Seq((0,"v1"),(0,"v2"),(1,"v3"),(1,"v1"))).toDF("id","values")
val df2=sc.parallelize(Seq((0,"a1","b1","-"),(1,"a2","-","b2"))).toDF("id","v1","v2","v3")
val joinedDF=df1.join(df2,"id")
val resultDF=joinedDF.rdd.map{row=>
val id=row.getAs[Int]("id")
val values=row.getAs[String]("values")
val feilds=row.getAs[String](values)
(id,values,feilds)
}.toDF("id","values","feilds")

Tijdens het testen op de console:

scala> val df1=sc.parallelize(Seq((0,"v1"),(0,"v2"),(1,"v3"),(1,"v1"))).toDF("id","values")
df1: org.apache.spark.sql.DataFrame = [id: int, values: string]

scala> df1.show
+---+------+
| id|values|
+---+------+
|  0|    v1|
|  0|    v2|
|  1|    v3|
|  1|    v1|
+---+------+


scala> val df2=sc.parallelize(Seq((0,"a1","b1","-"),(1,"a2","-","b2"))).toDF("id","v1","v2","v3")
df2: org.apache.spark.sql.DataFrame = [id: int, v1: string ... 2 more fields]

scala> df2.show
+---+---+---+---+
| id| v1| v2| v3|
+---+---+---+---+
|  0| a1| b1|  -|
|  1| a2|  -| b2|
+---+---+---+---+


scala> val joinedDF=df1.join(df2,"id")
joinedDF: org.apache.spark.sql.DataFrame = [id: int, values: string ... 3 more fields]

scala> joinedDF.show
+---+------+---+---+---+                                                        
| id|values| v1| v2| v3|
+---+------+---+---+---+
|  1|    v3| a2|  -| b2|
|  1|    v1| a2|  -| b2|
|  0|    v1| a1| b1|  -|
|  0|    v2| a1| b1|  -|
+---+------+---+---+---+


scala> val resultDF=joinedDF.rdd.map{row=>
     | val id=row.getAs[Int]("id")
     | val values=row.getAs[String]("values")
     | val feilds=row.getAs[String](values)
     | (id,values,feilds)
     | }.toDF("id","values","feilds")
resultDF: org.apache.spark.sql.DataFrame = [id: int, values: string ... 1 more field]

scala> 

scala> resultDF.show
+---+------+------+                                                             
| id|values|feilds|
+---+------+------+
|  1|    v3|    b2|
|  1|    v1|    a2|
|  0|    v1|    a1|
|  0|    v2|    b1|
+---+------+------+

Ik hoop dat dit uw probleem kan zijn. Bedankt!




  1. LIMIT trefwoord op MySQL met voorbereide verklaring

  2. Verbinden door in Oracle SQL

  3. Retourneer de parameters van een opgeslagen procedure of door de gebruiker gedefinieerde functie in SQL Server (T-SQL-voorbeelden)

  4. SQL JOINs-zelfstudie met voorbeelden