Ja, dat is het . U kunt het op 2 manieren in Spark SQL laten werken door een nieuwe kolom te gebruiken in GROUP BY
en ORDER BY
clausules
Benader 1 met behulp van subquery :
SELECT timeHour, someThing FROM (SELECT
from_unixtime((starttime/1000)) AS timeHour
, sum(...) AS someThing
, starttime
FROM
some_table)
WHERE
starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
timeHour
ORDER BY
timeHour
LIMIT 10;
Benadering 2 met behulp van WITH // elegante manier :
-- create alias
WITH table_aliase AS(SELECT
from_unixtime((starttime/1000)) AS timeHour
, sum(...) AS someThing
, starttime
FROM
some_table)
-- use the same alias as table
SELECT timeHour, someThing FROM table_aliase
WHERE
starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
timeHour
ORDER BY
timeHour
LIMIT 10;
Alternatief met Spark DataFrame(wo SQL) API met Scala:
// This code may need additional import to work well
val df = .... //load the actual table as df
import org.apache.spark.sql.functions._
df.withColumn("timeHour", from_unixtime($"starttime"/1000))
.groupBy($"timeHour")
.agg(sum("...").as("someThing"))
.orderBy($"timeHour")
.show()
//another way - as per eliasah comment
df.groupBy(from_unixtime($"starttime"/1000).as("timeHour"))
.agg(sum("...").as("someThing"))
.orderBy($"timeHour")
.show()