scala - How to explode array[string] field and group data for one pass -


i'm new in scala , spark , don't know how explode "path" field , found max , min "event_dttm" field 1 pass. have data:

val weblog=sc.parallelize(seq(   ("39f0412b4c91","staticnavi.com", seq( "panel", "cm.html" ), 1424954530, "so.01"),   ("39f0412b4c91","staticnavi.com", seq( "panel", "cm.html" ), 1424964830, "so.01"),   ("39f0412b4c91","staticnavi.com", seq( "panel", "cm.html" ), 1424978445, "so.01"),    )).todf("id","domain","path","event_dttm","load_src") 

i must next result:

"id"        |   "domain"   |"newpath" | "max_time" | min_time   | "load_src" 39f0412b4c91|staticnavi.com|  panel   | 1424978445 | 1424954530 | so.01 39f0412b4c91|staticnavi.com|  cm.html | 1424978445 | 1424954530 | so.01 

i think it's possible realize via row function, don't know how.

you looking explode(), followed groupby aggregation:

import org.apache.spark.sql.functions.{explode, min, max}  var result = weblog.withcolumn("path", explode($"path"))   .groupby("id","domain","path","load_src")   .agg(min($"event_dttm").as("min_time"),        max($"event_dttm").as("max_time"))  result.show() +------------+--------------+-------+--------+----------+----------+ |          id|        domain|   path|load_src|  min_time|  max_time| +------------+--------------+-------+--------+----------+----------+ |39f0412b4c91|staticnavi.com|  panel|   so.01|1424954530|1424978445| |39f0412b4c91|staticnavi.com|cm.html|   so.01|1424954530|1424978445| +------------+--------------+-------+--------+----------+----------+ 

Comments