i have rdd[(seq[string], seq[string])]
null values in data. rdd converted dataframe looks this
+----------+----------+ | col1| col2| +----------+----------+ |[111, aaa]|[xx, null]| +----------+----------+
following sample code:
val rdd = sc.parallelize(seq((seq("111","aaa"),seq("xx",null)))) val df = rdd.todf("col1","col2") val keys = array("col1","col2") val values = df.flatmap { case row(t1: seq[string], t2: seq[string]) => some((t1 zip t2).tomap) case row(_, null) => none } val transposed = values.map(somefunc(keys)) val schema = structtype(keys.map(name => structfield(name, datatypes.stringtype, nullable = true))) val transposeddf = sc.createdataframe(transposed, schema) transposed.show()
it runs fine until point create transposeddf, hit show throws following error:
scala.matcherror: null @ org.apache.spark.sql.catalyst.catalysttypeconverters$stringconverter$.tocatalystimpl(catalysttypeconverters.scala:295) @ org.apache.spark.sql.catalyst.catalysttypeconverters$stringconverter$.tocatalystimpl(catalysttypeconverters.scala:294) @ org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters.scala:97) @ org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters.scala:260) @ org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters.scala:250) @ org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters.scala:102) @ org.apache.spark.sql.catalyst.catalysttypeconverters$$anonfun$createtocatalystconverter$2.apply(catalysttypeconverters.scala:401) @ org.apache.spark.sql.sqlcontext$$anonfun$6.apply(sqlcontext.scala:492) @ org.apache.spark.sql.sqlcontext$$anonfun$6.apply(sqlcontext.scala:492)
if there no null values in rdd code works fine. not understand why fail when have null values, becauase specifying schema of stringtype nullable true. doing wrong? using spark 1.6.1 , scala 2.10
pattern match performed linearly appears in sources, so, line:
case row(t1: seq[string], t2: seq[string]) => some((t1 zip t2).tomap)
which doesn't have restrictions on values of t1 , t2 never matter match null value.
effectively, put null check before , should work.
Comments
Post a Comment