apache spark - Some(null) to Stringtype nullable scala.matcherror -


i have rdd[(seq[string], seq[string])] null values in data. rdd converted dataframe looks this

+----------+----------+ |      col1|      col2| +----------+----------+ |[111, aaa]|[xx, null]| +----------+----------+ 

following sample code:

val rdd = sc.parallelize(seq((seq("111","aaa"),seq("xx",null)))) val df = rdd.todf("col1","col2") val keys = array("col1","col2") val values = df.flatmap {     case row(t1: seq[string], t2: seq[string]) => some((t1 zip t2).tomap)     case row(_, null) => none } val transposed = values.map(somefunc(keys))  val schema = structtype(keys.map(name => structfield(name, datatypes.stringtype, nullable = true)))  val transposeddf = sc.createdataframe(transposed, schema)  transposed.show() 

it runs fine until point create transposeddf, hit show throws following error:

scala.matcherror: null         @ org.apache.spark.sql.catalyst.catalysttypeconverters$stringconverter$.tocatalystimpl(catalysttypeconverters.scala:295)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$stringconverter$.tocatalystimpl(catalysttypeconverters.scala:294)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters.scala:97)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters.scala:260)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$structconverter.tocatalystimpl(catalysttypeconverters.scala:250)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$catalysttypeconverter.tocatalyst(catalysttypeconverters.scala:102)         @ org.apache.spark.sql.catalyst.catalysttypeconverters$$anonfun$createtocatalystconverter$2.apply(catalysttypeconverters.scala:401)         @ org.apache.spark.sql.sqlcontext$$anonfun$6.apply(sqlcontext.scala:492)         @ org.apache.spark.sql.sqlcontext$$anonfun$6.apply(sqlcontext.scala:492) 

if there no null values in rdd code works fine. not understand why fail when have null values, becauase specifying schema of stringtype nullable true. doing wrong? using spark 1.6.1 , scala 2.10

pattern match performed linearly appears in sources, so, line:

case row(t1: seq[string], t2: seq[string]) => some((t1 zip t2).tomap) 

which doesn't have restrictions on values of t1 , t2 never matter match null value.

effectively, put null check before , should work.


Comments