python – 如何将Dataframe列从String类型更改为pyspark中的Double类型 - 代码日志

python – 如何将Dataframe列从String类型更改为pyspark中的Double类型

我有一个数据框,列为字符串。
我想在pyspark中将列类型更改为Double类型。

以下是我做的, –

toDoublefunc = UserDefinedFunction(lambda x: x,DoubleType())
changedTypedf = joindf.withColumn("label",toDoublefunc(joindf['show']))

Just wanted to know , is this the right way to do it as while running
through Logistic Regression , I am getting some error, so I wonder ,
is this the reason for the trouble.

这里不需要UDF。列已经提供了cast method与DataType实例:

from pyspark.sql.types import DoubleType

changedTypedf = joindf.withColumn("label", joindf["show"].cast(DoubleType()))

或短串:

changedTypedf = joindf.withColumn("label", joindf["show"].cast("double"))
http://stackoverflow.com/questions/32284620/how-to-change-a-dataframe-column-from-string-type-to-double-type-in-pyspark

本站文章除注明转载外,均为本站原创或编译
转载请明显位置注明出处:python – 如何将Dataframe列从String类型更改为pyspark中的Double类型