python – 如何将Dataframe列从String类型更改为pyspark中的Double类型

我有一个数据框,列为字符串。
我想在pyspark中将列类型更改为Double类型。

以下是我做的, –

toDoublefunc = UserDefinedFunction(lambda x: x,DoubleType())
changedTypedf = joindf.withColumn("label",toDoublefunc(joindf['show']))

Just wanted to know , is this the right way to do it as while running
through Logistic Regression , I am getting some error, so I wonder ,
is this the reason for the trouble.

这里不需要UDF。列已经提供了cast method与DataType实例:

from pyspark.sql.types import DoubleType

changedTypedf = joindf.withColumn("label", joindf["show"].cast(DoubleType()))

或短串:

changedTypedf = joindf.withColumn("label", joindf["show"].cast("double"))

转载注明原文:python – 如何将Dataframe列从String类型更改为pyspark中的Double类型 - 代码日志