The blog will help about How to Change a Column Type of a DataFrame in PySpark & learn how to solve different problems that come from coding errors. If you get stuck or have questions at any point,simply comment below.
Question: What is the best solution for this problem? Answer: This blog code can help you solve errors How to Change a Column Type of a DataFrame in PySpark. Question: “What should you do if you run into code errors?” Answer:”You can find a solution by following this blog.
How can we change the column type of a DataFrame in PySpark?
Suppose we have a DataFrame
df with column
num of type
Let’s say we want to cast this column into type
Column provides a
cast() method to convert columns into a specified data type.
cast() and the singleton
We can use the PySpark
DataTypes to cast a column type.
from pyspark.sql.types import DoubleType df = df.withColumn("num", df["num"].cast(DoubleType())) # OR df = df.withColumn("num", df.num.cast(DoubleType()))
We can also use the
col() function to perform the cast.
from pyspark.sql.functions import col from pyspark.sql.types import DoubleType df = df.withColumn("num", col("num").cast(DoubleType()))
cast() and simple strings
We can also use simple strings.
from pyspark.sql.types import DoubleType df = df.withColumn("num", df["num"].cast("double")) # OR df = df.withColumn("num", df.num.cast("double"))
Get simple string from
Here is a list of
DataTypes to simple strings.
BinaryType: binary BooleanType: boolean ByteType: tinyint DateType: date DecimalType: decimal(10,0) DoubleType: double FloatType: float IntegerType: int LongType: bigint ShortType: smallint StringType: string TimestampType: timestamp
Simple strings for any
DataType can be obtained using
We can get the simple string for any
DataType like so:
from pyspark.sql import types simpleString = getattr(types, 'BinaryType')().simpleString()
from pyspark.sql.types import BinaryType simpleString = BinaryType().simpleString()
We can also write out simple strings for arrays and maps:
Read more about
Revise the code and make it more robust with proper test case and check an error there before implementing into a production environment.
Now you can solve your code error in less than a minute.