JS Problems geeksforgeeks Exercises and Solutions for Beginners

How to Change a Column Type of a DataFrame in PySpark [All Method]️

The blog will help about How to Change a Column Type of a DataFrame in PySpark & learn how to solve different problems that come from coding errors. If you get stuck or have questions at any point,simply comment below.
Question: What is the best solution for this problem? Answer: This blog code can help you solve errors How to Change a Column Type of a DataFrame in PySpark. Question: “What should you do if you run into code errors?” Answer:”You can find a solution by following this blog.

How can we change the column type of a DataFrame in PySpark?

Suppose we have a DataFrame df with column num of type string.

Let’s say we want to cast this column into type double.

Luckily, Column provides a cast() method to convert columns into a specified data type.

Cast using cast() and the singleton DataType

We can use the PySpark DataTypes to cast a column type.

from pyspark.sql.types import DoubleType
df = df.withColumn("num", df["num"].cast(DoubleType()))
# OR
df = df.withColumn("num", df.num.cast(DoubleType()))

We can also use the col() function to perform the cast.

from pyspark.sql.functions import col
from pyspark.sql.types import DoubleType
df = df.withColumn("num", col("num").cast(DoubleType())) 

Cast using cast() and simple strings

We can also use simple strings.

from pyspark.sql.types import DoubleType
df = df.withColumn("num", df["num"].cast("double"))
# OR
df = df.withColumn("num", df.num.cast("double"))

Get simple string from DataType

Here is a list of DataTypes to simple strings.

BinaryType: binary
BooleanType: boolean
ByteType: tinyint
DateType: date
DecimalType: decimal(10,0)
DoubleType: double
FloatType: float
IntegerType: int
LongType: bigint
ShortType: smallint
StringType: string
TimestampType: timestamp

Simple strings for any DataType can be obtained using getattr() and simpleString().

We can get the simple string for any DataType like so:

from pyspark.sql import types
simpleString = getattr(types, 'BinaryType')().simpleString()
from pyspark.sql.types import BinaryType
simpleString = BinaryType().simpleString()

We can also write out simple strings for arrays and maps: array<int> and map<string,int>.

Read more about cast() here.


Revise the code and make it more robust with proper test case and check an error there before implementing into a production environment.
Now you can solve your code error in less than a minute.

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button