Change Python Pandas Column based on a Date using numpy.where() Method


Python Data Wrangling with Numpy and Pandas

I came across an interesting problem today. A data provider changed the decimal format of how they provide their financial market prices. This is problematic because my previous database of prices is different. This will cause any analysis we do with a combination of the new data and the old data to be way out of whack.

Luckily, there is a quick an easy way to reformat your data back into your desired result using the numpy.where() method. Numpy is a popular python library that is part of the python science stack and is especially utilized in work with arrays. We will also use pandas.

Let’s do a quick exercise

For example, if you have a simple and example data set of:

dateprice_close
2/1/20221.01
2/2/20221.02
2/3/20221.03
2/4/20221.04
2/5/20221.05

But the new data coming in looks like this:

dateprice_close
2/6/20220.00106
2/7/20220.00107
2/8/20220.00108
2/9/20220.00109
2/10/20220.0011

When you combine the data set, it will look like this:

dateprice_close
2/1/20221.01
2/2/20221.02
2/3/20221.03
2/4/20221.04
2/5/20221.05
2/6/20220.00106
2/7/20220.00107
2/8/20220.00108
2/9/20220.00109
2/10/20220.0011

Ok, so that is the data we have. Let’s turn that small data set into a Pandas dataframe to work with.

import pandas as pd
import numpy as np

datafolder = 'My_Folder_Location'
pricedata = pd.read_csv('{}/Test_Data_Set.csv'.format(datafolder), parse_dates=['date'])
df = pd.DataFrame(data=pricedata)
Here above, we import the python libraries we are going to use and then specify our folder location for the CSV data on our computer.

Next, we specify the CSV within that folder we want to work with (we tell pandas we are working with dates in the ‘date’ column by using parse_dates).

Lastly, we turn the set into a pandas DataFrame. And then we print it out.

print(df)

Output:

Ok, we can see from the data above that 2/6/2022 is the date where we want to change the data to match up with the first half of the dataframe.

We use the numpy.where() method to transform any data on or after that date by 3 decimal places using the code below.

df['price_close'] = np.where(df['date'] >= '2/6/2022', df['price_close']*1000, df['price_close'])
Explaining the code above: we are specifying that we want the price_close column (df[‘price_close’] =) to overwrite the current values based on a condition.

That condition is np.where(df[‘date’] >= ‘2/6/2022’, one that says, using the np or numpy.where method on the date column of the dataframe, if that date column is greater than or equal to 2/6/2022 then do something.

That something for our condition is to take our input after the comma which is df[‘price_close’]*1000. This says to take the price_close column values and multiply them by 1000 so we can move that decimal place over.

The last bit of that code after the second comma is df[‘price_close’]. This is the input that will take place if the condition is not true (if dates do not fall on or after 2/6/2022).

Let’s print the newly modified dataframe.
print(df)

And success! We have now correctly formatted the decimals for our prices.

More columns:

If you needed to change multiple columns based on that same condition, you can just add another line (or lines) of code with your desired parameters like the example below.

df['price_close'] = np.where(df['date'] >= '2/6/2022', df['price_close']*1000, df['price_close'])
df['second_column'] = np.where(df['date'] >= '2/6/2022', df['second_column']*1000, df['second_column'])

Article by Zachary Storella – See more programming posts on our Python Page