If you want to organize the dataframe into some logical groups and perform some transformation on a column using your custom logic, then Group By Transform is a perfect choice.
Let’s see how it works with the data
After reading tabular data into a dataframe, you can group it by a specific column. Further, you can transform any column inside a group and create a new derived column.
Are there any built-in functions to transform the data?
Yes. Obviously, you don’t have to start from scratch. There is a good library of methods that can be readily used for transformation. Any method that a Series object supports like min, max, sum, etc can be passed as an input parameter to the Transform method.
Can I use my custom function to transform the data?
Yes, of course, that’s the main advantage. Certainly, you can write your own function. Such a function should take a Series object as an input and return either a scalar value or another series object of the same size.
Apply vs Transform: How are they different?
Apply method treats each group as a dataframe, not as a series. In addition, it doesn’t return a series after executing the apply method. Contrarily, the transform method deals with only one column at a time. As a one-dimensional series object. You cannot work on two columns at a time.
Transform is the better choice when you have to derive a new column from one existing column.
Can I write group-specific logic in my custom function?
Ideally, the transformation logic is expected to be common across all the groups. But there is a way to include group-specific logic. You can differentiate the groups by looking at the group index. The name of the input Series object contains the group index.
Will the transform method overwrite the values of the column?
No. It doesn’t. It returns a new series. It’s up to you to append a new column to the data frame or replace an existing column.
Hope this clears some quick questions. Now with this level of clarity let’s look into a business case. The module that we have implemented for a retail client who runs in-store and e-commerce.
Whenever a supplier delivers the merchandise at the store, the store manager pays the transportation costs to the supplier. Divides that cost and adds it as an indirect cost to every unit of the product delivered in that batch. Transportation cost differs from supplier to supplier.
It is very important to revise the cost of the product at every stage of the supply chain. Else, the business would incur a heavy loss if some cost elements are missed while pricing the product.
It’s a good opportunity for the Python developers to add business value by preventing this loss.
Let me explain it in detail:
- Suppliers S1 and S2 have just delivered their respective products to the store. Read the stock details from the inventory.
- Group the dataframe by the Supplier. One group created for the supplier S1 and second group created for the supplier S2.
- Now what should be our Transformation column. Is it the Unit Cost or the Purchase Cost field? If Unit Cost is taken then the products having smaller lot will take bigger cost. So, let’s apportion based on total purchase cost so far. It takes both the Unit Cost and the Quantity in to consideration and seems to be more logical to absorb the indirect cost.
- Now define a custom function ‘AddIndirectCost’. The transportation cost was decided as $100 for supplier 1 and $150 for the supplier 2. This function applies this cost in the raito of the column slected for transformation.
- But the pricing team wants the cost attributed at the unit level. They are not bothered about the lot size or the previous purchase price. To get there, just calculate the total cost and divide it with the quantity of that product. Total cost here is the sum of the purchase cost and the transportation cost.
- That’s it we got what we want. The unit cost was revised after paying transportation costs to the respective suppliers.
We are able to instantly transform purchase cost into an indirect cost and revise the unit cost of the product. Thus contributing to the accuracy of pricing.
If you would like to share your business case on how you’ve used this method, please send the details to email@example.com. Our editorial team shall get back to you in adding it to this publication. Please contribute to the integration of knowledge towards building a new generation of fast learners. An example speaks a thousand paragraphs.