If you want to organize the dataframe into some logical groups and find the distinct values of each column within each group, then Pandas GroupBy Nunique is a perfect choice.
Let’s see how it works with the data.
This method groups the data as per the specified column and calculates how many distinct values are there in every other column inside each group.
Does it work with both text and numeric data types?
Yes. The column data type could be anything. But if the column values are unique and not much repetitive then the distinct count may not be much meaningful. So, this method suits well for categorical column types.
Are the distinct counts at an individual column level or at the combination of all columns?
The distinct counts are exclusive to each column. Even if you drop a few columns in your analysis that doesn’t impact the counts of other columns.
Is it possible to get the distinct counts of a specific combination of columns?
Yes. It can be done by applying some workaround. On the fly, you can concatenate the values of the required columns and calculate the distinct counts of this temporary column.
Let’s see how it supports key decisions related to price elasticity in the retail sector.
Demand and Supply – these two words are responsible for the majority of the pricing changes in the market. Higher the supply lesser the demand. Correspondingly, the lesser the supply higher the demand. As a result, the price is elastic in between these two market situations.
The pricing team needs to be highly vigilant in this window. Whenever there is a change in either of them (demand or supply) the team should instantly and thoroughly explore the changing conditions. Immediately reset the optimum price for the affected products. Otherwise, the profitability will be at risk.
We have a client who runs a chain of retail stores (in-store and web store). The web store manager is responsible for the profitability of the products sold via the e-commerce platform. To stay competitive and clear the inventory faster, the manager keeps a close watch on the supply available at the competitor stores.
The price monitoring system regularly extracts the product data from the competitors’ websites. At hourly frequency for some fast-moving products. The raw data comprises several details. From those details, the manager has to quickly estimate the stock availability at different stores and adjust the margin price for various products at our store immediately.
Let’s see how the Python developers can solve this problem by using the Group By NUnique method.
A simplified version of the raw data extracted from various competitor websites looks like this.
The idea now is to adjust the margin price of a product, based on its supply in the market. If more competitors are selling the same product with active stock, then the supply is deemed to be more. Saying it in other words, if some competitors run out of stock, then the supply is deemed to be less.
When the supply is more, the customers will have more alternatives to choose the product from. Following that we need to reduce the margin and keep the product price more competitive. In this situation, the profit margin may be less but the overall profitability will be high due to increased sales volume.
On the other hand, if fewer competitors are selling the product, the supply for that product is less in the market. As the demand is more, we can set a higher margin for that product and increase profitability. Now, how to analyze this raw data and derive the required insights?
GroupBy Nunique method makes it very easy to perform this analysis.
It’s just a two-step process:
- Group the data by ‘Product Title’ and then
- Call the nunique() method.
The groupby command creates one group for each Product. Then within each group, the nquique() method finds the distinct values in each column.
To keep the example simple, we’ve removed all other columns in the dataframe and retained only the ‘Product Title’ and ‘Competitor‘ columns for analysis.
Upon executing the above command, the output would look like this:
This exploration immediately alerts the manager that the Liquid Handwash is out-of-stock with many competitors. Hence the supply is less. As a result, the margin price can be increased for this product at our store.
Likewise, Peanut oil and Moisturizer stocks are available with a maximum number of competitors. That means the competitors are constantly refilling them. Therefore the supply is more. As the supply is more, the customers will move to another web store if our price is not competitive. Hence we need to reduce the margin for these products and stay competitive.
We have explored the competitor data using group by unique method and derived the insights required to adjust the margin price. It’s a very simple method that supports many critical business decisions related to data mapping.
If you would like to share your use case, relevant to this discussion, please send the details to firstname.lastname@example.org. Our editorial team will get in touch to add your contribution to this story.
An example speaks a thousand paragraphs. Let’s build together a new generation of fast learners.