Use This Simple Rich Method For Best Accuracy, DF Where

An illustrative guide on 'where' method capabilities



If Dataframe is seen as a large object holding a variety of data points, the ‘where‘ method is like a very powerful torchlight that makes the navigation precise and accurate.

This method is very rich and flexible in accommodating almost any type of business logic and transforming the identified data points as required. Though it is very simple in the declaration, it is often underutilized by the programmers. Mainly because many of them are not clear on, how widely this can be used in normal business scenarios and simplify the rest of analytics.

Instead of me repeating the same thing as told by the API documentation, if I show you some examples, you can realize it’s potential better and apply it in your business scenarios. That’s the idea. Hope you would like it.

In this article, we’ll discuss 3 different problems of an automotive business. And see how the dataframe ‘where’ method solves them with flexibility and accuracy.


Introduction

Our client is a nationwide large-scale seller of pre-owned cars. They have several outlets in major cities and deal with all car models. We have partnered with them in developing an analytical system to support their business operations. The objective of this system is to curate and analyze the operations data effectively to save costs and improve sales. The client has seen a 12% growth in sales in the first year after deploying our analytical system.

Problems and Capabilities:

To begin with, here is the list of four scenarios we are going to talk about. As can be seen, each of these scenarios demands a different kind of data transformation.

  1. Allocating the parking lots to the cars arrived at an outlet.
  2. Applying a dynamic discount to the models selected for the day
  3. Curating the customer feedback text to provide better input to NLP modules.

Subsequently, we will discuss the following capabilities of the dataframe ‘where’ method:

  1. Value mapping
  2. Conditional mapping
  3. Custom function definitions

Let’s get into each of them step-by-step.


1. Allocate Park Lots | Value mapping

The Problem Detected

In January, during our first weekly meeting, the client has raised an important issue in their operations. Inefficient allocation of parking lots affects customer ease.

The customers are finding it very difficult to walk back and forth in the parking area while comparing various models of their interest. As of now, their multi-storeyed park area is allocated as per the car models. But most of the customers have a fixed budget in their mind and decide on the age of the car that fits the budget. Eg: A customer is decided to purchase a car between 5 to 10 years old. To have a look at various manufacturers, the salesperson has to walk the customer across several floors.

Solution Expected

To address this problem, the management wants to reorganize their multi-storeyed park as per the age of the car. Going forward each floor hosts all the cars that fit a particular age bracket, irrespective of their make or model. When a new lot of cars arrives, our system should segregate the cars as per the age. And inform the floor managers how many parking lots are required for each model on each floor.

Analytics Module

Determining the age of the car from its manufacturing year is a simple linear transformation. Thus we did a value mapping between a range of numbers to a specific number. That is mapping a range of years to a numerical value determining the age bracket.

Input: Our input dataframe looks as follows. It contains the model name and the manufacturing year.

Manufacturing Year

Transformation: New cars of age less than 6 years should be parked on the first floor. Accordingly, the oldest cars are to be parked on the top floor.

The ‘where’ method takes a condition and a new value. If the present value of the dataframe meets the condition, the same value will be retained. Else, the new value will be applied. Thus the output of executing the above code looks as follows.

Output: In our output dataframe, we convert these years into another number representing their age classification.

Age Category

Result: A report generated from this output dataframe tells, floor 5 needs 3 parking lots one each for Acura, Audi, and Buick. And floor 3 needs only one parking lot. That is for Audi.


2. Sale Price Discount | Conditional Mapping

Photo by James Lee on Unsplash

Problem Detected

You can’t predict how dynamically the other car prices change when a new model hits the market successfully. Not to mention, the old car prices also quickly adjust to the fluctuation in the demand-supply equation in the market. Indeed it adjusts inversely.

A similar thing has happened with our client in the very next month. In response to that, the inventory has been growing higher and higher for cars above certain price thresholds.

Solution Expected:

In order to check the growing inventory, the outlet managers should be given the authority to apply price discounts on specific car models. Before displaying the price tag for a car on the website, the operations manager applies a certain discount for the selected models.

Analytics module:

Unlike the previous case, this one involves an additional rule. Do not transform all the values of the dataframe. Look for some values matching a rule and accordingly modify those values. In summary, we do a Conditional Mapping of the values.

Input: The data frame contains the manufacturer name and the price of various cars.

Transformation: Apply a discount of 15% if the price exceeds 7000.

Here we have defined a negative condition to the system. That is, price less than or equal to 7000. If this system condition fails that means the business condition succeeds. We did like that because, the where method applies the given transformation function only on the failed items, not on the successful items.

Output: The output dataframe from the above transformation code looks as follows.

Result: Here the dataframe contains both the string and number data types. We have defined an inline condition. And have instructed the dataframe to modify the number, only if the inline condition fails. Finally, we got the transformed dataframe that has applied the discount for the prices crossing the given threshold.


3. Test Drive Feedback | Custom Functions

The Problem Detected

The most difficult part of business development is not about collecting feedback. But it’s about converting the customer feedback into action items.

At the end of the first quarter in March, the sales team has identified an issue in leads conversion. There are a good number of inquiries happening every day but only very few of them are getting converted to sales. With some primary investigation, the business analysts have found that many customers, who took the test drive, have reported the same issues with the car.

This should not have happened, because all the customer feedback collected online is sent to an NLP module. The NLP module processes the natural text and identifies the components that need to be addressed in the car. For example, issues with acceleration, overheating, wheel alignment, etc. But this module is giving poor results as the text entered by the customer is not curated up to the mark.

Solution Expected

Standardize the text entered by the user before sending it to the NLP module. Replace the uncommon words with their close meaning

Analytics Module

Take the keywords from the user text. And check if there are any uncommon words in them. In case if there are any words like that, replace them with some corresponding standard words.

Unlike the previous comparisons, here the condition for the ‘where’ method is a little complex. But it is very rich in accommodating our custom logic. We can define our own custom logic as a ‘condition’ function.

Let’s see the same in detail here with some code examples as usual.

Input: Say, this is the list of standard words.

And this is the map of uncommon words to the standard words, as given by the business team.

Similarly, the keywords extracted from user feedback are as shown in the following dataframe:

Transformation:

This code first checks if the word is present in the standard list or not. If it is present, the same word will be retained. In case if the word is found to be uncommon, then the ‘where’ method modifies the value as per the logic defined in the ‘CloseMatch’ function.

Output: Now the dataframe contains all the standard words.

Result: Now that, we’ve curated the user feedback to a higher degree. On the whole, we have replaced the uncommon words with the standard words of the business. This makes the underlying NLP module to process the text better and yield results with better accuracy. It tells with better confidence what are the problematic components of the car are. Accordingly, the operations team addresses them before the next customer visits for a test drive.


Conclusion

Dataframe ‘where’ method offers great flexibility from its implementation but it is not being used to its full potential by the programmers. We have discussed, how rich is this method in mapping the data and curating it using custom business logic. The three business scenarios discussed here shall provide informative pointers to the readers in applying this technique in their projects.

Write a Comment

Comment