An Easy Way to Detect Data Abnormalities, Numpy Accumulate

A step by step guide to use pandas in exploratory data analysis


If you want to compute the cumulative sum of an array of elements and detect at which position does it crosses a specific threshold, then Numpy Accumulate is a good choice.

This method is very useful in performing comparative analysis and detecting the data abnormalities at an early stage.

Let’s see how we can use it in troubleshooting mechanical problems. Implemented this for a client who performs Machine Health Monitoring Services in automobile industry.

Domain Notes

Here is a quick overview of what is machine health monitoring and what is the significance of vibration analysis in that.

What is machine health monitoring?

From a chocolate to an aircraft, anything that is produced in a factory needs several machines to operate in synergy. If a particular machine goes down in the chain of production that may result in a huge loss to the company. So it is very important to keep monitoring the health of each machine.

What is vibration analysis?

When we feel sick, as humans, we can express our feelings and visit a doctor. But how can machines do that? Of course, they too follow a similar way. When there is any problem with the machine, the machine also expresses it by making different sounds/vibrations. Abnormal vibration is indicative of a developing fault.

Whatever be the kind of machines like a grinding machine, compressor, or refiner, every machine produces some vibrations while they are running.

Vibration analysis is a growing field of data analytics. It has great potential in driving the early detection of machine failures and saving enormous failure costs.

What do the field engineers do?

Our field engineers love the machines like pets. They regularly visit the factories, read the machine condition and send the data to our data analytics team. They read the machine vibrations by using some devices like a vibrometer, accelerometer, etc. These devices translate the vibrations into various mechanical attributes like velocity, acceleration, RPM of a machine.

What does the data analytics team do?

Well, the data engineers at the data analytics team enrich the attributes of the datasets sent by the field engineers. They transform the machine readings to more statistical measurements. And the enriched data can be easily interpreted by the business analysts.

What do the business analysts do?

Here the business analysts are typically the mechanical engineers. They understand well about the machine internals. From the statistics reported by the data analytics team, they determine the problem location. Subsequently, they prescribe the corrective action.

Business Scenario

Now lets get into the specific requirement related to issue classification.

When a machine is reported to be defective, the field engineer takes the readings at a more granular level. The data analytics team should compare these readings with a healthy machine. And accordingly indicate whether the problem could be with the power unit of the motor or with the bearings of the motor.

In regard to that the engineering team has specified a rule. In case of a defective power unit, the machine speed will be low from the start time itself. But in case of bearings, it may take more than 10 seconds for the machine speed to go down. A machine is said to be defective if the cumulative speed difference with a healthy machine is more than 50 units. Here the machine speed is measured in RPM (rotations per minute).

So in specific, our analytics module should calculate every second the RPM difference between a defective machine and the healthy machine. Subsequently, generate the cumulative deviation at every second. If the cumulative deviation is more than 50 units within 10 seconds then indicate the defect as a problem with the power unit. Otherwise, indicate it as a problem with the bearings.

Let’s see how we can solve it by using the Numpy Accumulate method.

Solution

  1. Firstly, read the source dataset. The dataset contains the RPM details of two motors captured during first 10 seconds of their operation. One of these motors is healthy and the other one is defective. Following that we’ll start our comparative analysis.
  1. Now calculate the difference between the RPMs of each motor at the respective time stamp. This difference is considered as abnormality.
  1. Use the Numpy Accumulate method, we’ll now calculate the abnormality accumulated after every second.
  1. Plot the same for better clarity

Analysis: From the table and the plotted graph, it is evident that the accumulated abnormality has crossed the threshold of 50 in the 8th second itself. So as per the field engineers subjective hypothesis, it is mostly an issue with the power circuit of the defective motor. It might not be getting the required amount of power delivered to the unit.

Summary

We have compared the RPM values of two motors – one healthy and one defective. By calculating the accumulated abnormality at every time index, we have established that the defective motor breaches the threshold at the 8th second of the operation. Thus we have supported the field engineers in taking a subjective decision about the primary suspected cause of the defect.

What Next?

If you would like to share your business case on how you’ve used this method, please send the details to pub@additionalsheet.com. Our editorial team shall get back to you in adding it to this publication. Please contribute to the integration of knowledge towards building a new generation of fast learners. An example speaks a thousand paragraphs.

Also Read

An Excellent Method for Instant Data Transformation, DF GroupBy Transform

How is GroupBy Rank used in Pricing Intelligence?

Write a Comment

Comment