Data vailidity, integrity and consistency

Data validity, integrity, and consistency: Useful tools to safeguard revenue

Measurement data has always been an integral part of industrial workplace operations, with automation, mainly via sensors, now taking a more significant role. All systems, plants, oil and gas fields, etc., are set up to collect many measurements, such as pressure, temperature, flow rate, composition, and more. The availability of accurate data is essential for maintaining optimal and safe operating conditions, but, in most cases, significant monetary consequences are also involved. The risks of inaccurate measurements are often (grossly) underestimated and, therefore, insufficient countermeasures are taken, leading to a false sense of security. This paper sheds light on the root causes of most measurement-related errors and some countermeasures that can be applied to mitigate problems, including the potential loss of revenue.

Background and definitions

We will use the terms data validity, data integrity, and data consistency as follows:

  • Data validity data stay within expected limits
  • Data integrity data behave as expected, based on system properties and measurement itself
  • Data consistency data are consistent with other system information, both past, and present

We will illustrate these terms further with an example from aviation. Sensor failure: Turkish Airlines crash of 2009. In reality, the plane was still some 250 m above the ground, as the other three sensors had correctly detected it. The loss of engine power resulted in a slowing down of the plane to below its stall speed and, subsequently, the plane dropped like a brick! It was too late when the three pilots finally found out what was happening and put the power back on the engines. Nine people had lost their lives; a plane was destroyed, and, consequently, Turkish Airlines’ reputation was at stake. Data validity, integrity, and consistency countermeasures would have easily avoided this accident, especially if the information was available in almost real-time to allow data-consistency checks to provide early alerts.

Comparing the results across four height measurements would have revealed that up to a particular moment in time, they were generally in agreement until suddenly, one gave an utterly different reading. This should have set off alarms, certainly when the autopilot’s (only) sensor. The software, obviously not designed to check data validity, data integrity, or data consistency, let alone all three factors, ordered the autopilot to finish the landing, which did happen but not in an intended way.

So, based on the available information and by simply checking the data validity, consistency and integrity, the only viable conclusion should have been that the height measurement was in error and, therefore, its information should have been ignored, and the auto-pilot returned to the control of the human pilots. This severe case of negligence resulted in a major disaster!
It needs to be an informative brochure where the new features are clear

Sensor errors: underlying causes

Although the complete breakdown is usually easy to detect, other, more sneaky effects can also happen, as the gradual decrease in sensor performance. This is illustrated in the following examples:

Sensors are calibrated before they leave the factory. Usually, the manufacturer specifies a maximum period after which the sensor should be re-calibrated, or at least its performance checked. As the calibration curve tends to creep, it gradually changes over time. In contrast to the Turkish Airlines example, reading does not change in a few tenths of a second, but over months or even longer; the assignment stays within the expected values. However, the result is a systematic error, which can have grave financial consequences, as we will see below.

Another example is a change in the calibration curve or the zero-setting of a sensor caused by a (short) overload condition. A well-known example is a bent orifice plate, possibly caused by a liquid slug hitting the orifice during start-up or a misoperation. The orifice will still work (meaning it will generate a differential pressure). Still, its discharge coefficient is likely to be different, so the relation between differential pressure and flow rate will show a systematic error. Until the orifice has been inspected and/or calibrated, this will go unnoticed without data consistency verification. Many sensors/transmitters provide diagnostic data next to the actual measurement, but it is rarely used from a broader perspective than the sensor itself. However, it often carries valuable information, which rarely, if ever, includes sensors and is specifically meant for diagnostic purposes of the system as a whole.

Similarly, with other sensors, although the period specified by the manufacturer has been exceeded, it is mainly assumed that these still work OK as long as they give numbers. But, a gradual change in the calibration will result in systematic errors and, thus, severe risks to revenue. What can be done about this will be discussed in the next section.

Oil & Gas industry: the need for accurate data

Concerning data measurement accuracy, the oil and gas industry tends to cling to some long-held misconceptions:

A common misunderstanding is that errors will average out. This is, however, only the case with random errors; systematic errors build up, as they always point in the same direction. Another misunderstanding is the concept of uncertainty, as it can not be seen. No bookkeeper, for example, will add the uncertainty of the (measured) revenues in the books, but that does not mean it is not there! A third misunderstanding is that correct metering is expensive and, thus, such efforts can be reduced. But, people forget the value of the information behind the metering results.

The complexity of ownership is another growing concern with oil and gas plays. In the old days, one company owned a reservoir, produced the oil and gas, and sold it (usually, the export meters were accurate). So, there was no need (people thought) for precise measurements upstream. Nowadays, the situation is far more complicated. Even in the simplest case, when everything is owned by one company, the assumption that no accurate measurements upstream are necessary is incorrect. This can be best understood by taking a more holistic view of the system.

The oil or gas reservoir itself is a complex system below the surface. To start the production of hydrocarbons, wells need to be drilled at several locations across the field. These locations are carefully chosen based on seismic data. But, the composition at various locations in the reservoir is usually not the same. They can vary from gas (in the gas cap) to light oil (close to the gas cap) and heavy liquid oil near the aquifer. At the lowest costs, the operator is constantly challenged to get the highest ultimate recovery (the total fraction of hydrocarbons recovered from the reservoir before abandonment). This requires knowledge of the initial situation in the reservoir and the history of the production from the pool. If this is not (accurately) available, less than optimal production may result, thus, reducing the ultimate recovery and/or higher costs and energy usage for secondary and tertiary recovery techniques. As the lifetime of a reservoir is typically 25 years, a holistic approach needs to be followed from design to abandonment, and the quality of the data needs constant attention during these years.

However, the current situation upstream is usually even more complex when it comes to ownership. A single company rarely owns a reservoir; it is mainly owned by several entities (either by a joint venture or because the field extends over several different concession blocks). And in remote areas or subsea developments, not every field has its flow line to the processing plant. So, the production of different reservoirs, with different compositions owned by other companies, is commingled and mixed; thermodynamic equilibria will shift (e.g., gas will come out of the solution). At the processing facility, the proportion of each company to the different export streams (with different monetary values) needs to be determined, and the revenues split. And this is all based on the input data from the measurements of the evacuation system. In essence, these data do not merely measure oil or gas; they measure $$$$$.
Again, it is often assumed that the measurement uncertainties will average out, but because the majority of the errors are systematic, this is not the case; they add up! So, the split of the revenues will be systematically biased. Note that this means some companies may receive too large a share of the payments, while another may receive too little: it is a zero-sum game. The problem is further exacerbated by the fact that some hydrocarbons are more valuable than others; not only are produced volumes come into play, but the composition of the different fluids is crucial.

Potential financial consequences

Another underestimated issue is the effect of uncertainties on the corporations’ revenue. It is often argued that it’s only 1%, so that’s not too bad. The problem is that the revenue is not only what is collected at the export meter but also depends on what the costs are to get it there. The net revenue is the difference between the gross revenue and the costs of production and transportation, making the net revenue much more influenced by uncertainty than the awful, particularly for marginal fields. Thus, accurate data will provide a more fair share of the revenues and help to act as a trusted business partner.


An excellent approach to ensuring accurate data is applying the countermeasures of data validity, integrity, and consistency. Let’s take a look at each of these in action.

Data validity: First, the data should be checked to determine whether or not the results lie in the range of values to be expected. Values like negative pressures, too high or too low temperatures, and flows going from low to high pressures should be detected A.S.A.P., and the cause analyzed and fixed. Again, measurement sensors often think it’s OK as long as it gives numbers. This is questionable, as the calibration of sensors tends to drift, and (undetected) damages can introduce systematic errors, even if the sensors still give numbers. Once the calibration of a sensor has expired, measures should be taken to verify the operation and quality of the data. Data integrity can also be helpful and even detect such deviations before calibration has expired.

Data integrity: Changes, both rapid and gradual, can indicate deterioration of the quality of the calibration of a sensor. Such changes should be explained by changes in the system, like a change in flow rate, well pressure, etc., or other causes like obstructions/deposits in the flow line. But, if these changes cannot be easily explained, they are likely caused by calibration drift or a zero-setting error of the sensor. The root cause should be further investigated, leveraging account information from other system parts to perform cross-functional data consistency verification.

Data consistency: All available information from the system should be consistent, period! Flows should go from high to lower pressure; temperatures should decrease when heat is leaking away, flow rates should relate to pressure differentials, and so forth. Inconsistencies point to sensor error (including the impulse lines of pressure sensors) and should be investigated. Historical data can be helpful to find where the problems are located.
It is impossible in this brief paper to illustrate all possible options, as these are system-dependent. But, it should be clear that a holistic view of the system (both in place and time) enables more accurate monitoring of the sensors and the system by using all available information and the relationships between the readings at different positions within the system.


Accurate data are crucial for the optimum and safe operation of systems, minimizing operational costs, and obtaining a fair share of revenues in complex evacuation scenarios. To obtain and maintain accurate data, a holistic approach is a significant step forward, incorporating data validity, integrity, and consistency, where possible, in combination with diagnostic data from individual sensors over the entire system. To realize such a holistic analysis, detailed information on the system needs to be available. The analysis can be created by combining routines for a software library and using a data historian. Hint B.V. (Wapenveld, Netherlands) can help you to set up such a system for your specific application.