Measuring Impact: Part Four – Understanding the Limitations of Measurement

You can sign up to our LinkedIn newsletter here.

Measuring Impact: Part Three – Linking Activity, Outcomes and Impact

Measurement has become something of a Holy Grail within management science. We are often told, “You cannot manage what you cannot measure.” However, measurement can be fraught with ambiguity and unintended consequences. 

Just Blame Heisenberg?

The uncertainty principle, also known as Heisenberg’s indeterminacy principle, is a fundamental concept in quantum mechanics. It states that there is a limit to the precision with which certain pairs of physical properties, such as position and momentum, can be simultaneously known. In other words, the more accurately one property is measured, the less accurately the other property can be known.

Although in theory, this concept should not be applicable at scales larger than those at which quantum physics predominates, by analogy, it provides a compelling cautionary tale for the management sciences.

Cats: Dead or Alive

Similarly, in physics, the observer effect is the principle that any observation affects the state of whatever is being observed. This has been powerfully captured by the thought experiment known as ‘Schrodinger’s Cat’. The cat is both alive and dead until the box is opened. Opening the box changes the state to either alive or dead. Again, the principle can be extended by analogy to management science.

When we measure something, we inadvertently change the behaviour associated with what is being measured.  Often, individuals manipulate the system to satisfy the measurement criteria rather than genuinely delivering the intended outcome. 

The Sad Fact about Happy Sheets

Moreover, the measurement process—whether asking schools to record data, requesting individuals to complete surveys, or conducting any other assessment—inevitably alters the perception of those being measured. A classic example of this is the concept of “learning gain.” 

Many of you may be familiar with training providers distributing “happy sheets”—forms completed before the conclusion of a course, which ask participants questions such as: ‘Did you enjoy the session?’ ‘Was the room satisfactory?’ ‘Was the food acceptable?’ Was the training useful? 

While superficially helpful, such feedback is ultimately of limited value. Feeling positive about an experience does not necessarily indicate that any meaningful learning has occurred. Thus, the next logical step was to attempt to measure “learning gain.” 

The Fallacy of Learning Gain

The principle is straightforward: at the start of a course, you are asked to rate, on a scale of one to ten, how much you know about the topics you will be studying. At the conclusion, you are asked again to rate your knowledge of the topics. The difference between the two ratings is your “learning gain”—a subjective, self-reported measure. 

However, there is a fundamental flaw: you are often unaware of what you do not know at the outset.  You might initially rate your knowledge as a 7 out of 10, not realising that the full scope of knowledge required covers 20 key areas.  Upon completing the course, you may realise that your original 7 out of 10 was, in reality, 7 out of 20. Consequently, your perceived “learning gain” may be negative. Not because you have lost learning, but because you are now more aware of the breadth of your previous ignorance. 

This phenomenon, in which simply asking a question alters the participant’s perception, has profound implications. 

Net Promoter Scores: Used or Abused?

An increasingly popular measure is the Net Promoter Score (NPS).  This method involves asking you how likely you are to recommend a product or service to others based on your experience.  Respondents provide a score from 0 to 10; scores of 0 to 5 are considered harmful, 6 and 7 are disregarded, and scores of 8 to 10 are considered positive.  The Net Promoter Score is then calculated by subtracting the percentage of detractors from the percentage of promoters. 

Originally, NPS was designed as a simple, robust indicator to identify satisfaction.  However, organisations have increasingly asked more targeted questions, such as, “Would you recommend this service based on [specific feature]?”  In doing so, you can inadvertently elevate the perceived importance of the feature, potentially distorting the feedback.  You may have previously had no opinion about this feature, but may feel compelled to rate it, leading to misleading results. 

Distorting Perceptions

A similar distortion has historically occurred in hospitality. It is often stated that the cleanliness of a pub or restaurant’s toilets is the most crucial factor determining whether a customer will return. 

Subsequently, customer surveys began to inquire about various aspects — the ambience, the food, the beer selection, the wine list, the toilets, etc.  Each element was artificially elevated in perceived importance by being assessed individually, even though, for many customers, the primary determinant of whether they returned remained the condition of the toilets.  Thus, achieving high satisfaction scores for numerous facets of the experience is possible while overlooking the critical issue that drives customer behaviour. 

Apples and Oranges

Another major challenge in measurement is the problem of “comparing apples to oranges.” Are we consistently measuring the same thing over time, or has the measurement changed? Sometimes, the entity changes. If you measure the weight of fruit consumed but change the fruit provided from apples to oranges, you are literally comparing apples to oranges.

Alternatively, different groups may have fundamentally different experiences: one group evaluates an apple, while the other evaluates an orange. For instance, if you measure the freshness of fruit but change the type of fruit, from apples (which have a relatively long shelf life) to oranges (which dry out more quickly) to bananas (where freshness is critically dependent on timing), the reliability of your measure becomes compromised. 

Moreover, suppose you serve oranges on Mondays, bananas on Tuesdays, apples on Wednesdays, and perhaps pineapple chunks on Thursdays. In that case, it creates an inconsistent and misleading composite for your “fruit freshness” measure. 

Qualitative Measures: How do you Qualify Them?

I want to emphasise the final challenge with measurement: the rise of qualitative measures. 

Requesting individuals to rate “how you feel” about an experience on a scale of one to ten introduces significant subjectivity. 

Each individual uses the scale differently. Some may seldom score below seven, even for a mediocre experience. Others may be hypercritical, reserving scores of eight, nine, or ten only for exceptional events. 

In my experience running CEO groups, I have repeatedly observed this phenomenon. At the beginning of each session, participants were asked to rate, out of ten, their feelings about themselves, their families, and their work at that moment. It quickly became apparent that individuals interpreted and used the scales differently, often reflecting personal dispositions more than objective realities. 

Conclusion

To measure impact, you must be fully aware of the measurement limitations and have strategies to avoid or mitigate these limitations. Otherwise, what you measure will truly be unmanageable.