Big Data Valuation: The Potential Benefits of Mock Valuations for Guiding Data Collection
Disclaimer: I am not an IP lawyer, an economist or an accountant. This series of articles on the topic of data valuation captures opinions I developed from taking an online course on IP Valuation. They are not meant to represent the views of my employers, past or present. The purpose of this article concerning technical, legal or professional subject matter is meant to foster more dialogue about the subject and does not constitute legal advice.
This article describes my experience trying to better understand the value of Big Data by seeking inspiration from the way the future economic value of patents is assessed. I did this by taking the IP Business Academy’s IP Valuation 1 course  and reflecting on how it could be adapted to value data.
In my last piece, I outlined my interest in understanding the topic of data valuation by seeking some kind of formal training in the subject. I found no valuation standard specifically for data, but I decided the best place to start was to take a course on IP Valuation through the Center for International Intellectual Property Studies at the University of Strasbourg. The course I took, IP Valuation 1, presents material that is also supported in the EU standard DIN 77100: ‘General principles for monetary patent valuation’ .
Now that I have completed the course, I want to share a few ideas as to how some of the concepts might be used to value Big Data. In this piece, my focus will be primarily on the implications that a systematic approach for valuing data sets could have on the way Big Data collections are developed within organisations.
My main conclusion in this article is that conducting some form of mock data valuation exercise could help produce more valuable data collections. I also think that valuations could address a kind of analysis debt I think exists when people decide to support large scale passive data collection. But doing them can take significant resources that organisations would need investment in planning and resource allocation.
The Appeal of Broad Data Collection and Deferring Valuation
In order to understand why the activity of data valuation would have value itself, it is useful to consider the way the current technology market place motivates mass data collection. Technological developments and a broadening pool of vendors have made it easier to support the rapid accumulation of data along the attributes of increased volume, increased variety and increased velocity that characterise Big Data.
The advent of cloud computing services has reduced the risk of investing in computational resources needed to grow and process Big Data sets. Storage and processing power can be more predictably matched against forecasted analytical needs even as the costs of both kinds of resources become cheaper. Movements like the Internet of Things provide a widening range of devices that can generate and stream data. The advent of data lake technologies allow data to be rapidly collected and save time needed to define data structures, schemas and data transformation activities .
The ease with which data can physically be transferred, generated, stored, and initially processed makes passive data collection tempting. If the costs of initially providing the data are low, then an organisation would only need to expect a similarly low minimum future value to break even and justify provisioning it.
Even if the appeal of having the data is not immediately apparent, there may be a belief that as a data set grows into a Big Data set, the ever-increasing vectors of growth metrics will eventually achieve some kind of synergistic effects, either on its own or in combination with other data provided in the future.
The ease of supporting the computational dimensions of Big Data, the difficulty in comprehending its future potential benefits and the belief it must have some value all make it inviting to capture data now and worry about valuing it later. An assumption could be made: there will exist some scenario where the data may be processed in a way that can provide some future economic benefit, either by making money or saving it.
Meanwhile, other costs will accumulate over the life cycle of a data source and raise the expectations for a minimum level of return on its investment. The process of turning data into marketable insights often involves human domain experts whose efforts contribute to the long-term costs of maintaining data. Besides the analysts themselves, research activities will likely involve the costs of software developers to provide custom tools and engineers needed to support knowledge infrastructure.
“What is the future economic value of the data?” is a difficult question to answer, and depends on whether organisations view their data sources as dependent or independent assets. In a dependent asset view, the value of data sources would be based on the perceived contribution data sets made to downstream products that would generate economic value. This view would fit well with hypothesis-driven analysis activities, where the nature of the question frames the nature of the data sets required to answer it. Although the data sets could be reused in other activities, it may be reasonable that the data sets are subservient to the hypothesis and any resulting IP products that come from it. In this case, data sets would not themselves warrant valuation.
In an independent asset view, the data sets would be perceived as having lives of their own that are independent of any one downstream product that generated economic value. This view may fit well with data-driven analysis activities, where data collection may be guided by a general intent, interest or line of inquiry rather than a single specific question. Analysis of the data may indeed answer a specific question that comes later, or it may raise or answer others. The breadth and depth of data available in a collection may invite multiple questions and its potential synergies might be presumed to exist but not be easily articulated. In this scenario, Big Data sets may warrant their own data valuation activity.
If organisations believe Big Data sets do have a ‘life of their own’, then they need to consider how they would estimate their value. As I will describe in my next article, valuations tend to place great preference in describing a scenario that shows how an asset would be used in practice. It is describing the ways data could be exploited to a valuator that requires attention from the stake holders of an organisation. If the pace of providing new data sources outstrips an organisation’s capability for managing them and valuing them as assets, then the question of value will be more difficult to assess.
Managing the Competition of Use Case Scenarios
The cumulative analysis debt I mentioned earlier could happen if the pace of providing data outstrips the ability for constrained rational thinking to provide a manageable set of realistic use cases that would generate economic benefit. In this case it may be difficult to identify all the most likely use case scenarios and even when that is done those use cases may end up competing for limited resources.
This problem is best illustrated by example.
Imagine you are asked to assess the potential value of multiple, large complex consumer data sets that are each growing exponentially. You are told the reason for the valuation, what your role in doing the task is, what they want from you and who the ultimate audience of your report will be. Suppose you only have a few days to investigate the problem.
Further suppose the datasets contain thousands of variables about people, their purchasing habits, their real-time locations and links to products and services they may be using.
Facing the limits of rationalising economic use scenarios
The first challenge could be dealing with the limits of rational thinking to identify realistic use cases that would provide economic benefit. The course I took  includes a unit on decision making in management, and it describes three limitations that would constrain rational decision making:
· A lack of relevant information
· Too much information for a human to collect and process information
· Too little time to make the decision
Not all relevant information could be obvious from the data sets. Licensing information can indicate what stake holders may do with data. Information about its provenance and data dictionaries can indicate what they should do with it. Information about its past in-house uses could guide how it could be reused or repurposed. Not knowing these things could stall or stop decision making that would identify use case scenarios.
Other constraints also present a challenge. A decision maker could potentially drown in variables that appear to have little use, starve from a lack of obvious variables that do and lack the time to assess the synergistic benefits of using them in combination.
Competition between Big Data and Big Enough Data use cases
Suppose you had enough information available to identify multiple use case scenarios for a Big Data set, but you only had a few days allocated to do a valuation. Further consider two of the scenarios:
· Scenario 1: uses only a couple of dozen variables from a couple of data sets to produce a data product that has obvious value.
· Scenario 2: uses thousands of variables from multiple data sets to produce a data product that has less obvious value.
If you only had time to explore one scenario, which would it be? Would you feel compelled to use more data to imply value that would only be achieved at large scale? Would you favour use cases that used more data in order to justify the broad scope of previous data collection efforts?
Constraint of fitting uses to branding
A third challenge might be constraining use cases that matched the branding of an organisation. Suppose you discover a use case that produces insights that could be valuable in an area outside your organisation’s known product lines or areas of expertise. Would such a use case dilute the value of a brand? Would the brand support an organisation’s competition in a new area? Could you sell the results to another organisation whose brand did support it?
Mock Valuations of Big Data could Reduce Analytical Debt of Assessing Future Value
Organisations which choose to do a valuation of their Big Data assets will need a systematic approach that provides credible estimates of future economic benefit. Broad data collection efforts for growing those assets may present challenges of doing valuations with limited resources. The dimensions of scale by which Big Data collections grow could present such a large set of use case scenarios that effort may be needed to triage them.
Some form of iterative mock valuation may help rationalise data provisioning in a way that makes portfolios of data sets have a more coherent economic value. Even if a mock valuation did not result in a specific economic value, the process of doing it would likely provide a useful reminder in asking why data are being collected, what data should be removed and what additional data would likely provide value.
The Next Article
In my first article on Big Data valuation, I outlined why data valuation activities were important for data driven organisations. There is no specific standard valuation, but there are standards for IP Valuation. I took a course on IP valuation with the goal of trying to relate it Big Data. In this article, I’ve reviewed the benefit of doing a systematic valuation of data to estimate its future economic benefit.
My next article will be entitled ‘Big Data Valuation: The Value of Synergy’. In it, I will begin to explore the anatomy of IP Value for an intangible good, and what the implications of DIN 77100’s view of what characterises IP value means for data.
 Wurzer, Alexander. Certified University Course IP Valuation 1. IP Business Academy. https://ipbusinessacademy.org/certified-university-course-ip-valuation-i
 Patent Valuation — General principles for monetary patent valuation. English translation, DIN 77100:2011–5. https://www.beuth.de/en/standard/din-77100/140168931
 The essential elements of a Data Lake and Analytics solution. Amazon Web Services. https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/#:~:text=Data%20Lakes%20allow%20you%20to,structures%2C%20schema%2C%20and%20transformations