Big Data Valuations: A Pause in a Journey of Learning

14 min readJan 18, 2022

Disclaimer: I am not an IP lawyer, an economist or an accountant. This series of articles on the topic of data valuation captures opinions I developed from taking an online course on IP Valuation. They are not meant to represent the views of my employers, past or present. The purpose of this article concerning technical, legal or professional subject matter is meant to foster more dialogue about the subject and does not constitute legal advice.

I was inspired to write a set of articles on data valuation when I read that the accountancy profession has had difficulty recognising data as an asset on a balance sheet. Having spent most of my career building or working with knowledge infrastructure, I was led me to wonder: How do you place a value on Big Data assets?

I knew little about how valuations worked, so I enrolled in a course on IP Valuation through the IP Business Academy [1]. The valuation course is meant to cover all intellectual assets, but focused on supporting the EU standard DIN 77100: ‘General principles for monetary patent valuation’ [2]. My goal in taking the course was to gain better insights about how concepts and methods used to support IP valuations would apply to valuing Big Data assets.

A Reluctance to Automatically Equate Patent Valuation with Data Valuation

Throughout the course it seemed obvious that however IP valuations worked for patents, they would likely work for other forms of IP such as trademarks, copyrights and geographical indicators. However, I’ve long hesitated to draw too many parallels between the idea of a patent and the idea of a data set. Both are intangibles, but they differ in important ways.

“Patent” has a widely understood meaning that is reflected in international laws [3][4]. “Data” does not [5]. Patents have widely been recognised by the accounting profession as an asset that can appear on a balance sheet. Data has not [6][7][8]. The cost of obtaining a patent presumes the intellectual material it protects would generate some future financial benefit. In contrast, the intellectual material in data may sometimes be viewed as a kind of ever-cheapening form of inventory that is subservient to value creation in better understood asset classes.

Where the economic value of intangible assets has been clearly appreciated, established IP rights follow. A patent describes both a source of intellectual material and a set of well-known exclusivity rights that underwrite the credibility of forecasts for generating future economic benefit. A data set is just the intellectual material and a thing in search of an IP right. Currently, organisations tend to protect data by relying on an assortment of options that include copyright, sui generis database rights and trade secrets. There are no sui generis laws that protect data per se, and the prospect of having one will continue to generate lively debate.

The predictable structure, static content, and rigorous process of differentiating patents from prior art fosters at least some kind of comparability amongst them. For data sets, their varied structure, their ability to rapidly evolve across multiple dimensions and their lack of prior art availability make their comparability more difficult. Data sets may evolve in the dark and become unique by intent or circumstances at multiple phases of their lifecycles. Often, the unique contributions that data sets make to organisations may make them poorly comparable for valuation activities.

Even when Big Data sets support comparability with others, we need to consider how the intent of their development contributes to compelling valuation scenarios. Whereas the original intent of a patent may be well preserved in its subsequent applications, the re-use potential of data sets may be more opportunistic and less anticipated.

This can lead to different perceptions of long-term risk for realizing economic value. If we create a patent for a new kind of water pump and make water pumps, that economic scenario seems predictable and perhaps it would bring modest gains in an established market. Now consider if we create data about vehicle GPS signals in order to improve fuel economy. Later on, we end up combining it with an emerging database of grocery store locations. The value proposition may be less predictable but bring lucrative gains in new markets.

The Value of Mock Valuations

Although data and patents have a lot of important differences, the main machinery of IP valuations seems applicable for both. In both cases, the purpose of a valuation, the goal of doing it, the role of the evaluator and the audience for it would all critically shape the valuation activity. The recognition of IP Value as an alignment of a valuation object, a scenario for generating future economic benefit and a set of complementary goods required to support that scenario would all apply [1].

The major approaches that could be used for comparing IP values would remain the same but their emphasis or use in combination may differ. These approaches would include [1]:

· direct comparisons, which rely on determinants of value

· indirect comparisons, which rely on indicators of value

· economic effect comparisons, which rely on expenses (cost approach), transaction data (market approach) or risks (income approach)

At its most basic level of consideration, the act of trying to value a portfolio of Big Data sets seems warranted and beneficial. We have moved beyond the era when the value of a data set could be regarded purely through its utility in supporting a single well-articulated question and any IP products the answer would provide. Regardless of how well that view actually worked in practice (especially in research), we are now forced to acknowledge the character of Big Data sets. They mutate, split or merge with a life of their own that seems worthy of a starring role rather than a supporting role in valuations.

A growing marketplace of vendor technologies has allowed us to collect data and grow it in multiple dimensions that can quickly exceed our ability to fully comprehend its potential use. And it is at this point when we should pause and reflect on the basic essence of why we’re creating data — just like we would have to consider why we would continue to foster any other kind of asset. Valuation efforts force us to consider how an asset would be used to generate some kind of economic benefit. For data grown in large dimensions, the main asset of interest is Actionable Data, not Big Data.

Once data grows in a way that exceeds our comprehension, extra magnitudes of scale cease having value. Big Data valuations should prevent audiences from coursing through phases of awe, confusion, apathy and finally skepticism. The era of Big Data has given us size fatigue that needs to be remedied by reminding us of its more concrete use cases.

In doing valuations, we must strike a compromise between trying to rationalise every piece of data we collect and allowing large amounts of data to collect in ways which encourage new insights that would otherwise seem unlikely to be harvested. This is because the value of a Big Data asset resides in three sources:

· The certainty that parts of it can be used in expected ways to support some established use case

· The hope that parts of it may be used in unexpected ways to support unforeseen use cases

· The expectation that once it achieves one or more dimensions of large scale, it will produce insights that would not be able to come from smaller scaled data

What an Actual Big Data Valuation Might Look Like

My goal for taking a course on IP Valuation was to gain more insights about how Big Data assets could be valued in practice. I did not set out to discover a formula, but I have evolved some ideas of how it might be done.

Establish the valuation context. Identify the cause of a valuation, its goal, the role of the valuator and the audience for the valuation report [1]. The context will help identify data sets that could appear in a portfolio and begin to shape the exploitation scenarios that could make them realise a future economic benefit. Note here that ‘exploitation scenario’ does not carry a negative connotation. It is a technical term that appears in the patent valuation standard and means a: ‘concrete exploitation of the patent whose expected future financial benefit forms the basis for valuation’ [2].

Establish provenance and do a due diligence exercise. Ensure you’re able to link Big Data assets with any contracts or regulations that may limit their use. This provides a check that would identify liabilities and would limit exploitation scenarios. Gathering provenance about how and why the data sets were created lends credibility to their future uses.

Where appropriate, organise data sets into databases. This step is most relevant for structured data sets. The selection of data sets and how they would be related can lead to coherent use cases and simplifies portfolios of many data sets to fewer databases. Effort used to choose which data sets go in which collection would likely warrant sui generis database rights, where they can be enforced.

Imagine your data assets ageing through their life cycles. Whereas a patent will likely remain static throughout its useful life, data sets will evolve and age through their own life cycles. Try to imagine what those are, and consider how they may grow, split, merge or obsolesce. This can make exploitation scenarios seem more compelling. Stander provides an excellent discussion of valuation tied to data asset life cycles [9].

Demonstrate a use case that involves minimal asset content to establish a concrete use case. We want to connect with the audience of a valuation to show the simplest way it could reasonably generate future economic value. We want to promote a minimum certainty of value.

Imply a use case that could combine data across themes in the future. We want to hint at use cases that may not be evident now but may be evident in the future. Instead of describing how specific features would be combined, describe generally how themes of data could be combined to support emerging markets. Prefer showing how portfolio data sets having one theme may one day combine with emerging external data sets having another theme.

Imply a use case that could combine multiple data sources in the same portfolio. We want to convey the synergistic value that may develop amongst portfolio items. Imply uses by suggesting how features in one

Imply a use case that could be ethically supported in some other market. Provide a compelling scenario to show how data originally created to support use cases in one market could yield data products that could be applied in another market.

Show that data has a large enough dimension to support specific analyses. Show there could be an expectation that some aspect of scale of the Big Data asset could support analyses that may not be supported with smaller data assets. For example, we may try to demonstrate that a data set has enough people represented in it to support some kind of statistical analysis.

Describe data sets in terms of domain-specific determinants and indicators of value. We want to find ways of reducing the detail and complexity of data sets so that they are easier to compare. We can do this by describing the data in terms of characteristics that either determine or indicate value.

Determinants could include attributes that show how data sets comply with appropriate regulation. Assertion of compliance guarantees a minimum value but implying they may not be compliant would indicate they are liabilities.

Indicators can be drawn from multiple sources that were described in the previous article. These could include metrics for Big Data “V”s like volume, variety and velocity. They could include their ability for a Big Data dimension to support certain analyses. They could also include domain-specific indicators (e.g.: desirable qualities for a financial data set or qualities for a biological data set).

Describing assets in terms of these more abstract attributes means that they will appear simpler, more comparable, and less volatile as they evolve.

Imply future value through comparisons that rely on market specific determinants and indicators of value. As I learned in the course, comparisons of valuation objects based on attributes are weak because they tend to be subjective and do not actually base value on an economic scenario [1]. Nevertheless, abstracting the details of data sets to simpler value indicators and determinants may provide the only opportunity to practically compare them.

The Future of Data Valuations

Now that I have finished the course I’ve speculated a lot on how data valuations may be supported in the future. I need to make another reminder that these predictions are purely based on my own opinion and they may not happen.

Big Data valuations will be legitimized by value estimates derived from a systematic approach

As data-driven organisations proliferate, data sets will continue to gain legitimacy as intellectual assets in valuation activities. Accountancy standards will change to encourage organisations to be able to better report data as capital assets. More organisations will begin to regard their data holdings as assets more than just inventory.

A growing recognition of data sets as assets will encourage new approaches to valuing data. The current cost, market and income methods are all applicable for data valuation, but they seemed to have been developed in an era that emphasised manufactured goods. They may have to adapt to recognise a volatile life cycle than the more static nature of IP products such as patents.

As the fervour of discussion about the value in Big Data continues, more people will want to see a more systematic and reasoned approach for estimates of future economic value. IP Valuation standards such as DIN 77100 [2] will continue to provide useful guidance for framing data asset valuations.

Data valuations will need to practically deal with potentially large portfolios of data sets, many of which may not have initially been created to generate a future economic benefit. Mock valuation exercises may help organisations rationalise data collection efforts. In future, the perceived value of Big Data provisioning efforts may lie more in the way they discriminate desirable data rather than accommodate any data.

Data valuations will need to be designed so they do not overwhelm an audience with so much material that people cease to find value in greater dimensions of it. The scenarios will have to convey the right balance of values drawn from perceptions of certainty, hope and expectation.

Current forms of IP will continue to provide awkward protection for data

In valuation scenarios, the ability to exclude other parties from using the intellectual material underwrites the credibility of forecasts of future economic value [1]. If data sets can be easily copied, regenerated or adapted, it will make it difficult to be confident that they will have a predictable value.

To that end, current IP rights will likely continue to provide an awkward form of protection for exclusive use [10][11]. Copyright will unlikely cover many forms of data being generated from the growing network of devices in the Internet of Things [11]. Patents will continue to protect data only in a limited set of circumstances [12][13]. Sui generis database rights will continue to hold effect in only certain parts of the world [14] and they may become less applicable to large collections of unstructured data [10].

A future sui generis data right will continue to warrant discussion, but I get the impression that it will be unlikely to have widespread support in the future. Apart from debates on whether data should warrant its own IP right, there is an area of complication to assess the impact a ‘data IP’ right would have on existing IP instruments [11].

If the landscape of current and future IP rights that might protect data seems brittle, organisations may decide it is easiest to protect them through trade secrets. Indeed for many organisations, trade secrets represent the most valuable form of IP [15]. Protecting data sets through a trade secret mechanism has the advantage that they do not need to satisfy criteria of other forms of IP and its protection period is unlimited so long as the secret information is not disclosed [10] They can grow behind closed organisation walls until they achieve a state that would be worthy of a more traditional IP protection.

However, trade secrets incur their own risks and a steep maintenance cost. The main risk of a trade secret is that once secret data are legitimately disclosed by an organisation or similar material is published by other organisations, the secret will cease having protection. The maintenance costs would be reflected in an Information Security Management System (ISMS) [16]. The ISMS would be used to help data managers classify trade secrets and limit their dissemination throughout an organisation.

Data management skills will become necessary to create value in Big Data asset valuations

One of the most important predictions I can make is that data management skills will become essential for Big Data assets to attain significant value. There are at least two reasons: tracking data assets through their life cycles makes them meaningful to use and restricting their access supports trade secret protections. In both cases, data management will increasingly align with an interest in intellectual asset management. The field will also become increasingly multi-disciplinary, and involve a mix of interests from intellectual property, finance, security, computing and other domain-area specialisms.

A market place of value determinants and indicators will evolve

The tendency for data sets to become unique and difficult to compare will encourage data owners to describe their assets in terms of attributes that either directly determine or indicate value. Because of the inherent subjectivity of determining what attribute would make a data asset valuable, communities will form which appreciate a common set of attributes. Likely these communities will reflect economic sectors, areas of knowledge or markets. These attributes will support weak comparisons that may lack the clout of economic effect comparisons but may enable comparisons to happen at all.

Conclusions

My experiment of learning about data valuation by taking an IP valuation course has made me look at data assets in a new way. I can now once again listen to seminars and discussions about the value of Big Data. I’m looking forward to finding ways of adding more value in data management.

Articles in the Series

References

[1] Wurzer, Alexander. Certified University Course IP Valuation 1. IP Business Academy. https://ipbusinessacademy.org/certified-university-course-ip-valuation-i

[2] Patent Valuation — General principles for monetary patent valuation. English translation, DIN 77100:2011–5. https://www.beuth.de/en/standard/din-77100/140168931

[3] Paris Convention for the Protection of Industrial Property, WIPO, https://www.wipo.int/treaties/en/ip/paris/

[4] Patent Law treaty (PLT), WIPO, https://www.wipo.int/treaties/en/ip/plt/

[5] Borgman, C. L. (2016). Big data, little data, no data: Scholarship in the networked world. MIT press.

[6] Collins, Virginia. (2019, June) Managing Data as an Asset. Retrieved from: https://www.cpajournal.com/2019/06/24/managing-data-as-an-asset/

[7] Laney, D. B. (2017). Infonomics: how to monetize, manage, and measure information as an asset for competitive advantage. Routledge.

[8] Lam, J., Tan, G., (Jul 2019) Should data be recognised as an asset on a balance sheet? https://www.acutus-ca.com/2019/07/31/should-data-be-recognised-as-an-asset-on-the-balance-sheet/

[9] Stander, J. B. (2015). The modern asset: big data and information valuation (Doctoral dissertation, Stellenbosch: Stellenbosch University). https://scholar.sun.ac.za/handle/10019.1/97824

[10] Debussche, Julien, Cesar, Jasmien.Big Data and Issues and Opportunities: Intellectual Property Rights (Mar 2019). https://www.twobirds.com/en/news/articles/2019/global/big-data-and-issues-and-opportunities-ip-rights

[11] Barczewski, Maciej. Value of information: intellectual property, privacy and big data. №7. Peter Lang Publishing Group, 2018

[12] Ahmad, Imran, Stepin, Nikita, Chou, Patrick, Suliman, Suzie. Where Data Meets IP (Sept 2021). https://www.dataprotectionreport.com/2021/09/where-data-meets-ip/

[13] Overview: the TRIPS Agreement. World Trade Organisation. https://www.wto.org/english/tratop_e/trips_e/intel2_e.htm

[14] Sharing Research Data and Intellectual Property Law: A Primer

https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002235

[15] Schaller, W. (2018) Trade Secret Law: The Role of Information Governance Professionals. https://repository.law.uic.edu/cgi/viewcontent.cgi?article=1446&context=ripl

[16] Chagoya, H. (Oct 2015). Trade secrets: risk management in an open-innovation environment, IAM website, https://www.iam-media.com/trade-secrets-risk-management-open-innovation-environment