An Investigation into Operational Issues for AI-Driven Medical Devices using Open FDA Medical Device Reports

32 min readDec 2, 2024

1 Introduction

As populations age and as healthcare systems become more strained and costly, governments are showing more interest in AI-based services that support aspects of healthcare. For example:

· US: the Biden administration expressed they are “…committed to the highest urgency on governing the development of AI safely and responsibly to drive improved health outcomes for Americans…”

· UK: the NHS Long Term Plan has prioritised investments in artificial intelligence to drive NHS digital transformation. It lists the potential use of AI to interpret CT and MRI images to support clinical decision support.

· EU: the European Parliament’s Artificial Intelligence in Healthcare, the study notes: “The potential for the application of AI in the clinical setting is enormous and ranges from the automation of diagnostic processes to therapeutic decision making and clinical research…AI will play a major role in tasks such as automating image analysis (e.g. radiology, ophthalmology, dermatology, and pathology) and signal processing (e.g. electrocardiogram, audiology, and electroencephalography).”

As health data infrastructures and machine learning techniques evolve, opportunities for deploying AI-based medical services in healthcare settings continue to grow. As of August 7, 2024, the FDA had approved over 950 AI/ML enabled medical devices for use. We are entering an era when many of these diagnostic services are being integrated into platforms that are meant to inform and be informed by large amounts of data, either about a single patient or by large cohorts of patients.

For many of the services, their roles they play in supporting clinical decisions make them a kind of medical device that can be classified as Software in a Medical Device (SiMD) (e.g. a pace maker) or Software as a Medical Device (SaMD) (e.g. an standalone medical imaging application that can support disease diagnosis) [MHRA-SaMD, FDA-SaMD, FDA-SiMD].

AI-based software that is classified as a medical device will be subject to rigorous scrutiny by various device regulation authorities such as the FDA, the MHRA and the EMA. As AI algorithms continue to migrate from research environments into field use, it will become more important for the makers of these systems to appreciate what kinds of issues their products may encounter while they are in use.

One way to appreciate the issues is to learn more about software development for medical devices. I would highly recommend Philip Cosgriff’s book “Writing In-House Medical Device Software in Compliance with EU, UK and US Regulations”. He does an excellent job of outlining what regulations apply to the activity and common issues that occur in that activity.

Another way to explore the issues is to observe what kinds of issues are reported during the operation of AI-driven medical devices. Collection of medical device reports can provide valuable anecdotes about what happens when devices appear to present problems in the context of an adverse health event. The focus of this investigation is on this second approach. Through the activity, I have learned a lot about the data quality of medical device reports, the challenges in identifying those devices that appear to use AI, and various issues related to resolving whether medical devices are causing adverse health outcomes.

2 The aim and goals of the investigation

The aim of this investigation is to gain insights about the kinds of issues that may be associated with AI-driven medical devices that are deployed in healthcare settings. Three goals support this aim:

· Developing a set of programming scripts that can retrieve medical device reports that describe events recorded in the field

· Develop search criteria that can select reports that are most relevant to the topic of AI-driven medical devices

· Identify qualitative trends in the reports

In support of the first goal, I have developed an open-source code base that is designed to extract medical device reports from the Open FDA’s Device API service. Different regulatory bodies around the world maintain these kinds of reports, but I chose to use Open FDA because it provided an easy-to-use interface that would support complex, programmatic analysis. The scripts can be run to capture the latest updates to the Open FDA collection, but I have based my analysis on a specific snapshot of results.

Readers who are interested in running or adapting the code can check-out the code base. Otherwise, those who are not interested in the code can instead look at downloading the specific snapshot results. They are labelled so they can be referenced within this article. For example, the reference [ALG-DIAG-036] refers to the 36th result that appears in the result file ‘open_fda_algorithms_diagnostic_results.html ‘. The file captures extracts of medical device reports which contain ‘algorithm’ and either ‘diagnostic’ or ‘diagnosis’. In this article, the hyperlinked reference [ALG-DIAG-036] will take you to the specific original record provided by Open FDA.

3 Understanding Medical Device Reports in Open FDA and MAUDE

open FDA, launched by the US FDA in June 2014, provides a collection of Application Program Interfaces (APIs) that support easy public access to collections maintained for drugs, devices, food, tobacco, animal and veterinary, and other categories. Most relevant to this investigation is the Open FDA’s Device API, which provides access to at least 20,278,386 adverse event reports. Adverse event reports are submitted “…to the FDA to report serious events or undesirable experiences associated with the use of a medical device.” [OpenFDA-Device]. The FDA’s definition of a medical device is broad, and can include tongue depressors, bedpans, antigen test kits, and complex electronic devices that may rely on software.

Open FDA Device API’s data set is ultimately derived from Manufacturer and User Facility Device Experience (MAUDE), a dataset which contains “…medical device adverse event reports submitted by mandatory reporters — manufacturers, importers and device user facilities — and voluntary reporters such as health care professionals, patients, and consumers.” [OpenFDA-Device]. The collection contains reports dating between 1992 and the present, and it is updated weekly.

By far the majority of reports that appear in Open FDA API are manufacturer reports that are submitted to comply with FDA regulations. As the main page for the MAUDE database indicates:

“In accordance with 21 CFR Part 803, manufacturers and importers must submit reports when they become aware of information that reasonably suggests that one of their marketed devices may have caused or contributed to a death or serious injury or has malfunctioned and the malfunction of the device or a similar device that they market would be likely to cause or contribute to a death or serious injury if the malfunction were to recur. Manufacturers must send reports of such deaths, serious injuries and malfunctions to the FDA. Importers must send reports of deaths and serious injuries to the FDA and the manufacturer, and reports of malfunctions to the manufacturer.”

4 Limitations of data quality in Open FDA Device reports that shape expectations of the investigation

Open FDA provides a wealth of information about medical device usage, but there are important limitations that need to be appreciated. Open FDA Device API provides the most important warning about the quality of its reports [OpenFDA-Device]:

“Adverse event reports submitted to FDA do not undergo extensive validation or verification. Therefore, a causal relationship cannot be established between product and reactions listed in a report. While a suspected relationship may exist, it is not medically validated and should not be the sole source of information for clinical decision making or other assumptions about the safety or efficacy of a product.”

There are good reasons to limit the expectations of what reading medical device reports can tell us about AI-driven medical device behaviour. Some of these reasons are described later in the Findings section of the report:

· the reporting sources such as manufacture and voluntary reports reflect bias that may allude to underlying legal tensions.

· many medical device reports repetitively describe the same event.

· searching for reports that specifically describe AI-driven medical devices can be difficult.

· missing data, poor quality reporting data and poor quality device data can make it difficult to make meaningful conclusions about the what device issues may actually be causing problems.

In light of these limitations and Open FDA’s own warnings about data quality, I’ve opted to emphasise a qualitative inspection of medical device reports rather than attempt to develop statistics about trends in how they’ve been worded or described.

5 Providing evidence for the investigation

The serious nature of some of these events has resulted in injury or death. Out of respect for patients, clinicians and device maker, I’ve limited my discussion to identifying trends but try not to pass any judgements about blame or responsibility of adverse outcomes whether specific devices have caused a reported problem. For the investigation, I thought it was important to develop a body of query results which were repeatable and traceable to the original medical device reports which the Open FDA has made publicly available. If you think I’ve somehow missed context from the report fragments I’ve cited, I want you to be able to go back to the original source of the reports and draw your own conclusions.

I have developed a GitHub code repository which searches Open FDA Device API’s collection of medical device reports based on certain search criteria (see Fig. 1). It has been written in Python, using the integrated development environment PyCharm.

***Fig. 1****: The ai_medical_device_investigation GitHub code repository.*

Each time the program runs, it generates a new time-stamped result folder that contains the results that can provide insights about medical device reports (see Fig. 2).

***Fig. 2****: Exploring the sample run of search results.*

For each type of search query, both HTML and XLSX results are produced. The exception is for open_fda_algorithm_results; the script will generate both kinds but in the example run folder, the HTML file is is too large to manage in the repository.

Each result in the files is labelled and in the HTML files you can click on “Open Source Query” to be see the original query result page that was used to make the generated report (see Fig. 3).

***Fig. 3****. Within the HTML files, clicking the original query that generated each result will cause a browser window to open, featuring the original Open FDA Device API query.*

Within all the other square bracketed references that appear in this investigation, I have hyperlinked the result number to an Open FDA API query that will show the whole medical device report from which an extract was taken.

6 Crafting a meaningful search query for Open FDA Device data

6.1 Search results for ML and AI were ambiguous

Initially, I tried to do a case-insensitive search for medical device reports that contained ‘ ML ’ or ‘machine learning’ or ‘ AI ’ or ‘artificial intelligence’ in any field. However, an unmanageable 248582 results were returned. A case insensitive search just for ‘ AI ‘ yielded at least 5270 results, but for some results, AI stood for aortic insufficiency, analog interface, augmentation index. The same kind of search for ‘ ML ‘ yielded 243346 results, but many results showed ‘ML’ standing for millilitre and sometimes mediolateral.

I opted to develop searches that returned far fewer results so that I could inspect them by eye. I evolved the scripts to return all results and then filter them based on a flag setting adverse_event_flag=’Y’, which indicates that a report describes an incident where “…the use of the device is suspected to have resulted in an adverse outcome in a patient”.

6.2 Search results for machine learning or artificial intelligence were too few

One of the queries that I used for the investigation was a case-insensitive search for reports that mentioned either ‘machine learning’ or ‘artificial intelligence’ in any field. The search yielded 67 results, of which 56 were suspected to have resulted in an adverse outcome in a patient. I was surprised that given that the FDA has approved over 950 AI-driven medical devices, that only these few results were returned from a report collection of millions.

As Lyell notes: “The methodology for identifying events involving ML devices was challenging. First, the FDA neither reports whether devices utilize ML nor is it possible to search the free text of the FDA approval documents. Second, while published lists of ML devices exist, most do not report any method for confirming ML utilization by the devices identified and therefore cannot be considered gold standard. To overcome these limitations, we searched MAUDE for reports about adverse events involving the ML medical devices that have been identified by previous studies. The results from the MAUDE search were screened to ensure they involved devices utilizing ML and then analyzed.”

Perhaps in future, new fields will be added to reports managed by the Open FDA Device API such that there is an indication if the device is AI-driven. Lyell adopted a more thorough but perhaps more labour intensive means of identifying device reports that described devices that used ML. I decided to broaden my investigation, partly because I began to conclude that the nature of AI-driven devices may share most of the same concerns as devices that used software but were not AI-driven.

6.3 Search results for ‘algorithm’ were ambiguous and too many

Another query I briefly considered for analysis was a case-insensitive search for all medical device reports that mentioned the term ‘algorithm’. However, the search returned an unmanageable 26704 reports. In order to avoid making a more complicated iterative search through results, I used the way of forming URL queries that applies to searches that return no more than 25000 results. The main use of the results for this query was to make use of the the generated spreadsheet to gather some basic field value frequencies rather than the much longer HTML document.

Most of my analysis was based on results returned from a case-insensitive search for reports that mentioned ‘algorithm’ and either ‘diagnostic’ or ‘diagnosis’. The search would likely include many results that were not related to devices that used AI-driven software. For example, many reports that involved antigen assays make mention of an ‘algorithm’ that refers to paper-based instructions rather than software. Consider text extracts from these reports:

· “there is no malfunction as when the sample was tested with the confirmatory assay per the package insert testing algorithm” [ALG-DIAG-249]

· “per product labelling: “repeatedly reactive samples must be confirmed according to cdc recommended confirmatory algorithms. the subresults for either (b)(6) or (b)(6) can be used as an aid in the selection of the confirmation algorithm for reactive samples.” [ALG-DIAG-036]

· “their standard-of-care algorithm identified streptococcus salivarius/vestibularis in subculture with identification by maldi-tof” [ALG-DIAG-265]

6.4 Search results for ‘algorithm’ and either ‘diagnosis’ or ‘diagnostic’ supported manual inspection

Despite the likelihood that the search would capture reports about medical devices that did not rely on AI-driven software, the query would yield a manageable number of results that could provide some insights. The search for ‘algorithm’ and either ‘diagnosis’ or ‘diagnostic’ yielded at least 1367 results, of which only 425 were suspected as having resulted in an adverse outcome in a patient. Results from the run used to provide results for this article are shown in Fig. 4.

**Fig. 4.** Summary of search results generated from running the scripts on November 22, 2024 at 21:20:24. The available reports are what are stored in GitHub and that were used for this report.

7 Findings

7.1 Litigious tones of evidence gathering

7.1.1 Assertions made through voluntary reports

It may be useful to examine the behaviour of software-based medical devices in a spectrum that begins with the most serious perceived device issues. We begin with an extract from a voluntary report that suggests there is an aspect of artificial intelligence in a MEDPOR porous polyethylene implant:

“…a neuromonitoring device (medpor®) manufactured by stryker corporation was placed underneath my eye after an intoxicated male attacked me with brass knuckles in a blind-sided incident… i have discovered it is more than just a simple piece of plastic designed to prop my eye up. after referencing the manufacturer’s website, turns out it is a neurotechnological implant, much more sophisticated than outlined by dr. (b)(6). this device has shown me first-hand its customizable features are granting others’ advantages. it gives the control group esp, voice synchronization paired with artificial intelligence, a controllable way to dispatch a “dizzy-sensation” [AI-56].

Assuming I have found the correct product page, I have found no mention of artificial intelligence being used in the product. Next, we can consider another voluntary report featuring a patient who is recalling an explanation about the device behaviour from someone else who examined the device:

“i was told by the person who interrogated the icd that it decided to shock me as artificial intelligence may do, and i was found not to be in a life-threatening situation at the time the device shocked me.” [AI-55]

This example is noteworthy for two reasons. First, either the patient, the other person examining the device or both are assuming the software is using an artificial intelligence approach in the software. Second, the model mentioned in the wider report, “st jude icd model cd2231–40q”, is listed in a device recall notice described by the Hannon Law Firm. The attention these reports can garner from law firms is a reminder of why many of the reports that comment on device behaviour are carefully worded.

It seems that the most strongly worded claims of device malfunction come from voluntary reports. The reporter of this event asserts the presence of serious problems in the device design:

“…since this pt death, our engineers have studied this particular electrocardiograph machines interpretive algorithm and have found evidence of a crucial error in the design of the interpretive algorithm. we have also discovered serious and faulty interpretive criteria for the algorithms diagnosis of wolff-parkinson-white type a and b.” [ALG-DIAG-421]

In another voluntary report, the reporter warns of a perceived risk of injury to the patients:

“ekg system does not isolate pt from the risk of electrical shock. ekg platform is not electrically isolated from the pt. exposes pt to risk of voltage transients from the mains as well as the computer power supply. also system introduces an excessive amount of electrical noise into the software algorithm, making diagnostic quality tracings very difficult to obtain” [ALG-DIAG-419]

The vast majority of reports are manufacturer reports, some of which appear to note issues in the algorithms:

“based on an extensive investigation conducted, the likely contributing cause for failure to restart was the inability of the pump-start algorithm to provide sufficient torque to overcome abnormally high mechanical resistance caused by unknown conditions that existed prior to the failed restart attempt” [ALG-DIAG-218]

7.1.2 Disclaimers made in manufacturer reports

At this point in our analysis, it is worth noting the caveats that manufacturers may attach to their own reports. Consider that further in the report just mentioned, the manufacturers add this lengthy piece of text:

“medtronic is submitting this report to comply with fda reporting regulations under 21 cfr parts 4 and 803. this report is based upon information obtained by medtronic, which the company may not have been able to fully investigate or verify prior to the date the report was required by the fda. medtronic has made reasonable efforts to obtain more complete information and has provided as much relevant information as is available to the company as of the submission date of this report. this report does not constitute an admission or a conclusion by fda, medtronic, or its employees that the device, medtronic, or its employee caused or contributed to the event described in the report. in particular, this report does not constitute an admission by anyone that the product described in this report has any ‘defects’ or has ‘malfunctioned’. these words are included in the fda 3500a form and are fixed items for selection created by the fda to categorize the type of event solely for the purpose of regulatory reporting. medtronic objects to the use of these words and others like them because of the lack of definition and the connotations implied by these terms. this statement should be included with any information or report disclosed to the public under the freedom of information act.” [ALG-DIAG-218]

Two other manufacturer reports echo a more compact disclaimer:

· “…this report does not constitute an admission by irhythm that the product described in this report has any defects or has malfunctioned.” [ALG-DIAG-231].

· “this report does not reflect a conclusion by fda, depuy synthes or its employees that the report constitutes an admission that the device, depuy synthes, or its employees caused or contributed to the potential event described in this report.” [ALG-DIAG-159].

7.1.3 Evidence of assertions to show there is not a causal link between device and outcome

Some manufacturer reports make the effort to get explicit statements from physicians involved with events to assert the device did not cause the problems. Consider these examples of evidence gathered from people commenting on the devices:

· “the physician’s opinion, on what contributed to the adverse event is that it was the patient’s unique anatomy. physician did not feel acclarent technology was defective or responsible for the injury.” [AI-38]

· “additional information received from the site on 09-april-2019 clarified that the implanted flow diverter did not cause the patient’s dissection.”[ALG-DIAG-277]

· “additional information provided by a co-author of the article suggests the postoperative complications were not mesh-related. as reported per co-author, “the wound problem was caused by patient behavior or smoking and i believe it would have happened anyway and it was not mesh related.” [AI-01]

· “additional information received from the corresponding author indicates that the manufacturer products were not related to any of the deaths in the study. furthermore, there were no allegations or complications noted against the products. no patient complications have been reported as a result of this event.” [ALG-01375]

Thus far, these examples show how the type of reporter (e.g. voluntary report, manufacturer report) may reflect a bias on issues of perceived fault and liability. Again, it is worth recalling that Open FDA API is explicit in its heading of “Responsible use of the data”: “Adverse event reports submitted to FDA do not undergo extensive validation or verification. Therefore, a causal relationship cannot be established between product and reactions listed in a report.” [OpenFDA-Device]

7.2 Limitations of algorithm application

7.2.1 Inappropriate application of device

Some of the manufacturer reports remark that the device was being used in the wrong context. The most prominent example is where the manufacturer gives a reminder that the device should not be used to make a diagnosis on its own:

“physician notifications and ecg report(s) does not provide diagnosis; rather, it provides preliminary findings from which the clinician may make a diagnosis based on clinical judgement and patient clinical history.” [ALG-DIAG-229]

There are some cases where a medical device has been used in an inappropriate health condition:

“alivecor is a lead-i mobile ecg that is not intended to detect an infarct. the device labeling specifies that the device does not detect heart attack” [ALG-DIAG-279]

In other cases the medical device may not be appropriate for the severity of the health condition:

“it is indicated for use on patients 18 years or older who may be asymptomatic or who may suffer from transient symptoms such as palpitations, shortness of breath, dizziness, light-headedness, pre-syncope, syncope, fatigue, or anxiety. the reports are provided for review by the intended user to render a diagnosis based on clinical judgment and experience. it is not intended for use on critical care patients.” [ALG-DIAG-228]

7.2.2 Sensitivity issues with devices

Some applications of the device may technically be within the scope of allowable use but show the limits of what the device can detect.

“this very weak reaction was not detected by the reader of the instrument, because it is at the limit of detection of the instrument…since not enough points have been detected to consider this reaction as “+/-”. for this reason, the image has been classified as negative by the instrument… therefore, once again cannot be confirmed that the reaction could be attributed to the incorrect result but to the underlying condition of the patient” [ALG-DIAG-174]

Another manufacturer report provides a reminder that the sensitivity of the device is not 100% correct:

“the performance characteristics detailed in the product ifu do not claim 100% sensitivity or specificity, therefore there is a known low level of possible false negative and false positive results. therefore in line with contemporary diagnostic algorithms for suspected dvt/pe, high probability pts are not safe to exclude from further testing based on the results of d-dimer tests” [ALG-DIAG-197]

One manufacturer report suggested a potential algorithmic sensitivity without commenting on any quantitative value of correctness:

“the leading cause of the misclassified arrhythmia is attributed to a potential algorithm sensitivity. it was determined that the algorithm should have detected atrial flutter. the account stated that the patient presented to the hospital in atrial flutter. based on the available information, it is inconclusive whether or not the patient’s hospitalization was related to the atrial flutter.” [ALG-DIAG-230]

For some devices, the sensitivity is something that can be adjusted through re-programming the software:

“it was reported that the patient went to the emergency room after receiving a shock for t wave oversensing (twos) on the right ventricular lead. the device algorithm did not withhold detection because the r waves were large. after the shock was delivered the algorithm identified the twos and properly withheld. the sensitivity was reprogrammed and the device and lead remain in use.” [ALG-DIAG-150]

Another report shows how sensitivity problems were overcome through reprogramming:

“it was reported that during an unrelated hospitalization, the patient experienced syncope due to oversensing of the diagnostic testing associated with the corvue algorithm and resulting in pacing inhibition. the device was reprogrammed to resolve the event. the patient is in stable condition.” [ALG-DIAG-042]

7.2.3 Other limitations of algorithmic capabilities

One manufacturer report tried to qualify an algorithm limitation without quantifying it much:

“engineering determined that the navigational difficulty was due to a software algorithm limitation that failed to segment the patient anatomy correctly. the algorithm limitation lead to the poor virtual view rending near the target. a review of the device history record (dhr) was performed. there are no reports of nonconformance that relate to the reported incident.” [ALG-DIAG-284].

7.4 Misinterpretations of result

7.4.1 Misinterpretation of device instructions

In one manufacturer report, the reporter provides a reminder of the consequences if the person using the device fails to fully follow the instructions:

“freeze detect is an integral part of the coolsculpting system and is automatically employed when a treatment is initiated. failure to follow instructions could result in injury to the patient, including first- or second-degree burns. second-degree burns or complications of second-degree burns may result in hypopigmentation” [ALG-DIAG-239]

7.4.2 Misinterpretations of results: responsibities of a device versus healthcare staff using a device

Some manufacturer reports identify the source of problems as people using the devices rather than the devices themselves. Indeed each report is characterised by the types of relevant product problems, which include ‘incorrect interpretation of signal’. Consider that in this report, the manufacturer states: “the misclassified arrhythmia was not caused by the algorithm, but rather by an error in interpretation by the certified cardiographic technician” [ALG-DIAG-231]

However, one voluntary report describes the challenge of apportioning responsibility for making a good interpretation:

“midmark’s ecg system is responsible for producing a reliable and accurate waveform to the clinical team to interpret, first and foremost. in cases where an internal firmware / hardware error that causes the waveform morphology to aberrant. i am much more concerned as the clinician cannot interpret the clinical situation accurately in those cases. in this case, the waveform is not in doubt, only the interpretation we provide. but even the interpretation is not necessarily ‘wrong’. in a purely classical and binary sense, this tracing did not fully qualify this pt as meeting the criteria for ischemia. “ [ALG-DIAG-424]

One manufacturer suggests that device users require a certain level of competence to avoid misinterpreting any results:

“in conclusion, both the default imar protocols and the user manual, remind the experienced user that imar image series should be read in conjunction with a respective non-imar image series to avoid misinterpretation.” [ALG-DIAG-189]

Some manufacturer reports indicate that ultimately, the decision to use a result resides with the clinicians:

“the issue was investigated by a cross team where it was determined that the device was working within specifications and was an acceptable risk. a computer-interpreted ecg report is not intended to be a substitute for interpretation by a qualified physician. the interpreted ecg is a tool to assist the physician in making a clinical diagnosis in conjunction with the physician’s knowledge of the patient, the results of the physical examination, and other findings. the algorithm helps to identify problem areas for the physician and saves time for the physician or editing technician who may only need to add, delete, or modify a few statements.” [ALG-DIAG-020]

7.5 Inadequate results

Another report does not describe a misinterpreted result but an inadequate one for supporting a course of action. The manufacturer report describes how the result of applying an ecg algorithm was not specific enough to support a decision that a healthcare organisation wanted. An extract of the text indicates:

“it was reported that the customer wants to be able to alias ecg interpretation statements at on the tc50, so that when the er or urgent care provider sees the ecg print out, they will see “st changes” instead of “st elevation” which activates a stemi protocol. the result is that a number of patients have been sent to the cath lab, with some actually making it to the table, for no reason. patient had an unnecessary invasive cardiac procedure… a follow up report will be submitted once the investigation is complete..there is no product malfunction. device was confirmed to be operating per specifications and displayed the appropriate interpretation by the algorithm. customer has an enhancement request for the automated interpretive statement on the top of the report printout to read st changes rather than st elevation because due to their hospital’s workflow they must initiate a stemi protocol if st elevation is stated on the report.” [ALG-DIAG-026]

7.6 Outdated or unapproved software algorithms

Another source of issues in software-enabled medical devices is that parts of the software may be outdated or not formally approved. For example, consider: “(b)(6) was loaded with a software containing an unapproved pump start algorithm. as a result, the reported events were confirmed.” [ALG-DIAG-227].

Another report, describing an event that happened in 2023, mentions an old software update that may have helped provide better results:

“a software update was released in 2015 to further improve consistency of the impedance test results and provide the clinician with additional diagnostic tools, including programmable shock lead impedance alert limits” [ALG-DIAG-084]

7.7 Poor quality results

Various reports described characteristics of the event that led to poor data quality. For one event involving an ecg reading: “further investigation revealed that the algorithm classified this event as a pause due to poor signal quality.” [ALG-DIAG-229]. A voluntary report described the problems device users had with signal noise: “also system introduces an excessive amount of electrical noise into the software algorithm, making diagnostic quality tracings very difficult to obtain” [ALG-DIAG-419]

ECG devices can have a variety of problems with signal artifacts that can distort the results. In one voluntary report about an ECG device, the reporter identifies artifacts which they believe may have influenced an adverse outcome:

“inconsistencies in the analyzed arrhythmia waveform due to the presence of the unipolar ventricular pacemaker artifacts lead to termination of the treatment algorithm. the vt eventually degenerated into ventricular fibrillation and the pt expired. in this case, we found there to be two potential explanations for the treatment failure. both possibilities involve misinterpretation of unipolar pacing artifacts by the wad, a potential interaction which the co identifies clearly in its literature on the device.” [ALG-DIAG-420]

In the area of medical imaging, the presence of metal prosthetics can distort the images that are created from radiation sources:

“metal strongly attenuates the x-ray beam. hence, image quality can be degraded significantly by photon starvation and/or beam hardening. imar is a correction algorithm that is intended to recover image information that is lost because of the metal objects. imar can reconstruct information in the image, especially when it is located further away from the metal object. regions that are mostly surrounded by metal (such as the inner area of the acetabular component of the hip implant) cannot be recovered without artifacts.” [ALG-DIAG-189]

7.8 Missing information

7.8.1 Specific devices identities not known

Many of the manufacturer reports are responses to journal articles that describe a study where adverse outcomes have occurred, in a context where the manufacturer’s devices may have been used. The reporters often indicate that the research papers often do not provide enough information about individual patients, or associate individual patients with specific medical devices. Consider these examples:

· of note, multiple patients were noted in the article; however, a one to one correlation could not be made with unique product serial/lot numbers. the baseline gender/age characteristics is male/67 years old. without a lot number or device serial number, the manufacturing date cannot be determined. since no device id was provided, it is unknown if this event has been previously reported.[ALG-DIAG-092]

· “multiple patients and multiple manufacturers were noted in the article; however, a one to one correlation could not be made with unique manufacturer/device serial numbers.”[ALG-DIAG-203]

When this information is missing, it becomes difficult to fill in Medwatch forms, such as Section E: Suspect Medical Device, as shown in the figure below:

Fig. 5. Section E: Suspect Medical Device, part of the MedWatch FDA 3500 form.

7.8.2 Devices not returned to manufacturer for further analysis

Another cause of missing information is due to the medical devices that are used during the event are not returned to the manufacturer for testing and diagnostic activities. In cases of implanted devices, it can seem understandable that returning them may be difficult:

“product event summary: the device remains implanted in the patient and is thus not available for return to the manufacturer. with a review of the available information there is no evidence to indicate any device malfunctions or performance issues that would impact the reported events.” [ALG-DIAG-204]

Without a device being returned, a lot of information to support root cause analysis can be difficult:

“the results of the investigation are inconclusive since the device was not returned for analysis. based on the information received, the cause of the reported incident could not be conclusively determined.” [ALG-DIAG-261]

7.8.3 Failed attempts to gather more information from complainants

In some reports, manufacturers indicate that important context information was not provided by the device customer:

“the customer reported that upon arrival to evaluate a pt, the device displayed a heart rate of 20 bpm and alarmed for asystole. based on this information, and at the first responder’s discretion, no treatment was delivered at that time. subsequent to the incident, no documentation of the corresponding clinical rational has been provided to philips.” [ALG-DIAG-019]

In some reports, the reporters indicate describe their attempts to get additional important information:

· “this complaint is from a literature source. the following complications were reported in this publication: 9 patient underwent catheter ablation of atrial fibrillation and suffered cardiac tamponade. no additional details were provided. multiple requests for clarification have been sent to the corresponding author, but no additional details were provided at this time.” [ALG-01044]

· “alivecor followed up with the customer 4 separate times to get additional information but was not successful.”[ALG-DIAG-279]

7.8.4 Data withheld due to data privacy concerns

Some manufacturer reports note some patient information was deliberately withheld by hospitals or physicians.

· “ge healthcare’s investigation is ongoing. a f/u report will be submitted once the investigation has been completed. due to pt privacy protocol, the hosp will not release pt data” [ALG-01281].

· “medtronic was made aware of this event through a search of literature publications. this event occurred outside the us. patient information is limited due to confidentiality concerns. select patient information cannot be included in regulatory report due to regional privacy regulations.” [ALG-01395]

· “the cause of death was not provided by the hospital; and any pre-existing conditions are unk, and will not be provided by the hospital due to privacy policies. the hospital is not forthcoming with any hospital documentation related to the event. the customer is unwilling to reveal any details regarding the pt, pre-existing conditions, testing, treatment or timeline of events. therefore, no determination can be made regarding potential cause or contribution of biomerieux bact/alert or fn culture bottle products to the pt death.” [ALG-01781]

· “ge healthcare’s investigation is ongoing. a follow-up report will be submitted after the investigation has been completed. patient data not provided due to country privacy laws. initial reporter data not provided due to country privacy laws.”[ALG-19019]

The lack of this kind of information can make it difficult for manufacturers to fill out some parts of the MedWatch FDA 3500 Form, which is part of the FDA Safety Information and Adverse Event Reporting Program:

**Fig. 6.** Section A: Patient Information, part of the MedWatch FDA 3500 form.

7.8.5 Device data was lost, over-written or not recorded

In some scenarios, data about an event could be captured by the medical device but is not stored long-term. Consider this report where ecg data themselves have not been captured but the paper graphs made from it are:

“according to the hospital contact, the site does not have ecg management software, nor do they save ecg files on the device’s internal memory. the only technical readouts saved by the site are in the form of an ecg paper print, which is subsequently scanned into the medical record system. therefore, it is not possible to evaluate why the algorithm used did not correctly generate the interpretive statement of prolonged qt interval.” [ALG-DIAG-025]

In some cases, data captured by a device is somehow lost: “upon device interrogation, loss of capture was observed in several episodes recorded in the device memory.” [ALG-DIAG-011].

In at least two reports, it seems that the device has overwritten data due to limited memory:

· “upon further review of the missing episode mentioned in the complaint, ts confirmed the episode was overwritten by the device. since the device did not have the most current software installed, this may have caused the device to retain the overwritten episode. this is not indicative of a malfunction of the device.” [ALG-DIAG-076]

· “the physician requested a technical service (ts) analysis to determine whether these events were inappropriate. however, due to the data storage algorithm, these events were over-written and the electrocardiogram (egm) data was unable to be reviewed.” [ALG-DIAG-082]

7.8.6 Old reports are missing new report data fields

The MAUDE database which provides Open FDA Device API with records has reports that date back to 1992. Over the years, the fields captured in its reports have changed. For example, most reports will have a field ‘date_of_event’ which was added in 2006 and means “Actual or best estimate of the date of first onset of the adverse event’. However, this field is not present in the initial submission voluntary report [ALG-DIAG-419], which appears to have happened in 1995.

7.9 Importance of audit trails provided by device log files

Within the generated reports, I looked for terms like ‘log’ and ‘download’ to gather insights about the data generated by devices and their use in assessing an event. I make a distinction between data files and log files but acknowledge that in some context, ‘download data’ could refer to both files that contain generated device data and log files may focus more on recording a chronology of activities.

Log files can reveal rich context information for an event:

· “review of the log files associated with (b)(6) revealed two electrical fault alarms logged on 17-may-2019, one (1) electrical fault alarm logged on 21-may-2019, and four (4) electrical fault alarms logged on 01-feb-2020; all of the alarms were due to an overcurrent condition on the front stator, resulting in the pump running on the rear stator only.” [ALG-DIAG-207]

· “response button use observed in the download data after the shock from 19:50:06 to 19:50:10. it is unclear who was pressing the response buttons. the device declared a non-treatable rhythm at 19:50:54. to the patient’s rhythm was asystole transitioning to severe bradycardia at 10 bpm from approximately 19:53:45 until the electrode belt disconnection at 20:01:08 on 11/2/2020. the patient reportedly passed away on (b)(6) 2020.” [ALG-DIAG-272]

· “review of the patient’s download data revealed that the patient received a total of 12 treatments from the lifevest.” [ALG-DIAG-268]

· “review of the patient’s downloaded flag file confirmed that the device declared a treatable arrhythmia and subsequently delivered a treatment defibrillation. [ALG-DIAG-267]

· “the zio at device was returned, and the and the clinical data was downloaded. a review of the clinical data found that the patient wore the at device for 9 days of the 14-day prescribed wear period.” [ALG-DIAG-232]

· “ review of the download data indicates that the patient experienced an asystole event, which is considered a non-life sustaining, non-treatable rhythm.” [ALG-DIAG-266]

One report cautions on the limits of what a log file can tell: “note that log data cannot be used to confirm the occurrence or cause of a pneumothorax.” [ALG-DIAG-168]. For some reports, even if the log files cannot recreate the conditions of irregularity, they can record that one did happen: “the reported event of “inadequate flow rate” could not be replicated but was confirmed via review of the controller log files which indicates a decrease in estimated flow.” [ALG-DIAG-199]

Their use can provide evidence that the device was operating as expected:

· “upon receipt at our post market quality assurance laboratory, a thorough evaluation of the device was performed. a review of the device memory log found no irregularities” [ALG-DIAG-076]

· “the log files of the instrument provided by the customer were analysed and no evidence of any malfunction was found.” [ALG-DIAG-174]

· “android log files from the complaint device were uploaded to the cloud system by the user and downloaded for investigation. inspection of the android log files found no evidence of issues with insulin delivery.” [ALG-DIAG-182]

· “the download data from the received device does not contain any timeouts or pulse width increases that would indicate a failure to deliver insulin” [ALG-DIAG-184]

· “a review of the log files by the clinical application specialist (cas) concluded that the pic ix system and x3 monitor behaved as intended.” [ALG-DIAG-414]

The absence of generated device data can also erode evidence presented in a complaint: “additionally, the reported low flow alarms could not be confirmed as no log files were submitted for evaluation.” [ALG-DIAG-151]

8 Inconclusive analyses due to poor quality information

A general trend I observe with many of the medical device reports is that they often lack enough information to come to a definitive conclusion about the relationship between an adverse event outcome and the behaviour of a specific device:

· “current information is insufficient to permit conclusions as to the cause of the events. event details and product identification was not provided for the patients mentioned in the journal article. the following sections could not be completed with the limited information provided. date of event — unknown. catalog number, lot number and expiration date — unknown. date implanted — unknown.” [ALG-DIAG-002]

· “this report is for an unknown. part and lot number are unknown. without the specific part number; the udi number and 510-k number is unknown. complainant part is not expected to be returned for manufacturer review/investigation. concomitant medical products: unknown. without a lot number the device history records review could not be completed. product was not returned. based on the information available, it has been determined that no corrective and/or preventative action is proposed…this complaint will be accounted for and monitored via post market surveillance activities….”[ALG-DIAG-028]

· “customer has indicated that the product will not be returned to zimmer biomet for investigation. reported event was unable to be confirmed due to limited information received from the customer. device history record (dhr) review was unable to be performed as the lot number of the device involved in the event is unknown. root cause was unable to be determined as the necessary information to adequately investigate the reported event was not provided.” [ALG-DIAG-001]

· “without the return of the actual product involved, our investigation could not proceed.”[ALG-DIAG-013]

· “the contribution of the device to the reported event could not be determined as the device was not returned for evaluation. the root cause of the event could not be determined from the information available and without device evaluation” [ALG-DIAG-027]

· “since the product was not returned for analysis, no product failure analysis can be conducted, and no determination of possible contributing factors could be made.” [ALG-DIAG-063]

9 Further Insights from examining categories of ‘product problems’

Thus far, the evidence I’ve reported has been based on searching for specific search terms that I hope increase the likelihood that a search result describes the behaviour of an AI-driven medical device. Taking a broader view of just considering issues that might be related to an algorithm or might be software related, there may be useful insights gained by looking at the categories of medical device ‘product problems’ that tag each medical device report. If you look at the open_fda_algorithms_results.xlsx file in the code repository, you can see many of the categories. Useful ones include:

· device operates differently than expected

· incorrect interpretation of signal

· inaccurate information

· failure to capture

· over-sensing

· patient data problem

· signal/artifact noise

· application program problem

· dose calculation error

· parameter calculation error

· computer software problem

· under-sensing

· invalid sensing

10 Concluding thoughts

My aim for this investigation was to gain insights about what issues AI-driven medical devices might be associated in the field. One of the best sources for insights that I’ve identified is the collection of medical device reports that is made accessible via the Open FDA Device API. However, there are multiple data quality issues that make it difficult to tightly bind various problems and issues to those medical devices which are specifically driven by AI.

Regulatory bodies are becoming more interested by the prospects of using AI in medical devices and they acknowledge that their aspects of development may warrant special consideration. Reporting on issues related to this specific family of devices will become more meaningful if medical device reports are enhanced to have a field that specifically indicates their AI-based nature.

On reflection of the aim, I reminded myself that teams doing work on AI-driven medical devices will have to consider all the other issues that are associated with devices that use algorithms, regardless of whether those algorithm rely on machine learning or not. I adapted my investigation to choose search queries that could work as proxies for identifying the most relevant reports that could be related to my focus. I made the most use of these search criteria:

· Reports that contained the term ‘algorithm’

· Reports that contained the term ‘machine learning’ or ‘artificial intelligence’

· Reports that contained the term ‘algorithm’ and either ‘diagnosis’ or ‘diagnostic’

To make the task of gathering search results repeatable and traceable, I developed and published an open-source code repository. I’ve released it under the MIT open-source license and hope it encourages others to make use of Open FDA resources.

The most surprising thing I found in analysis is just how few medical device reports actually mention AI-related terms. Most of the reports that mention it are describing a manufacturer responding to a new journal article whose title contains ‘artificial intelligence’ or ‘machine learning’. Considering the large collection of reports I reviewed by eye, it became clear that the reports were not merely reporting events, but events which could have legal consequence and involve sensitive tasks of attributing responsibility to manufacturers, clinicians and patients.

Understanding the complex nature of a medical device’s expected behaviour within specifically prescribed use case limits is challenging. Relating that context to a specific event adds yet more challenge, especially considering that many of the reports seem to reflect poor or missing data. I’m hopeful that the quality of these reports will improve to provide more detailed insights that could benefit AI development teams who are wondering what working on medical devices is like.