Validation is a crucial step before a medical device (MD) is put on the market.
We can easily think that validating an artificial intelligence triage solution for radiology is an easy path: Yes/No. There is a single pathology: it doesn’t require a lot of work!
However, it is a step with heavy responsibility. Whether it is to comfort us, as a software developer, or to obtain certifications, such as FDA clearance or CE mark. Therefore, it’s an important regulatory requirement.
The goal is to prove to the medical community, and to the competent authorities, that our product is accurate, reliable, performant (perform as intended), fast, and above all: carries no risk for the patient!
This proof is made through statistical results, rigorously chosen, reliable, robust and appropriate. Which, we obtain through validation studies by testing our algorithms on a sample of real world medical images from the clinic. Furthermore, this sample is supposed to represent the targeted population. Of course, before we start the actual validation process, we need to know our MD at our fingertips – to get the facts.
3 important preliminary steps must be carried out:
Carry out a bibliographical research:
This is a very long and tedious part, which aims to define the state of the art relative to the device we want to validate. The points that will be addressed are:
Search and define similar/predicate devices:
a) Know their limitations/dysfunctions (to exclude them from our products). Or, if we observe the same limitations in our products during the validation phase, this will give us something to discuss.
b) Understand statistical performances of the competitors: in order to set a benchmark for the performance and effectiveness we wish to achieve with our device, or even surpass them.
– Appreciate the incidence, prevalence, target and at-risk populations (age, gender). In short, to know the epidemiology of the pathology targeted by our MD.
– This library will allow us to set up a solid validation protocol adapted to the MD to be validated by choosing the right methodologies and adequate and appropriate statistical methods.
Write the validation protocol (the Study Summary) according to the point quoted just before.
Constitute the database:
This step is also very important, and sometimes the longest. As we know, we will not be able to constitute a database representative of the whole targeted population. Nevertheless, and in view of the bibliography we have made, we can target some characteristics to get as close as possible to it.
Indeed, if we know that 90% of the pathology we are targeting affects an Asian male population of 70 years old and over, it’s clear that we are not going to look for data of Caucasian women under the age of 30. It seems obvious, but way less when we are under pressure to finalize the validation.
Another important point is to collect clinical data acquired in a large number of clinical sites (multi-center data), in order to cover as much as possible different populations and acquisition protocols.
Also, we need to set the performance we want to achieve. This can be the result of the bibliographic research we have done on similar devices. So, we can already decide not to perform less well than the competitor. Common sense will also push us to say that we do not want, for example, a sensitivity and specificity lower than 90%. Lastly, this is also an FDA request for sorting MDs.
These requirements will allow us to calculate the minimum sample size of our database: sample size calculation. This size is indicative only, it is a minimum value. It will inevitably increase according to the criteria we wish to explore during the validation.
The performance is first calculated on the totality of the data, but also on stratified ones:
1) Scanner makes:
We will try to obtain data from all the major scanner manufacturers. For each scanner, we want to have as many models as possible, with all the detectors rows available, etc.
Therefore, the more subgroups to be explored (validated), the more sample size for each group will have to be increased in order to have significant statistics.
Hence, the minimum sample size that we have calculated can increase quickly.
2) In addition, we will explore the acquisition protocol recommended in current clinical practice:
We will collect the data trying to get as close as possible to it. Such as aiming at the recommended slice thickness, radiation doses (kVp, mAs, etc.), and more.
3) We NEED to have equivalent portions in terms of sex (depending of the pathology and target population): ~50% for each group
4) Age too.
Always refer to the target population. Of course, we can have small samples in young populations (<20 years old) as long as we can justify that the targeted pathology does not affect this population.
5) …. And more parameters.
6) Also, this is where it gets more complex:
If you are aiming an FDA clearance, you should know that at least 50% of the data must be US. This could be not an easy task, especially if you want to apply IN ADDITION to all the selection criteria described just before.
7) Finally, the location of the pathology must be taken into account.
For example, the large vessel occlusion in the anterior circulation affects the M1 MCA, proximal M2 MCA, distal M2 MCA and ICA. So, the data with positive LVO must be equally distributed in these groups, for a stratification that holds the road. BUT, as we have done our homework (i.e bibliography), we know that the LVO affects the MCA more than the ICA, so we can justify a smaller sample for the latter category.
Here, I have only spoken of the preliminary parts.
Once, all these steps are done we start the actual validation.
The main difficulty of this part is to find radiologists specialized by type of pathology in order to carry out the validation. Whether it is a quantitative or qualitative validation or just to establish the ground truth.
For the FDA, the operators must be US board certified radiologists (for our indication of use). In addition, to establish the ground truth, a minimum of 3 physicians must be involved.
The validation process can be long, as it depends on the availability of the physicians, and the huge amount of clinical data they will assess .
When the statistical results are obtained, we validate (or not) the performance of our medical device.
If the statistics are not good (lower than the limit we set beforehand); if processing errors appear during the validation; if bugs, display or calculation problems are found; then, the validation stops and the software goes back into development!
Who still says that validating a triage software is easy?
At Avicenna.AI we are fortunate to have an experienced team, having already carried out such validations, in a changing and increasingly demanding environment.
This has allowed us to quickly put in place rigorous validation studies, required to obtain the FDA for our LVO (large vessel occlusion) and ICH (intracranial hemorrhage) triage software.
This expertise is also used for the construction/training of our algorithms. And, has allowed us to obtain applications with high performances and effectiveness, robust for a safe and accurate use. which is for us the key to the acceptance of AI algorithms in clinical routine.