Cross-Lingual Knowledge Transfer for Clinical Phenotyping
Schematic demonstration of our approaches. We compare cross-lingual with monolingual approaches. For the knowledge transfer we use sequential transfer learning starting from Mimic (high-resource dataset). We distinguish between cross-lingual encoders, cross-lingual encoders plus adapters and English domain-specific encoders with prior translation.


Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. It is especially beneficial for the automatic diagnosis of diseases that up to now do not occur often at a geographic location as at another or when the population e.g. like in  Greece is relatively small. Here, we investigated cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. We evaluated these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains such as cardiology, oncology and the ICU. We examined two different cross-lingual knowledge transfer strategies and found that both perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for small datasets.

Article link: