[back to the DMCM2025 Programme]
![]() |
Martin Krallinger, Leading Researcher, Life Sciences - NLP for Biomedical Information Analysis, Barcelona Supercomputing Center, Barcelona, Spain. |
“Martin Krallinger is currently the head of the NLP4BIA team at the BSC. He is an expert in the field of biomedical and clinical text mining and language technologies and has been working in this research field for more than fifteen years. He has developed biomedical language technology solutions to a variety of health-related application scenarios including drug-safety, biomaterials research, cardiovascular diseases, oncology, toxicology, occupational health among others. He has been particularly active in the evaluation and quality benchmarking of biomedical language technologies and LLM-based solutions through the generation and release of annotated benchmark datasets and academic shared task organization to assess and measure quality of tools for applications like automatic clinical concepts detection (entity recognition), concept normalization (entity linking), biomedical information extraction components, medical machine translation or automatic semantic indexing of large datasets of heterogeneous health content types. In this respect, he is one of the main organizers of BioCreative community assessment challenges for the evaluation of natural language processing tools in biomedicine and has been involved in the organization of biomedical text mining shared task in various international community challenges including IberEval, IberLEF, biomedical WMT and eHealth CLEF, BioASQ or BIONLP-ST. To address a key bottleneck influencing the development of robust biomedical NLP solutions, namely the lack of access to high quality annotated datasets his group has extensive experience in working on the creation and release of medical corpora and annotation protocols, building state of the art transformer-based NLP components trained on these high quality datasets by exploiting the high performance computational infrastructure offered by BSC.”
An important source of information relevant to understand and characterize diseases is in unstructured data of different types, including scientific literature, clinical records, trials, or even social media. Moreover, non-English data sources in particular clinical case reports or medical records are highly underexploited, despite recent advances in NLP and clinical LLMs. This talk will summarize some of the current work and results of clinical NLP solutions my group has been generating to enable a better understanding of clinical diseases information and how such strategies can be adapted across multiple languages. Some practical use cases related to cardiovascular diseases, occupational health-related conditions, disease-surveillance, rare diseases, and topical medicine will be presented. Moreover, in the era of LLMs, quality evaluation of clinical language technology is becoming crucial, and therefore community evaluation scenarios, shared tasks, and the importance of clinical experts in the loop of technical developments are necessary and will be discussed in this presentation. Finally, this talk will also present some use cases and different projects (DataTools4Heart, AI4HF, BiomatDB+, BARITONE and PATTERN) where clinical NLP systems are used for predictive and diseases modelling purposes.