ai-powered-affiliation-extraction
AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences
https://github.com/ehsanbitaraf/ai-powered-affiliation-extraction
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords
Repository
AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences
Basic Info
- Host: GitHub
- Owner: EhsanBitaraf
- License: cc0-1.0
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://ebooks.iospress.nl/doi/10.3233/SHTI250474
- Size: 31.9 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences
This repository contains detailed bibliometric analyses of MIE Conferences to be published at MIE 2025.
Table of contents
- License
In this repository, we have used the conference MIE dataset and transformed the affiliation information into a structured form through a pipeline designed with langflow (Fig. 1) using the gpt-3.5-turbo-0125 model. We used a prompt for this purpose and received the outputs in json form. Then we analyzed it
Fig 1. The pipeline of langflow

Results
All output is generated using the Python programming language and is available here.
General Information
This section provides a quantitative overview of the dataset analyzed, including the total number of publications, authors, citations, average citations per publication, and the diversity of contributing countries and institutions. It establishes the scope and scale of the research landscape covered by the bibliometric study
| Index | Value | | --- | --- | | Total Publications | 4606 | | Total Authors | 11308 | | Total Citations | 6191 | | Average Citations | 1.34 | | Total Countries | 95 | | Total Universities | 352 | | Total Unclean Universities | 1553 |
Publication and citation trend
The top 10 most cited works of literature
This part highlights the most influential articles by citation count, showcasing key contributions that have had significant impact within the field.

Annual and Cumulative Publication Trends
This visualization tracks how the number of publications has changed over time, both annually and cumulatively, revealing patterns of growth and periods of increased research activity.

Articles with No Citations vs At Least One Citation by Year
This chart compares the number of uncited articles to those with at least one citation each year, offering insight into the visibility and influence of conference outputs over time.

Trends in Citation Patterns and Future Predictions
This section analyzes how citations accumulate annually and cumulatively, and may include projections to anticipate future citation trends, helping to assess the evolving impact of the field.

Authors
Top Authors by Articles and Citations
This section lists the most cited authors, identifying key contributors and research leaders whose work has shaped the fields development.

Countries
This section examines the geographical distribution of research output and impact, showing which countries contribute most to publications and citations, and how their roles have evolved over time. It also visualizes collaboration patterns and research productivity through various charts and maps


Percentage of Annual Publications by Top 10 Countries
This visualization displays the share of annual publications from the leading countries, illustrating shifts in research leadership and international engagement.

Bubble chart to visualize the top 10 countries
These bubble charts provide a comparative, visual representation of the top publishing countries, making it easy to spot dominant players and emerging contributors.

Citation per Article Index by Country
This section compares countries based on the average citations per article, highlighting differences in research impact and influence.

Heatmap of Top 10 Country Co-occurrence
The heatmap illustrates collaboration intensity among the top countries, revealing international research networks and partnerships.

Number of Articles geomap
This map visualizes the global distribution of published articles, offering a spatial perspective on research activity.

Countries Collabration
This network analysis file and visualization show how often countries collaborate, mapping the structure of international research cooperation

Institute
This section presents data on research output and citation impact at the institutional level, allowing comparison of the most productive and influential institutes in the field.

University
Here, the focus narrows to universities, showing their publication and citation metrics, as well as their collaboration networks, often visualized using network analysis tools like VOSviewer.


Keywords
This section analyzes the most frequently used keywords in publications, revealing major research topics, emerging trends, and thematic evolution over time. Network visualizations further illustrate how topics are interconnected within the field


Data Availability
After modifying the MIE Dataset and using LLM, a new structure called structural_affiliations was added to the previous dataset, which contains the following fields. The final dataset can be found here.
structuralaffiliations fields sample: ```json "structuralaffiliations": [ { "country": "", "institute": "", "department": "", "university": "", "city": "", "postalcode": "", "email": "", "Status": "", "universityf": "" } ] ```
Citation
If you use this article or the dataset in a scientific publication, we would appreciate references to the following paper:
Biblatex entry:
latex
@article{bitaraf-2025,
author = {Bitaraf, Ehsan and Jafarpour, Maryam},
journal = {Studies in health technology and informatics},
month = {5},
title = {{AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences}},
year = {2025},
doi = {10.3233/shti250474},
url = {https://doi.org/10.3233/shti250474},
}
Contributors
Please see our contributing guidelines for more details on how to get involved.
License
This Repository is available under the CC0-1.0 license.
Owner
- Name: Ehsan Bitaraf
- Login: EhsanBitaraf
- Kind: user
- Website: linkedin.com/in/ehsan-bitaraf-34aa28247
- Repositories: 2
- Profile: https://github.com/EhsanBitaraf
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
AI-Powered Affiliation Insights: LLM-Based Bibliometric
Study of European Medical Informatics Conferences
message: >-
If you use this dataset, please cite it using the metadata
from this file.
type: dataset
authors:
- given-names: Ehsan
family-names: Bitaraf
email: ehsan.bitaraf@gmail.com
affiliation: >-
Rajaei Cardiovascular Research Institute, Iran
University of Medical Sciences, Tehran, Iran
orcid: 'https://orcid.org/0000-0002-6588-7349'
- given-names: Maryam
family-names: ' Jafarpour'
email: maryam.jafarpoor@gmail.com
affiliation: >-
Center for Medical Data Science, Medical University of
Vienna, Vienna, Austria
orcid: 'https://orcid.org/0000-0001-7266-5018'
repository-code: >-
https://github.com/EhsanBitaraf/ai-powered-affiliation-extraction
url: 'https://ebooks.iospress.nl/doi/10.3233/SHTI250474'
abstract: >-
This study employs Large Language Models (LLMs) to analyze
bibliometric data from European Medical Informatics
conferences from 1996 to 2024. By enhancing traditional
methods with LLM-based techniques, the researchers
significantly improved affiliation extraction accuracy.
The analysis reveals trends in publication volume, author
impact, and institutional collaborations across Europe.
Key findings include the identification of leading
contributors, visualization of collaboration networks, and
mapping of geographical and institutional centers of
excellence. The study highlights the potential of LLMs in
bibliometric analysis, offering deeper insights into
research trends and collaborations while addressing
challenges in data standardization and computational
resources.
keywords:
- Artificial Intelligence
- LLM
- Bibliometrics
- Medical Informatics
- affiliation parsing
license: CC-BY-NC-4.0
GitHub Events
Total
- Push event: 10
Last Year
- Push event: 10