ai-powered-affiliation-extraction

AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences

https://github.com/ehsanbitaraf/ai-powered-affiliation-extraction

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

affiliation-networks bibliometric-analysis bibliometric-network bibliometric-visualization llm
Last synced: 6 months ago · JSON representation ·

Repository

AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
affiliation-networks bibliometric-analysis bibliometric-network bibliometric-visualization llm
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences

This repository contains detailed bibliometric analyses of MIE Conferences to be published at MIE 2025.

MIE 2025 Conference

Table of contents

- License

In this repository, we have used the conference MIE dataset and transformed the affiliation information into a structured form through a pipeline designed with langflow (Fig. 1) using the gpt-3.5-turbo-0125 model. We used a prompt for this purpose and received the outputs in json form. Then we analyzed it

Fig 1. The pipeline of langflow

Fig 1. The pipeline of langflow

Results

All output is generated using the Python programming language and is available here.

General Information

This section provides a quantitative overview of the dataset analyzed, including the total number of publications, authors, citations, average citations per publication, and the diversity of contributing countries and institutions. It establishes the scope and scale of the research landscape covered by the bibliometric study

| Index | Value | | --- | --- | | Total Publications | 4606 | | Total Authors | 11308 | | Total Citations | 6191 | | Average Citations | 1.34 | | Total Countries | 95 | | Total Universities | 352 | | Total Unclean Universities | 1553 |

Publication and citation trend

The top 10 most cited works of literature
This part highlights the most influential articles by citation count, showcasing key contributions that have had significant impact within the field.

Annual and Cumulative Publication Trends
This visualization tracks how the number of publications has changed over time, both annually and cumulatively, revealing patterns of growth and periods of increased research activity.

Articles with No Citations vs At Least One Citation by Year
This chart compares the number of uncited articles to those with at least one citation each year, offering insight into the visibility and influence of conference outputs over time.

Trends in Citation Patterns and Future Predictions
This section analyzes how citations accumulate annually and cumulatively, and may include projections to anticipate future citation trends, helping to assess the evolving impact of the field.

Authors

Top Authors by Articles and Citations
This section lists the most cited authors, identifying key contributors and research leaders whose work has shaped the fields development.

Countries

This section examines the geographical distribution of research output and impact, showing which countries contribute most to publications and citations, and how their roles have evolved over time. It also visualizes collaboration patterns and research productivity through various charts and maps

Percentage of Annual Publications by Top 10 Countries
This visualization displays the share of annual publications from the leading countries, illustrating shifts in research leadership and international engagement.

Bubble chart to visualize the top 10 countries
These bubble charts provide a comparative, visual representation of the top publishing countries, making it easy to spot dominant players and emerging contributors.

Citation per Article Index by Country
This section compares countries based on the average citations per article, highlighting differences in research impact and influence.

Heatmap of Top 10 Country Co-occurrence
The heatmap illustrates collaboration intensity among the top countries, revealing international research networks and partnerships.

Number of Articles geomap
This map visualizes the global distribution of published articles, offering a spatial perspective on research activity.

Countries Collabration
This network analysis file and visualization show how often countries collaborate, mapping the structure of international research cooperation

VOSviewer

Institute

This section presents data on research output and citation impact at the institutional level, allowing comparison of the most productive and influential institutes in the field.

institutions_comparison_tables

University

Here, the focus narrows to universities, showing their publication and citation metrics, as well as their collaboration networks, often visualized using network analysis tools like VOSviewer.

VOSviewer

Keywords

This section analyzes the most frequently used keywords in publications, revealing major research topics, emerging trends, and thematic evolution over time. Network visualizations further illustrate how topics are interconnected within the field

VOSviewer

Data Availability

After modifying the MIE Dataset and using LLM, a new structure called structural_affiliations was added to the previous dataset, which contains the following fields. The final dataset can be found here.

structuralaffiliations fields sample: ```json "structuralaffiliations": [ { "country": "", "institute": "", "department": "", "university": "", "city": "", "postalcode": "", "email": "", "Status": "", "universityf": "" } ] ```

Repo Size

Citation

If you use this article or the dataset in a scientific publication, we would appreciate references to the following paper:

Biblatex entry:

latex @article{bitaraf-2025, author = {Bitaraf, Ehsan and Jafarpour, Maryam}, journal = {Studies in health technology and informatics}, month = {5}, title = {{AI-Powered Affiliation Insights: LLM-Based Bibliometric Study of European Medical Informatics Conferences}}, year = {2025}, doi = {10.3233/shti250474}, url = {https://doi.org/10.3233/shti250474}, }

Contributors

Please see our contributing guidelines for more details on how to get involved.


License

This Repository is available under the CC0-1.0 license.

Owner

  • Name: Ehsan Bitaraf
  • Login: EhsanBitaraf
  • Kind: user

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  AI-Powered Affiliation Insights: LLM-Based Bibliometric
  Study of European Medical Informatics Conferences 
message: >-
  If you use this dataset, please cite it using the metadata
  from this file.
type: dataset
authors:
  - given-names: Ehsan
    family-names: Bitaraf
    email: ehsan.bitaraf@gmail.com
    affiliation: >-
      Rajaei Cardiovascular Research Institute, Iran
      University of Medical Sciences, Tehran, Iran
    orcid: 'https://orcid.org/0000-0002-6588-7349'
  - given-names: Maryam
    family-names: ' Jafarpour'
    email: maryam.jafarpoor@gmail.com
    affiliation: >-
      Center for Medical Data Science, Medical University of
      Vienna, Vienna, Austria
    orcid: 'https://orcid.org/0000-0001-7266-5018'
repository-code: >-
  https://github.com/EhsanBitaraf/ai-powered-affiliation-extraction
url: 'https://ebooks.iospress.nl/doi/10.3233/SHTI250474'
abstract: >-
  This study employs Large Language Models (LLMs) to analyze
  bibliometric data from European Medical Informatics
  conferences from 1996 to 2024. By enhancing traditional
  methods with LLM-based techniques, the researchers
  significantly improved affiliation extraction accuracy.
  The analysis reveals trends in publication volume, author
  impact, and institutional collaborations across Europe.
  Key findings include the identification of leading
  contributors, visualization of collaboration networks, and
  mapping of geographical and institutional centers of
  excellence. The study highlights the potential of LLMs in
  bibliometric analysis, offering deeper insights into
  research trends and collaborations while addressing
  challenges in data standardization and computational
  resources.
keywords:
  - Artificial Intelligence
  - LLM
  - Bibliometrics
  - Medical Informatics
  - affiliation parsing
license: CC-BY-NC-4.0

GitHub Events

Total
  • Push event: 10
Last Year
  • Push event: 10