Get started

Install biblioflow and run your first bibliometric workflow.

Installation

Install the core library from PyPI:

pip install biblioflow

The core package includes common file readers plus API clients for OpenAlex, Crossref, Scopus, PubMed, and PubMed Central.

Install application packages only when you need them:

pip install biblioflow-web
pip install biblioflow-nb
Note

The applications depend on the core biblioflow package. New analysis features should be implemented in the core library first and then exposed by the web and notebook interfaces.

Load bibliographic records

import biblioflow as bf

dataset = bf.load(
    "records.ris",
    source="auto",
    format="auto",
)

bf.load() returns a BibliographicDataset containing normalized records, raw records, metadata, warnings, and errors.

len(dataset)
dataset.metadata
dataset.warning_dicts()
dataset.to_records()[:3]

Supported input formats

Format/source Typical extension Notes
RIS .ris Common export format from reference managers and scholarly databases.
BibTeX .bib Common for LaTeX and citation managers.
Scopus CSV .csv Scopus headers such as EID, Authors, and Source title are detected.
Web of Science plain text .txt Clarivate savedrecs.txt exports with multiline/repeated tags.
OpenAlex JSON .json Single works, lists, or API responses with results.
Crossref JSON .json Single works, lists, or work-list responses with message.items.
CSV/TSV .csv, .tsv Useful for spreadsheets and custom exports.
JSON/JSONL .json, .jsonl Supports generic records and provider payloads.
XML .xml Includes PubMed XML support.
NBIB .nbib PubMed/NLM text export format.
PubMed/PMC API remote Search live PubMed and PubMed Central with pymedx.

See Sources and import formats for the complete source guide.

Source inference

Use source="auto" and format="auto" for convenience, or pass explicit values when working with known exports:

dataset = bf.load("openalex.json", source="openalex", format="json")
dataset = bf.load("pubmed.xml", source="pubmed", format="xml")
dataset = bf.load("crossref.json", source="crossref", format="json")
dataset = bf.load("savedrecs.txt", source="wos")
dataset = bf.load("scopus.csv", source="scopus")

Provider-aware adapters normalize database-specific field names into the canonical biblioflow schema.

provider= remains supported for compatibility with older examples, but new documentation uses source=.

API imports

File loading and API querying are separate entry points:

openalex = bf.from_openalex(search="bibliometrics", limit=100)
crossref = bf.from_crossref(query="science mapping", limit=100)
pubmed = bf.from_pubmed(
    query="bibliometrics AND reproducibility",
    limit=100,
    email="researcher@example.org",
)
pmc = bf.from_pmc(
    query="open science",
    limit=50,
    email="researcher@example.org",
)

For PubMed and PubMed Central, you may also set BIBLIOFLOW_NCBI_EMAIL and optionally BIBLIOFLOW_NCBI_API_KEY instead of passing contact details in each call.

Scopus API imports require pybliometrics configuration and access:

scopus = bf.from_scopus(
    query="TITLE-ABS-KEY(bibliometrics)",
    limit=100,
)

Descriptive analysis

summary = bf.analyze(dataset, top_n=20)
summary.to_dict()

The descriptive summary includes high-level information, annual production, top authors, top sources, and top keywords.

Matrix and network construction

keyword_matrix = bf.matrix(
    dataset,
    kind="co_occurrence",
    unit="keywords_all",
    normalize="association",
    min_occurrences=2,
)

keyword_network = bf.network(
    dataset,
    kind="co_occurrence",
    unit="keywords_all",
    normalize="association",
    min_occurrences=2,
)

Export results

bf.export(dataset, "records.json", format="json")
bf.export(keyword_matrix, "keywords.csv", format="csv")
bf.export(keyword_network, "keywords.graphml", format="graphml")

Supported network exports include JSON, GraphML, GEXF, Pajek, and VOSviewer-style edge lists.

Command-line interface

The command name is biblioflow:

biblioflow --help
biblioflow validate records.ris
biblioflow analyze records.ris --top-n 25 -o summary.json
biblioflow matrix records.ris -o keywords.csv --kind co_occurrence --unit keywords_all
biblioflow network records.ris -o network.graphml --to graphml
biblioflow search pubmed --query "bibliometrics" --email you@example.org -o pubmed.json

Do not use or document a hyphenated command alias.