Get started
Installation
Install the core library from PyPI:
pip install biblioflowThe core package includes common file readers plus API clients for OpenAlex, Crossref, Scopus, PubMed, and PubMed Central.
Install application packages only when you need them:
pip install biblioflow-web
pip install biblioflow-nbThe applications depend on the core biblioflow package. New analysis features should be implemented in the core library first and then exposed by the web and notebook interfaces.
Load bibliographic records
import biblioflow as bf
dataset = bf.load(
"records.ris",
source="auto",
format="auto",
)bf.load() returns a BibliographicDataset containing normalized records, raw records, metadata, warnings, and errors.
len(dataset)
dataset.metadata
dataset.warning_dicts()
dataset.to_records()[:3]Supported input formats
| Format/source | Typical extension | Notes |
|---|---|---|
| RIS | .ris |
Common export format from reference managers and scholarly databases. |
| BibTeX | .bib |
Common for LaTeX and citation managers. |
| Scopus CSV | .csv |
Scopus headers such as EID, Authors, and Source title are detected. |
| Web of Science plain text | .txt |
Clarivate savedrecs.txt exports with multiline/repeated tags. |
| OpenAlex JSON | .json |
Single works, lists, or API responses with results. |
| Crossref JSON | .json |
Single works, lists, or work-list responses with message.items. |
| CSV/TSV | .csv, .tsv |
Useful for spreadsheets and custom exports. |
| JSON/JSONL | .json, .jsonl |
Supports generic records and provider payloads. |
| XML | .xml |
Includes PubMed XML support. |
| NBIB | .nbib |
PubMed/NLM text export format. |
| PubMed/PMC API | remote | Search live PubMed and PubMed Central with pymedx. |
See Sources and import formats for the complete source guide.
Source inference
Use source="auto" and format="auto" for convenience, or pass explicit values when working with known exports:
dataset = bf.load("openalex.json", source="openalex", format="json")
dataset = bf.load("pubmed.xml", source="pubmed", format="xml")
dataset = bf.load("crossref.json", source="crossref", format="json")
dataset = bf.load("savedrecs.txt", source="wos")
dataset = bf.load("scopus.csv", source="scopus")Provider-aware adapters normalize database-specific field names into the canonical biblioflow schema.
provider= remains supported for compatibility with older examples, but new documentation uses source=.
API imports
File loading and API querying are separate entry points:
openalex = bf.from_openalex(search="bibliometrics", limit=100)
crossref = bf.from_crossref(query="science mapping", limit=100)
pubmed = bf.from_pubmed(
query="bibliometrics AND reproducibility",
limit=100,
email="researcher@example.org",
)
pmc = bf.from_pmc(
query="open science",
limit=50,
email="researcher@example.org",
)For PubMed and PubMed Central, you may also set BIBLIOFLOW_NCBI_EMAIL and optionally BIBLIOFLOW_NCBI_API_KEY instead of passing contact details in each call.
Scopus API imports require pybliometrics configuration and access:
scopus = bf.from_scopus(
query="TITLE-ABS-KEY(bibliometrics)",
limit=100,
)Descriptive analysis
summary = bf.analyze(dataset, top_n=20)
summary.to_dict()The descriptive summary includes high-level information, annual production, top authors, top sources, and top keywords.
Matrix and network construction
keyword_matrix = bf.matrix(
dataset,
kind="co_occurrence",
unit="keywords_all",
normalize="association",
min_occurrences=2,
)
keyword_network = bf.network(
dataset,
kind="co_occurrence",
unit="keywords_all",
normalize="association",
min_occurrences=2,
)Export results
bf.export(dataset, "records.json", format="json")
bf.export(keyword_matrix, "keywords.csv", format="csv")
bf.export(keyword_network, "keywords.graphml", format="graphml")Supported network exports include JSON, GraphML, GEXF, Pajek, and VOSviewer-style edge lists.
Command-line interface
The command name is biblioflow:
biblioflow --help
biblioflow validate records.ris
biblioflow analyze records.ris --top-n 25 -o summary.json
biblioflow matrix records.ris -o keywords.csv --kind co_occurrence --unit keywords_all
biblioflow network records.ris -o network.graphml --to graphml
biblioflow search pubmed --query "bibliometrics" --email you@example.org -o pubmed.jsonDo not use or document a hyphenated command alias.