# dbparser: from complex drug databases to reproducible R workflows

Canonical HTML: https://dsti.school/techblog/dbparser-pharmacological-data-r-package

This Markdown copy is generated from the same DSTI static-site build as the canonical HTML page. It is intended for machine readability and concise retrieval.

[DSTI TechBlog](https://dsti.school/techblog)  /  Alumni

Alumni Open-source research software

A student project became a maintained, peer-reviewed piece of research infrastructure. DSTI alumnus Mohammed Ali built `dbparser` to turn incompatible pharmacological databases into consistent R objects and reproducible integration workflows.

MA Mohammed Ali DSTI alumnus · Author and maintainer of dbparser · R and pharmacological data integration

23 Jun 2026 12 min read CRAN · rOpenSci · JOSS

R dbparser pharmacovigilance bioinformatics open-source reproducible-research

![Official dbparser package logo](https://dsti.school/assets/dsti-techblog-dbparser-logo.6ebb275baa.png)

## Three sources. One analysis model.

DrugBank, OnSIDES and TWOSIDES become consistent, traceable R objects.

DrugBank nested XML

OnSIDES relational CSV

TWOSIDES compressed interactions
dvobject one consistent drugverse object

3 supported databases

2.2.1 current CRAN release

2018 first CRAN release

2026 JOSS publication

Large pharmacological databases are valuable because they preserve complex relationships between drugs, targets, pathways, products, adverse effects and interactions. They are difficult to analyse for exactly the same reason. DrugBank arrives as deeply nested XML; OnSIDES as relational CSV files; TWOSIDES as compressed interaction data. `dbparser` converts those different sources into consistent R objects and traceable integration workflows.

i
Data access and licensing dbparser parses databases that the researcher is authorised to access. It does not redistribute restricted DrugBank content. Reproducibility still requires recording the source database release, access conditions and the exact package version used.

The useful abstraction is not merely a flatter file. It is a stable object model that preserves relationships, release information and provenance while giving analysts one consistent way to work.

## 01 The problem is structural, not cosmetic

A pharmacological database is not a spreadsheet with too many columns. Drug records connect to targets, enzymes, carriers, transporters, pathways, products, references and external identifiers. A parser that only flattens the file can make the result easier to load while silently destroying the relationships that give the data meaning.

The sources also disagree on formats and identifiers. DrugBank uses a large XML hierarchy. OnSIDES distributes related CSV tables derived from drug labels. TWOSIDES uses a compressed flat representation of adverse events associated with drug pairs. Ad-hoc scripts can bridge one analysis, but they usually hide assumptions about joins, versions and missing values.

DrugBank `XML hierarchy`
Mechanisms, drug records, targets, pathways and identifiers.

OnSIDES `CSV tables`
Adverse drug events extracted from FDA drug labels.

TWOSIDES `CSV.GZ`
Adverse events associated with pairs of drugs.

## 02 A common object without erasing the source

dbparser introduces the `dvobject`—a drugverse object implemented as an R list with consistent access patterns. It retains tidy tables for analysis, metadata about the database release and parse process, and mappings that describe how tables relate to one another.

For a single DrugBank release, the object can expose drug information, salts, products, references and the connected carrier–enzyme–target–transporter structures. When sources are merged, the same object gains nested database components and integrated tables rather than becoming an undocumented collection of joins.

### What a dvobject keeps together

analysis-ready object

drugs core drug tables

cett carriers, enzymes, targets, transporters

products commercial products

references articles, links and books

metadata release and provenance

## 03 From parser to integration engine

The current package uses DrugBank as the mechanistic hub. OnSIDES contributes adverse drug events extracted from FDA labels, while TWOSIDES contributes adverse events associated with drug combinations. The hub-and-spoke decision reduces the number of identifier mappings that must be maintained and makes the integration path explicit.

That design is a trade-off: multi-database workflows depend on DrugBank identifiers and mappings. But it is a visible, testable trade-off rather than an implicit assumption buried inside a one-off notebook.

OnSIDES Adverse drug events extracted from FDA drug labels.

DrugBank DrugBank as the mechanistic hub

TWOSIDES Adverse events associated with pairs of drugs.

## 04 The software engineering around the parser

A useful research package needs more than working parsing functions. It needs a stable public interface, tests, documentation, metadata, examples, versioned releases and a process for reviewing changes. dbparser is distributed through CRAN, documented through rOpenSci, released under the MIT licence and maintained in a public repository.

The package was peer-reviewed through rOpenSci and its software paper was published in the Journal of Open Source Software in February 2026. That review record matters because it makes quality claims inspectable: users can see the repository, review thread, archived release, documentation and issue tracker.

Versioned releases
CRAN archives make package evolution and the exact release used in an analysis visible.

Peer review
rOpenSci review and the JOSS record expose documentation, testing and software-design decisions.

Reproducible inputs
Metadata keeps source versions and parse details alongside the analysis-ready object.

## 05 A compact, reproducible workflow

The code below shows the architectural idea without hiding it behind a graphical interface. Each source is parsed independently. The resulting objects are then merged through explicit, chainable operations. The code is deliberately unchanged from the package documentation.

R Integration pipeline

```r
library(dbparser)
library(dplyr)

drugbank_db <- parseDrugBank("data/drugbank.xml")
onsides_db  <- parseOnSIDES("data/onsides/")
twosides_db <- parseTWOSIDES("data/TWOSIDES.csv.gz")

final_db <- drugbank_db %>%
  merge_drugbank_onsides(onsides_db) %>%
  merge_drugbank_twosides(twosides_db)

head(final_db$integrated_data$drug_drug_interactions)
```

## 06 From student project to research infrastructure

CRAN records the first dbparser release in December 2018. The public archive then shows a sequence of maintained releases rather than a one-off upload. By version 2.2.1, published in January 2026, the package had moved beyond DrugBank-only parsing to support integrated pharmacovigilance workflows across three sources.

The project documentation identifies use in more than ten peer-reviewed publications spanning drug repurposing, biomarkers, pathway modelling and clinical-trial analysis. The stronger story is therefore not that a student wrote a parser. It is that the work survived contact with other researchers, changing source databases, package review and long-term maintenance.

2018 dbparser 1.0.0 enters CRAN.

2023–24 The 2.x series modernises the package and its data model.

2026 Version 2.2.1 supports DrugBank, OnSIDES and TWOSIDES integration.

2026 The software paper is published in JOSS after open review.

## 07 The alumnus behind the open-source project

Mohammed Ali is a DSTI alumnus, author and maintainer of dbparser. Ali Ezzat is co-author of the package and the JOSS paper. Their public record lets readers inspect the software at several levels: the stable CRAN release, the full reference manual, the rOpenSci documentation, the source repository, the software-review discussion and the archived JOSS publication.

That openness is part of the engineering result. A reproducible research tool should make it possible to trace not only the output of an analysis, but also the software version, the source data release and the decisions that transformed one into the other.

### Software, documentation and publication

MA

### Mohammed Ali

DSTI alumnus, author and maintainer of dbparser. His work brings R software engineering, pharmacological data integration and reproducible research infrastructure together.

[LinkedIn](https://www.linkedin.com/in/mohammedali85/)

Source and editorial note: Article developed from DSTI’s former student-project record, a supplied manuscript and the current CRAN, rOpenSci, repository and JOSS sources. Technical names, package functions, code and publication titles are preserved exactly.
