Posts by Collection

publications

BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning

BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning

Published in ICML 2024, 2024

Most of the existing Large Language Model (LLM) benchmarks on bioinformatics problem reasoning focus on problems grounded to niche research domains where datasets contain a small number of samples and, therefore are not truly representative of the broad domain of bioinformatics. To systematically examine the reasoning capabilities required for solving complex bioinformatics problems, we introduce an expansive benchmark suite BioinformaticsBench for LLMs.

Citation: Varuni Sarwal, Seungmo Lee, Rosemary He, Aingela Kattapuram, xiaoxuan wang, Eleazar Eskin, Wei Wang, Serghei Mangul. "BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning." AccMLBio ICML 2024.
Download Paper

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner

Published in bioRxiv, 2024

We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4x mean depth) and deep whole exome (30-40x mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations.

Citation: Boltz, T., Chu, B., Liao,C. Sealock, J , … Lee, Seungmo. ... Martin,R. (2024). "A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner." bioRxiv.
Download Paper

VISTA: an integrated framework for structural variant discovery

VISTA: an integrated framework for structural variant discovery

Published in Briefings in Bioinformatics, 2024

Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm.

Citation: Varuni Sarwal*, Seungmo Lee*, Jianzhi Yang, Sriram Sankararaman, Mark Chaisson, Eleazar Eskin, Serghei Mangul "VISTA: an integrated framework for structural variant discovery." Briefings in Bioinformatics, Volume 25, Issue 5.
Download Paper

talks

RECOMB 2024 Short Talk Presentation

Published:

VISTA: an integrated framework for structural variant discovery

ISMB 2024, Satellite Talk

Published:

VISTA: an integrated framework for structural variant discovery

RECOMB 2025 Short Talk Presentation

Published:

Metagenomics agnostic Test

teaching

CS 122: Algorithms in Bioinformatics

Winter 2025, Computer Science Department, 2025