About Me
I help teams turn complex biological data into clear, reproducible, and actionable insights. I design and run bioinformatics and genomics analyses that scale efficiently and deliver results that can be trusted across projects. By combining workflow automation with statistical and computational methods, I make it easier for research and industry teams to interpret data, validate findings, and make informed decisions. My MS in Bioinformatics and Computational Biology and my research in Dr. Xuan's lab strengthened my ability to build systems that handle large datasets with precision. I am a bioinformatics scientist who takes initiative, identifies gaps in analysis and process, and develops solutions that improve both accuracy and impact.
Building reliable analyses and workflows that turn data into results teams can trust.
Skills & Expertise
I specialize in turning raw sequencing data into results that research teams can trust. My skills span reproducible genomics analysis, machine learning for biological data, and production-ready pipelines that scale across HPC and cloud systems.
Next-Generation Sequencing (NGS)
Developed end-to-end RNA-seq and scRNA-seq pipelines that convert raw reads into expression matrices and QC reports used in cancer and disease studies.
Genomic Data Analysis
Analyzed and interpreted genomic variation, integrating public databases and pathway tools to link variants to functional and regulatory impact.
Bioinformatics & Statistical Analysis
Applied statistical modeling and data cleaning to large datasets, improving reproducibility and clarity in gene expression and variant studies.
Machine Learning in Biology
Built classification and prediction models using scikit-learn and PyTorch for transcriptomic and histopathology data, improving interpretability and accuracy.
GWAS and TWAS Pipelines
Integrated genotype, expression, and chromatin data using TWAS and eQTL mapping to identify trait-associated genes beyond traditional GWAS.
Workflow Automation
Designed scalable workflows with Nextflow, Snakemake, Docker, and AWS that reduced runtime and ensured reproducibility across HPC and cloud systems.
Work Experience
Graduate Researcher — Genomic Data Modeling
Dr. Xuan's Lab, UT Dallas (Jan 2024 — May 2025)
- Lab needed reliable pipelines to connect genetic variation to expression and phenotype.
- Built and validated machine learning models using GTEx and Hi-C data; engineered modular pipelines on HPC systems.
- Improved the lab's ability to predict gene expression across tissues and interpret long-range regulatory effects, directly supporting translational genomics research.
Undergraduate Research Assistant — Antimicrobial Research
Sree Vidyanikethan Degree College (Aug 2020 — Mar 2021)
- The project aimed to study Biancaea sappan for antioxidant and antibacterial potential, but initial assays lacked consistency and yield.
- Refined assay protocols, optimized growth media, and tested extraction conditions to improve reproducibility.
- Achieved more reliable yields and confirmed antibacterial activity, giving the team stronger evidence to pursue natural compound studies further.
Projects & Pipelines
Computational strategies for deciphering biological heterogeneity, gene regulation, and clinical pathogenicity.
CRC-TME: Tumor Heterogeneity Analysis
Uncovering immune infiltration patterns in the Colorectal Tumor Microenvironment. Integrated 63k+ single-cell transcriptomes across cohorts using scVI for probabilistic batch correction and Scanpy for clustering.
View Analysis
Yeast-Stress: Automated RNA-Seq Pipeline
Reproducible Nextflow pipeline for quantifying oxidative stress response in S. cerevisiae. Automates QC, alignment, and differential expression analysis in a containerized environment.
View Pipeline
Pan-Cancer Expression Profiling
Standardized analysis workflow for five major cancer types (e.g., BRCA, KIRC). Automates normalization (DESeq2) and survival analysis to identify subtype-specific prognostic biomarkers.
View Workflow
Gleason AI: Histology Classifier
Deep learning model for grading prostate cancer tissue. Trained a ResNet-50 CNN to identify Gleason patterns, using Grad-CAM to visualize the morphological features driving the diagnosis.
View Model
TinyVariant: Transformer Classifier
Experimental NLP model for classifying Variants of Uncertain Significance (VUS). Adapts attention mechanisms (BERT-like) to learn pathogenicity from genomic sequence context.
View Project
SeqMorph: Mutation Simulator
CLI tool for injecting synthetic mutations into sequencing data. Designed to stress-test the sensitivity of alignment algorithms against edge cases like indels and structural variants.
View Tool
CORE-seq: Sequence Compression
High-performance Python library for optimizing nucleotide storage. Reduces memory overhead during the pre-processing of large-scale genomic datasets for machine learning ingestion.
View Library
Antimicrobial Assay Optimization
Refined extraction protocols for Biancaea sappan to improve yield of bioactive compounds. Validated antioxidant and antibacterial efficacy against pathogens via zone-of-inhibition assays.
Nociception Study
Investigating gut microbial metabolites in nociception pathways. Performed molecular docking simulations to map the binding affinity of secondary metabolites to host pain receptors.
Education & Certifications
MS: Bioinformatics and Computational Biology
University of Texas at Dallas (UTD) — May 2025
- Applied Bioinformatics
- Statistics in Bioinformatics
- Molecular Biology
- Algorithms & Data Structures
- Medical Image Analysis
Advanced Diploma: Bioinformatics
Bharati Vidyapeeth University (BVDU) — 2022
- Biological Informatics
- Biostatistics
- Data Mining & ML
- Molecular Modeling
- R & Data Analytics
BSc: Microbiology, Biochemistry, Chemistry
Sri Venkateswara University (SVU) — 2020
- Microbial Physiology
- Medical Microbiology
- Immunology
- Biomolecules
- Biotechnology
Certifications
AWS Educate: Cloud Computing 101
Completed foundational training on cloud computing infrastructure, services, deployment models, and best practices.
Hello Nextflow Certificate
Passed the Hello Nextflow test at the conclusion of the Nextflow training week (September 2025).
Contact Me
Open to opportunities in bioinformatics, computational biology, and data science. I also welcome collaborations on pipeline optimization and machine learning for genomics.