About me - Download my résumé

Mathematics PhD with data science and machine learning knowledge and experience - actively seeking opportunities to leverage my skills in finding data-driven solutions to real world problems.

Skills and knowledge

I have a Ph.D. in mathematics and I am self-taught in data science and machine learning. Skills include:

  • Python programming including libraries: Pandas, NumPy, Scikit-Learn, PyTorch, XGBoost, Statsmodels, Matplotlib, Seaborn, Plotly, Natural Language Toolkit, Langchain, Streamlit, Gradio, Transformers, Diffusers
  • Machine learning and statistical models: linear models, GLM and GAM, KNN, SVM, decision trees, clustering, ensemble models, MLP, CNN, RNN, GAN, encoders and decoders, transformers, diffusion, LLMs, ARIMA, (Neural) Prophet
  • Data science principles and techniques: dimensionality reduction e.g. PCA, LDA, manifold learning; data cleaning and feature engineering; data exploration and visualization
  • Math expertise: linear algebra, calculus and differential equations, statistics and probability, discrete math and graph theory, abstract algebra, geometry and topology
  • Proficiency in SQL, LaTeX, Git, MS Office Suite

Recent projects:

Map of PA bike crashes

BikeSaferPA

A project in which I build BikeSaferPA, a gradient boosted decision tree classifier designed to predict severity of bicycle crashes in PA based on crash input data.

  • BikeSaferPA is trained on a PENNDOT dataset of over 26,000 bicycle crashes in PA from 2002-2021.
  • The project involved data procurement and cleaning, visualizations, a feature engineering pipeline, and a rigorous model selection process culminating in the BikeSaferPA model.
  • I investigated the importance of various features in explaining the model's predictions via SHAP value analysis, and used these results to make policy recommendations for improving safety outcomes for cyclists.
  • I used Streamlit to design and built an easy-to-use BikeSaferPA web app, a suite of tools which enable the user to visualize the data and experiment with the BikeSaferPA model.

Try out the BikeSaferPA web app

See the GitHub repository, or view the Jupyter notebooks in HTML format:

Brain tumor segmentation model

Brain tumor segmentation

A 3-D UNet model trained for segmentation of brain tumor regions in 3-dimentional MRI images.

  • I designed the model in PyTorch, trained the model on data from the BraTS 2020 challenge, and evaluated its performance using several segmentation metrics.
  • I implemented data augmentation for the training set to discourage overfitting and used test-time augmentation when predicting on the validation and testing sets to promote higher-quality predictions.

View a Jupyter notebook documenting the training and evaluation process

See the GitHub repository

RoBERTa Q&A model

Fine-tuned RoBERTa Q&A model and app

A RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers) language model, fine-tuned for the extractive question answering task using version 2 of SQuAD (Stanford Question Answering Dataset).

I designed a web app which demonstrates simple Q&A based on user-provided material, as well as a Wikipedia-aided question answering tool.

Reseach and teaching experience

I have over a decade of experience teaching a wide variety advanced math classes to undergraduate and graduate students in STEM.

I have planned and executed individual and collaborative research projects in math, authored or co-authored seven peer-reviewed publications in national and international journals, and presented my work at seminars and national conferences. Please click here to visit my Google Scholar profile.