Hello, I'm
Morgantown, WV
Turning complex data into actionable insights with machine learning, statistical modeling, and a foundation in industrial & chemical engineering.
I'm a data scientist and machine learning engineer with a background in chemical and industrial engineering. I specialize in building predictive models, designing experiments, and extracting meaningful patterns from complex datasets.
My work spans energy analytics and optimization, NLP-driven compliance automation, public health modeling, and statistical process control — applying rigorous ML methods to deliver measurable impact across industrial, government, and healthcare domains.
With hands-on experience leading client-facing assessments at industrial facilities and developing NLP pipelines for Fortune 500 cybersecurity programs, I bridge the gap between advanced analytics and real-world decision-making.
Classification, regression, and ensemble methods for industrial energy optimization and public health applications. End-to-end pipelines from data curation to deployment.
Sentence transformers, semantic similarity, and compliance automation. Built hybrid NLP pipelines combining embeddings with rule-based detection for cybersecurity.
Facility assessments, baseline consumption modeling, and retrofit analysis. Quantified $150K+ in annual savings across 9 industrial facilities per ASHRAE standards.
Multivariate SPC, PCA-based monitoring, and anomaly detection. Reduced dimensionality by 85% and identified 12.3% out-of-control observations in manufacturing data.
Applied the CRISP-DM framework to analyze OPC-UA industrial sensor data across 27 production shifts, engineering time-series features (Total Active Energy, Active Power L2) and implementing DTW-based Time-Series KMeans to identify anomalous operating regimes linked to energy inefficiency and quality defects.
Built end-to-end ML pipelines on 2,500+ U.S. counties and 300+ features to predict food insecurity, diabetes prevalence (regression), and obesity hotspots (classification) using Gradient Boosting and Logistic Regression. Designed robust preprocessing and validated with nested cross-validation and bootstrap uncertainty analysis.
Developed a robust multivariate statistical process control (MSPC) framework on 552 manufacturing records with 209 variables. Used PCA and robust outlier detection to isolate a stable in-control baseline, then deployed a Phase II Hotelling's T² monitoring scheme for real-time anomaly detection.
I'm always open to discussing data science projects, research collaborations, or opportunities.