Hello, I'm

Akshay Patel

|

Morgantown, WV

Turning complex data into actionable insights with machine learning, statistical modeling, and a foundation in industrial & chemical engineering.

0+ Industrial Facilities Assessed
$0K+ Annual Savings Identified
$0M+ Grant Funding Supported
0 ton/yr CO₂ Reduction Estimated

About Me

Akshay Patel

I'm a data scientist and machine learning engineer with a background in chemical and industrial engineering. I specialize in building predictive models, designing experiments, and extracting meaningful patterns from complex datasets.

My work spans energy analytics and optimization, NLP-driven compliance automation, public health modeling, and statistical process control — applying rigorous ML methods to deliver measurable impact across industrial, government, and healthcare domains.

With hands-on experience leading client-facing assessments at industrial facilities and developing NLP pipelines for Fortune 500 cybersecurity programs, I bridge the gap between advanced analytics and real-world decision-making.

Education

Master of Science in Industrial Engineering

West Virginia University, Morgantown, WV

January 2024 — December 2025
Relevant Coursework: Machine Learning, Design of Experiments

Bachelor of Technology in Chemical Engineering

Maulana Azad National Institute of Technology (MANIT), Bhopal, India

August 2019 — May 2023

What I Do

Predictive Modeling & ML

Classification, regression, and ensemble methods for industrial energy optimization and public health applications. End-to-end pipelines from data curation to deployment.

NLP & Text Analytics

Sentence transformers, semantic similarity, and compliance automation. Built hybrid NLP pipelines combining embeddings with rule-based detection for cybersecurity.

Energy Analytics & Optimization

Facility assessments, baseline consumption modeling, and retrofit analysis. Quantified $150K+ in annual savings across 9 industrial facilities per ASHRAE standards.

Statistical Process Control

Multivariate SPC, PCA-based monitoring, and anomaly detection. Reduced dimensionality by 85% and identified 12.3% out-of-control observations in manufacturing data.

Skills

Languages

Python SQL JavaScript

ML & Statistical Methods

Classification Regression Neural Networks Time Series SVM Random Forest Decision Trees Naive Bayes KNN PCA Hypothesis Testing Predictive Analysis

NLP

Sentence Transformers Cosine Similarity Text Classification MiniLM-L6-v2

Libraries & Frameworks

SciPy NumPy Pandas Matplotlib Scikit-Learn

Visualization & BI

Power BI Excel Matplotlib Seaborn

Domain Knowledge

ASHRAE Standards NIST 800-53 STIG CRISP-DM DOE

Experience

Jan 2024 — Dec 2025

Graduate Research Assistant (Data Scientist — Energy Analyst)

WVU IMSE Pollution Prevention Group, Morgantown, WV

  • Led energy assessments across 9 industrial facilities, conducting on-site data collection and stakeholder interviews to drive data-driven efficiency recommendations.
  • Built baseline consumption models from facility-level energy data (load profiles, equipment inventories, operating schedules) and evaluated retrofit scenarios per ASHRAE standards, identifying $150K+ in annual cost-reduction opportunities.
  • Performed sensitivity analysis, ROI, and payback calculations, estimating 120 ton/yr CO₂ reduction and presenting investment recommendations to executive decision-makers.
  • Developed Power BI/Excel dashboards and technical reports that contributed to $1M+ in successful USDA REAP grant applications.
  • Delivered technical webinars on energy-saving strategies to 50+ non-technical stakeholders.
Aug 2025 — Dec 2025

Cybersecurity Analyst (NLP Engineering)

Data Driven WV, Morgantown, WV

  • Built an NLP-based compliance automation PoC for a Fortune 500 government services org, from requirements gathering to system design validation with the client cybersecurity team.
  • Engineered a hybrid NLP pipeline (MiniLM-L6-v2 sentence transformers + rule-based detection) aligned with NIST 800-53 and STIG standards for automated log classification.
  • Presented results to 20+ executive stakeholders, securing approval to transition the solution to in-house production.
  • Tuned precision-recall thresholds (0.15–0.70) to balance false-positive risk under regulatory compliance constraints.

Projects

October 2024

Time Series Clustering for Industrial Energy Optimization

Applied the CRISP-DM framework to analyze OPC-UA industrial sensor data across 27 production shifts, engineering time-series features (Total Active Energy, Active Power L2) and implementing DTW-based Time-Series KMeans to identify anomalous operating regimes linked to energy inefficiency and quality defects.

Silhouette: 0.65–0.66 Calinski-Harabasz: up to 82 27 production shifts
Time Series DTW KMeans CRISP-DM Python
November 2025

County-Level Food & Health Outcomes Modeling

Built end-to-end ML pipelines on 2,500+ U.S. counties and 300+ features to predict food insecurity, diabetes prevalence (regression), and obesity hotspots (classification) using Gradient Boosting and Logistic Regression. Designed robust preprocessing and validated with nested cross-validation and bootstrap uncertainty analysis.

2,500+ counties 300+ features Nested CV + Bootstrap
Gradient Boosting Logistic Regression Public Health Python
November 2025

Multivariate Quality Control & Anomaly Detection

Developed a robust multivariate statistical process control (MSPC) framework on 552 manufacturing records with 209 variables. Used PCA and robust outlier detection to isolate a stable in-control baseline, then deployed a Phase II Hotelling's T² monitoring scheme for real-time anomaly detection.

85% dimensionality reduction 209 → 46 PCs 68 anomalies (12.3%)
PCA Hotelling's T² SPC Anomaly Detection Python

Get in Touch

I'm always open to discussing data science projects, research collaborations, or opportunities.