Machine Learning Engineer

About Me

Building ML systems that make LLMs production-ready

IbuildMLsystemsthatmakelargelanguagemodelsfaster,smaller,andproduction-ready.With5+yearsofexperienceapplyingML,NLP,andpredictiveanalyticstohealthcareandinsurancedata,Ispecializeinclinicaltextanalysis,modelcompression,andproductionMLpipelines.

distillation_pipeline.py

Impact Metrics

110M→65M

Model Compression

3.5x

Inference Speed

93.2%

F1 Retained

4,350/s

Docs Processed

Education

M.S. Data Science — CU Boulder

B.S. ECE — JNTU Kakinada

View Degree

Location

Colorado, United States

🟢 Available for opportunities

Awards

2023 DataScience Hackathon

Extra Mile Award (Team & Individual)

🧠

Currently Learning

RLHF (Reinforcement Learning with Human Feedback) & Reward Modeling

System Studio

Choose a real problem, tune the priority, and see the architecture tradeoffs behind the work

Latency-sensitive healthcare ML

Clinical NLP Compression

Make a clinical entity model smaller and faster without giving up useful accuracy.

Clinical Text

Stage 1/5

Validate schema, source quality, and entity coverage before training.

Optimization Focus

Latency39ms → 11ms

Model Size411MB → 62.6MB

F1 Retainedbaseline → 93.2%

Decision

Serve the compact model first

Prioritize the distilled model and keep the larger model as a quality reference.

Result

39ms to 11ms inference with a much smaller artifact.

92% confidence

Production judgment: smaller models, measured tradeoffs, and deployment-aware evaluation.

Open Project

Skills & Expertise

An interactive constellation of technologies — drag, hover, and filter

Drag nodes to rearrange · Hover for details · Filter by category

Featured Projects

Recruiter mode emphasizes business impact, metrics, and project proof.

Details

Wisdom Vault — The Inheritable Agent

An AI-powered inheritance system that lets a parent's decision patterns be inherited through cryptographically scoped tokens via Auth0 Token Vault. Uses Claude for wisdom extraction and multi-generational delegation with 2-of-3 trustee multi-sig.

PythonFlaskAuth0Claude APIJWT+1

Details

Agentic AI Parenting Assistant

An AI-driven modular Parenting Agent built with Google's Agent Development Kit (ADK) and FatSecret integration. Features specialized sub-agents for parenting advice, nutrition meal planning, and basic medical guidance with stateful sessions.

PythonGoogle ADKFatSecret APILangChainAgentic AI

Details

Clinical NLP — Healthcare NER Pipeline

End-to-end ML engineering project for clinical Named Entity Recognition. Distilled Bio_ClinicalBERT (110M) into DistilClinicalBERT (65M) retaining 93.2% F1, achieving 3.5x faster inference (39ms→11ms) and 6.6x model compression (411MB→62.6MB). Built distributed processing on AWS EMR handling 900K records in 3.5 minutes at 4,350 docs/sec, with weak labeling over 7K PubMed abstracts generating 19.5K entities.

PyTorchHuggingFaceONNX RuntimePySparkAWS EMR+3

Clinical NLP — Healthcare NER Pipeline

Project evidence

Key Metrics

93.2% F1 retention · 3.5x faster · 6.6x smaller

Details

End-to-end ML engineering project for clinical Named Entity Recognition. Distilled Bio_ClinicalBERT (110M) into DistilClinicalBERT (65M) retaining 93.2% F1, achieving 3.5x faster inference (39ms→11ms) and 6.6x model compression (411MB→62.6MB). Built distributed processing on AWS EMR handling 900K records in 3.5 minutes at 4,350 docs/sec, with weak labeling over 7K PubMed abstracts generating 19.5K entities.

Tech Stack

PyTorchHuggingFaceONNX RuntimePySparkAWS EMRFastAPIPrometheusTerraform

Code

Details

Cold Email Generator

A cold email generator for service companies built with Groq, LangChain, and Streamlit. Extracts job listings from career pages and crafts personalized emails with relevant portfolio links from a vector database.

GroqLangChainStreamlitChromaDBPython

Details

SemEval-2024 Task 8a — AI Text Detection

Research project for SemEval-2024 on detecting machine-generated text. Experimented with DistilBERT, DeBERTa, RoBERTa, and ALBERT on 119K+ samples. DeBERTa tokenizer + RoBERTa model achieved best results.

PyTorchHuggingFaceDeBERTaRoBERTaNLP

Details

Space Data Analytics Dashboard

Comprehensive analytics project covering 4,630+ space missions from 1957–2022. Features interactive Power BI dashboards analyzing launch trends, mission success rates, rocket usage, and global cooperation patterns.

Power BIDAXPythonData Modeling

Details

PHI/PII Parser — FHIR Data Redaction

A service that reads HL7 FHIR Bundle JSON files from AWS S3, detects and redacts PII/PHI fields using key-name matching and regex patterns, and outputs cleaned CSVs. Supports both FastAPI local deployment and serverless AWS Lambda with automatic S3 triggers.

PythonFastAPIAWS LambdaS3Docker+1

PHI/PII Parser — FHIR Data Redaction

Project evidence

Key Metrics

Redacts 10+ FHIR resource types across PII/PHI fields

Details

A service that reads HL7 FHIR Bundle JSON files from AWS S3, detects and redacts PII/PHI fields using key-name matching and regex patterns, and outputs cleaned CSVs. Supports both FastAPI local deployment and serverless AWS Lambda with automatic S3 triggers.

Tech Stack

PythonFastAPIAWS LambdaS3DockerPydantic

Code

Details

Data-Center Scale Computing

Projects focused on distributed computing at data-center scale, implementing scalable data processing pipelines and cloud-native architectures for large-scale ML workloads.

PythonPySparkAWSDistributed Systems

Experience

My career journey in ML and AI

Aug 2024 – PresentWork

Blue Cross Blue Shield of Colorado

Used ClinicalBERT, Python, and Scikit-learn to analyze clinical notes, claims narratives, and prior authorization text, helping medical review teams find relevant clinical patterns faster. Built Python and SQL-based models to prioritize high-risk claims, predict likely denials, and surface high-cost cases. Built HIPAA-compliant data pipelines with Azure Data Factory, Azure Synapse, and Python to process claims from 6+ payer systems, reducing manual data preparation by 30%. Developed Power BI dashboards to track denial rates, turnaround time, and provider performance.

May 2023 – Aug 2023Work

Data Science Intern

Parlay (Techstars '23)

Engineered PySpark data lake architecture on AWS S3 with Apache Hudi, achieving 40% reduction in batch processing time for 500K+ daily records. Automated data ingestion and transformation pipelines using Python multiprocessing and SQLAlchemy. Designed REST APIs to standardize internal data access, reducing integration time for new data sources by 60%.

Jul 2019 – Aug 2022Work

Machine Learning Engineer

Accenture – Sun Life Insurance Client

Built predictive models with Random Forest, XGBoost, and Scikit-learn to analyze insurance claims, policyholder behavior, and risk patterns. Developed anomaly detection logic to flag suspicious billing activity and high-risk policy behavior. Prepared ML-ready datasets using Azure Data Factory, Azure Synapse, PySpark, and Python, improving ingestion throughput by 50% and reducing reporting costs by 31%. Deployed Azure Databricks pipelines with data quality checks, logging, and SLA monitoring.

2023Award

2023 DataScience Hackathon

Hackathon Award

Recognized for building innovative data science solutions. Also received Extra Mile Awards in both team and individual categories at Accenture.

Publications & Writing

Articles, papers, and talks

SemEval-2024 Task 8a: Multigenerator Multidomain Black-Box Machine-Generated Text Detection

SemEval-2024 / University of Colorado Boulder · Jun 2024

Wisdom Vault: Cryptographic Inheritance of AI-Extracted Decision Patterns

Auth0 Authorized to Act Hackathon · Apr 2026

Certifications

Professional credentials and continuous learning

AWS Certified Machine Learning Engineer - Associate

Amazon Web Services· Mar 2026 · Expires Mar 2029

View Credential

AWS Certified Solutions Architect

Amazon Web Services

View Credential

Applications of AI for Anomaly Detection

NVIDIA· Nov 2024

View Credential

The Structured Query Language (SQL)

University of Colorado Boulder· May 2023

Data Analysis with R Programming

Google· May 2024

Share Data Through the Art of Visualization

Google· May 2024

Analyze Data to Answer Questions

Google· May 2024

Process Data from Dirty to Clean

Google· May 2024

Ask Questions to Make Data-Driven Decisions

Google

View Credential

Prepare Data for Exploration

Google

View Credential

Introduction to Programming and Tidyverse

University of Colorado Boulder· Aug 2021

R Programming and Tidyverse Capstone Project

University of Colorado Boulder· Sep 2022

Data Analysis with Tidyverse

University of Colorado Boulder· Aug 2022

The Ultimate MySQL Bootcamp

Udemy

View Credential

GitHub Activity

Live stats and contributions

12Stars

3Forks

0Repos

Currently Building

AlignLLM — LLM Alignment Pipeline on AWS

Building a 7B chat model alignment pipeline using SFT, DPO, RLHF, and LoRA/QLoRA on AWS SageMaker with Terraform infrastructure.

Progress45%

Follow progress

Top Languages

Python

92%

SQL

85%

HTML

70%

Java

65%

Jupyter Notebook

55%

Contribution Graph

LessMore

Get in Touch

Let's connect and discuss opportunities

Connect

Email

GitHub

Download Resume

Send a Message

What People Say

“Santosh is great at taking a messy ML problem and turning it into something that actually works in production. He helped us rethink our model compression approach and was always willing to dig into the details when things did not go as expected. Solid engineer, easy to work with.”

DC

David Chen

ML Engineering Lead, Blue Cross Blue Shield Association

Santosh Adabala

About Me

Impact Metrics

Education

Location

Awards

Currently Learning

System Studio

Clinical NLP Compression

Clinical Text

Serve the compact model first

Skills & Expertise

Featured Projects

Wisdom Vault — The Inheritable Agent

Agentic AI Parenting Assistant

Clinical NLP — Healthcare NER Pipeline

Cold Email Generator

SemEval-2024 Task 8a — AI Text Detection

Space Data Analytics Dashboard

PHI/PII Parser — FHIR Data Redaction

Data-Center Scale Computing

Experience

Machine Learning Engineer

Data Science Intern

Machine Learning Engineer

2023 DataScience Hackathon

Publications & Writing

SemEval-2024 Task 8a: Multigenerator Multidomain Black-Box Machine-Generated Text Detection

Wisdom Vault: Cryptographic Inheritance of AI-Extracted Decision Patterns

Certifications

AWS Certified Machine Learning Engineer - Associate

AWS Certified Solutions Architect

Applications of AI for Anomaly Detection

The Structured Query Language (SQL)

Data Analysis with R Programming

Share Data Through the Art of Visualization

Analyze Data to Answer Questions

Process Data from Dirty to Clean

Ask Questions to Make Data-Driven Decisions

Prepare Data for Exploration

Introduction to Programming and Tidyverse

R Programming and Tidyverse Capstone Project

Data Analysis with Tidyverse

The Ultimate MySQL Bootcamp

GitHub Activity

Currently Building

Top Languages

Contribution Graph

Get in Touch

Connect

Send a Message

What People Say