Sahil Batra

SAHIL BATRA

Autodidact learner with a passion of building Machine Learning Systems & Teaching

PROJECTS

Built an open source wake word detector using Mozilla Common Voice Dataset & Google Speech to Text (85% + acc).
Built an end to end Kube Flow pipeline for ML model deployment & inference.
Built GPT2 based transformer decoder model for next word prediction.
Built LSTM based Sentiment Analysis model deployed on AWS Sagemaker.
Built Transformer Encoder Model with embeddings used for downstream Topic Classification problem.

SKILLS

Cloud: GCP, AWS
Database Programming: Relational Databases (IBM DB2, SQL Server, Teradata), BigQuery, RedShift
ML Tools: SageMaker, Vertex, Transformer, MCP, Langgraph, TFX, Kubeflow
Programming Language: Python, Bash, SQL
ETL Tools: Informatica 9.1, DataProc(Spark), DataFlow(Beam)
Software/Tools: Airflow, Git (CI/CD), JupyterLab, VS Code, Docker, Kubernetes, GCP, Power BI, Comet
Packages: scikit-learn, numpy, pandas, sklearn, tensorflow, pytorch, xgboost ray
Certifications: Data Engineering Nanodegree (Udacity), Deep Learning & Natural Language Processing Specialization (Coursera), Splunk Fundamentals & Advanced Searching (Splunk), GIAC Security Essentials (GSEC)

EXPERIENCE

2022 - Present
Sr. ML Engineer Remote, Canada
  • Built end to end ML model for fraud onboarding use cases—including point-of-payment, card cashing, and no-intent-to-ship fraud—covering feature engineering, model training, real-time inference and adoption of fraud scores across products. Prec 90%; Savings of $5M+.
  • Led ML Ops productionalization for fraud detection models, enhancing model evaluation, tracking, and real-time deployment. 100+ manual hours saved, allowed fast iteration of models within hours.
  • Helped build transformer-decoder networks using Sequence data to identify fraud at different points of merchant journey. Led to identification of Bot related incidents.
2018 - 2022
Sr. Data Scientist Remote, Canada
  • Manage the AI platform & strategy, stakeholder engagement and technical work across 4 data scientists & data engineers.
  • Built AI systems & Data Pipelines
  • Created infection risk KPI and provided time to return based on LSTM model. The model was shared across the Security Executive Council and 500 companies across the globe.
  • Classical imbalanced fraud risk scoring model to detect fraudulent emails with model metrics on live data - Prec 90%, $5M + saved, end to end system to support Fraud ops.
  • Identified characteristics of employees susceptible to phishing based on phishing simulations data (Prec 83% & Rec 89%).
  • Applied an AutoEncoder to reconstruct daily network data to identify anomalous traffic using Producer Consumer Ratio, Session & Volume of traffic across network enabling reduction in false positive rate by 60%.
2020 - 2022
Mentor Canada
  • Teaching & providing industry perspective for MIT & UT Austin - Artificial Intelligence & Machine Learning courses with overall rating of 4.8/5.
2016 - 2018
Data Scientist, BI & Analytics Lead USA / Canada
  • Built Metrics & KPI from ground up for Servers Business Unit on weekly operational performance (100+ attendees).
  • Applied ML models in sales domain:
  • Customer market segmentation using K Prototype based on company attributes, and recency, frequency & monetary attributes for sales targeting, resulting in a 28% increase in qualified leads.
  • Commodity data clustering to identify trends of most common vs least configurations, reducing sample inventory holding costs by 20%.
2015 - 2016
Data Scientist, Analytics USA
  • Created ML models:
  • Clustering using KNN & Quantile Regression to identify customer software spend across different market segments, revealing 2 high-potential market segments for targeting.
  • Lead generation cross-sell model to help sales to identify better candidates for cold calling, improving cross-sell conversion rate from 9% to 37%.
2013 - 2014
Software Developer, Business Analytics India
  • Utilized informatica to build data pipelines & hadoop for data storage for ingestion of employee workday data.
2010 - 2013
Software Engineer, BI India / United Kingdom
  • Built ETL Data pipelines using SAS & Informatica to pull data feeds to create data warehouse & data marts in SQL & IBM DB2 using Fact/Dims.

EDUCATION

2021
Machine Learning Engineer FourthBrain.AI
2015
MS in Business Analytics The University of Texas at Austin
2010
Bachelor of Technology, Electronics & Communication GGS Indraprastha University

WORK AUTHORIZATION

Canadian and Indian citizen, US H1B Visa and Australia/NZ Working Holiday Visa.