Sahil Batra

SAHIL BATRA

Autodidact learner with a passion of building Machine Learning Systems & Teaching

PROJECTS

Built an open source wake word detector using Mozilla Common Voice Dataset & Google Speech to Text (85% + acc).
Built an end to end Kube Flow pipeline for ML model deployment & inference.
Built GPT2 based transformer decoder model for next word prediction.
Built LSTM based Sentiment Analysis model deployed on AWS Sagemaker.

SKILLS

Cloud: GCP, AWS
Database Programming: Relational Databases (IBM DB2, SQL Server, Teradata), BigQuery, RedShift
ML Tools: SageMaker, Vertex, Transformer, MCP, Langchain, TFX, Kubeflow
Programming Language: Python, Bash, SQL
ETL Tools: Informatica 9.1, DataProc(Spark), DataFlow(Beam)
Software/Tools: Airflow, Git (CI/CD), JupyterLab, VS Code, Docker, Kubernetes, GCP, Power BI, Comet
Packages: scikit-learn, numpy, pandas, sklearn, tensorflow, pytorch, xgboost ray
Certifications: Data Engineering Nanodegree (Udacity), Deep Learning Specialization (Coursera), Natural Language Processing Specialization (Coursera), Splunk Fundamentals & Advanced Searching (Splunk), GIAC Security Essentials (GSEC)

EXPERIENCE

2022 - Present
Sr. ML Engineer Remote, Canada
  • Built end to end ML model for fraud onboarding use cases—including point-of-payment, card cashing, and no-intent-to-ship fraud—covering feature engineering, model training, real-time inference and adoption of fraud scores across products. Prec 90%; Savings of $5M+.
  • Led ML Ops productionalization for fraud detection models, enhancing model evaluation, tracking, and real-time deployment. 100+ manual hours saved, allowed fast iteration of models within hours.
  • Helped build transformer-decoder networks using Sequence data to identify fraud at different points of merchant journey. Led to identification of Bot related incidents.
2018 - 2022
Sr. Data Scientist Remote, Canada
  • Manage the AI platform & strategy, stakeholder engagement and technical work across 4 data scientists & data engineers.
  • Built AI systems & Data Pipelines
  • Created infection risk KPI and provided time to return based on LSTM model. The model was shared across the Security Executive Council and 500 companies across the globe.
  • Classical imbalanced fraud risk scoring model to detect fraudulent emails with model metrics on live data - Prec 90%, $5M + saved, end to end system to support Fraud ops.
  • Identified characteristics of employees susceptible to phishing based on phishing simulations data (Prec 83% & Rec 89%).
  • Applied an AutoEncoder to reconstruct daily network data to identify anomalous traffic using Producer Consumer Ratio, Session & Volume of traffic across network enabling reduction in false positive rate by 60%.
2020 - 2022
Mentor Canada
  • Teaching & providing industry perspective for MIT & UT Austin - Artificial Intelligence & Machine Learning courses with overall rating of 4.8/5.
2016 - 2018
Data Scientist, BI & Analytics Lead USA / Canada
  • Built Metrics & KPI from ground up for Servers Business Unit on weekly operational performance (100+ attendees).
  • Applied ML models in sales domain:
  • Customer market segmentation using K Prototype based on company attributes, and recency, frequency & monetary attributes for sales targeting, resulting in a 28% increase in qualified leads.
  • Commodity data clustering to identify trends of most common vs least configurations, reducing sample inventory holding costs by 20%.
2015 - 2016
Data Scientist, Analytics USA
  • Created ML models:
  • Clustering using KNN & Quantile Regression to identify customer software spend across different market segments, revealing 2 high-potential market segments for targeting.
  • Lead generation cross-sell model to help sales to identify better candidates for cold calling, improving cross-sell conversion rate from 9% to 37%.
2013 - 2014
Software Developer, Business Analytics India
  • Utilized informatica to build data pipelines & hadoop for data storage for ingestion of employee workday data.
2010 - 2013
Software Engineer, BI Noida, India / United Kingdom
  • Built ETL Data pipelines using SAS & Informatica to pull data feeds to create data warehouse & data marts in SQL & IBM DB2 using Fact/Dims.

EDUCATION

2021
Machine Learning Engineer FourthBrain.AI
2015
MS in Business Analytics The University of Texas at Austin
2010
Bachelor of Technology, Electronics & Communication GGS Indraprastha University

WORK AUTHORIZATION

Canadian and Indian citizen, US H1B Visa and Australia/NZ Working Holiday Visa.