Hi, I'm Suvrakamal
Machine Learning Engineer, I love building things around ML. I am active on Twitter. Read my latest research
SD

About

I'm a certified TensorFlow ML Engineer, and have publications at the SciPyConf. Read my latest research - here

Work Experience

X

XRI Global USA

Jul 2024 - Present
Machine Learning Engineer
LLM Trans-tokenization: Developed a cross-lingual vocabulary transfer strategy for adapting Mistral-7B models to new languages, achieving 87% accuracy in Hindi-English translation. Implemented token alignment using SMT tools and fine-tuned LLMs on low-resource languages to optimize token mapping and performance across Latin and non-Latin scripts. Achieved better BLEU, Perplexity and CHRF scores compared to base models. Dashboard and Data Integration: Developed a Full-Stack Interactive Language Technology Dashboard to visualize African language data. Integrated interactive charts (BLEU, chrF++), geospatial mapping (Leaflet.js), and RESTful APIs for data aggregation, using PySpark for processing. Improved data accessibility by 40% and ensured database reliability with automated backups.
P

ProjectX.Cloud

January 2023 - July 2024
Machine Learning Engineer
RAG System Development: Built a production-ready retrieval-augmented generation (RAG) chatbot using Python, FastAPI, LangChain, and Chroma DB, trained on 1,000+ documents. Implemented RESTful API endpoints to handle user queries with Redis-based caching and Celery for background task processing. Impact: Enhanced chat response time by 1.5x through RAG optimization and enabled the PR agent to detect 80% of code quality issues, supporting a prototype platform handling 500 daily user queries before discontinuation due to strategic challenges. Performance Optimization: Implemented caching strategies and asynchronous processing to improve API response times by 45%, ensuring scalability for multiple concurrent language processing requests.
V

VisionWay

January 2022 - December 2022
Machine Learning Engineer
Built an AI-powered interview software with a client-side cheating detection dashboard, training ML models in Python using facial landmark analysis for eye and head tracking on a dataset of 500 images; integrated OpenAI API for question generation and answer scoring, with results displayed via a JSON-based FastAPI dashboard using Redis and Celery for caching. Backend Architecture: Designed and implemented a scalable backend system using Python, FastAPI, Redis and Celery that processed real-time video streams and AI analysis results. Created RESTful API endpoints that handled 100+ daily requests while maintaining sub-second response times.

Skills

TensorFlow
PyTorch
NLP
Computer Vision
Tableau
Power BI
SQL
Python
C++
Typescript
Node.js
Go
Java
AWS (Sagemaker, Lambda)
GCP (Vertex AI, Dialogflow CX)
Postgres
Docker
Kubernetes
Scrum
Jenkins
CICD
React
Next.js
My Projects

Check out my latest work

I've worked on a variety of projects, from simple websites to complex web applications. Here are a few of my favorites.

Transtokenization

Transtokenization

Adapted Mistral-7B LLM from English to Hindi through innovative tokenizer reconfiguration and cross-lingual token mapping. Implemented subword unit alignment for better coverage and fine-tuned with Hindi data, achieving significant improvements in perplexity, BLEU, and CHRF scores without full retraining.

AI Agent based Pull Request Reviewer

AI Agent based Pull Request Reviewer

An autonomous code review agent system that uses AI to analyze GitHub pull requests. Made with Celery worker and Redis as broker for efficient API handling. It is based on CrewAI on top of OpenAI API that receives , analyzes code and outputs responses in JSON format.

Digital Divide AI

Digital Divide AI

The digitaldivide.ai project is a collaborative initiative between XRI Global and students at the University of Arizona. Our mission is to provide real-time visibility into AI model capabilities across the world's languages. We believe that bridging the digital divide begins with establishing foundational technology support for all languages.

AI based changelog

AI based changelog

A developer-facing tool that allows devs to quickly AI-generate a changelog. Provides a unique flow with github actions, github api, openai api, and json files to help generate changelog automatically on publish a release, also keeping the option to add things manually.

Hackathons

Side Quests

I have written and publshed research papers, got scholarships from Hack The North and Linux foundation to attend Kubecon. Below are some of my achievements listed.

  • H

    Hack The North

    Waterloo, Ontario

    I got a scholarship to attend Hack the North, Canada's largest hackathon organized by University of Waterloo
  • D

    Delivered a speech around entrepreneurship.

    Kolkata, India

    I talked about how we raised funds for our startup with the initial MVP and scaled it for customers. https://www.linkedin.com/posts/academy-of-technology_a-special-talk-on-how-to-build-prize-winning-activity-7184774748038467584-P74E?utm_source=share&utm_medium=member_desktop&rcm=ACoAADb5VDMBBWBtC1aB-Mu0CuCEosyNC9Wd73g
  • Y

    Yet another talk around machine learning operations on cloud-native technologies.

    Kolkata, India

    Here I discussed around machine learning operations creating entire pipelines of machine learning using Kubeflow and talking about an end-to-end ML Ops project.
  • T

    Tensorflow Certfication

  • S

    SciPy Conference

    Tacoma, WA, USA

    My paper was accepted in this conference
  • S

    Speaker at FOSS Kolkata

    Kolkata, India

    I delivered a speech around trans tokenization on large language models (monolingual). I spoke about fine-tuning adapting trans tokenization to your existing methods. Also talked about making your individual tokenizer. (https://www.linkedin.com/feed/update/urn:li:activity:7248060113524031488/)
  • K

    Kubecon India

    Delhi, India

    Got a scholarship to attend Kubecon India
  • C

    Conducted an AI session around trans-tokenization

    Remote, India

    Trans-tokenization topic.
Contact

Get in Touch

Want to chat? Just shoot me a dm with a direct question on twitter and I'll respond whenever I can. I will ignore all soliciting.