ML Platform Engineer

Resume.pdf | LinkedIn | Hackernoon | GitHub | bento.me | raj.amitvikram@gmail.com

Skills

Programming Skills: Python, bash
Database: SQL, MongoDB
Technical Skills:
- Amazon Web Services(AWS), Docker, FastAPI, AWS CDK, OpenTelemetry, Grafana, Prometheus
- TensorFlow, Scikit-Learn, Gensim, NLTK, Pytesseract
- BeautifulSoup, Pandas, NumPy, Matplotlib, Seaborn, Regex
Familiar:
- XGBoost, LightGBM, Streamlit, Flask, OpenCV, SciPy, Plotly, bokeh, Selenium

About Me

🎧 I’m currently working on or plan to learn:
- Cloud Services: AWS
- Deploying AI/ML Models to Production
- Data Engineering - Databases, Warehouses, Lakes, and Pipelines for making data consumable
- Backend Engineering in general
🌼 In addition to my studies, I enjoy reading fiction, spiritual literature, and psychology books.
😄 I’m active on LinkedIn

Work Experience

Pattern | ML Platform Engineer | Jan 2025 - Present

Skills: Python, MLOps

MLOps & Cloud

MLOps Club | Course Co-author | June 2024 - Present

Skills: Python, AWS, Observability & Monitoring

Helping to build a micro-degree to bring you from "Coder" → "Software Engineer" → "Cloud/DevOps Engineer" → "MLOps Engineer".

SyncMOF | Backend Engineer Intern | May 2024 – Sept. 2024

Skills: Python, Pandas, Numpy, pwlf, SciPy, scikit-learn

Used interpolation methods to generate synthetic data; utilized DBSCAN & Hierarchical Clustering algorithms along with Principal Component Analysis (PCA) to group data.
Applied the Piecewise Linear Fit algorithm (pwlf) and B-splines Interpolation to enhance the accuracy of feature extraction processes.
Built a comprehensive data analysis, visualization, and processing pipeline, automating previously manual Excel tasks and improving efficiency.
Wrote fast, efficient, and manageable code by packaging the code and adhering to official Python PEP8 style guides.

Wint Wealth | Data Science Intern | Oct 2023 – Feb 2024

Skills: Python, Web Scraping, Beautiful Soup, AWS Lambda, AWS Simple Queue Service, AWS S3, Cron, Regex, Code Refactoring

Built an internal Python utility library, centralizing the reused code in the ML codebase, thereby reducing code duplication and streamlining the whole codebase.
Implemented SSH tunneling into EC2 and locally connected to DocumentDB, performing faster local testing.
Built an efficient Web Crawling and Scraping Pipeline in a scalable fashion to scrape 20+ finance news sources, reducing the scraping time from 3 days to 4 hours.
Implemented a serverless solution using AWS Lambda, SQS, DocumentDB, and S3, optimizing efficiency and scalability in the scraping pipeline.
Built a dashboard to keep track of the Scraping Pipeline using Appsmith, fetching data from MongoDB, AWS CloudWatch, and AWS SQS.
Worked in a fast-paced startup environment.

SiviSoft | AI/ML Intern | Sept. 2023 – Oct 2023

Skills: Python, Code Refactoring, Code Debugging, AWS CLI, AWS S3, Regex, pdfplumber, Jira, Elasticsearch, Elasticview

Worked with Medical PDF data, including extracting patient data and scanned PDF data.
Performed extensive Code Debugging and Code Refactoring.
Assisted other interns and new employees with their Jira tickets and environment setup.
Worked for a little over 5 weeks; left due to mental health reasons and work culture.

Culinda Inc. | Data Science Intern | Aug 2022 – Jan 2023

Skills: Python, CyberSecurity, Statistics, Data Analysis, Machine Learning, IoT/IoMT

Created a POC using Python on Cyber risk quantification using FAIR and STRIDE Models to quantify cyber risk to IoMT/IoT devices.
Wrote Python scripts that analyzed terabytes of data to generate text and Excel reports, ensuring the data flow in the pipeline was functioning as expected (Data Validator Tool).
Worked on baselining hospitals' network data to identify any malicious behavior.

Articles

Projects

Python Projects

Python Cookiecutter Project Template | GitHub
- Technologies Used: Python, Cookiecutter, Pytest, GitHub Actions, CI/CD, GitHub CLI, Bash, setupTools, Linters, Pre-Commit
- Developed a customizable template using Cookiecutter, GitHub CLI, and GitHub Actions to automate the creation of Python project repositories, including setup for linting, testing, CI/CD, and secrets management.
- Implemented comprehensive GitHub Actions workflows for continuous integration (CI) and continuous delivery (CD), ensuring consistent code quality and automated testing.
- Integrated modern development tools and best practices such as VS Code settings, pyproject.toml configuration, and a suite of linting tools (flake8, black, mypy, etc.) to maintain code quality.
Basic Library Management System API | GitHub
- Technologies Used: Python, FastAPI, Pydantic, MongoDB, Docker, GCP
- Implemented a RESTful API for a Library Management System using FastAPI with MongoDB Atlas as the database, deployed as a Docker image on GCP.

NLP Projects

Fake News Classification | GitHub
- Technologies Used: Python, TensorFlow, scikit-learn, nltk, langdetect, wordcloud, matplotlib, regex, numpy, pandas
- Implemented an LSTM Model on Kaggle Fake News Dataset with over 70K news text data, achieving 97% accuracy.
- Along with standard text pre-processing, the langdetect library was used to identify & remove news in other languages (French, German, Arabic, etc.), improving model performance.
- For EDA, WordCloud, and plotting of bi-grams and tri-grams were used to identify the general words present in the corpus.
- The LSTM Model was built using TensorFlow along with pre-trained GloVe Word Embeddings.
Topic Modeling Using RACE Dataset | GitHub
- Technologies Used: Python, Regex, NLTK, Gensim, Scikit-Learn, tSNE, pyLDAvis, bokeh, Git
- This NLP Project aims to use statistical models to reveal the abstract “topics” present in a large set of text documents, classifying documents based on different themes they convey.
- Three Topic Modeling algorithms were used: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF).
- BERTopic & Top2Vec were also explored, yielding strong results.
Medical Embeddings and Clinical Trial Search Engine | GitHub
- Technologies Used: Python, Gensim, Word2Vec, FastText, Streamlit, Git
- This project trains SkipGram and FastText Models on the COVID-19 Clinical Trials Dataset and builds a search engine where users can input COVID-19-related keywords to retrieve the top n similar results from the dataset.

Computer Vision Projects

Image Coloring using Autoencoders | GitHub
- Technologies Used: Python, TensorFlow, Keras, scikit-image, matplotlib, numpy
- I tried using Autoencoders and Transfer Learning, using VGG16 and InceptionResNetV2 as encoder/feature extractor layers, paired with a custom decoder layer.
- Although the results weren't that great🥲
Multi-class Image Classification Model | GitHub
- Technologies Used: Python, TensorFlow, Keras, matplotlib, Flask, Gunicorn, pathlib, numpy
- The project aims to classify images into driving license, social security, and others category by using a CNN model architecture.
- An accuracy of 96% was achieved on test data of 150 images. Deployment was done using Gunicorn and Flask API.

Machine Learning/Other Projects

Business License Status Prediction | GitHub
- Technologies Used: Python, scikit-learn, h2o, tensorflow, flask, gunicorn
- The project aims to predict if a customer's license should be issued, renewed, or canceled depending on features in the dataset.
- The problem statement was presented at ZS Data Science Challenge - 2019.
Medical Data Extraction Project | GitHub
- Technologies Used: Python, Regex, OpenCV, Pytesseract, FastAPI
- Built a Python backend using Pytesseract, OpenCV, regular expressions, and FastAPI as a web-serving framework.
- Automatically extracted important fields from patient details and medical prescriptions.
- Image processing was performed in OpenCV, followed by image-to-text conversion using Pytesseract, and then Regex for extracting key fields.
SQL Project: Provide Insights to Management in Consumer Goods Domain
- A simple project that I made while learning SQL in 2023.
- Project GitHub Link & Certificate of Participation
Credit Card Default Prediction | GitHub
- A classic Credit Card Default Prediction project to predict whether a borrower is likely to default in the next 2 years or not, based on customer profile data.
- Implemented models including Logistic Regression, Random Forest, XGBoost, LightGBM, and a vanilla Neural Network.
Regression Models for House Price Prediction | GitHub
- Predicted house prices on the Pune real-estate dataset using different regression models, including Linear, Ridge, Lasso, Elastic Net, Random Forest, XGBoost, K-Nearest Neighbours, Support Vector Regressor, and XGBoost.
- Also implemented a multi-layer perceptron (MLP) using TensorFlow.
Kaggle House Price Prediction | Link
- My very first project.

Knowledge Repo

NLP with TensorFlow

My Notes from the book Natural Language Processing with TensorFlow, 2nd-ed. by Thushan Ganegedara.
Things I have become familiar with:
- Word Embeddings
- Project: Sentence Classification using CNN
- RNNs, LSTMs, GRUs
  - Project: NER with RNNs
- Seq2Seq Learning, Language Modelling, Neural Machine Translation (NMT)
  - Project: Neural Machine Translation: English to German
  - Project: Language Modelling: Generating Text using LSTMs
- Currently learning Transformers:
  - Project: QnA with BERT using HuggingFace

Machine Learning with PyTorch and Scikit-Learn

My Notes from Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka.
Things covered so far:
- Perceptron, Gradient Descent
- Logistic Regression, Decision Tree, SVM, KNN
- Feature Selection, Regularization (L1 & L2)
- Dimensionality Reduction: PCA, LDA
- Model Evaluation & HyperParameter Tuning
- Ensemble Learning: Bagging, Boosting
- Sentiment Analysis, Topic Modelling

Deep Learning with TensorFlow and Keras

My Notes from the book Deep Learning with TensorFlow and Keras, 3rd Edition.
Will cover selective topics from this book.

Machine Learning using Python

Notes from Machine Learning using Python by Manaranjan Pradhan, U Dinesh Kumar.
This was the very first ML book I read.

Education

BS in Data Science & Application (CGPA: 8.5) IIT Madras 2021-2025(Expected)
12th Std. CBSE Board (Percentage: 86.8%) Star International School, Ranchi, JH 2020