Amit Vikram Raj

Logo

Machine Learning Platform Engineer. This is my Portfolio. Active on LinkedIn

View the Project on GitHub avr2002/portfolio-avr

ML Platform Engineer

Resume.pdf | LinkedIn | Hackernoon | GitHub | bento.me | avr13405@gmail.com

Skills


About Me


Work Experience

Pattern | ML Platform Engineer | Jan 2025 - Present

Skills: Python, MLOps

  • MLOps & Cloud

MLOps Club | Course Co-author | June 2024 - Present

Skills: Python, AWS, Observability & Monitoring

  • Helping to build a micro-degree to bring you from "Coder" โ†’ "Software Engineer" โ†’ "Cloud/DevOps Engineer" โ†’ "MLOps Engineer".

SyncMOF | Backend Engineer Intern | May 2024 โ€“ Sept. 2024

Skills: Python, Pandas, Numpy, pwlf, SciPy, scikit-learn

  • Used interpolation methods to generate synthetic data; utilized DBSCAN & Hierarchical Clustering algorithms along with Principal Component Analysis (PCA) to group data.
  • Applied the Piecewise Linear Fit algorithm (pwlf) and B-splines Interpolation to enhance the accuracy of feature extraction processes.
  • Built a comprehensive data analysis, visualization, and processing pipeline, automating previously manual Excel tasks and improving efficiency.
  • Wrote fast, efficient, and manageable code by packaging the code and adhering to official Python PEP8 style guides.

Wint Wealth | Data Science Intern | Oct 2023 โ€“ Feb 2024

Skills: Python, Web Scraping, Beautiful Soup, AWS Lambda, AWS Simple Queue Service, AWS S3, Cron, Regex, Code Refactoring

  • Built an internal Python utility library, centralizing the reused code in the ML codebase, thereby reducing code duplication and streamlining the whole codebase.
  • Implemented SSH tunneling into EC2 and locally connected to DocumentDB, performing faster local testing.
  • Built an efficient Web Crawling and Scraping Pipeline in a scalable fashion to scrape 20+ finance news sources, reducing the scraping time from 3 days to 4 hours.
  • Implemented a serverless solution using AWS Lambda, SQS, DocumentDB, and S3, optimizing efficiency and scalability in the scraping pipeline.
  • Built a dashboard to keep track of the Scraping Pipeline using Appsmith, fetching data from MongoDB, AWS CloudWatch, and AWS SQS.
  • Worked in a fast-paced startup environment.

SiviSoft | AI/ML Intern | Sept. 2023 โ€“ Oct 2023

Skills: Python, Code Refactoring, Code Debugging, AWS CLI, AWS S3, Regex, pdfplumber, Jira, Elasticsearch, Elasticview

  • Worked with Medical PDF data, including extracting patient data and scanned PDF data.
  • Performed extensive Code Debugging and Code Refactoring.
  • Assisted other interns and new employees with their Jira tickets and environment setup.
  • Worked for a little over 5 weeks; left due to mental health reasons and work culture.

Culinda Inc. | Data Science Intern | Aug 2022 โ€“ Jan 2023

Skills: Python, CyberSecurity, Statistics, Data Analysis, Machine Learning, IoT/IoMT

  • Created a POC using Python on Cyber risk quantification using FAIR and STRIDE Models to quantify cyber risk to IoMT/IoT devices.
  • Wrote Python scripts that analyzed terabytes of data to generate text and Excel reports, ensuring the data flow in the pipeline was functioning as expected (Data Validator Tool).
  • Worked on baselining hospitals' network data to identify any malicious behavior.

Articles


Projects

Python Projects

  • Python Cookiecutter Project Template | GitHub
    • Technologies Used: Python, Cookiecutter, Pytest, GitHub Actions, CI/CD, GitHub CLI, Bash, setupTools, Linters, Pre-Commit
    • Developed a customizable template using Cookiecutter, GitHub CLI, and GitHub Actions to automate the creation of Python project repositories, including setup for linting, testing, CI/CD, and secrets management.
    • Implemented comprehensive GitHub Actions workflows for continuous integration (CI) and continuous delivery (CD), ensuring consistent code quality and automated testing.
    • Integrated modern development tools and best practices such as VS Code settings, pyproject.toml configuration, and a suite of linting tools (flake8, black, mypy, etc.) to maintain code quality.
  • Basic Library Management System API | GitHub
    • Technologies Used: Python, FastAPI, Pydantic, MongoDB, Docker, GCP
    • Implemented a RESTful API for a Library Management System using FastAPI with MongoDB Atlas as the database, deployed as a Docker image on GCP.

NLP Projects

  • Fake News Classification | GitHub
    • Technologies Used: Python, TensorFlow, scikit-learn, nltk, langdetect, wordcloud, matplotlib, regex, numpy, pandas
    • Implemented an LSTM Model on Kaggle Fake News Dataset with over 70K news text data, achieving 97% accuracy.
    • Along with standard text pre-processing, the langdetect library was used to identify & remove news in other languages (French, German, Arabic, etc.), improving model performance.
    • For EDA, WordCloud, and plotting of bi-grams and tri-grams were used to identify the general words present in the corpus.
    • The LSTM Model was built using TensorFlow along with pre-trained GloVe Word Embeddings.
  • Topic Modeling Using RACE Dataset | GitHub
    • Technologies Used: Python, Regex, NLTK, Gensim, Scikit-Learn, tSNE, pyLDAvis, bokeh, Git
    • This NLP Project aims to use statistical models to reveal the abstract โ€œtopicsโ€ present in a large set of text documents, classifying documents based on different themes they convey.
    • Three Topic Modeling algorithms were used: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF).
    • BERTopic & Top2Vec were also explored, yielding strong results.
  • Medical Embeddings and Clinical Trial Search Engine | GitHub
    • Technologies Used: Python, Gensim, Word2Vec, FastText, Streamlit, Git
    • This project trains SkipGram and FastText Models on the COVID-19 Clinical Trials Dataset and builds a search engine where users can input COVID-19-related keywords to retrieve the top n similar results from the dataset.

Computer Vision Projects

  • Image Coloring using Autoencoders | GitHub
    • Technologies Used: Python, TensorFlow, Keras, scikit-image, matplotlib, numpy
    • I tried using Autoencoders and Transfer Learning, using VGG16 and InceptionResNetV2 as encoder/feature extractor layers, paired with a custom decoder layer.
    • Although the results weren't that great๐Ÿฅฒ
  • Multi-class Image Classification Model | GitHub
    • Technologies Used: Python, TensorFlow, Keras, matplotlib, Flask, Gunicorn, pathlib, numpy
    • The project aims to classify images into driving license, social security, and others category by using a CNN model architecture.
    • An accuracy of 96% was achieved on test data of 150 images. Deployment was done using Gunicorn and Flask API.

Machine Learning/Other Projects

  • Business License Status Prediction | GitHub
    • Technologies Used: Python, scikit-learn, h2o, tensorflow, flask, gunicorn
    • The project aims to predict if a customer's license should be issued, renewed, or canceled depending on features in the dataset.
    • The problem statement was presented at ZS Data Science Challenge - 2019.
  • Medical Data Extraction Project | GitHub
    • Technologies Used: Python, Regex, OpenCV, Pytesseract, FastAPI
    • Built a Python backend using Pytesseract, OpenCV, regular expressions, and FastAPI as a web-serving framework.
    • Automatically extracted important fields from patient details and medical prescriptions.
    • Image processing was performed in OpenCV, followed by image-to-text conversion using Pytesseract, and then Regex for extracting key fields.
  • SQL Project: Provide Insights to Management in Consumer Goods Domain
  • Credit Card Default Prediction | GitHub
    • A classic Credit Card Default Prediction project to predict whether a borrower is likely to default in the next 2 years or not, based on customer profile data.
    • Implemented models including Logistic Regression, Random Forest, XGBoost, LightGBM, and a vanilla Neural Network.
  • Regression Models for House Price Prediction | GitHub
    • Predicted house prices on the Pune real-estate dataset using different regression models, including Linear, Ridge, Lasso, Elastic Net, Random Forest, XGBoost, K-Nearest Neighbours, Support Vector Regressor, and XGBoost.
    • Also implemented a multi-layer perceptron (MLP) using TensorFlow.
  • Kaggle House Price Prediction | Link
    • My very first project.

Knowledge Repo

NLP with TensorFlow

Machine Learning with PyTorch and Scikit-Learn

  • My Notes from Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka.
  • Things covered so far:
    • Perceptron, Gradient Descent
    • Logistic Regression, Decision Tree, SVM, KNN
    • Feature Selection, Regularization (L1 & L2)
    • Dimensionality Reduction: PCA, LDA
    • Model Evaluation & HyperParameter Tuning
    • Ensemble Learning: Bagging, Boosting
    • Sentiment Analysis, Topic Modelling

Deep Learning with TensorFlow and Keras

  • My Notes from the book Deep Learning with TensorFlow and Keras, 3rd Edition.
  • Will cover selective topics from this book.

Machine Learning using Python

  • Notes from Machine Learning using Python by Manaranjan Pradhan, U Dinesh Kumar.
  • This was the very first ML book I read.

Education