Machine Learning Engineer
Skills
- Programming Skills:
Python
, bash
, LaTex
- Database:
SQL
, MongoDB
- Technical Skills:
TensorFlow
, Keras
, Scikit-Learn
, Gensim
, NLTK
, Pytesseract
, PyCaret
, Pandas
, NumPy
, Matplotlib
, Seaborn
, Regex
, SciPy
, BeautifulSoup
, Git
, GitHub
, Excel, PowerPoint
- Familiar Tools:
XGBoost
, LightGBM
, Streamlit
, Flask
, FastAPI
, Docker
, OpenCV
, Plotly
, bokeh
, Selenium
Work Experience
SyncMOF | Backend Engineer Intern | (May 2024 – August 2024)
Skills: Python
, Pandas
, Numpy
, pwlf
, SciPy
, scikit-learn
-
Used interpolation method to generate synthetic data; used DBSCAN & Hierarchical Clustering algorithms along with Principal Component Analysis(PCA) to group data.
-
Used Piecewise Linear Fit algorithm(pwlf) and B-splines Interpolation to enhance the accuracy of feature extraction processes.
-
Built a comprehensive data analysis, visualization, and processing pipeline, automating previously manual tasks and improving efficiency.
-
Wrote fast, efficient, and manageable code by packaging the code and following official Python PEP8 style guides.
Wint Wealth | Data Science Intern | (Oct 2023 – Feb 2024)
Skills: Python
, Web Scraping
, Web Crawling
, Beautiful Soup
, AWS Lambda
, AWS Simple Queue Service
, AWS S3
, Cron
, Regex
, Code Refactoring
, Team Coordination
, Teamwork
, Notion
- Built an internal Python utility library, centralizing the reused code in the ML codebase, thereby reducing code duplication and streamlining the whole codebase. Implemented SSH tunneling into EC2 and locally connect to DocumentDB, performing faster local testing.
- Built an efficient Web Crawling and Scraping Pipeline in a scalable fashion to scrape 20+ finance news sources, reducing the scraping time from 3 days to 4 hours.
- Implemented a serverless solution using AWS Lambda, SQS, Document DB, and S3, optimizing efficiency and scalability in the scraping pipeline.
- Built a dashboard to keep track of Scraping Pipeline using Appsmith, fetching data from MongoDB, AWS Cloudwatch, and AWS SQS.
- Worked in a fast-paced startup environment.
SiviSoft | AI/ML Intern | (Sept 2023 – Oct 2023)
Skills: Python
, Code Refactoring
, Code Debugging
, AWS CLI
, AWS S3
, NLP
, Regex
, pdfplumber
, Jira
, Elasticsearch
, Elasticview
, Team Coordination
, Teamwork
- Working with Medical PDF data(extracting patient data, scanned PDF data).
- Using Python and NLP; as of now, mostly working with Python.
- Have done lots of Code Debugging and Code Refactoring.
- Helping other interns/contract-based employees with their Jira tickets and setting up their environment.
- Worked for a little over 5 weeks, left due to mental health reasons and work culture.
Skills: Python
, CyberSecurity
, Statistics
, Data Analysis
, Machine Learning
, IoT/IoMT
- Created a POC using Python on Cyber risk quantification using FAIR, STRIDE Model for quantifying
cyber risk to IoMT/IoT devices.
- Wrote Python scripts that analyzed Terabytes of data to generate (text & excel) reports that checked if the data flow in the pipeline was happening as expected. (Data Validator Tool)
- Worked in Baselining for hospitals’ network data to identify any malicious behavior.
Articles
Projects
NLP Projects
-
Fake News Classification |
GitHub |
- Technologies Used:
Python
, TensorFlow
, scikit-learn
, nltk
, langdetect
, wordcloud
, matplotlib
, regex
, numpy
, pandas
- Implemented an LSTM Model on Kaggle Fake News Dataset with over 70K news text data, with 97% accuracy
- Along with standard text pre-processing, langdetect library was used to identify & remove news in other languages(French, German, Arabic, etc.) giving better model performance.
- For EDA, WordCloud, and plotting of bi-grams and tri-grams were used to identify the general words present in the corpus.
- LSTM Model was build using TensorFlow along with pre-trained GloVe Word Embeddings.
-
Topic Modeling Using RACE Dataset |
GitHub |
- Technologies Used:
Python
, Regex
, NLTK
, Gensim
, Scikit-Learn
, tSNE
, pyLDAvis
, bokeh
, Git
- This NLP Project aims to use statistical models to reveal the abstract “topics” present in a large set of text documents, thus trying to classify documents based on different themes they convey.
- Three Topic Modeling algorithms were used namely, Latent Semantic Analysis(LSA), Latent Dirichlet Allocation(LDA), and Non-Negative Matrix Factorization(NMF).
- BERTopic & Top2Vec were also explored which gave quite good results.
-
Medical Embeddings and Clinical Trial Search Engine |
Github |
- Technologies Used:
Python
, Gensim
, Word2Vec
, FastText
, Streamlit
, Git
- The Project aims to train SkipGram and FastText Models on COVID-19 Clinical Trials Dataset and builds a Search Engine where user can type any COVID-19 related keyword and it presents all the top n similar results from the dataset
Computer Vision Projects
-
Image Coloring using Autoencoders |
Github |
- Technologies Used:
Python
, TensorFlow
, Keras
, scikit-image
, matplotlib
, numpy
- I tried using Autoencoders and Transfer Learning for this one. I tried VGG16 and InceptionResNetV2 as an encoder/feature extractor layer and a custom decoder layer.
-
Muti-class Image Classification Model |
Github |
- Technologies Used:
Python
, tensorflow
, keras
, matplotlib
, flask
, gunicorn
, pathlib
, numpy
- The project aims to classify images into driving license, social security, and others category by using a CNN model architecture.
- An accuracy of 96% was achieved on test data of 150 images. Deployment was done using gunicorn and flask API.
Machine Learning & Python Projects
-
Python Cookiecutter Project Template |
GitHub |
- Technologies Used:
Python, Cookiecutter, Pytest, GitHub Actions, CI/CD, GitHub CLI, Bash, SetupTools, Linters, Pre-Commit
- Developed a customizable template using Cookiecutter, GitHub CLI, and GitHub Actions to automate the creation of Python project repositories, including setup for linting, testing, CI/CD, and secrets management.
- Implemented comprehensive GitHub Actions workflows for continuous integration(CI) and continuous delivery(CD), ensuring consistent code quality and automated testing.
- Integrated modern development tools and best practices such as VS Code settings,
pyproject.toml
configuration, and a suite of linting tools (flake8
, black
, mypy
, etc.) to enhance developer productivity and maintain code quality.
-
Basic Library Management System API |
GitHub |
- Technologies Used:
Python, FastAPI, Pydantic, MongoDB, Docker, GCP
- This project implements a RESTful API for a Library Management System using FastAPI with MongoDB Atlas as the database, deployed as a Docker image on GCP.
-
Business License Status Prediction |
GitHub |
- Technologies Used:
Python, scikit-learn, h2o, tensorflow, flask, gunicorn
- The project aims to predict if a customer’s license should be issued, renewed, or cancelled depending on features in the dataset. The problem statement was presented at ZS Data Science Challenge - 2019.
-
Medical Data Extraction Project |
Github |
- Technologies Used:
Python, Regex, OpenCV, Pytesseract, FastAPI
- Python backend was built using pytesseract, OpenCV, Regular expressions and FastAPI as a web serving framework
- Auto extracted important fields from patient details and medical prescriptions. Image processing was performed in OpenCV and then pytesseract was used for image to text conversion. The last step was to use Regular Expression (Regex) for extracting important fields from the text
- SQL Project: Provide Insights to Management in Consumer Goods Domain
-
Credit Card Default Prediction |
Github |
- This a classic Credit Card Default Prediction project where based on customer profile we want to predict whether the borrower is likely to default in the next 2 years or not have a delinquency of more than 3 months.
- LogisticRegression, RandomForst, XGBoost, LightGBM, and a vanilla Neural Network was implemented in modeling.
-
Regression Models for House Price Prediction |
GitHub |
- House Price Prediction on Pune Real-estate dataset using different regression models like Linear, Ridge, Lasso, Elastic Net, Random Forest, XGBoost, K-Nearest Neighbours, Support Vector Regressor, XGBoost.
- Also, multi-layer perceptron(MLP) was implemented using TensorFlow
-
Kaggle House Price Prediction |
Link |
Knowledge Repo
- My Notes from the book Natural Language Processing with TensorFlow, 2nd-ed. by Thushan Ganegedara
- Things I have become familiar with:
- My Notes from Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka.
- Things covered so far:
- Perceptron, Gradient Descent
- Logistic Regression, Decision Tree, SVM, KNN
- Feature Selection, Regularization(L1 & L2)
- Dimensionality Reduction: PCA, LDA
- Model Evaluation & HyperParameter Tuning
- Ensemble Learning: Bagging, Boosting
- Sentiment Analysis, Topic Modelling
- My Notes from the book Deep Learning with TensorFlow and Keras, 3rd Edition
- Will cover selective topics from this book
- Notes from Machine Learning using Python by Manaranjan Pradhan, U Dinesh Kumar
- This was the very first ML book I read.
About Me
-
📖 I’m interested in NLP & ML Engineering. And Looking forward to building my career there. I document my learning on GitHub and share it with the LinkedIn AI Community.
-
🕵🏼♂️ Besides my studies, I’m interested in learning about myself from a spiritual & psychological perspective.
-
👀 𝐋𝐨𝐨𝐤𝐢𝐧𝐠 𝐟𝐨𝐫 𝐦𝐲 𝐟𝐢𝐫𝐬𝐭 𝐟𝐮𝐥𝐥-𝐭𝐢𝐦𝐞 𝐫𝐨𝐥𝐞 𝐚𝐬 𝐚 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫, 𝐩𝐫𝐞𝐟𝐞𝐫𝐚𝐛𝐥𝐲 𝐬𝐭𝐚𝐫𝐭𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐚𝐧 𝐢𝐧𝐭𝐞𝐫𝐧𝐬𝐡𝐢𝐩.
-
👉🏼 Priority For Me: I’m looking for a fun work environment, especially a mentor under whom I can work and learn a lot of stuff, one who is willing to commit to me just as I will, and one who sees my potential.
-
⭐ Open to Remote Opportunities (both Internationally & within India)
-
😃 Contact me if you find me interesting. I’m active on LinkedIn🌼
Education