Projects

Check out my GitHub for more details!

Brain Tumor Classification – Deployment ready project Link to heading

Brain Tumor Classification Image

  • Deployed a scalable brain-tumor classifier on Kubernetes developed on Streamlit leveraging Restful API endpoints.
  • The deployment was based on the latest Docker image in the artifact registry, pushed by GitHub Actions achieving CI/CD.
  • Developed the model using CNN on TensorFlow Keras and managed the versioning and artifacts using MLFlow.
  • Automated training and retraining pipelines of model using Airflow DAGs based on triggers like feedback, model-decay.
  • Monitored on Tableau the model for confidence, prediction distribution, data for drift, skew to increase lifecycle of model.

Generative Models - VANs, GANs Link to heading

VAN Image

  • Designed and implemented Variational Autoencoders with a 32-dimensional latent space to reconstruct MNIST digits, enabling detailed exploration of latent space interpolations and smooth transitions between digit classes.
  • Developed Generative Adversarial Networks with CNN based architectures for the generator and discriminator, optimizing adversarial training using binary cross-entropy loss to generate synthetic CIFAR-10 images.

RAG Database Assistant Link to heading

RAG Database Assistant Image

  • Enabled users to query and retrieve results from the database in a conversational format leveraging GPT-4o on Langchain.
  • Reduced prompt size by retrieving relevant schemas based on user queries by doing a semantic search on a vector database.
  • Enhanced SQL query accuracy by adding few shot examples relevant to user queries and employing conversational memory.

Multimodal Document Handling Link to heading

Multimodal Document Handling Image

  • Retrieved relevant document pages based on user queries by converting images of pdf documents into embeddings using a vison transformer, storing them in a vector database and doing a similarity search on embeddings of user queries.
  • Optimized storage in database by using binary quantization and enhanced search accuracy by reranking and oversampling.

MBTA – Machine learning model for predicting the load on bus Link to heading

  • Improved user satisfaction by reducing load on bus by 30% with scheduling strategies derived based on a random forest model developed with MBTA Data Science team which achieved an accuracy of 84.8%.

Penalty Analysis and Goalkeeper Strategies Link to heading

Penalty Analysis and Goalkeeper Strategies

  • Identified and established key correlations between the direction of shot during a penalty and game specific factors like if player is facing home or away fans in front of the goal, team is leading or behind in goals, time in game and so on.#Beyond the Kick
  • Derived strategies to save penalties by predicting the direction of shot using a KNN model developed on the insights above.

Body Composition Scanner Link to heading

Body Composition Scanner Image

  • Used a pretrained Resnet model to extract silhouettes of front and left or right facing images of subjects.
  • Extracted features from silhouettes such as area, solidity of contour.
  • Established a linear relation between wrist size and neck circumference through a paper by Prof. John Verzani called “Human Proportions,” which was later used for other component calculations like fat percentage, lean mass, and more.

Classical Machine Learning Algorithms Link to heading

  • Implemented classical machine learning algorithms (Linear Regression, Logistic Regression, and SVM) from scratch using Python, NumPy, and SciPy, demonstrating strong foundational understanding of model optimization, regularization, gradient descent.
  • Developed comprehensive evaluation metrics (Accuracy, RMSE, SSE, Precision, Recall) to assess model performance, improving the interpretability and robustness of predictions.

F1 2021 – Visual Storyboard and Analytical Dashboard Link to heading

Formula1 Analysis Image

  • Crafted a data-driven story showcasing the season’s pivotal moments and dynamics, allowing audience to grasp complex factors such as race strategies, driver performance and points progression that contributed to the championship battle.#Beyond the Finish Line
  • Designed an interactive dashboard on Tableau featuring key insights into driver performance and championship progression.

Health Centre Database and Datawarehouse for Analysis Link to heading

Health Centre Database Image

  • Designed a multidimensional model for tracking metrics with a focus on identifying diagnostic trends and regional hazards.
  • Utilized AWS Glue jobs for ETL from PostgreSQL to Redshift, leveraging S3 buckets for intermediate storage and Lambda functions to automate job triggers for efficient data processing.

European Football/Soccer Database Management System Link to heading

European Football Database Image

  • Architected and implemented a scalable and robust database system for the European football/soccer league using MySQL and MongoDB.
  • Integrated MySQL database with Python using SQLAlchemy for data analysis.

Timeseries Analysis of Human Activities Link to heading

  • Conducted timeseries analysis on human activity data, applying techniques such as ARIMA, LSTM, and Prophet models to forecast and understand patterns.

Check out my GitHub for more projects and details!