"Life is like riding a bicycle. To keep your balance you must keep moving"
I am a Software Engineer at Intel, currently focused on Deep Learning Performance R&D.
I recieved my Master's in Electrical Engineering from Rochester Institute of Technology in 2016, where I was advised by
Amlan Ganguly and Ray Ptucha
on Multi-Core Systems with NoC Architectures and Deep Learning. Over the course of my Master's I had an internship at Intel,
where I worked on creating high performance Deep Learning models for Intel Atom based SoC.
I recieved my Bachelor's degree in Electronics and Communication Engineering form Visvesvaraya Technological University, India in 2012.
I enjoy doing Interdisciplinary Research, and attending Hackathons. I am a Neuroscience, Physics, and Hardware Architecture enthusiast.
Engineer (Mar 2016 - date )
• Machine Learning Algorithms, and Deep Learning Data Science for Computer vision
• Deep Learning Software and Computer Architecture
• Hybrid computing (Distributed + Heterogeneous)
• Deep Learning Software Architecture - Intel Movidius Next-Gen AI Product
Mar 2019 – Present
• Deep Learning Graph Compiler nGraph - Intel Nervana NNP-Training
Sep 2017 – Mar 2019
• Computer Vision and Deep Learning - Intel Atom+FPGA+iGPU
Mar 2016 – Sep 2017
Software Engineer Intern (Oct 2015- Dec 2015 )
Performance analysis and optimization of machine learning algorithms (Deep Learning) for Computer Vision and Mobile Application. In Torch,OpenCV and TensorFlow.
Research Assistant
@Multi-Core System Lab | (Aug 2014 - Mar 2016)
Improving thermal performance of multi-core network-on-chip based architectures through a distributed and intelligent proactive thermal-aware task reallocation algorithm. Optimized for faster computation training time of Neural Network.
@Machine Intelligence Lab | (May 2015 – Oct 2015)
Developing improved scheme for video classification by going deeper using convolutional neural network for better accuracy and computational time.
Apprentice Engineer (Aug 2012- May 2013 )
Worked on data analysis of Solid State Digital Video Recording system and Electronic
Flight instrument system
for fault detection and integration of the system.
SYSML 2018 link
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new
topology remains challenging, as each requires some level of manual effort. This issue is compounded by the
proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires
deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs,
ASICs) and requires O(fp) effort; where f is the number of frameworks and p is the number of platforms. While optimized
kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks
(MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our
experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced
C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms.
Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel
Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported
compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our
overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range
of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference
optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of
operations).
Thesis link
Continuous improvement in silicon process technologies has made possible the integration of hundreds of cores on a
single chip. However, power and heat have become dominant constraints in designing these massive multicore chips causing
issues with reliability, timing variations and reduced lifetime of the chips. Dynamic Thermal Management (DTM) is a
solution to avoid high temperatures on the die. Typical DTM schemes only address core level thermal issues. However, the
Network-on-chip (NoC) paradigm, which has emerged as an enabling methodology for integrating hundreds to thousands of
cores on the same die can contribute significantly to the thermal issues. Moreover, the typical DTM is triggered
reactively based on temperature measurements from on-chip thermal sensor requiring long reaction times whereas
predictive DTM method estimates future temperature in advance, eliminating the chance of temperature overshoot.
Artificial Neural Networks (ANNs) have been used in various domains for modeling and prediction with high accuracy due
to its ability to learn and adapt. This thesis concentrates on designing an ANN prediction engine to predict the thermal
profile of the cores and Network-on-Chip elements of the chip. This thermal profile of the chip is then used by the
predictive DTM that combines both core level and network level DTM techniques. On-chip wireless interconnect which is
recently envisioned to enable energy-efficient data exchange between cores in a multicore environment, will be used to
provide a broadcast-capable medium to efficiently distribute thermal control messages to trigger and manage the DTM
schemes.
Identification of human finger snaps. Feature exaction using Cepstrum. Random Forest and PCA is used for training on the recorded data. Coding in Python.
A Web app that helps draw images that are uploaded by joining points(tracing the image). Using Machine Learning, Python, OpenCV, HTML5, JS and CSS
A Web app that converts boring articles to colorful for images for children to read. Using Machine Learning, Python, HTML, CSS and JS
Designed Arm Robot that can aim at moving targets in real time using a cat toy laser.
Regression technique was
used for arm movement in MATLAB.
Descriptive statistic was performed on the quantitative description of stock data and to understand the stock
performances of a company.
Historical temperature analysis using statistics and temperature prediction using machine learning algorithms
like PCA, SVD and Bayesian Networks.
Designed the RTL and the verification model for the Adaptive Quantizer and Tone & Transition model and some of
their sub models for the pipelined design of MCAC.
Designed the RTL model and performed verification, logic synthesis, test insertion and detailed timing analysis
using Verilog HDL.
Hierarchically designed Boundary Scan Sum with optimal sizing, clean DRC and LVS in Cadence Virtuoso, 0.6 micron
technology.
Deep Neural Networks with feature extraction was implemented for image classification on "calTech101" dataset.