"Life is like riding a bicycle. To keep your balance you must keep moving"
Albert Einstein
[]

Bio

I am a Software Engineer at Intel, currently focused on Deep Learning Performance R&D. I recieved my Master's in Electrical Engineering from Rochester Institute of Technology in 2016, where I was advised by Amlan Ganguly and Ray Ptucha on Multi-Core Systems with NoC Architectures and Deep Learning. Over the course of my Master's I had an internship at Intel, where I worked on creating high performance Deep Learning models for Intel Atom based SoC. I recieved my Bachelor's degree in Electronics and Communication Engineering form Visvesvaraya Technological University, India in 2012.

I enjoy doing Interdisciplinary Research, and attending Hackathons. I am a Neuroscience, Physics, and Hardware Architecture enthusiast.

home

Experiences

  • Intel Corporation :
  • Engineer (Mar 2016 - date )

    • Technologies
    • • Machine Learning Algorithms, and Deep Learning Data Science for Computer vision

      • Deep Learning Software and Computer Architecture

      • Hybrid computing (Distributed + Heterogeneous)

    • Products
    • • Deep Learning Software Architecture - Intel Movidius Next-Gen AI Product
      Mar 2019 – Present

      • Deep Learning Graph Compiler nGraph - Intel Nervana NNP-Training
      Sep 2017 – Mar 2019

      • Computer Vision and Deep Learning - Intel Atom+FPGA+iGPU
      Mar 2016 – Sep 2017

  • Intel Corporation (Hillsboro, OR):
  • Software Engineer Intern (Oct 2015- Dec 2015 )
    Performance analysis and optimization of machine learning algorithms (Deep Learning) for Computer Vision and Mobile Application. In Torch,OpenCV and TensorFlow.


  • Rochester Institute of Technology(Rochester, NY):
  • Research Assistant
    @Multi-Core System Lab | (Aug 2014 - Mar 2016)

    Improving thermal performance of multi-core network-on-chip based architectures through a distributed and intelligent proactive thermal-aware task reallocation algorithm. Optimized for faster computation training time of Neural Network.
    @Machine Intelligence Lab | (May 2015 – Oct 2015)
    Developing improved scheme for video classification by going deeper using convolutional neural network for better accuracy and computational time.

  • Hindustan Aeronautics Limited (Bangalore, India):
  • Apprentice Engineer (Aug 2012- May 2013 )
    Worked on data analysis of Solid State Digital Video Recording system and Electronic
    Flight instrument system for fault detection and integration of the system.

    home

    Research paper / Hackathons / Academic Projects

  • Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning:
  • SYSML 2018 link
    The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires O(fp) effort; where f is the number of frameworks and p is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations).


  • An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform:
  • Thesis link
    Continuous improvement in silicon process technologies has made possible the integration of hundreds of cores on a single chip. However, power and heat have become dominant constraints in designing these massive multicore chips causing issues with reliability, timing variations and reduced lifetime of the chips. Dynamic Thermal Management (DTM) is a solution to avoid high temperatures on the die. Typical DTM schemes only address core level thermal issues. However, the Network-on-chip (NoC) paradigm, which has emerged as an enabling methodology for integrating hundreds to thousands of cores on the same die can contribute significantly to the thermal issues. Moreover, the typical DTM is triggered reactively based on temperature measurements from on-chip thermal sensor requiring long reaction times whereas predictive DTM method estimates future temperature in advance, eliminating the chance of temperature overshoot. Artificial Neural Networks (ANNs) have been used in various domains for modeling and prediction with high accuracy due to its ability to learn and adapt. This thesis concentrates on designing an ANN prediction engine to predict the thermal profile of the cores and Network-on-Chip elements of the chip. This thermal profile of the chip is then used by the predictive DTM that combines both core level and network level DTM techniques. On-chip wireless interconnect which is recently envisioned to enable energy-efficient data exchange between cores in a multicore environment, will be used to provide a broadcast-capable medium to efficiently distribute thermal control messages to trigger and manage the DTM schemes.


  • OhSnap @HackPrinceton-2015 | Princeton University:
  • Identification of human finger snaps. Feature exaction using Cepstrum. Random Forest and PCA is used for training on the recorded data. Coding in Python.



  • SketchitUp @BrickHack-2015 | Rochester Institute of Technology:
  • A Web app that helps draw images that are uploaded by joining points(tracing the image). Using Machine Learning, Python, OpenCV, HTML5, JS and CSS


  • Literade @HackMIT-2015 | Massachusetts Institute of Technology:
  • A Web app that converts boring articles to colorful for images for children to read. Using Machine Learning, Python, HTML, CSS and JS

  • Other hackathon attended
    • Tinder4Food (Android App) @HackNY-2015 | New York University.
    • HackBU-2016 | Binghamton University.
    • BrickHack2-2016 | Rochester Institute of Technology


  • Moving Target Detection and Aiming - 2013 :
  • Designed Arm Robot that can aim at moving targets in real time using a cat toy laser. Regression technique was
    used for arm movement in MATLAB.


  • e-Data Analysis - 2014 :
  • Descriptive statistic was performed on the quantitative description of stock data and to understand the stock
    performances of a company.


  • Weather Prediction - 2014 :
  • Historical temperature analysis using statistics and temperature prediction using machine learning algorithms
    like PCA, SVD and Bayesian Networks.


  • Multi-Channel ADPCM CODEC (MCAC) - 2013 :
  • Designed the RTL and the verification model for the Adaptive Quantizer and Tone & Transition model and some of
    their sub models for the pipelined design of MCAC.


  • Memory Access Bus Arbiter (ARB) of DTMF receiver - 2013 :
  • Designed the RTL model and performed verification, logic synthesis, test insertion and detailed timing analysis
    using Verilog HDL.


  • Boundary Scan Sum - 2013 :
  • Hierarchically designed Boundary Scan Sum with optimal sizing, clean DRC and LVS in Cadence Virtuoso, 0.6 micron
    technology.


  • Deep Learning for Image Classification - 2013 :
  • Deep Neural Networks with feature extraction was implemented for image classification on "calTech101" dataset.


    home
    CONTACT