Publications
November 2021
IEEE
May 2021
IEEE
We propose a method to recover and restore art-work that has been damaged over time due to several factors. Our method produces great results by completely removing damages in most of the images and perfectly estimating the damaged region. We achieved accurate results due to (i) a custom data augmentation technique which depicts realistic damages rather just blobs (ii) novel CResNetBlocks that subsequently upsample and downsample features to restore the image with efficient backpropagation measures, and (iii) the choice of using patch-discriminators to achieve sharpness and colorfulness. Our network architecture is a conditional Generative Adversarial Network where the generator uses a combination of adversarial loss, L 1 loss and the discriminator uses binary cross-entropy loss for optimization. While the expressiveness of existing comparison methods is limited, we present our results with several metrics for future comparison and showcase some visuals of recovered artwork.
Facial emotion recognition and pose estimation have been topics of great research interest in the past, especially in the last two decades. While several methods have achieved commendable results in both domains, most of them have focused on the fidelity and accuracy of predictions. Moreover, we believe that combining facial emotion recognition with body pose/posture estimation provides a more comprehensive understanding of the actions and behavior of a person. We propose an end-to-end pipeline that combines both i) pose estimation and ii) facial emotion recognition to analyze human behaviour. We achieve real-time performance due to optimal network size and reduced number of parameters compared to existing techniques, with a minimal trade-off in accuracy. We train the networks separately and combine them at inference time to form an end-to-end pipeline. Our technique has a low inference time, achieves real-time performance without compromising in accuracy, which can be beneficial for real-world applications such as surveillance, remote supervision, and proctoring systems.
Master's Thesis
Evaluating Multi View Stereo in a CPC
Scenario
Multi-view Stereo (MVS) has been an underlying research problem in computer vision and has been comprehensively studied and researched for several years. Recent advancements in deep learning has shown that learning-based MVS networks can yield promising results on replacing parts of the traditional MVS pipeline. However, their performance and generalization ability on internet images/community image collections (CPCs) remains an open question. Community image collections (CPCs) consists of crowd-sourced images captured in-the-wild with drastically different cameras and viewpoints consisting of occluded entities, which is very challenging for learning-based MVS methods. We propose to evaluate the depth estimation and reconstruction quality of current MVS methods on internet images and prototype a new framework that can provide accurate and complete 3D reconstruction of scenes in a CPC scenario. Specifically, we analyze the performance of existing MVS networks on internet images and provide a detailed discussion of the capabilities and shortcomings of these approaches. We propose a supplementary mask estimation module that is able to mask occlusion/noise in the foreground of these internet images, such as people, buildings and objects.
​
Subsequently, we propose a depth alignment and depth estimation module that is able to compute relative depth values of these masked foreground pixels and align them with the absolute depth of the scene, resulting in accurate absolute depth maps which can be fused to obtain a complete point cloud of the scene with foreground objects, instead of a point cloud containing only the specific contextual entity such as a building or a monument. The proposed methodology results in accurate 3D reconstructions from internet images, exhibiting it’s robust capability of depth estimation and 3D reconstruction in the wild.