Sorry, you need to enable JavaScript to visit this website.

Prior work on image compression has focused on optimizing models to achieve better reconstruction at lower bit rates. These approaches are focused on creating sophisticated architectures that enhance encoder or decoder performance. In some cases, there is the desire to jointly optimize both along with a designed form of entropy encoding. In some instances, these approaches result in the creation of many redundant components, which may or may not be useful.


This paper provides an in-depth study of a framework consisting of combining trainable im- age and video codecs with machine task algorithms. The field of video coding optimization for machines is gaining traction, due to the increasing share of image and video content that is dedicated to be analyzed by machines, rather than viewed by humans. Recent works in image compression have demonstrated the potential of end-to-end deep-learning-based auto-encoders that can be trained to optimally compress images and videos with respect to a target rate and any given differentiable quality metric.


Holistic word recognition in handwritten documents is an important research topic in the field of Document Image Analysis. For some applications, given strong language models, it can be more robust and computationally less expensive than character segmentation and recognition. This paper presents HH-CompWordNet, a novel approach to applying a Convolutional Neural Network (CNN) to directly to the DCT coefficients of the compressed domain word images.


Optimal power flow (OPF) is one of the most important optimization problems in the energy industry. In its simplest form, OPF attempts to find the optimal power that the generators within the grid have to produce to satisfy a given demand. Optimality is measured with respect to the cost that each generator incurs in producing this power. The OPF problem is non-convex due to the sinusoidal nature of electrical generation and thus is difficult to solve.


Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super-resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room for improvement. In this fashion, the model is extended to further improve the perceptual quality of the images. We have designed a network architecture with a novel basic block to replace the one used by the original ESRGAN. Moreover, we introduce noise inputs to the generator network in order to exploit stochastic variation.


Reinforcement Learning enables to train an agent via interaction with the environment. However, in the majority of real-world scenarios, the extrinsic feedback is sparse or not sufficient, thus intrinsic reward formulations are needed to successfully train the agent. This work investigates and extends the paradigm of curiosity-driven exploration. First, a probabilistic approach is taken to exploit the advantages of the attention mechanism, which is successfully applied in other domains of Deep Learning.