Artificial Intelligence Preprint | 2019-07-14

in artificial •  6 years ago 

Artificial Intelligence


Learning by Abstraction: The Neural State Machine (1907.03950v2)

Drew A. Hudson, Christopher D. Manning

2019-07-09

We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning. Given an image, we first predict a probabilistic graph that represents its underlying semantics and serves as a structured world model. Then, we perform sequential reasoning over the graph, iteratively traversing its nodes to answer a given question or draw a new inference. In contrast to most neural architectures that are designed to closely interact with the raw sensory data, our model operates instead in an abstract latent space, by transforming both the visual and linguistic modalities into semantic concept-based representations, thereby achieving enhanced transparency and modularity. We evaluate our model on VQA-CP and GQA, two recent VQA datasets that involve compositionality, multi-step inference and diverse reasoning skills, achieving state-of-the-art results in both cases. We provide further experiments that illustrate the model's strong generalization capacity across multiple dimensions, including novel compositions of concepts, changes in the answer distribution, and unseen linguistic structures, demonstrating the qualities and efficacy of our approach.

Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle (1907.05390v1)

Guojun Wu, Yanhua Li, Zhenming Liu, Jie Bao, Yu Zheng, Jieping Ye, Jun Luo

2019-07-11

Many real-world human behaviors can be characterized as a sequential decision making processes, such as urban travelers choices of transport modes and routes (Wu et al. 2017). Differing from choices controlled by machines, which in general follows perfect rationality to adopt the policy with the highest reward, studies have revealed that human agents make sub-optimal decisions under bounded rationality (Tao, Rohde, and Corcoran 2014). Such behaviors can be modeled using maximum causal entropy (MCE) principle (Ziebart 2010). In this paper, we define and investigate a general reward trans-formation problem (namely, reward advancement): Recovering the range of additional reward functions that transform the agent's policy from original policy to a predefined target policy under MCE principle. We show that given an MDP and a target policy, there are infinite many additional reward functions that can achieve the desired policy transformation. Moreover, we propose an algorithm to further extract the additional rewards with minimum "cost" to implement the policy transformation.

Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark (1904.03845v2)

Guangrun Wang, Guangcong Wang, Xujie Zhang, Jianhuang Lai, Liang Lin

2019-04-08

Person re-identification (Re-ID) benefits greatly from the accurate annotations of existing datasets (e.g., CUHK03 \cite{li2014deepreid} and Market-1501 \cite{zheng2015scalable}), which are quite expensive because each image in these datasets has to be assigned with a proper label. In this work, we ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., we group the images into bags in terms of time and assign a bag-level label for each bag. This greatly reduces the annotation effort and leads to the creation of a large-scale Re-ID benchmark called SYSU-30. The new benchmark contains categories of persons, which is about times larger than CUHK03 ( categories) and Market-1501 ( categories), and times larger the ImageNet ( categories). It sums up to 29,606,918 images. Learning a Re-ID model with bag-level annotation is called the weakly supervised Re-ID problem. To solve this problem, we introduce a differentiable graphical model to capture the dependencies from all images in a bag and generate a reliable pseudo label for each person image. The pseudo label is further used to supervise the learning of the Re-ID model. When compared with the fully supervised Re-ID models, our method achieves the state-of-the-art performance on SYSU-30 and other datasets. The code, dataset, and pretrained model will be available at \url{https://github.com/wanggrun/SYSU-30k}.

A Formalization of Kant's Second Formulation of the Categorical Imperative (1801.03160v3)

Felix Lindner, Martin Mose Bentzen

2018-01-09

We present a formalization and computational implementation of the second formulation of Kant's categorical imperative. This ethical principle requires an agent to never treat someone merely as a means but always also as an end. Here we interpret this principle in terms of how persons are causally affected by actions. We introduce Kantian causal agency models in which moral patients, actions, goals, and causal influence are represented, and we show how to formalize several readings of Kant's categorical imperative that correspond to Kant's concept of strict and wide duties towards oneself and others. Stricter versions handle cases where an action directly causally affects oneself or others, whereas the wide version maximizes the number of persons being treated as an end. We discuss limitations of our formalization by pointing to one of Kant's cases that the machinery cannot handle in a satisfying way.

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning (1905.04819v4)

Yilun Du, Karthik Narasimhan

2019-05-13

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics, which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.

XGBoostLSS -- An extension of XGBoost to probabilistic forecasting (1907.03178v2)

Alexander März

2019-07-06

We propose a new framework of XGBoost that predicts the entire conditional distribution of a univariate response variable. In particular, XGBoostLSS models all moments of a parametric distribution, i.e., mean, location, scale and shape (LSS), instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to gain additional insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real world examples that demonstrate the benefits of our approach.

Safe Policy Improvement with Soft Baseline Bootstrapping (1907.05079v1)

Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

2019-07-11

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation.

Deep Ordinal Reinforcement Learning (1905.02005v2)

Alexander Zap, Tobias Joppen, Johannes Fürnkranz

2019-05-06

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.

DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks (1905.04791v2)

Jun Zhang, Tong Zheng, Shengping Zhang, Meng Wang

2019-05-12

Computational color constancy refers to the estimation of the scene illumination and makes the perceived color relatively stable under varying illumination. In the past few years, deep Convolutional Neural Networks (CNNs) have delivered superior performance in illuminant estimation. Several representative methods formulate it as a multi-label prediction problem by learning the local appearance of image patches using CNNs. However, these approaches inevitably make incorrect estimations for the ambiguous patches affected by their neighborhood contexts. Inaccurate local estimates are likely to bring in degraded performance when combining into a global prediction. To address the above issues, we propose a contextual deep network for patch-based illuminant estimation equipped with refinement. First, the contextual net with a center-surround architecture extracts local contextual features from image patches, and generates initial illuminant estimates and the corresponding color corrected patches. The patches are sampled based on the observation that pixels with large color differences describe the illumination well. Then, the refinement net integrates the input patches with the corrected patches in conjunction with the use of intermediate features to improve the performance. To train such a network with numerous parameters, we propose a stage-wise training strategy, in which the features and the predicted illuminant from previous stages are provided to the next learning stage with more finer estimates recovered. Experiments show that our approach obtains competitive performance on two illuminant estimation benchmarks.

Evidential positive opinion influence measures for viral marketing (1907.05028v1)

Siwar Jendoubi, Arnaud Martin

2019-07-11

The Viral Marketing is a relatively new form of marketing that exploits social networks to promote a brand, a product, etc. The idea behind it is to find a set of influencers on the network that can trigger a large cascade of propagation and adoptions. In this paper, we will introduce an evidential opinion-based influence maximization model for viral marketing. Besides, our approach tackles three opinions based scenarios for viral marketing in the real world. The first scenario concerns influencers who have a positive opinion about the product. The second scenario deals with influencers who have a positive opinion about the product and produce effects on users who also have a positive opinion. The third scenario involves influence users who have a positive opinion about the product and produce effects on the negative opinion of other users concerning the product in question. Next, we proposed six influence measures, two for each scenario. We also use an influence maximization model that the set of detected influencers for each scenario. Finally, we show the performance of the proposed model with each influence measure through some experiments conducted on a generated dataset and a real world dataset collected from Twitter.



Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Congratulations @wholesome-post! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published a post every day of the week

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!