Data Structures And Algorithms | 2019-04-18

in datastructure •  6 years ago 

Data Structures And Algorithms


Faster and simpler algorithms for finding large patterns in permutations (1902.08809v2)

László Kozma

2019-02-23

Permutation patterns and pattern avoidance have been intensively studied in combinatorics and computer science, going back at least to the seminal work of Knuth on stack-sorting (1968). Perhaps the most natural algorithmic question in this area is deciding whether a given permutation of length contains a given pattern of length . In this work we give two new algorithms for this well-studied problem, one whose running time is , and one whose running time is the better of and . These results improve the earlier best bounds of Ahal and Rabinovich (2000), and Bruner and Lackner (2012), and are the fastest algorithms for the problem when . When , the parameterized algorithm of Guillemot and Marx (2013) dominates. Our second algorithm uses polynomial space and is significantly simpler than all previous approaches with comparable running times, including an algorithm proposed by Guillemot and Marx. Our approach can be summarized as follows: "for every matching of the even-valued entries of the pattern, try to match all odd-valued entries left-to-right". For the special case of patterns that are Jordan-permutations, we show an improved, subexponential running time.

ProUM: Projection-based Utility Mining on Sequence Data (1904.07764v1)

Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, Philip S. Yu

2019-04-16

In recent decade, utility mining has attracted a great attention, but most of the existing studies are developed to deal with itemset-based data. Different from the itemset-based data, the time-ordered sequence data is more commonly seen in real-world situations. Current utility mining algorithms have the limitation when dealing with sequence data since they are time-consuming and require large amount of memory usage. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store necessary information of sequence-order and utility. By utilizing the projection technique in generating utility-array, ProUM can significantly improve the mining efficiency, and effectively reduce the memory consumption. Besides, we propose a new upper bound named sequence extension utility. Several pruning strategies are further applied to improve the efficiency of ProUM. Experimental results show that the proposed ProUM algorithm significantly outperforms the state-of-the-art algorithms.

p-Adic scaled space filling curve indices for high dimensional data (1904.07700v1)

Patrick Erik Bradley, Markus Wilhelm Jahn

2019-04-16

Space filling curves are widely used in Computer Science. In particular Hilbert curves and their generalisations to higher dimension are used as an indexing method because of their nice locality properties. This article generalises this concept to the systematic construction of p-adic versions of Hilbert curves based on affine transformations of the p-adic Gray code, and develops an efficient scaled indexing method for data taken from high-dimensional spaces based on these new curves, which with increasing dimension is shown to be less space consuming than the optimal standard static Hilbert curve index. A measure is derived which allows to assess the local sparsity of a data set, and is tested on some data.

Fast Commutative Matrix Algorithm (1904.07683v1)

Andreas Rosowski

2019-04-16

We show that the product of an nx3 matrix and a 3x3 matrix over a commutative ring can be computed using 6n+3 multiplications. For two 3x3 matrices this gives us an algorithm using 21 multiplications. This is an improvement with respect to Makarov algorithm using 22 multiplications[10]. We generalize our result for nx3 and 3x3 matrices and present an algorithm for computing the product of an lxn matrix and an nxm matrix over a commutative ring for odd n using n(lm+l+m-1)/2 multiplications if m is odd and using (n(lm+l+m-1)+l-1)/2 multiplications if m is even. Waksman algorithm for odd n needs (n-1)(lm+l+m-1)/2+lm multiplications[16], thus in both cases less multiplications are required by our algorithm.

On the Randomized Complexity of Minimizing a Convex Quadratic Function (1807.09386v7)

Max Simchowitz

2018-07-24

Minimizing a convex, quadratic objective of the form for is a fundamental problem in machine learning and optimization. In this work, we prove gradient-query complexity lower bounds for minimizing convex quadratic functions which apply to both deterministic and \emph{randomized} algorithms. Specifically, for , we exhibit a distribution over with condition number , such that any \emph{randomized} algorithm requires gradient queries to find a solution for which , where is the optimal solution, and a small constant. Setting , this lower bound implies the minimax rate of queries required to minimize an arbitrary convex quadratic function up to error . Our lower bound holds for a distribution derived from classical ensembles in random matrix theory, and relies on a careful reduction from adaptively estimating a planted vector in a deformed Wigner model. A key step in deriving sharp lower bounds is demonstrating that the optimization error cannot align too closely with . To this end, we prove an upper bound on the cosine between and in terms of the MMSE of estimating the plant in a deformed Wigner model. We then bound the MMSE by carefully modifying a result due to Lelarge and Miolane 2016, which rigorously establishes a general replica-symmetric formula for planted matrix models.

Dynamic Packed Compact Tries Revisited (1904.07467v1)

Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

2019-04-16

Given a dynamic set of strings of total length whose characters are drawn from an alphabet of size , a keyword dictionary is a data structure built on that provides locate, prefix search, and update operations on . Under the assumption that characters fit into a single machine word , we propose a keyword dictionary that represents in bits of space, supporting all operations in expected time on an input string of length in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure.

Point-width and Max-CSPs (1904.07388v1)

Clement Carbonnel, Miguel Romero, Stanislav Zivny

2019-04-16

The complexity of (unbounded-arity) Max-CSPs under structural restrictions is poorly understood. The two most general hypergraph properties known to ensure tractability of Max-CSPs, -acyclicity and bounded (incidence) MIM-width, are incomparable and lead to very different algorithms. We introduce the framework of point decompositions for hypergraphs and use it to derive a new sufficient condition for the tractability of (structurally restricted) Max-CSPs, which generalises both bounded MIM-width and \b{eta}-acyclicity. On the way, we give a new characterisation of bounded MIM-width and discuss other hypergraph properties which are relevant to the complexity of Max-CSPs, such as -hypertreewidth.

Approximation Algorithms for Distributionally Robust Stochastic Optimization with Black-Box Distributions (1904.07381v1)

Andre Linhares, Chaitanya Swamy

2019-04-16

Two-stage stochastic optimization is a framework for modeling uncertainty, where we have a probability distribution over possible realizations of the data, called scenarios, and decisions are taken in two stages: we make first-stage decisions knowing only the underlying distribution and before a scenario is realized, and may take additional second-stage recourse actions after a scenario is realized. The goal is typically to minimize the total expected cost. A criticism of this model is that the underlying probability distribution is itself often imprecise! To address this, a versatile approach that has been proposed is the {\em distributionally robust 2-stage model}: given a collection of probability distributions, our goal now is to minimize the maximum expected total cost with respect to a distribution in this collection. We provide a framework for designing approximation algorithms in such settings when the collection is a ball around a central distribution and the central distribution is accessed {\em only via a sampling black box}. We first show that one can utilize the {\em sample average approximation} (SAA) method to reduce the problem to the case where the central distribution has {\em polynomial-size} support. We then show how to approximately solve a fractional relaxation of the SAA (i.e., polynomial-scenario central-distribution) problem. By complementing this via LP-rounding algorithms that provide {\em local} (i.e., per-scenario) approximation guarantees, we obtain the {\em first} approximation algorithms for the distributionally robust versions of a variety of discrete-optimization problems including set cover, vertex cover, edge cover, facility location, and Steiner tree, with guarantees that are, except for set cover, within -factors of the guarantees known for the deterministic version of the problem.

Introduction to Multi-Armed Bandits (1904.07272v1)

Aleksandrs Slivkins

2019-04-15

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a review of the more advanced results. The chapters are as follows: Stochastic bandits; Lower bounds; Bayesian Bandits and Thompson Sampling; Lipschitz Bandits; Full Feedback and Adversarial Costs; Adversarial Bandits; Linear Costs and Semi-bandits; Contextual Bandits; Bandits and Zero-Sum Games; Bandits with Knapsacks; Incentivized Exploration and Connections to Mechanism Design. Status of the manuscript: essentially complete (modulo some polishing), except for last chapter, which the author plans to add over the next few months.

Stochastic Load Balancing on Unrelated Machines (1904.07271v1)

Anupam Gupta, Amit Kumar, Viswanath Nagarajan, Xiangkun Shen

2019-04-15

We consider the problem of makespan minimization on unrelated machines when job sizes are stochastic. The goal is to find a fixed assignment of jobs to machines, to minimize the expected value of the maximum load over all the machines. For the identical machines special case when the size of a job is the same across all machines, a constant-factor approximation algorithm has long been known. Our main result is the first constant-factor approximation algorithm for the general case of unrelated machines. This is achieved by (i) formulating a lower bound using an exponential-size linear program that is efficiently computable, and (ii) rounding this linear program while satisfying only a specific subset of the constraints that still suffice to bound the expected makespan. We also consider two generalizations. The first is the budgeted makespan minimization problem, where the goal is to minimize the expected makespan subject to scheduling a target number (or reward) of jobs. We extend our main result to obtain a constant-factor approximation algorithm for this problem. The second problem involves -norm objectives, where we want to minimize the expected q-norm of the machine loads. Here we give an -approximation algorithm, which is a constant-factor approximation for any fixed .



Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Congratulations @binarytree! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 40 posts. Your next target is to reach 50 posts.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!