Data Structures And Algorithms | 2019-04-14

in datastructure •  6 years ago 

Data Structures And Algorithms


Distributed Graph Clustering by Load Balancing (1607.04984v3)

He Sun, Luca Zanetti

2016-07-18

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. However, most of these methods are based on complicated spectral techniques or convex optimisation, and cannot be applied directly for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to an optimal partition. The main component of our algorithm is an application of the random matching model of load balancing, which is a fundamental protocol in distributed computing and has been extensively studied in the past 20 years. Hence, our result highlights an intrinsic and interesting connection between graph clustering and load balancing. At a technical level, we present a purely algebraic result characterising the early behaviours of load balancing processes for graphs exhibiting a cluster-structure. We believe that this result can be further applied to analyse other gossip processes, such as rumour spreading and averaging processes.

Multiplicative Up-Drift (1904.05682v1)

Benjamin Doerr, Timo Kötzing

2019-04-11

Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible linear dependence on the growth parameter (the previous-best was quadratic). This gives immediately stronger run time guarantees for a number of applications.

Tight Hardness Results for Consensus Problems on Circular Strings and Time Series (1804.02854v4)

Laurent Bulteau, Vincent Froese, Rolf Niedermeier

2018-04-09

Consensus problems for strings and sequences appear in numerous application contexts, ranging from bioinformatics over data mining to machine learning. Closing some gaps in the literature, we show that several fundamental problems in this context are NP- and W[1]-hard, and that the known (partially brute-force) algorithms are close to optimality assuming the Exponential Time Hypothesis. Among our main contributions is to settle the complexity status of computing a mean in dynamic time warping spaces which, as pointed out by Brill et al. [DMKD 2019], suffered from many unproven or false assumptions in the literature. We prove this problem to be NP-hard and additionally show that a recent dynamic programming algorithm is essentially optimal. In this context, we study a broad family of circular string alignment problems. This family also serves as a key for our hardness reductions, and it is of independent (practical) interest in molecular biology. In particular, we show tight hardness and running time lower bounds for Circular Consensus String; notably, the corresponding non-circular version is easily linear-time solvable.

Tight Bounds for the Subspace Sketch Problem with Applications (1904.05543v1)

Yi Li, Ruosong Wang, David P. Woodruff

2019-04-11

In the subspace sketch problem one is given an matrix with bit entries, and would like to compress it in an arbitrary way to build a small space data structure , so that for any given , with probability at least , one has , where , and where the randomness is over the construction of . The central question is: How many bits are necessary to store ? This problem has applications to the communication of approximating the number of non-zeros in a matrix product, the size of coresets in projective clustering, the memory of streaming algorithms for regression in the row-update model, and embedding subspaces of in functional analysis. A major open question is the dependence on the approximation factor . We show if is not a positive even integer and , then bits are necessary. On the other hand, if is a positive even integer, then there is an upper bound of bits independent of . Our results are optimal up to logarithmic factors, and show in particular that one cannot compress to "directions" , such that for any , can be well-approximated from . Our lower bound rules out arbitrary functions of these inner products (and in fact arbitrary data structures built from ), and thus rules out the possibility of a singular value decomposition for in a very strong sense. Indeed, as , for the space complexity becomes arbitrarily large, while for it is at most . As corollaries of our main lower bound, we obtain new lower bounds for a wide range of applications, including the above, which in many cases are optimal.

Beyond trace reconstruction: Population recovery from the deletion channel (1904.05532v1)

Frank Ban, Xi Chen, Adam Freilich, Rocco A. Servedio, Sandip Sinha

2019-04-11

\emph{Population recovery} is the problem of learning an unknown distribution over an unknown set of -bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip and erasure noise channels. We initiate the study of population recovery under the \emph{deletion channel}, in which each bit is independently \emph{deleted} with some fixed probability and the surviving bits are concatenated and transmitted. This is a far more challenging noise model than bit-flip~noise or erasure noise; indeed, even the simplest case in which the population size is 1 (corresponding to a trivial distribution supported on a single string) corresponds to the \emph{trace reconstruction} problem, a challenging problem that has received much recent attention (see e.g.~\cite{DOS17,NP17,PZ17,HPP18,HHP18}). We give algorithms and lower bounds for population recovery under the deletion channel when the population size is some . As our main sample complexity upper bound, we show that for any , a population of strings from can be learned under deletion channel noise using samples. On the lower bound side, we show that samples are required to perform population recovery under the deletion channel, for all . Our upper bounds are obtained via a robust multivariate generalization of a polynomial-based analysis, due to Krasikov and Roddity \cite{KR97}, of how the -deck of a bit-string uniquely identifies the string; this is a very different approach from recent algorithms for trace reconstruction (the case). Our lower bounds build on moment-matching results of Roos~\cite{Roo00} and Daskalakis and Papadimitriou~\cite{DP15}.

Restricted Isometry Property under High Correlations (1904.05510v1)

Shiva Prasad Kasiviswanathan, Mark Rudelson

2019-04-11

Matrices satisfying the Restricted Isometry Property (RIP) play an important role in the areas of compressed sensing and statistical learning. RIP matrices with optimal parameters are mainly obtained via probabilistic arguments, as explicit constructions seem hard. It is therefore interesting to ask whether a fixed matrix can be incorporated into a construction of restricted isometries. In this paper, we construct a new broad ensemble of random matrices with dependent entries that satisfy the restricted isometry property. Our construction starts with a fixed (deterministic) matrix satisfying some simple stable rank condition, and we show that the matrix , where is a random matrix drawn from various popular probabilistic models (including, subgaussian, sparse, low-randomness, satisfying convex concentration property), satisfies the RIP with high probability. These theorems have various applications in signal recovery, random matrix theory, dimensionality reduction, etc. Additionally, motivated by an application for understanding the effectiveness of word vector embeddings popular in natural language processing and machine learning applications, we investigate the RIP of the matrix where is formed by taking all possible (disregarding order) -way entrywise products of the columns of a random matrix .

PT-ISABB: A Hybrid Tree-based Complete Algorithm to Solve Asymmetric Distributed Constraint Optimization Problems (1902.06039v4)

Yanchen Deng, Ziyu Chen, Dingding Chen, Xingqiong Jiang, Qiang Li

2019-02-16

Asymmetric Distributed Constraint Optimization Problems (ADCOPs) have emerged as an important formalism in multi-agent community due to their ability to capture personal preferences. However, the existing search-based complete algorithms for ADCOPs can only use local knowledge to compute lower bounds, which leads to inefficient pruning and prohibits them from solving large scale problems. On the other hand, inference-based complete algorithms (e.g., DPOP) for Distributed Constraint Optimization Problems (DCOPs) require only a linear number of messages, but they cannot be directly applied into ADCOPs due to a privacy concern. Therefore, in the paper, we consider the possibility of combining inference and search to effectively solve ADCOPs at an acceptable loss of privacy. Specifically, we propose a hybrid complete algorithm called PT-ISABB which uses a tailored inference algorithm to provide tight lower bounds and a tree-based complete search algorithm to exhaust the search space. We prove the correctness of our algorithm and the experimental results demonstrate its superiority over other state-of-the-art complete algorithms.

Efficient Distributed Workload (Re-)Embedding (1904.05474v1)

Monika Henzinger, Stefan Neumann, Stefan Schmid

2019-04-10

Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter. However, dynamically changing the embedding of workloads is algorithmically challenging: communication patterns are often not known ahead of time, but must be learned. During the learning process, overheads related to unnecessary moves (i.e., re-embeddings) should be minimized. This paper studies a fundamental model which captures the tradeoff between the benefits and costs of dynamically collocating communication partners on servers, in an online manner. Our main contribution is a distributed online algorithm which is asymptotically almost optimal, i.e., almost matches the lower bound (also derived in this paper) on the competitive ratio of any (distributed or centralized) online algorithm. As an application, we show that our algorithm can be used to solve a distributed union find problem in which the sets are stored across multiple servers.

Constant factor approximations to edit distance on far input pairs in nearly linear time (1904.05459v1)

Michal Koucký, Michael E. Saks

2019-04-10

For any , there are constants and and a randomized algorithm that takes as input an integer and two strings of length at most , and runs in time and outputs an upper bound on the edit distance that with high probability, satisfies . In particular, on any input with the algorithm outputs a constant factor approximation with high probability. A similar result has been proven independently by Brakensiek and Rubinstein (2019).

What Storage Access Privacy is Achievable with Small Overhead? (1904.05452v1)

Sarvar Patel, Giuseppe Persiano, Kevin Yeo

2019-04-10

Oblivious RAM (ORAM) and private information retrieval (PIR) are classic cryptographic primitives used to hide the access pattern to data whose storage has been outsourced to an untrusted server. Unfortunately, both primitives require considerable overhead compared to plaintext access. For large-scale storage infrastructure with highly frequent access requests, the degradation in response time and the exorbitant increase in resource costs incurred by either ORAM or PIR prevent their usage. In an ideal scenario, a privacy-preserving storage protocols with small overhead would be implemented for these heavily trafficked storage systems to avoid negatively impacting either performance and/or costs. In this work, we study the problem of the best that is achievable with only over plaintext access. To answer this question, we consider which is a generalization of the security notion that are considered by ORAM and PIR. Quite surprisingly, we present strong evidence that constant overhead storage schemes may only be achieved with privacy budgets of . We present asymptotically optimal constructions for differentially private variants of both ORAM and PIR with privacy budgets with only overhead. In addition, we consider a more complex storage primitive called key-value storage in which data is indexed by keys from a large universe (as opposed to consecutive integers in ORAM and PIR). We present a differentially private key-value storage scheme with and overhead. This construction uses a new oblivious, two-choice hashing scheme that may be of independent interest.



Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Congratulations @binarytree! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You received more than 100 upvotes. Your next target is to reach 250 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Vote for @Steemitboard as a witness to get one more award and increased upvotes!