PODS program

Proceedings

PODS 2017 papers will be accessed here (TBD).

Conference Program: PODS Sessions

This page describes the complete PODS Conference program.

The PODS 2017 program will also be available at http://confer.csail.mit.edu/sigmod2017. Confer is a conference program management tool with a crowd-sourcing component that automatically groups papers into sessions to maximize the number of papers each attendee likes that he or she will get to see.

 

 

 

PODS Keynote Talk

Monday 8:15-9:40
Location: ♛Continental A
Session Chair: Jan Van den Bussche (Hasselt University)

  • Data Citation: a Computational Challenge (pods313sd)
    Susan Davidson (University of Pennsylvania)
    Abstract: Most information is now published in complex, structured, evolving datasets or databases. As such, there is increasing demand that this digital information should be treated in the same way as conventional publications and cited appropriately. Unlike traditional publications, however, which have a fixed granularity to which citations can be attached (e.g. a paper in a conference proceedings, or chapter in a book), the granularity of data varies when retrieved by a query over a database. Since there are a potentially infinite number of queries, each accessing and generating different subsets of the database, it is impossible to explicitly attach a citation to every possible result set and/or query. Data citation is therefore a computational challenge, whose solution draws on two well-studied problems in database theory: query answering using views, and provenance. In this talk, I will give an overview of what is being done in data citation and highlight several open research problems, both practical and theoretical.
    Bio: Susan B. Davidson received the B.A. degree in Mathematics from Cornell University in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University in 1980 and 1982. Dr. Davidson is the Weiss Professor of Computer and Information Science (CIS) at the University of Pennsylvania, where she has been since 1982, and currently serves as Chair of the board of the Computing Research Association. Dr. Davidson’s research interests include database and web-based systems, scientific data management, provenance, crowdsourcing, and data citation. Dr. Davidson was the founding co-director of the Penn Center for Bioinformatics from 1997-2003, and the founding co-director of the Greater Philadelphia Bioinformatics Alliance. She served as Deputy Dean of the School of Engineering and Applied Science from 2005-2007 and Chair of CIS from 2008-2013. Her awards include: ACM Fellow, Corresponding Fellowship of the Royal Society of Edinburgh (2015), Lenore Rowe Williams Award (2002), Fulbright Scholar and recipient of a Hitachi Chair (2004), Trustees’ Council of Penn Women/Provost Award (April 2015) for her work on advancing women in engineering, and IEEE TCDE Impact Award (2017).

 

 

 

RESEARCH SESSIONS

 

Session 1: New formal frameworks

Monday 9:40-10:30
Location: ♛Continental A
Session Chair: Paris Koutris (University of Wisconsin-Madison)

  • A Relational Framework for Classifier Engineering (pods047)
    Benny Kimelfeld (Technion) and Christopher Ré (Stanford University)
  • Querying Probabilistic Preferences in Databases (pods094)
    Batya Kenig (Technion), Benny Kimelfeld (Technion), Haoyue Ping (Drexel University) and Julia Stoyanovich (Drexel University)

 

 

Session 2: Algorithms, data structures, benchmarking

Monday 11:00-12:40
Location: ♛Continental A
Session Chair: Floris Geerts (University of Antwerp)

  • Benchmarking the chase (pods043)
    Michael Benedikt (University of Oxford), George Konstantinidis (University of Oxford), Giansalvatore Mecca (University of Basilicata), Boris Motik (University of Oxford), Paolo Papotti (Arizona State University), Donatello Santoro (University of Basilicata) and Efthymia Tsamoura (University of Oxford)
  • Efficient and Provable Multi-Query Optimization (pods040)
    Tarun Kathuria (Microsoft Research) and S Sudarshan (Indian Institute of Technology)
  • Write-Optimized Skip Lists (pods118)
    Michael Bender (Stony Brook University), Martín Farach-Colton (Rutgers University), Rob Johnson (Stony Brook University), Simon Mauras (ENS Lyon), Tyler Mayer (Stony Brook University), Cynthia Phillips (Sandia National Laboratories) and Helen Xu (MIT)
  • Output-optimal Parallel Algorithms for Similarity Joins (pods085)
    Xiao Hu (HKUST), Yufei Tao (University of Queensland) and Ke Yi (HKUST)

 

 

Gems of PODS and Test-of-Time Award

Monday 14:00-15:30
Location: ♛Continental A
Session Chair: Floris Geerts (University of Antwerp)

  • The Semiring Framework for Database Provenance (ToT Award)
    (pods319vt)
    Val Tannen (University of Pennsylvania)
    Abstract: Imagine a computational process that uses a complex input consisting of multiple “items” (e.g., files, tables, tuples, parameters, configuration rules) The provenance analysis of such a process allows us to understand how the different input items affect the output of the computation. It can be used, for example, to derive confidence in the output (given confidences in the input items), to derive the minimum access clearance for the output (given input items with different classifications), to minimize the cost of obtaining the output (given a complex input item pricing scheme). It also applies to probabilistic reasoning about an output (given input item distributions), as well as to output maintenance, and to debugging.
    Provenance analysis for queries, views, database ETL tools, and schema mappings is strongly influenced by their declarative nature, providing mathematically nice descriptions of the output-inputs correlation. In a series of papers starting with PODS 2007 we have developed an algebraic framework for describing such provenance based on commutative semirings and semimodules over such semirings. So far, the framework has exploited usefully the observation that, for database provenance, data use has two flavors: joint and alternative.
    Here, we have selected several insights that we consider essential for the appreciation of this framework’s nature and effectiveness and we also give some idea of its applicability.

    Bio: Val Tannen is a professor in the Department of Computer and Information Science of the University of Pennsylvania. He joined Penn after receiving his PhD from the Massachusetts Institute of Technology in 1987. After working for a time in Programming Languages, his current research interests are in Databases. Moreover, he has always been interested in applications of Logic to Computer Science and since 1994 he has also worked in Bioinformatics, leading a number of interdisciplinary projects. In Databases, he and his students and collaborators have worked on query language design and on models and systems for query optimization, parallel query processing, and data integration. More recently their work has focused on models and systems for data sharing, data provenance, the management of uncertain information and algorithmic provisioning for what-if analysis. Tannen has received the 20 year Test-of-Time Award from ICDT and the 10 year Test-of-Time Award from PODS. He is an ACM Fellow.

 

  • Data Integration: From the Enterprise into Your Kitchen (pods316ah)
    Alon Halevy (Recruit Institute of Technology)
    Abstract: Twenty five years ago, data integration was a concern mainly for large enterprises with many autonomous data sources. Since then, while data integration became even more important to enterprises, the technology has ventured out into new areas, such as the Web and recently into voice-activated consumer devices that answer a plethora of questions in your home. At the core of these systems is a mechanism for describing the semantics and capabilities of individual data sources. Database theory has made important contributions to the area of modeling data sources and query answering in data integration systems. I this talk I will cover some of the main developments in the field of data integration since the mid-90’s, and discuss the challenges that lie ahead.

    Bio: Alon Halevy is the C.E.O of the Recruit Institute of Technology. From 2005 to 2015 he headed the Structured Data Management Research group at Google. Prior to that, he was a professor of Computer Science at the University of Washington in Seattle, where he founded the Database Group. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space, and in 2004, Dr. Halevy founded Transformic, a company that created search engines for the deep web, and was acquired by Google. Dr. Halevy is a Fellow of the Association for Computing Machinery, the author of the book “The Infinite Emotions of Coffee”, and co-author of the book “Principles of Data Integration”.

 

 

PODS Session 3: Concurrency, JSON, learning and privacy

Monday 16:00-18:05
Location: ♛Continental A
Session Chair: Stijn Vansummeren (Université Libre de Bruxelles)

  • How Fast can a Distributed Transaction Commit? (pods050)
    Rachid Guerraoui (EPFL) and Jingjing Wang (EPFL)
  • JSON: data model, query languages and schema specification (pods130)
    Pierre Bourhis (INRIA Lille), Juan L. Reutter (PUC Chile), Fernando Suárez (PUC Chile) and Domagoj Vrgoč (PUC Chile)
  • J-Logic: Logical foundations for JSON querying (pods064)
    Jan Hidders (Free University Brussels), Jan Paredaens (University of Antwerp) and Jan Van den Bussche (Hasselt University)
  • Reverse Engineering SPJ-Queries from Examples (pods096)
    Yaacov Y. Weiss (Hebrew University of Jerusalem) and Sara Cohen (Hebrew University of Jerusalem)
  • Private Incremental Regression (pods042)
    Shiva Kasiviswanathan (Samsung Research America), Kobbi Nissim (Georgetown University) and Hongxia Jin (Samsung Research America)

 

 

PODS Session 4: Best paper award, ontologies and probabilistic databases

Tuesday 14:00-15:40
Location: ♛Continental A
Session Chair: Andreas Pieris (University of Edinburgh)

  • Best paper award: Dichotomies in Ontology-Mediated Querying with the Guarded Fragment (pods074)
    André Hernich (University of Liverpool), Carsten Lutz (University of Bremen), Fabio Papacchini (University of Liverpool) and Frank Wolter (University of Liverpool)
  • The Complexity of Ontology-Based Data Access with OWL2QL and Bounded Treewidth Queries (pods039)
    Meghyn Bienvenu (CNRS, University of Montpellier), Stanislav Kikot (Birkbeck University of London), Roman Kontchakov (Birkbeck University of London), Vladimir V. Podolskii (Steklov Mathematical Institute, National Research University Higher School of Economics), Vladislav Ryzhikov (Free University of Bozen-Bolzano) and Michael Zakharyaschev (Birkbeck University of London)
  • Conjunctive Queries on Probabilistic Graphs: Combined Complexity (pods131)
    Antoine Amarilli (Télécom Paris Tech), Mikael Monet (Télécom Paris Tech) and Pierre Senellart (École normale supérieure PSL Research University, INRIA Paris)
  • Circuit Treewidth, Sentential Decision, and Query Compilation (pods010)
    Simone Bova (TU Wien) and Stefan Szeider (TU Wien)

 

 

PODS Session 5: Enumeration problems

Tuesday 16:00-18:05
Location: ♛Continental A
Session Chair: Ke Yi (Hong Kong University of Science and Technology)

  • 2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection (pods104)
    David Eppstein (University of California, Irvine), Michael T. Goodrich (University of California, Irvine), Michael Mitzenmacher (Harvard University) and Manuel R. Torres (University of California, Irvine)
  • On Asymptotic Cost of Triangle Listing in Random Graphs (pods038)
    Di Xiao (Texas A&M University), Yi Cui (Texas A&M University), Daren Cline (Texas A&M University) and Dmitri Loguinov (Texas A&M University)
  • Efficiently Enumerating Minimal Triangulation (pods083)
    Nofar Carmeli (Technion), Batya Kenig (Technion) and Benny Kimelfeld (Technion)
  • Counting and Enumerating (Preferred) Database Repairs (pods071)
    Ester Livshits (Technion) and Benny Kimelfeld (Technion)
  • Answering Conjunctive Queries under Updates (pods027)
    Christoph Berkholz (Humboldt-Universität zu Berlin), Jens Keppeler (Humboldt-Universität zu Berlin) and Nicole Schweikardt (Humboldt-Universität zu Berlin)

 

 

PODS Session 6: Best student paper, streaming and sketches

Wednesday 14:00-15:40
Location: ♛Continental A
Session Chair: Yufei Tao (University of Queensland)

  • Best student paper award: Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem (pods105)
    Sepehr Assadi (University of Pennsylvania)
  • Streaming Algorithms for Measuring H-Impact (pods126)
    Priya Govindan (Rutgers University), Morteza Monemizadeh (Rutgers University) and S Muthukrishnan (Rutgers University)
  • Efficient Matrix Sketching over Distributed Data (pods129)
    Zengfeng Huang (UNSW), Xuemin Lin (UNSW), Wenjie Zhang (UNSW) and Ying Zhang (UTS)
  • BPTree: an L2 heavy hitters algorithm using constant memory (pods049)
    Vladimir Braverman (John Hopkins University), Stephen Chestnut (ETH Zurich), Nikita Ivkin (John Hopkins University), Jelani Nelson (Harvard University), Zhengyu Wang (Harvard University) and David Woodruff (IBM Research)

 

 

PODS Session 7: Dependencies, graphs and query evaluation

Wednesday 16:00-18:05
Location: ♛Continental A
Session Chair: Angela Bonifati (Université de Lyon)

  • Stable Model Semantics for Tuple-Generating Dependencies Revisited (pods041)
    Mario Alviano (University of Calabria), Michael Morak (TU Wien) and Andreas Pieris (University of Edinburgh)
  • Schema Mappings for Data Graphs (pods098)
    Nadime Francis (University of Edinburgh) and Leonid Libkin (University of Edinburgh)
  • Dependencies for Graphs (pods101)
    Wenfei Fan (University of Edinburgh) and Ping Lu (Beihang University)
  • A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries (pods014)
    Bas Ketsman (Hasselt University) and Dan Suciu (University of Washington)
  • What do Shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? (pods055)
    Mahmoud Abo Khamis (LogicBlox Inc), Hung Ngo (LogicBlox Inc) and Dan Suciu (LogicBlox Inc, University of Washington)

 

 

POSTER SESSIONS

 

SIGMOD/PODS Poster & Demo Session 1

Tuesday 16:00-18:00
Location: ⚓Stevens Salon D

  • All papers from PODS Research Sessions 1, 3, 4 and 7
  • All SIGMOD papers from Tuesday: SIGMOD Research Sessions 1-10
  • SIGMOD demos

 

SIGMOD/PODS Poster & Demo Session 2

Wednesday 16:00-18:00
Location: ⚓Stevens Salon D

  • All papers from PODS Research Sessions 2, 5 and 6
  • All SIGMOD papers from Wednesday: SIGMOD Research Sessions 11-20
  • SIGMOD demos

 

 

 

TUTORIAL SESSIONS

 

Invited Tutorial 1

Tuesday 11:00-12:20
Location: ♛Continental A
Session Chair: Benny Kimelfeld (Technion)

  • Statistical Relational Learning: Unifying AI & DB Perspectives on Structured Probabilistic Models (pods325tt)
    Lise Getoor (University of California, Santa Cruz)
    Abstract: Machine learning and database approaches to structured probabilistic models share many commonalities, yet exhibit certain important differences. Machine learning methods focus on learning probabilistic models from (certain) data and efficient learning and inference, whereas probabilistic database approaches focus on storing and efficiently querying uncertain data. Nonetheless, the structured probabilistic models that both use are often (almost) identical. In this tutorial, I will overview the field of statistical relational learning (SRL) and survey common approaches. I’ll make connections to work in probabilistic databases, and highlight commonalities and differences among them. I’ll close by describing some of our recent work on probabilistic soft logic.
    Bio: Lise Getoor is a Professor in the Computer Science Department at UC Santa Cruz. Her research areas include machine learning and reasoning under uncertainty; in addition, she works in data management, visual analytics and social network analysis. She has over 200 publications and extensive experience with machine learning and probabilistic modeling methods for graph and network data. She is a Fellow of the Association for Artificial Intelligence, an elected board member of the International Machine Learning Society, serves on the board of the Computing Research Association (CRA), DARPA ISAT (2016-19), and the External Advisory Boards for the Max Planck Institute for Software Systems and the San Diego Super Computer Center. She has served as Machine Learning Journal Action Editor, Associate Editor for the ACM Transactions of Knowledge Discovery from Data, JAIR Associate Editor, and on the AAAI Council. She was co-chair for ICML 2011, and has served on the PC of many conferences including the senior PC of AAAI, ICML, KDD, NIPS, UAI, WSDM and the PC of SIGMOD, VLDB, and WWW. She is a recipient of an NSF Career Award and 11 best paper and best student paper awards. She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a Professor in the Computer Science Department at the University of Maryland, College Park from 2001-2013.

 

 

Invited Tutorial 2

Wednesday 11:00-12:20
Location: ♛Continental A
Session Chair: Semih Salihoglu (University of Waterloo)

  • Communication Cost in Parallel Query Processing — A Tutorial (pods322ds)
    Dan Suciu (University oF Washington)
    Abstract: Populating a relational schema from textual content, a problem commonly known as Information Extraction, is pervasive in We consider the following problem: what is the amount of communication required to compute a query in parallel on p servers, over a large input database? To study this problem we define a variant of Valiant’s BSP, where servers are infinitely powerful and where the cost is given by the maximum communication per server, and the number of rounds. Query evaluation in this model has been studied for full conjunctive queries. A simple lower bound on the communication per server and number of rounds is expressed in terms of the fractional edge covering number of the query’s hypergraph, and it follows from Atserias, Grohe, and Marx’ (AGM) upper bound on the query size; however, no matching algorithm is known for this lower bound.
    Algorithms for computing full conjunctive queries are based on hash partitioning the input data, and the key difficulty is that they can only be applied when each partitioning step is applied to skew-free data. We will start by discussing the case when the computation is limited to a single round of communication. In this case, if the data is skew-fee, then a tight bound is given in terms of the fractional vertex covering number of the query’s hypergraph, and this result can be used to also derive a tight bound on arbitrary input data. Next, we consider the multi-round case. Here, the key algorithmic ingredient is a technique that uses additional rounds in order to handle skewed values in the data. Using this technique it has been shown recently that the tight bound for evaluating a query where all relations have arity at most two is given by the general bound derived from the AGM inequality. The case when the input relations have arbitrary arities remains open; it is not even known whether this bound is given in terms of fractional edge cover, or the fractional vertex cover.

    bio: Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the best paper award in SIGMOD 2000 and ICDT 2013, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, the 10 Year Most Influential Paper Award in ICDE 2013, the VLDB Ten Year Best Paper Award in 2014, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for JACM, VLDBJ, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS and ACM TOIS. Suciu’s PhD students Gerome Miklau, Christopher Re and Paris Koutris received the ACM SIGMOD Best Dissertation Award in 2006, 2010, and 2016 respectively, and Nilesh Dalvi was a runner up in 2008.

 

PODS Welcome Reception

Sunday 18:00-20:00
Location: International South

 

PODS Business Meeting

Monday 19:00-20:00
Location: ♛Continental A