The first part of the talk illustrates, in simple and data-inspired terms, what is a viral sequence, what are mutations, how mutated sequences become organized forming a “variant”, what are the effects of individual mutations and of variants, how viral sequences are deposited to public repositories (GenBank, COGUK, GISAID). The second part of the talk presents the systems that were developed within my group, thanks to ERC and EIT funding. Specifically, I will illustrate (i) ViruSurf, a search system enabling free meta-data driven search over the integrated and curated databases, now hitting about 3 million SARS-CoV-2 sequences, continuously updated from the above repositories; (ii) VirusViz, a data visualization tool for comparatively analyzing query results; (iii) VirusLab, a tool for exploring user-provided viral sequences; (iv) EpiSurf, a tool for intersecting viral sequences with epitopes - used in vaccine design. I will also hint at ongoing projects for viral surveillance and for exploring a knowledge base of viral resources.
Computational social choice is an interdisciplinary field that studies collective decision-making from an algorithmic perspective. Determining the winners under various voting rules is a mainstream area of research in computational social choice. Such rules assume that the voters provide complete information about their preferences, an assumption that is often unrealistic because typically only partial preference information is available. This state of affairs has motivated the study of the notions of the necessary winners and the possible winners with respect to a variety of voting rules.
In the first part of the talk, we will present an overview of results about the complexity of winner determination under incomplete information. In the second part of the talk, we will discuss the framework of election databases, a new framework that aims to create bridges between the computational social choice and the data management communities. An election database contains incomplete information about the preferences of voters (in the form of partial orders), alongside with standard database relations that provide contextual information. The availability of relational context enables the formulation of sophisticated queries about voting rules, candidates, winners, issues, and positions on issues. We will introduce the semantics of queries on election databases and explore their computational complexity.
The use of basic SQL aggregates in recursive queries enables programmers to employ query languages to develop complete big-data applications, including graph, machine learning and data mining applications. To achieve this goal, programmers must make sure that their SQL queries can be converted into equivalent Datalog programs that combine rigorous declarative semantics with very efficient and highly scalable fixpoint based semantics. Thus, our approach provides methods and tools to verify that (i) queries with recursive aggregate have Stable Model Semantics (SMS) and (ii) such SMS can be represented via a fixpoint-based computation that is conducive to bulk-synchronous and stale-synchronous parallelism. We also provide techniques to restructure queries that satisfy (i) but not (ii) into queries that satisfy both.