NESTOR | data series management/exploration/analytics

Storage and retrieval system for complex analytics on big sequence collections

Features

Summarization and Indexing

NESTOR uses specialized summarization techniques for both reducing the size of data series, but also for allowing blazing fast analytics. It additionally allows for the construction of domain specific indexes and decide when to use them by performing access path selection. Such indexes facilitate both analytical (such as similarity search) as well as aggregation queries.

Parallelization

Both data storage, indexing, as well as query processing can scale to large clusters of computing nodes, allowing both for multi-TB data processing but also for large analytical jobs to be performed in seconds.

Adaptive Reorganization

NESTOR's storage layer continuously and adaptively reorganizes the underlying data layout in order to match the current workload, without incurring any additional overhead.

Modern Hardware Optimizations

We utilize all modern hardware optimization techniques such as SIMD, NUMA-aware multi-processing, GPUs and SSD optimizations.

Team

Prof. Themis Palpanas (University of Paris)
Prof. Stratos Idreos (Harvard University)
Dr. Kostas Zoumpatianos (Harvard University & University of Paris)
PhD students: Qitong Wang, Ilias Azizi
Research Engineers: -
Collaborators: Anastasia Bezerianos, Niv Dayan, Panagiota Fatourou, Haridimos Kondylakis, Theophanis Tsandilas

Alumni: Paul Boniol, Manos Chatzakis, Karima Echihabi, Botao Peng, Luka Jakovljevic, Anna Gogolou, Michele Linardi, Federico Roncalo, Xucheng Tang

Research

We have developed the current state-of-the-art data series indexes, iSAX2+ (bulk loading), ADS+ and Dumpy (adaptive), DPiSAX and Odyssey (distributed), ParIS+ and Hercules (multi-core), SING (GPU), MESSI and Elpis (in-memory), Coconut-LSM (streaming series), ULISSE (variable-length), and ProS (progressive query answering) the first data series query workload benchmark, as well as DSStat, a toolset for data series preprocessing and visualization.

We have applied our techniques on streaming and uncertain data series, and have worked with data from diverse domains, such as home networks, road tunnels, seismology, neuroscience, astrophysics, manufacturing, as well as from deep learning embeddings.

Extensive experimental evaluations demonstrate that our techniques are the state-of-the-art for exact search and approximate search with quality guarantees, and the only viable solution for disk-resident datasets for both data series and general high-dimensional vector datasets.

Moreover, we have developed unsupervised methods for subsequence anomaly detection: NormA and Series2Graph (offline), and SAND (online). These methods exhibit state-of-the-art performance across a variety of dataset characteristics and anomaly types, without the need to learn from domain knowledge, labeled data, or datasets clean from anomalies.

Tutorials

In our tutorials we describe the most prevalent similarity search methods developed in both the data series and the high-dimensional communities, and comment on their merits and drawbacks. We present recent results from extensive experiemntal comparison studies, which demonstrate the superiority of the state-of-the-art data series methods. We also present and discuss the state-of-the-art methods in data series analytics, and subsequence anomaly detection in particular.
New Trends in Time Series Anomaly Detection.
Paul Boniol, John Paparrizos, Themis Palpanas.
EDBT 2023
Scalable Analytics on Large Sequence Collections.
Karima Echihabi, Themis Palpanas.
MDM 2022
New Trends in High-D Vector Similarity Search: AI-driven, Progressive, and Distributed.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
VLDB 2021
High-Dimensional Similarity Search for Scalable Data Science.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
ICDE 2021
Big Sequence Management: Scaling Up and Out.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
EDBT 2021
Big Sequence Management: on Scalability.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
IEEE BigData 2020
Big Sequence Management.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
ISCC 2020

Management

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences (a.k.a. data series, or time series). Examples of such applications come from biology, astronomy, entomology, the web, and other domains.
Data Series Management (Dagstuhl Seminar 19282).
Anthony Bagnall, Richard L. Cole, Themis Palpanas, Kostas Zoumpatianos.
Dagstuhl Reports 2019
Report on the First and Second Interdisciplinary Time Series Analysis Workshop (ITISA).
Themis Palpanas and Volker Beckmann.
SIGMOD Record 2019
T-Store: Tunable Storage for Large Sequential Data [a.k.a. data series, or time series].
Kostas Zoumpatianos, Stratos Idreos, Themis Palpanas.
NEDB 2019
Data Series Management: Fulfilling the Need for Big Sequence Analytics.
Kostas Zoumpatianos, Themis Palpanas.
ICDE 2018
The Parallel and Distributed Future of Data Series Mining. [Invited Paper]
Themis Palpanas.
HPCS 2017
Data Series Management: The Next Challenge
Themis Palpanas.
ICDE 2016
Big Sequence Management: A Glimpse on the Past, the Present, and
the Future. [Invited Paper]
Themis Palpanas.
LNCS 2016
Data Series Management: The Road to Big Sequence Analytics.
Themis Palpanas.
SIGMOD Record 2015

Indexing

For big data exploration, it is prohibitive to rely to full sequential scans for every single query, and therefore, indexing is required. The target of our indexing techniques is to make query processing efficient enough, such that the analysts can repeatedly fire several exploratory queries with quick response times and low initialization costs.
FreSh: A Lock-Free Data Series Index.
Panagiota Fatourou, Eleftherios Kosmas, Themis Palpanas, George Paterakis.
SRDS 2023
Dumpy: A Compact and Adaptive Index for Large Data Series Collections.
Zeyu Wang, Qitong Wang, Peng Wang, Themis Palpanas, Wei Wang.
SIGMOD 2023
Elpis: Graph-Based Similarity Search for Scalable Data Science.
Ilias Azizi, Karima Echihabi, Themis Palpanas.
PVLDB 2023
Odyssey: A Journey in the Land of Distributed Data Series Similarity Search.
Manos Chatzakis, Panagiota Fatourou, Eleftherios Kosmas, Themis Palpanas, Botao Peng.
PVLDB 2023
Hercules Against Data Series Similarity Search.
Karima Echihabi, Panagiota Fatourou, Kostas Zoumbatianos, Themis Palpanas, Houda Benbrahim.
PVLDB 2022
Efficient Range and kNN Twin Subsequence Search in Time Series.
Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos.
TKDE 2022
Data Series Similarity Search via Deep Learning.
Qitong Wang (supervised by: Themis Palpanas).
VLDB PhD Workshop 2022
SING: Sequence Indexing Using GPUs.
Botao Peng, Panagiota Fatourou, Themis Palpanas.
ICDE 2021
Twin Subsequence Search in Time Series.
Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos.
EDBT 2021
Fast Data Series Indexing for In-Memory Data.
Botao Peng, Panagiota Fatourou, Themis Palpanas.
VLDBJ 2021
BestNeighbor: Efficient Evaluation of kNN Queries on Large Time Series Databases.
Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseflia, Themis Palpanas, Dennis Shasha, Patrick Valduriez.
KAIS 2020
Scalable Data Series Subsequence Matching with ULISSE.
Michele Linardi, Themis Palpanas.
VLDBJ 2020
Evolution of a Data Series Index - The iSAX Family of Data Series Indexes.
Themis Palpanas.
CCIS 2020
ParIS+: Data Series Indexing on Multi-Core Architectures.
Botao Peng, Panagiota Fatourou, Themis Palpanas.
TKDE 2020
Massively Distributed Time Series Indexing and Querying.
Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas
TKDE 2020
Data Series Indexing Gone Parallel.
Botao Peng (supervised by Panagiota Fatourou, Themis Palpanas)
ICDE (PhD Workshop) 2020
Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm.
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas.
SIGMOD 2019
Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim.
PVLDB 2020
MESSI: In-Memory Data Series Indexing.
Botao Peng, Panagiota Fatourou, Themis Palpanas.
ICDE 2020
Truly Scalable Data Series Similarity Search.
Karima Echihabi (supervised by Themis Palpanas and Houda Benbrahim).
VLDB PhD Workshop 2019
Effective and Efficient Variable-Length Data Series Analytics.
Michele Linardi (supervised by Themis Palpanas).
VLDB PhD Workshop 2019
Coconut: Sortable Summarizations for Scalable Indexes over Static and Streaming Data Series.
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas.
VLDBJ 2019
Local Similarity Search on Geolocated Time Series Using Hybrid Indexing.
Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos.
SIGSPATIAL 2019
Distributed Algorithms to Find Similar Time Series.
Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Dennis Shasha, Themis Palpanas, Patrick Valduriez, Reza Akbarinia, Florent Masseglia.
ECML/PKDD 2019
Local Pair and Bundle Discovery over Co-Evolving Time Series.
Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos.
SSTD 2019
The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim.
PVLDB 2019
Scalable, Variable-Length Similarity Search in Data Series: The ULISSE Approach.
Michele Linardi, Themis Palpanas.
PVLDB 2019
Generating Data Series Query Workloads.
Kostas Zoumpatianos, Yin Lou, Ioana Ileana, Themis Palpanas, Johannes Gehrke.
VLDBJ 2018
Massively Distributed Time Series Indexing and Querying.
Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas.
TKDE 2018
ParIS: The Next Destination for Fast Data Series Indexing and Query Answering.
Botao Peng, Themis Palpanas, Panagiota Fatourou.
IEEE BigData 2018
Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas.
PVLDB 2018
ULISSE: ULtra compact Index for Variable-Length Similarity SEarch in Data Series
Michele Linardi, Themis Palpanas.
ICDE 2018
DPiSAX: Massively Distributed Partitioned iSAX
Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas.
ICDM 2017
ADS: The Adaptive Data Series Index
Kostas Zoumpatianos, Stratos Idreos, Themis Palpanas.
VLDBJ 2016
Query Workloads for Data-Series Indexes
Kostas Zoumpatianos, Yin Lou, Themis Palpanas, Johannes Gehrke.
KDD 2015
Beyond One Billion Time Series: Indexing and Mining Very Large Time Series
Collections with iSAX2+
Alessandro Camerra, Jin Shieh, Themis Palpanas, Thanawin Rakthanmanon, Eamonn Keogh.
KAIS 2014
Indexing for Interactive Exploration of Big Data Series
Kostas Zoumpatianos, Stratos Idreos, Themis Palpanas.
SIGMOD 2014
iSAX 2.0: Indexing and Mining One Billion Time Series.
Alessandro Camerra, Themis Palpanas, Jin Shieh, Eamonn Keogh.
ICDM 2010
Indexing Large Human-Motion Databases
Eamonn Keogh, Themis Palpanas, Victor B. Zordan, Dimitrios Gunopulos, Marc Cardle.
VLDB 2004

Analytics

Examples of analysis operations are queries by content (range and similarity queries, nearest neighbors), clustering, classification, outlier patterns, frequent sub-sequences, and others.
Choose Wisely: An Extensive Evaluation of Model Selection for Anomaly Detection in Time Series.
Emmanouil Sylligardos, Paul Boniol, John Paparrizos, Panos Trahanias, Themis Palpanas.
PVLDB 2023
Appliance Detection Using Very Low-Frequency Smart Meter Time Series.
Adrien Petralia, Philippe Charpentier, Paul Boniol, Themis Palpanas.
e-Energy 2023
dCAM: Dimension-wise Activation Map for Explaining Multivariate Data Series Classification.
Paul Boniol, Mohammed Meftah, Emmanuel Remy, Themis Palpanas.
SIGMOD 2022
iEDeaL: A Deep Learning Framework for Detecting Highly Imbalanced Interictal Epileptiform Discharges.
Qitong Wang, Stephen Whitmarsh, Vincent Navarro, Themis Palpanas.
PVLDB 2022
Predicting Dyslexia in Adolescents from Eye Movements During Free Painting View.
Alae Eddine El Hmimdi, Lindsey M Ward, Themis Palpanas, Vivien Sainte Fare Garnot, Zoi Kapoula.
BrainSci 2022
Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection.
John Paparrizos, Paul Boniol, Themis Palpanas, Ruey S. Tsay, Aaron Elmore, Michael J. Franklin.
PVLDB 2022
Theseus: Navigating the Labyrinth of Subsequence Anomaly Detection.
Paul Boniol, John Paparrizos, Yuhao Kang, Themis Palpanas, Ruey Tsay, Aaron J. Elmore, Michael J. Franklin.
PVLDB 2022
TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection.
John Paparrizos, Yuhao Kang, Ruey Tsay, Paul Boniol, Themis Palpanas, Michael J. Franklin.
PVLDB 2022
Predicting Dyslexia and Reading Speed in Adolescents from Eye Movements in Reading and Non-Reading Tasks: a Machine Learning Approach.
Alae Eddine El Hmimdi, Lindsey M Ward, Themis Palpanas, Zoi Kapoula.
BrainSci 2021
SAND: Streaming Subsequence Anomaly Detection.
Paul Boniol, John Paparrizos, Themis Palpanas, Michael J. Franklin.
PVLDB 2021
SAND in Action: Subsequence Anomaly Detection for Streams.
Paul Boniol, John Paparrizos, Themis Palpanas, Michael J. Franklin.
PVLDB 2021
Electricity Demand Activation Extraction: From Known to Uknown Signatures, Using Similarity Search.
Pauline Laviron, Zueqi Dai, Berenice Huquet, Themis Palpanas.
e-Energy 2021
Unsupervised and Scalable Subsequence Anomaly Detection in Large Data Series.
Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas, Mohammed Meftah. Emmanuel Remy.
VLDBJ 2021
GraphAn: Graph-based Subsequence Anomaly Detection.
Paul Boniol, Themis Palpanas, Mohammed Meftah, Emmanuel Remy.
PVLDB 2020
Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series.
Paul Boniol, Themis Palpanas.
PVLDB 2020
Unsupervised Subsequence Anomaly Detection in Large Sequences.
Paul Boniol (supervised by Themis Palpanas, Mohammed Meftah, Emmanuel Remy).
VLDB PhD Workshop 2020
Scalable Machine Learning on High-Dimensional Vectors: From Data Series to Deep Network Embeddings.
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas.
WIMS 2020
Automated Anomaly Detection in Large Sequences.
Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas.
ICDE 2020
SAD: An Unsupervised System for Subsequence Anomaly Detection.
Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas.
ICDE 2020
Matrix Profile Goes MAD: Variable-Length Motif and Discord Discovery in Data Series.
Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh.
DAMI 2020
Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length
Motifs in Data Series.
Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh.
SIGMOD 2018
VALMOD: A Suite for Easy and Exact Detection of Variable Length
Motifs in Data Series.
Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh.
SIGMOD 2018
Data Series Similarity Using Correlation-Aware Measures.
Katsiaryna Mirylenka, Michele Dallachiesa, Themis Palpanas.
SSDBM 2017
Correlation-Aware Distance Measures for Data Series
Katsiaryna Mirylenka, Michele Dallachiesa, Themis Palpanas.
EDBT 2017
Time Series Analysis for Near-Infrared Spectroscopy Data.
Novri Suhermi, Judit Gervain, Themis Palpanas.
fNIRS 2016
Characterizing Home Device Usage From Wireless Traffic Time Series.
Katsiaryna Mirylenka, Vassilis Christophides, Themis Palpanas, Ioannis Pefkianakis, Martin May.
EDBT 2016
Envelope-Based Anomaly Detection for High-Speed Manufacturing Processes.
Katsiaryna Mirylenka, Alice Marascu, Themis Palpanas, Matthias Fehr, Stefan Jank, Gunter Welde, Daniel Groeber.
APC|M 2013
Finding Interesting Correlations with Conditional Heavy Hitters.
Katsiaryna Mirylenka, Themis Palpanas, Graham Cormode, Divesh Srivastava.
ICDE 2013
Scalable Similarity Matching in Streaming Time Series.
Alice Marascu, Suleiman Ali Khan, Themis Palpanas.
PAKDD 2012
Real-Time Data Analytics in Sensor Networks.
Themis Palpanas.
Springer 2012

Exploration

Using our techniques, users can explore large datasets and find patterns of interest, using nearest neighbor search. They can draw queries (data series) using a mouse, or touch screen, or they can select from their own datasets.
ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees.
Karima Echihabi, Theophanis Tsandilas, Anna Gogolou, Anastasia Bezerianos, Themis Palpanas.
VLDBJ 2022
Data Series Progressive Similarity Search with Probabilistic Quality Guarantees.
Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, Anastasia Bezerianos, Themis Palpanas.
SIGMOD 2020
Progressive Similarity Search on Time Series Data.
Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, Anastasia Bezerianos.
BigVis@EDBT 2019
Comparing Similarity Perception in Time Series Visualizations.
Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, Anastasia Bezerianos.
TVCG 2019
Comparing Similarity Perception in Time Series Visualizations
Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, Anastasia Bezerianos.
IEEE VIS 2018
RINSE: Interactive Data Series Exploration with ADS+
Kostas Zoumpatianos, Stratos Idreos, Themis Palpanas.
VLDB 2015

Summarization

In order to support time- and space-efficient management and analytics, data series need to be summarized. Different summarization techniques are applicable to different applications and problem settings.
SEAnet: A Deep Learning Architecture for Data Series Similarity Search.
Qitong Wang, Themis Palpanas.
TKDE 2023
Deep Learning Embeddings for Data Series Similarity Search.
Qitong Wang, Themis Palpanas.
KDD 2021
Practical Data Prediction for Real-World Wireless Sensor Networks.
Usman Raza, Alessandro Camerra, Amy L. Murphy, Themis Palpanas, Gian Pietro Picco.
TKDE 2015
What Does Model-Driven Data Acquisition Really Achieve in Wireless Sensor Networks?.
Usman Raza, Alessandro Camerra, Amy L. Murphy, Themis Palpanas, Gian Pietro Picco.

Best Paper Award
PerCom 2012
Real-Time Data Analytics in Sensor Networks.
Themis Palpanas.
Springer 2012
Streaming Time Series Summarization Using User-Defined Amnesic Functions.
Themis Palpanas, Michail Vlachos, Eamonn Keogh, Dimitrios Gunopulos.
TKDE 2008
Online Amnesic Approximation of Streaming Time Series.
Themis Palpanas, Michail Vlachos, Eamonn Keogh, Dimitrios Gunopulos, Wagner Truppel.
ICDE 2004

Uncertainty

Modeling tuples with value and existential uncertainty has several advantages. From an engineering perspective, a programmer can feed uncertain data directly into the system, without explicitly preprocessing data and forcing data approximations. From an application requirements perspective, maintaining possible values allows the application to provide results with confidence intervals.
Sliding Windows over Uncertain Data Streams
Michele Dallachiesa, Gabriela Jacques-Silva, Bugra Gedik, Kun-Lung Wu, Themis Palpanas.
KAIS 2015
Top-k Nearest Neighbor Search In Uncertain Data Series
Michele Dallachiesa, Themis Palpanas, Ihab F. Ilyas.
VLDB 2015
Uncertain Time-Series Similarity: Return to the Basics
Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas.
VLDB 2012
Similarity Matching for Uncertain Time Series:
Analytical and Experimental Comparison
Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas.
QUeST @ GIS 2011

Contact

France
Email: Prof. Themis Palpanas
LIPADE - University of Paris
45 rue des Saints Pères
Paris 75006, France

Storage and retrieval system for complex analytics on big sequence collections

Storage and retrieval system for complex analytics on big sequence collections

Data Series

Scaling to Big Data

Interactive Science

Features

Team

Research

Tutorials

Management

Indexing

Analytics

Exploration

Summarization

Uncertainty

Contact