Modern database systems for online analytical processing (OLAP) typically rely on in-memory processing. Keeping all active data in DRAM severely limits the data capacity and makes larger deployments much more expensive than disk-based alternatives. Byte-addressable persistent memory (PMEM) is an emerging storage technology that bridges the gap between slow-but-cheap SSDs and fast-but-expensive DRAM. Thus, research and industry have identified it as a promising alternative to pure in-memory data warehouses. However, recent work shows that PMEM’s performance is strongly dependent on access patterns and does not always yield good results when simply treated like DRAM. To characterize PMEM’s behavior in OLAP workloads, we systematically evaluate PMEM on a large, multi-socket server commonly used for OLAP workloads. Our evaluation shows that PMEM can be treated like DRAM for most read access but must be used differently when writing. To support our findings, we run the Star Schema Benchmark on PMEM and DRAM. We show that PMEM is suitable for large, read-heavy OLAP workloads with an average query runtime slowdown of 1.66x compared to DRAM. Following our evaluation, we present 7 best practices on how to maximize PMEM’s bandwidth utilization in future system designs.
Many business applications benefit from fast analysis of online data streams. Modern stream processing engines (SPEs) provide complex window types and user-defined aggregation functions to analyze streams. While SPEs run in central data centers, wireless sensors networks (WSNs) perform distributed aggregations close to the data sources, which is beneficial especially in modern IoT setups. However, WSNs support only basic aggregations and windows. To bridge the gap between complex central aggregations and simple distributed analysis, we propose Disco, a distributed complex window aggregation approach. Disco processes complex window types on multiple independent nodes while efficiently aggregating incoming data streams. Our evaluation shows that Disco’s throughput scales linearly with the number of nodes and that Disco already outperforms a centralized solution in a two-node setup. Furthermore, Disco reduces the network cost significantly compared to the centralized approach. Disco’s tree-like topology handles thousands of nodes per level and scales to support future data-intensive streaming applications.