The main research goal of this project is the quest for a rigorous mathematical theory of input-output efficient preprocessing. This new theory will develop the computational tools to design powerful algorithms for preprocessing very large instances of hard problems that very efficiently compress those instances to smaller ones with guaranteed size. Our motivation is the incapability of current preprocessing routines with compression guarantee (kernelizations) to handle very large instances that do not fit into main memory. The theory also seeks to rigorously explain the practical successes of preprocessing very large instances by algorithms without compression guarantee (heuristics), and will lead to a concept of computational intractability to explain the limitations of heuristics. The project aims to design preprocessing algorithms that harness the full capabilities of advanced processor technology and memory hierarchies of computing hardware in science and industry, to efficiently compress big data sets. With new multivariate computational models that utilize instance structure and hardware structure at the same time, we will deepen the understanding of the mathematical origins of compressibility and serve to build more powerful algorithms for preprocessing massive data sets.