Versatility so that it is possible to request data using any technology.
The optimality of this request. The access method should be such that it is good and convenient to get data from the database.
Parallelism, because now everything is scalable, different servers simultaneously access the database for the same data. It needs to be done to maximize the benefits of parallelism and to process data faster this way.
It is still important for the storage layer to maintain the original parallelism so that all data is not beaten, overwritten, overwritten, etc.
At the same time, they must be securely stored and reliably reproduced. That is, if we write something to the database, we must be sure that we will get it back.
If you have worked with old databases, for example, FoxPro, you know that broken data often appears there. In new databases like MongoDB, Cassandra and others, these problems also happen. Maybe it’s just that they are not always noticed, because there is so much data and it is more difficult to notice.
For hardware, reliability is really important. This is, as it were, an assumption, since we are still going to talk about theoretical things. In our model, if something gets on the disk, then we believe that everything is fine there. How to replace a disk in a RAID on time is a concern of administrators for us today. We will not dive deeply into this issue, and practically will not touch on how efficiently the storage is organized physically.
To solve these problems, there are some approaches that are very similar across different data warehouses – both new and classic.