Introduction the increase in the volume of
Introduction Over the recent time, the amount of data to be processed has grown significantly.
If a few years ago the average statistical database contained up to a million records, then with the development and dissemination of the Internet, it became necessary to create databases with hundreds of millions or even billions of records. With the increase in the volume of the database, a processing problem arose. With a large database distribution between servers and a large number of tables, the search time of the required record is strongly increasing. Plus, since large databases are used on sites with millions of visitors, the number of one-time calls can reach several thousand. Each new conversion will come faster than the processing of the previous one. Thus, the database servers will quickly get killed, and all this will lead to a denial of work. It becomes obvious the need for the refusal of the relational databases and the transition to another methodology for data storage.
One of such methodologies is NoSQL.What is NoSQL? NoSQL is a concept that involves the use of non-relational data models and the ability to scale horizontally (allocating a database for a very large number of domains should not affect the processing speed). For the first time the term NoSQL was used in 1998 by the Italian software developer Carlo Strozzi, and then still meant a relational database with open source code that did not use the SQL language. In the modern sense, the term NoSQL has been used since 2009.
Why NoSQL is interesting? There are two main reasons: 1. Productive development of applications. The development of many applications requires effort to display data structures that are, for example, in RAM. Thus, we can obtain the solution, which will be solved the phenomenon of Impedance Mismatch. NoSQL databases offer a data model that better meets the needs of your application, resulting in easier interaction with the database. And it means that you need to write your code shorter, requiring less debugging. On this way a DB needs a smaller amount of changes 2.
Large amounts of data. Organizations considered valuable to have in its available as much information and rapid addition of processes, which in the case of a relational database is expensive, not to mention the fact, whether it is at all possible to provide. The main reason for this is that relational database designed to work on the same machine, while more economical to work with a large database and to distribute the load on clusters of many smaller and cheaper machines. Most of the NoSQL databases are designed just to run on clusters, this is why they are better suited to work with a large amount of information. Thus NoSql databases represent a promising technology that allows the manipulation of huge amounts of data distributed among servers.What Is a Key-Value Store? Key-value stores are the simplest NoSQL data stores to use from an API perspective. The client can either get the value for the key, put a value for a key, or delete a key from the data store. The value is a blob that the data store just stores, without caring or knowing what’s inside; it’s the responsibility of the application to understand what was stored.
Since key-value stores always use primary-key access, they generally have great performance and can be easily scaled. Figure 1 – typical example of Key-Value The model of KV is one of the simplest non-trivial data models, and richer data models are often implemented as an extension of it. This model can be extended to a discretely ordered model that maintains keys in lexicographic order. This extension is computationally powerful, in that it can efficiently retrieve selective key ranges.Comparison of characteristics between traditional RDBMS and Key-Value Store Relational databases and repositories of key values differ radically and are used to solve various problems. Comparing the characteristics allows us only to understand the difference between them. Comparison of characteristics will allow understand the difference between them:Relational database Key-value store The database consists of tables, tables contain columns and rows, and rows consist of values of columns.
All rows in one table have the same structure. For domains you can draw an analogy with tables, but unlike the tables for domains is not determined by the structure of the data. Domain is a box into which you can put anything you like. Records within the same domain can have different structures. The data model is defined in advance. Is strongly typed, contains constraints and relations to ensure integrity of data.
Identification of records takes place using the key, wherein each entry record has a dynamic set of attributes associated with it. The data model is based on the natural representation of the contained data, not of the functionality of the application. In some implementation, the attributes can only be strings. In other implementations, the attributes have simple data types that reflect the types used in programming: integers, array of strings, and lists. The data model is normalized to avoid data duplication. Normalization creates relationships between tables. Relationships between tables connect data in different tables. Between domains, as well as within the same domain, the relationship is not explicitly defined.
?omparison of data access between traditional RDBMS and Key-Value StoreRelational database Key-value store Data is created, updated, deleted and queried using structured query language (SQL). Data is created, updated, deleted and queried using a call to the API methods. SQL queries can extract data from single table or from multiple tables using joins Some implementations provide a SQL-like syntax to specify filter conditions. SQL queries can include aggregation and complex filters.
You can often use only the basic operators comparison (=, !=, <, >, <= and =>). A relational database usually contains built-in logic, such as triggers, stored procedures and functions. All business logic and logic to support the integrity of data contained in the application code. Comparison of interaction with applications between traditional RDBMS and Key-Value StoreRelational database Key-value store Most commonly used private APIs, or generalized, such as OLE DB or ODBC. The most commonly used SOAP and / or the REST API, by means of which the access to the data. The data is stored in a format that reflects their natural structure, so you need mapping of application structures and relational database structures. Data can be displayed more effectively in the application structure, only the code needs to write data into objects.
The advantages of Key-Value storage There are two distinct advantages of such systems to relational DB: 1. They are very suitable for cloud services. The first advantage of key-value storage is that they are easier, and thus have greater scalability than relational databases. If you put together your own system, and plan to place dozens or hundreds of servers that need to cope with the increasing workload for your data store, then you have to choice – key-value stores. Due to the fact that this storage is easily and dynamically expand, they are also useful for vendors who provide multi-user storage web platform.
Such a framework is relatively low-cost means of storing data with a lot of potential for scalability. Users typically pay only for what they use, but their needs may grow. The vendor will be able to dynamically and virtually no restrictions to increase the size of the platform, based on the load. 2. A more natural integration with the code. The relational data model and object model of code are usually constructed in different ways, leading to some incompatibilities. The developers solve this problem by writing the code that displays the relational model to an object model.
This process does not have clear and achievable values quickly, and can take a lot of time that could be spent on the development of the application itself. Meanwhile, many key-value storages store data in such a structure that appears in objects more naturally. This can significantly reduce development time. The disadvantages of Key-Value storage (the advantages of Relational DB) 1. Constraints in a relational database to ensure data integrity at the lowest level. Data that do not satisfy the constraints are physically unable to get to the base.
In storages of key-value there are no such restriction so data integrity monitoring is fully based on the application. However, in any code has bugs. If the errors in a properly designed relational database usually don’t lead to data integrity issues, errors in the storages of key-value storages will usually lead to such problems.
2. Another advantage of relational databases is that they force you to go through the process of developing a data model. If you have a well-developed model, the database will contain a logical structure that fully reflects the structure of the stored data, but at odds with the structure of the application. Thus, the data become independent of the application. This means that another application can use the same data and application logic can be changed without any changes in the database model. To do the same thing with the key-value storage, you need to replace the process of designing the relational model design classes in which are general classes, based on the natural data structure. 3.
Unlike relational databases, repositories are targeted for use in the “cloud”, are much less common standards. Although conceptually they are not different, they all have different the API, query interfaces and specific. Therefore, you’d better trust your vendor, because if something happens, it will be not so easily switch to another service provider. And given the fact that almost all modern key-value storages are in beta versions, trust is even riskier than in the case of relational databases.Key-Value Store Features on Riak example Usage of NoSQL data stores requires an understanding of features compatibility between itself and the standard RDBMS data stores, which also used by us. The main point is to understand what features NoSQL are lacking and what changes must be done to the application architecture for more effective use of a key-value data store and its features. Some common features of NoSQL data stores we will discuss here are consistency, transactions, query features, structure of the data, and scaling. Consistency Consistency applies only for a single-key operation.
These are either a get, put, or delete on a single key. Optimistic writes are very cost-expensive because data store itself cannot determine a change in value. In distributed key-value stores (Riak, for example) implemented the eventually consistent model of consistency. Since the value may have already been replicated to other nodes, Riak has two ways of resolving update conflicts: either the newest write wins and older writes lose, or both (all) values are returned allowing the client to resolve the conflict. In Riak, these options can be set up during the bucket creation. Buckets are just a way to namespace keys so that key collisions can be reduced. Let’s assume that all customer keys reside in the customer bucket.
When creating a bucket we can provide default consistency values, such as “write is considered good only when the data is consistent across all the nodes where the data is stored.” Bucket bucket = connection.createBucket(bucketName).withRetrier(attempts(3)).
r(numberOfNodesToRespondToRead).execute(); To guarantee that data in every node is consistent, we can increase the numberOfNodesToRespondToWrite set by w to be the same as nVal. Of course, doing that will decrease the cluster’s write performance. We can change the allowSiblings flag during bucket creation for some improvement on write or read conflicts. If the flag is set to false, store will let the last write to win and not create siblings.
Transactions Different products have different specifications of transactions, but, in general there are no guarantees on the writes. Many data stores do implement transactions in different ways. Riak uses the concept of quorum implemented by using the replication factor during the write API call. Let’s assume we have a Riak cluster with a replication factor of 5 and we supply the numberOfNodesToRespondToWrite (W) value of 3. It means that Riak will have tolerance of N – W = 2. So, up two nodes can be down, and data store still will succeed on write operation, though we would have lost some data on those two nodes for read. Query Features As name implies, all key-value stores can query by the key. When query uses some attributes of the value column, it’s not possible to use the database only, an application must read the value to check it out for validity.
There is an interesting side effect: most of the data stores will not return a list of all their primary keys. And even if they did, cost of retrieving lists of keys and later querying for the values would be quite excessive. Some key-value databases compensate this by searching inside the value, as it implemented in Riak Search tool. That allows user to query the data just like when using indexes. While using key-value stores, lots of thought has to be given to the design of the key.
Can the key be generated using some algorithm? Can the key be provided by the user (user ID, email, etc.)? Or derived from timestamps or other data that can be derived outside of the database? These query characteristics make key-value stores likely candidates for storing session data (with the session ID as the key), shopping cart data, user profiles, and so on. The expiry_secs property can be used to expire keys after a certain time interval, especially for session/shopping cart objects.
When writing to the Riak bucket using the store API, the object is stored for the key provided. Similarly, we can get the value stored for the key using the fetch API. Riak provides an HTTP-based interface, so that all operations can be performed from the web-browser or on the command line using curl. Let’s save this data to Riak: Use the curl command to POST the data, storing the data in the session bucket with the key of a7e618d9db25 ( have to provide this key): Structure of Data Key-value databases don’t care what is stored in the value part of the key-value pair. The value can be a blob, text, JSON, XML, and so on.
In Riak, we can use the Content-Type in the POST request to specify the data type.Scaling Many key-value stores scale by using sharding . With sharding, the value of the key determines on which node the key is stored. Let’s assume we are sharding by the first character of the key; if the key is f4b19d79587d, which starts with an f, it will be sent to different node than the key ad9c7a396542. This kind of sharding setup can increase performance as more nodes are added to the cluster.
Sharding also introduces some problems. If the node used to store f goes down, the data stored on that node becomes unavailable, nor can new data be written with keys that start with f. Data stores such as Riak allow you to control the aspects of the CAP Theorem: N (number of nodes to store the key-value replicas), R (number of nodes that have to have the data being fetched before the read is considered successful), and W (the number of nodes the write has to be written to before it is considered successful). Let’s assume we have a 5-node Riak cluster. Setting N to 3 means that all data is replicated to at least three nodes, setting R to 2 means any two nodes must reply to a GET request for it to be considered successful, and setting W to 2 ensures that the PUT request is written to two nodes before the write is considered successful.
These settings allow us to fine-tune node failures for read or write operations. Based on our need, we can change these values for better read availability or write availability. Generally speaking choose a W value to match your consistency needs; these values can be set as defaults during bucket creation. Suitable use cases Let’s discuss some of the problems where key-value stores are a good fit. 1. Storing Session Information Generally, every web session is unique and is assigned a unique sessionID value.
Applications that store the sessionID on disk or in an RDBMS will greatly benefit from moving to a key-value store, since everything about the session can be stored by a single PUT request or retrieved using GET. This single-request operation makes it very fast, as everything about the session is stored in a single object. 2. User Profiles, Preferences Almost every user has a unique userId, username, or some other attribute, as well as preferences such as language, color, timezone, which products the user has access to, and so on. This can all be put into an object, so getting preferences of a user takes a single GET operation.
Similarly, product profiles can be stored. 3. Shopping Cart Data E-commerce websites have shopping carts tied to the user.
As we want the shopping carts to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into the value where the key is the userID. When not to use There are problem spaces where key-value stores are not the best solution. 1. Relationships among Data If you need to have relationships between different sets of data, or correlate the data between different sets of keys, key-value stores are not the best solution to use, even though some key-value stores provide link-walking features. 2. Multioperation Transactions If you’re saving multiple keys and there is a failure to save any one of them, and you want to revert or roll back the rest of the operations, key-value stores are not the best solution to be used.
3. Query by Data If you need to search the keys based on something found in the value part of the key-value pairs, then key-value stores are not going to perform well for you. There is no way to inspect the value on the database side, with the exception of some products like Riak Search or indexing engines like Lucen or Solr. 4. Operations by Sets Since operations are limited to one key at a time, there is no way to operate upon multiple keys at the same time. If you need to operate upon multiple keys, you have to handle this from the client side.
Conclusion Key-value stores are most suitable for storing a large number of poorly structured data that assume distribution among several domains. That is, such repositories are suitable for sites with a very large number of visitors. Also, such a data store should be selected if the data should be object-oriented, or can have dynamic attributes.Although the main disadvantage of such storage facilities is the lack of the NoSQL standard, in the near future the standard can be adopted, and it will be more convenient to operate with this type of storage.