NoSQL

 

What is noSQL?

NoSQL stands for "not only SQL". A database management system used for storage and retrieval of non-relational (i.e., non-tabular) databases. Some examples of non-relational data models are graph, document, and key-value databases. NoSQL systems benefit from high flexibility and operational speed, and the possibility to be scaled across many servers".

 

Why is noSQL used?

NoSQL is used to store and process large volumes of unstructured, semi-structured, and varying data types. It allows developers to store data without defining a strict schema first, meaning the data structure can change over time without requiring structural modifications to the database itself. Furthermore, NoSQL is used to execute horizontal scaling. This means a system increases its processing capacity by adding more servers to a distributed network, rather than upgrading the hardware components of a single server.

 

Who uses NoSQL databases?

NoSQL is used mainly by software engineers, data engineers, and system architects. Organizations that build large-scale web applications, real-time analytics platforms, social media networks, and Internet of Things (IoT) systems rely on NoSQL. These groups use it to process millions of direct read and write operations per second across distributed geographic locations.

 

 What are the main types of NoSQL databases?

There are four primary structural categories of NoSQL databases:

1. Document databases: Store data in documents using JSON or BSON formats (e.g., MongoDB, CouchDB).

2. Key-value stores: Map a unique identifier (key) directly to a specific data object (value) (e.g., Redis, Amazon DynamoDB).

3. Wide-column stores: Organize data in flexible, dynamic columns instead of fixed rows (e.g., Apache Cassandra, HBase).

4. Graph databases: Store specific, interconnected relationships between data points using nodes and edges (e.g., Neo4j, ArangoDB).

How is NoSQL used in data engineering?

In data engineering, a wide-column NoSQL database like Apache Cassandra is used to ingest and store high-velocity time-series data. A data processing system continuously receives temperature and pressure readings from thousands of distinct industrial sensors every second.

Cassandra writes this data across multiple independent servers simultaneously, preventing system delays and ensuring the data is recorded even if one server fails. Data engineers then use Python and the cassandra-driver library to extract these raw sensor records and aggregate the numerical values for real-time monitoring software.

 

Which programming languages and libraries are used to interact with NoSQL?

 Developers use standard programming languages such as Python, Java, JavaScript to interact with NoSQL databases. Because NoSQL databases do not share a universal query language, developers must use specific software libraries and drivers to communicate with the database. For example, Python developers use the PyMongo library to connect to MongoDB, Java developers use the DataStax Java Driver to execute commands in Apache Cassandra, and JavaScript developers use the Mongoose library to manage document data structures in Node.js environments.