Despite its difficulties in detail handling, SQL, the most widely used language for structured data processing, is significantly more convenient in defining the fundamental processes than high-level languages like Java. This is because high-level languages are not set-oriented, whereas SQL is. Using Java as a comparative language, the following examines characteristics of a set-oriented language based on SQL syntax (other high-level languages are similar).
Ability to focus on the details
The structured data is typically present in sets, or batches. It is imperative that a programming language supports set-oriented data manipulation well in order to process structured data sets in an efficient manner.
High-level languages, such as Java, lack ready-to-use class libraries that enable the execution of set-oriented operations. Despite providing array type, which is the same as sets, they only describe a very small set of fundamental operations. For tasks like filtering, grouping, and aggregating, hundreds of lines of code are frequently required; even a basic sum requires 4-5 lines of loop statements. The extremely lengthy code makes it difficult to understand the algorithm’s overall structure because it handles countless specifics, making it easy to overlook important parts.
In contrast, SQL provides a wide range of set-oriented operations that enable the creation of short code. These operations include basic operations like intersection, union, and difference as well as aggregate operations like SUM/COUNT, WHERE clause filtering, and GROUP statement grouping.
parameter for expression
Is set-orientedness sufficient on its own for processing structured data? If we create a class library for the high-level languages to handle set-oriented operations, will it produce the same outcome as SQL does?
However, it still requires work to be comparable to SQL.
Consider the process of filtering as an illustration. In order to keep members who meet the given criterion—that is, members who technically make the criterion true—a filtering needs a criterion. The criterion in SQL is an expression, such as WHERE x>0, which means retrieving members over which the x>0 result is true. Instead of computing the expression prior to the SQL statement being executed, it does so while the set members are being traversed. The expression is essentially a function that takes the current member as an argument. For example, a function specified as an expression is used as a parameter in the WHERE clause.
This straightforward wording is not supported by the Java grammar. The syntax is convoluted and the code is bloated since each function must be specified explicitly, even though it permits sending a function as a parameter to another. If an expression is directly written into a function as the parameter, it will be computed before the traversal of each of the set members.
Obviously it’s convenient and intuitive for SQL to use an expression to define a function and pass it as a parameter.
This kind of wording has a specific name. It’s functional language, or lambda syntax.
SQL makes great use of the Lambda syntax. Apart from the operations like filtering where Lambda is a necessity, others become simple to perform with the syntax used. An aggregate function can use an expression to calculate the aggregate values while the operation specified by the function name is executed. The sum(x*x) function, for instance, gets the sum of squares, where the expression x*x is calculated during the execution of sum. With the absence of the Lambda syntax, we can first get a set consisting of the square of each of the members with the set-oriented processing, and then calculate the sums over the set. The non-Lambda phrasing is not as intuitive as the Lambda syntax because an expression computed over one member each time is easier to write and understand than one over a whole set,.
Direct reference of fields
The structured data constitutes not the single values but the records made up of field values.
In most cases, SQL allows referencing a field value in an expression parameter through the field name, rather than specifying the record containing the desired field value. A referenced field needs to be headed by its table name (or the table’s alias) only when there is a field (or there are fields) with the same name.
Although the Lambda syntax can be implemented in Java by predefining a function, this function can only accept the current record as a parameter, and this must be mentioned before each field name in the function expression. For instance, if x is the parameter for the current record, the equation for calculating the amount would be “x.unitpricex.number,” which would be the result of multiplying the unit price by the number. However, it is just expressed as “unitpricenumber” in SQL.
The advanced program design that underlies SQL’s straightforward syntax allows the syntax to be accurately understood by the parser and executed. Some programming languages have Lambda syntax, which enables the definition of a function as an expression and the passing of it as a parameter, although they lack the SQL-like phrasing capability. As an alternative, they employ the laborious language style “x.unitprice*x.number.” A language is not considered professional structured data processing if it does not allow fields to be directly referenced.
Data structure that changes dynamically
Additionally, SQL supports dynamic data structures well.
Structured data is frequently returned in the form of results from calculations with structured data. It is impossible to determine the structure of a returned result before the code is written because it depends on the type of operation. It is therefore necessary to be able to accommodate dynamic data structures.
A new data structure can be created by any SQL SELECT statement by adding or removing fields instead of first specifying a structure (or class). In the absence of this capability, Java defines the classes or structures that are anticipated to exist during code compilation and, theoretically, does not permit the dynamic generation of data structures during execution.
The language that was translated
Dynamic data structures are not supported by compiled languages. Furthermore, the compiler is unable to determine whether to compile the parameter expression as a function for parameter passing or to calculate the parameter expression first and then pass the value, making it unsuitable for implementing the Lambda syntax. Additional syntax notations are required to provide instructions to the compiler. The decision-making is left up to a function in an interpreted language. Another crucial aspect of a set-oriented language is its interpreted execution.
Happy Learning!