Language-Integrated Query

Cell Enumeration #

With the cell selectors, we can select all cells of a specific type in the local memory storage, wrapped in an IEnumerable<CellType> or IEnumerable<CellType_Accessor>. This interface exposes basic enumeration capabilities. By itself, an IEnumerable<T> is nothing more than a container where it can pump out elements one after another -- similar to making iterations through the whole database with cursors in other databases. It does not provide indexer so we cannot take an element by specifying a subscript; there is no rewind facilities so U-turn and revisiting an element is impossible.

Enumerable Collection Operators #

Custom logic can be performed on the cells when iterating through them. The .NET framework provides a set of static methods for querying enumerable collections. For a complete list of query methods, refer to MSDN.

With the extension methods provided by System.Linq.Enumerable, we can use the cell selectors to manipulate data in a succinct style. Instead of writing data processing logic in a foreach loop, we can use the query interfaces to extract and aggregate information in a declarative way. For example, instead of writing:

var sum = 0;
foreach(var n in Global.LocalStorage.Node_Selector())
    sum += n.val;

We can simply write:

var sum = Global.LocalStorage.Node_Selector().Sum(n=>n.val);

Or:

var sum = Global.LocalStorage.Node_Selector().Select(n=>n.val).Sum();

The code is kept away from intermediate states(e.g., the sum variable in this example) and internal implementations. In GE, certain query optimizations can be done automatically by the query execution engine to leverage the indexes defined in TSL. More specifically, it inspects the filters, extracts the substring queries, and redirects them to proper substring query interfaces generated by the TSL compiler. The basic rule of expression rewriting is as follows:

  • Select operators are not allowed to return accessors.

  • For a Where operator, if there is an invocation of String.Contains on a string field of a cell and the field is indexed, the invocation sent to the inverted index module as a substring query.

  • If a string container field (such as a list of strings or an array of strings) is marked as indexed, the TSL compiler will generate extension methods ContainerType.Contains which accepts same parameters as those on System.String. Invocation of these methods are also executed as inverted index queries.

Language-Integrated Query (LINQ) #

LINQ is a convenient way to query a data collection. The expression power of LINQ is equivalent to those extension methods provided by the System.Linq.Enumerable class, only more convenient to use. The following example demonstrates LINQ in GE versus its imperative equivalent:

/*==========================  LINQ version ==============================*/ 
var result = from node in Global.LocaStorage.Node_Accessor_Selector()     
             where node.color == Color.Red && node.degree > 5             
             select node.CellID.Value;                                    
/*==========================  Imperative version ========================*/
var result = Global.LocalStorage.Node_Accessor_Selector()                      
            .Where(  node => node.color == Color.Red && node.degree > 5 )
            .Select( node => node.CellID.Value  );

Both versions will be translated to the same binary code; the elements in the LINQ expression will eventually be one-to-one mapped to the imperative interfaces provided in System.Linq.Enumerable class. But, with LINQ we can write cleaner code. For example, if we try to write an imperative equivalent for the following LINQ expression, a nested lambda expression must be used.

 var positive_feedbacks = from user in Global.LocalStorage.User_Accessor_Selector()
                          from comment in user.comments
                          where comment.rating == Rating.Excellent
                          select new 
                          {
                            uid = user.CellID,
                            pid = comment.ProductID
                          };

Parallel LINQ (PLINQ) #

PLINQ(MSDN) is a parallel implementation of LINQ. It runs the query on multiple processors simultaneously whenever possible. Calling AsParallel() on a selector will turn it into a parallel enumerable container that works with PLINQ.

Limitations #

There is a limitation of IEnumerable<T>: IDisposable elements are not disposed along the enumeration. However, disposing a cell accessor after use is crucial in GE, and a non-disposed cell accessor will result in the target cell being locked permanantly.

This has led to the design decision made in GE, that we actively dispose a cell accessor when the user code finishes using the accessor in the enumeration loop. As a result, it is not allowed for a user to capture the value/reference of an accessor during an enumeration and store it somewhere for later use. Because the reference will be destroyed and the value will be invalidated immediately after the enumeration loop body, any operation done to the stored value/reference will cause data corruption or system crash. This is the root cause for the following limitations:

  • Select operator cannot return cell accessors, because the accessors are disposed as soon as the loop is done.

  • LINQ operators that cache elements (such as join, group by) are not supported.

  • PLINQ caches some elements and then distributes them to multiple cores, therefore it will not work with cell accessors. It does work with cell object selectors, though.

  • Although enumeration operation will not block the whole database, it does employ trunk-level locks. Compound LINQ selectors with join operations are not supported, because the inner loop will try to obtain the trunk lock already taken by the outer one.