Language-Integrated Query

Cell Enumeration #

With the cell selectors, we can select the cells of a given type in the local memory storage via IEnumerable<CellType> or IEnumerable<CellType_Accessor>. An IEnumerable<T> is nothing more than a container where it can pump out elements one after another. This interface exposes basic enumeration capabilities. It does not provide indexer so we cannot take an element by specifying a subscript. There is no rewind facilities so U-turn and revisiting an element is impossible.

Enumerable Collection Operators #

Custom logic can be performed on the cells when iterating through them. .NET provides a set of static methods for querying enumerable collections. For a complete list of query methods, refer to the doc for the Enumerable interface.

With the extension methods provided by System.Linq.Enumerable, we can use the cell selectors to manipulate data in an elegant manner. Instead of writing data processing logic in a foreach loop, we can use the query interfaces to extract and aggregate information in a declarative way. For example, instead of writing:

var sum = 0;
foreach(var n in Global.LocalStorage.Node_Selector())
    sum += n.val;

We can simply write:

var sum = Global.LocalStorage.Node_Selector().Sum(n=>n.val);

Or:

var sum = Global.LocalStorage.Node_Selector().Select(n=>n.val).Sum();

The code eliminates the need for intermediate states (e.g., the sum variable in this example) and saves some implementation details. In GE, certain query optimizations can be done automatically by the query execution engine to leverage the indexes defined in TSL. Specifically, the execution engine inspects the filters, extracts the substring queries, and dispatches them to the proper substring query interfaces generated by the TSL compiler.

Language-Integrated Query (LINQ) #

LINQ provides a convenient way of querying a data collection. The expressive power of LINQ is equivalent to the extension methods provided by the System.Linq.Enumerable class, only more convenient to use. The following example demonstrates LINQ in GE versus its imperative equivalent:

/*==========================  LINQ version ==============================*/
var result = from node in Global.LocaStorage.Node_Accessor_Selector()
             where node.color == Color.Red && node.degree > 5
             select node.CellID.Value;

/*==========================  Imperative version ========================*/
var result = Global.LocalStorage.Node_Accessor_Selector()
            .Where(  node => node.color == Color.Red && node.degree > 5 )
            .Select( node => node.CellID.Value  );

Both versions will be translated to the same binary code; the elements in the LINQ expression will eventually be mapped to the imperative interfaces provided by System.Linq.Enumerable class. But, with LINQ we can write cleaner code. For example, if we try to write an imperative equivalent for the following LINQ expression, a nested lambda expression must be used.

 var positive_feedbacks = from user in Global.LocalStorage.User_Accessor_Selector()
                          from comment in user.comments
                          where comment.rating == Rating.Excellent
                          select new
                          {
                            uid = user.CellID,
                            pid = comment.ProductID
                          };

Parallel LINQ (PLINQ) #

PLINQ is a parallel implementation of LINQ. It runs the query on multiple processors simultaneously whenever possible. Calling AsParallel() on a selector turns it into a parallel enumerable container that works with PLINQ.

Limitations #

There is a limitation of IEnumerable<T>: IDisposable elements are not disposed during the enumeration. However, disposing a cell accessor after use is crucial, an undisposed cell accessor will result in the target cell being locked permanently.

This has led to the design decision that we actively dispose a cell accessor when the user code finishes using the accessor in the enumeration loop. As a result, it is not allowed to capture the value/reference of an accessor during an enumeration and store it somewhere for later use. The reference will be destroyed and the value will be invalidated immediately after the enumeration loop. Any operation done to the stored value/reference will cause data corruption or system crash. This is the root cause for the following limitations:

  • Select operator cannot return cell accessors, because the accessors are disposed as soon as the loop is done.

  • LINQ operators that cache elements, such as join, group by, are not supported.

  • PLINQ caches some elements and distributes them to multiple cores, therefore it will not work with cell accessors. It does work with cell object selectors, though.

  • Although an enumeration operation does not lock the whole local storage, it does take the trunk-level locks. Compound LINQ selectors with join operations are not supported, because the inner loop will try to obtain the trunk lock that has been taken by the outer one.