Discussion:
Filter a datatable for values from another datatable
(too old to reply)
x***@y.com
2016-04-30 23:37:01 UTC
Permalink
Say I have two datatables, table1 and table2, where both have a column
named "id".
I want to filter table1 for those id values that are in table2.
Similar to sql query:

select * from table1
where id in (select id from table2)

I can do this using dataview and rowfilter and looping and all that.
I am looking for a more elegant solution, using LINQ perhaps?
Marcel Mueller
2016-05-01 07:19:12 UTC
Permalink
Post by x***@y.com
Say I have two datatables, table1 and table2, where both have a column
named "id".
I want to filter table1 for those id values that are in table2.
select * from table1
where id in (select id from table2)
I can do this using dataview and rowfilter and looping and all that.
I am looking for a more elegant solution, using LINQ perhaps?
Well, if you have a chance, get rid of the old .NET 1 DataTables. The
last application that we refactored this way became faster by several
orders of magnitude.

However, DataTables are containers and can be accessed with LINQ. Just
use the Rows property.


Marcel
x***@y.com
2016-05-01 16:46:04 UTC
Permalink
On Sun, 01 May 2016 09:19:12 +0200, Marcel Mueller
Post by Marcel Mueller
Post by x***@y.com
Say I have two datatables, table1 and table2, where both have a column
named "id".
I want to filter table1 for those id values that are in table2.
select * from table1
where id in (select id from table2)
I can do this using dataview and rowfilter and looping and all that.
I am looking for a more elegant solution, using LINQ perhaps?
Well, if you have a chance, get rid of the old .NET 1 DataTables. The
last application that we refactored this way became faster by several
orders of magnitude.
I don't understand. We use datatables throughout our app.
Get rid of datatables and replace them with what?
Post by Marcel Mueller
However, DataTables are containers and can be accessed with LINQ. Just
use the Rows property.
I have just started on LINQ.
Can you give me an example for the above query in LINQ?
Post by Marcel Mueller
Marcel
Marcel Mueller
2016-05-01 20:11:33 UTC
Permalink
Post by x***@y.com
Post by Marcel Mueller
Well, if you have a chance, get rid of the old .NET 1 DataTables. The
last application that we refactored this way became faster by several
orders of magnitude.
I don't understand. We use datatables throughout our app.
Get rid of datatables and replace them with what?
Are we both talking about System.Data.DataTable?

In this case POCOs are the alternative, i.e. classes that provide just
the properties you need. No more no less.
You may have a look at entity framework for database connectivity.
Post by x***@y.com
Post by Marcel Mueller
However, DataTables are containers and can be accessed with LINQ. Just
use the Rows property.
I have just started on LINQ.
Can you give me an example for the above query in LINQ?
DataTable ist .NET 1. It does not provide type information. So you need
a cast to the row type to make it LINQ compatible:
myTable.Rows.Cast<DataRow>()

You should not use sub queries in a where clause in LINQ directly since
LINQ has no optimizer and will execute the sub query over and over. I.e.
assign the sub query to a variable or just use join instead:

var result = myTable.Rows.Cast<DataRow>()
.Join(
myFilterTable.Rows.Cast<DataRow>(),
row => row[idcolumn],
filterrow => filterrow[idcolumn],
(row, filterrow) => row );

Something like that.

Of course, your key type should have a reasonable implementation of
Equals and GetHashCode.


An even smarter implementation might create a hash table of wanted keys
and select all matching rows.

var keys = myFilterTable.Rows.Cast<DataRow>()
.Select(filterrow => filterrow[idcolumn])
.ToHashSet();

var result = myTable.Rows.Cast<DataRow>()
.Where(row => keys.Contains(row[idcolumn]));

Note that LINQ results use deferred execution and will execute anew
every time you use the result unless you force evaluation with one of
the ToXXX() functions.


Marcel
x***@y.com
2016-05-03 00:22:25 UTC
Permalink
On Sun, 01 May 2016 22:11:33 +0200, Marcel Mueller
Post by Marcel Mueller
Post by x***@y.com
Post by Marcel Mueller
Well, if you have a chance, get rid of the old .NET 1 DataTables. The
last application that we refactored this way became faster by several
orders of magnitude.
I don't understand. We use datatables throughout our app.
Get rid of datatables and replace them with what?
Are we both talking about System.Data.DataTable?
In this case POCOs are the alternative, i.e. classes that provide just
the properties you need. No more no less.
You may have a look at entity framework for database connectivity.
Yes, I was talking about System.Data.DataTable.

So you wrote your own class as a substitute for Syatem.Data.DataTable?

What about other objects in System.Data namesapace?
We use DataView/DataViewRow a lot.
Do we need to wriet substitute classes for those too to work with the
POCO datatable?

Sounds like a lot of re-inventing the wheel.
Marcel Mueller
2016-05-04 08:29:33 UTC
Permalink
Post by x***@y.com
Yes, I was talking about System.Data.DataTable.
So you wrote your own class as a substitute for Syatem.Data.DataTable?
No. We never replaced the bulky DataTable. That would not be
significantly better.

If we need to retrieve a table 'Humans' from DB with the columns
int ID,
varchar Name,
int Age,
int Gender, -- 0 unkown, 1 male, 2 female
datetime LastUpdate,
varchar LastUpdateUser,
...
for example then the corresponding POCO (Plain Old Clr Object) looks like
enum Gender
{ Unknown,
Male,
Female
}
class Human
{ public int ID;
public string Name;
public int Age;
public Gender Gender;
public DateTime LastUpdate;
public int LastUpdateUserID;
}
and the table type for this row type is just
List<Human>

If you want to use the old ASP.NET DataBinder.Eval, then you need to add
{ get; set; }
to each line to crate auto properties since DataBinder cannot deal with
fields.

This takes significantly less resources than a DataTable.
The latter uses object[] for the row type, even if you used typed DataSets.
So every value type in the row is auto boxed (int, enum, DateTime in
this case). Autoboxing increases the size by factor 5 or more.
DataTables furthermore have a significant internal overhead for
maintaining state and metadata.
OK, this has been optimized to some degree by using strongly typed
internal column stores for common value types now. But still there is
the autoboxing overhead at each access. In fact the optimization
replaced memory consumption by GC pressure. And there are hundreds of
code lines to be executed for every single field access.
Post by x***@y.com
What about other objects in System.Data namesapace?
They belong together. You cannot use them independently. AFAIK
everything need to be in a DataSet. Even if you create a single table a
DataSet with all related objects is created internally.
Post by x***@y.com
We use DataView/DataViewRow a lot.
Do we need to wriet substitute classes for those too to work with the
POCO datatable?
Sounds like a lot of re-inventing the wheel.
More like reinventing a square wheel. :-)

DataTables are just fine for 'PowerPoint' requirements, i.E. show a
quick an dirty solution, where everything else does not count.

But in practice I remember only very few cases where they really met the
needs. I do not write applications all the day that open a 1:1 editor to
exactly one SQL table which allows you to edit one data row at a time
and save the result back to the SQL server like Management Studio does.
Most of the time the database point of view and the users point of view
are quite different. Users do not think in normalized relational data
models. They think in business objects with data that belong together
and do not care about cardinality that SQL cannot represent in a single
table.
E.g. the simple concept like 'Friends' does not fit into one DataTable
class Human
{ public int ID;
public string Name;
...
IList<Human> Friends;
...
}
You would need a second DataTable with foreign keys at this point. No
big deal so far. But now you need to write the GUI yourself since users
want to edit all together and not to edit friends in a separate dialog
and deal with bare IDs. If you need to write the GUI and the data
bidirectional binding yourself and therefore also need to track
insertions, and deletions in the friends table yourself, then the
DataTables do no longer simplify that task.
In case of foreign keys in general no user wants to edit them by value
(e.g. ID). So again you need to provide your own editor with change
tracking.
And if we are talking about backend data processing: LINQ will not care
whether you have fields, properties or DataColumns. So the POCOs will
fit your needs just as the DataRows do. But POCOs are faster by about 2
orders of magnitude.
The SQL style filtering and grouping of the DataTables is outperformed
by LINQ too. LINQ also removes the risk of SQL injection, although the
very limited features of DataTables also restrict the potential danger.

Of course, I do not refactor every old (migrated) .NET 1 application
which uses DataTables just because I can do so. But for new designs and
major changes DataTables are no longer an option since .NET 3.5.

(Note I only develop web applications and no Win Executables which are
more difficult to test and to maintain. Maybe WinForms interact smarter
with DataTables and there is more benefit in this case.)


Marcel
Arne Vajhøj
2016-05-05 01:48:22 UTC
Permalink
Post by Marcel Mueller
Well, if you have a chance, get rid of the old .NET 1 DataTables.
I agree with the advice. Either an ORM (I prefer NHibernate but
that is a detail) or plain ADO.NET depending on object usage or
bulk usage.

But in all fairness then MS took a long time to get rid
of DataSet:

.NET 1.0, 1.1 : untyped DataSet
.NET 2.0, 3.0 : typed DataSet
.NET 3.5+ : EF

Arne
Marcel Mueller
2016-05-05 18:04:49 UTC
Permalink
Post by Arne Vajhøj
Post by Marcel Mueller
Well, if you have a chance, get rid of the old .NET 1 DataTables.
I agree with the advice. Either an ORM (I prefer NHibernate but
that is a detail) or plain ADO.NET depending on object usage or
bulk usage.
But in all fairness then MS took a long time to get rid
.NET 1.0, 1.1 : untyped DataSet
.NET 2.0, 3.0 : typed DataSet
.NET 3.5+ : EF
The last line should better read as
.NET 4.0 : EF

The .NET 3.5 EF was a PowerPoint solution, i.e. it was sufficient to be
used in a presentation, no more, no less.

At .NET 3.5 time I wrote my own ORM. It was by far superior to EF. Cache
with in memory DB, immutable shared thread-safe objects with
deduplication, stateless snapshot isolation (i.e. version history with
global revisions), automatic parallelism and, of course, significantly
faster.
In some aspects it is still superior, but current EF is quite usable.


Marcel

Loading...