c# - Removing duplicates from datatable -


i'm trying remove duplicates in datatable similar this question. however, when need on ordered dataset, 1 of criteria time 1 of columns, , need earliest time instance remain.

i came across question on ordered lists datatable, i'm not sure how combine two.

basically, i'm reading file dataset, want sort on time , 3 other columns, , delete duplicates leaving earliest time instance. columns in question name (int), phone number (long), time (int) , location (string). if name, phone , location duplicated, remove after first (earliest) time.

dsholdingset.tables["filedata"].columns.add("location", typeof(string)); dsholdingset.tables["filedata"].columns.add("name", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("time", typeof(int)); dsholdingset.tables["filedata"].columns.add("phone", typeof(long)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(int)); dsholdingset.tables["filedata"].columns.add("field", typeof(long)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(boolean)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); dsholdingset.tables["filedata"].columns.add("field", typeof(string)); 

that's table definition, add rows validate lines in file.

what want group rows distinct values. if want use linq against datatable, easiest way using built-in datatable.asenumerable() extension method. returns ienumerable<datarow> you.

once we've got that, need construct comparable object out of composite of 3 values. here used approach of string concatenation, because strings easy compare. there other ways this, 1 simple:

name|phone|location

this produces sequence of igrouping<string, datarow>. each grouping ienumerable<datarow> represents subset group. if sort each grouping object time, , pull first 1 off, that's first row.

here's complete code.

var rows = dsholdingset.tables["filedata"].asenumerable()     .groupby(row => string.format("{0}|{1}|{2}",         row.field<string>("name"),         row.field<string>("phone"),         row.field<string>("location"))     .select(group =>          group.orderby(row => row.field<timespan>("time")).first()); 

some other notes - phone should string, not long; unless time represents other kind of measure haven't gone into, should either timespan or datetime. first thing want when loading data set manipulate coerce data robust , correct data types - makes actual manipulation easier. can deconvert if need after it's done.


Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -