Wednesday, September 9, 2015

Delete duplicate Rows from datatable

Hi,

To delete duplicate rows from datatable in vb.net, first we need to sort table on the keys that will be used to delete rows.

Sort asc if the row to be deleted has highest value.
Sort desc if the row to be deleted has least value of duplicate.

In this example, I want to keep the highest filecount in any duplicated time. So I will sort desc.
In the function below, I pass the datatable that has duplicate set.
I have also cloned of Filedt to store duplicated rows.
Then based on sortString, I will apply select on datatable and return array of rows.
Now when I start looping the rows, I will save the prevRow and then match it with next row. If next row is same, I will save it in prevRow, import in cloned datatable and then delete it.

Private Function RemoveDupsdt(ByRef Filedt As DataTable) As DataTable

        Dim cloneDT As DataTable = Filedt.Clone
        Dim sortString As String = "ProcessingTime asc, FileCount desc"
        Dim prevRow As String = ""
        Dim rowlist As DataRow() = Filedt.Select("", sortString)
        Dim dcIssue As New DataColumn("Issue", GetType(String))
        dcIssue.DefaultValue = "Duplicate"

        For Each dr As DataRow In rowlist
            If prevRow = dr.Item("ProcessingTime")  Then
                prevRow = dr.Item("ProcessingTime")
                cloneDT.ImportRow(dr)
                dr.Delete()
            Else
                prevRow = dr.Item("ProcessingTime")
            End If
        Next

        cloneDT.Columns.Add(dcIssue)
        RemoveDupsdt = cloneDT

    End Function

No comments: