They are Millions!
- Andy Gasparini
- Jan 6, 2020
- 6 min read
Updated: Feb 6, 2022
The Million Items Query
It is common for humans to think that if one can summarize a task in a few words, then the task itself will be easy to implement. In the case of SQL or Entity Framework the last statement is commonly false for projects that involve an extensive amount of stress on a database, particularly a single table. Most of the time we try to find ways around it, like design the database in different creative ways, or simply increasing the power on our cloud subscription. As the title referencing the video-game They Are Billions may have given it away, sometimes entries are almost like zombies, slowly lurking into a database one by one, clogging the systems and eating brains, or in this case computational power and consequently money, and valuable design time.
A Different Approach
Commonly the following statements are used to insert an item into a table with Entity Framework:
var myObject = new MyObject{ fields };
_repo.Insert(myObject);
try
{
await _repo.SaveChangesAsync().ConfigureAwait(false);
}
catch (Exception ex)
{
// Your emergency maneuvers!
}
But what if the service or application you are building requires the insertion of millions of items? Add a simple for loop, right?!
try
{
// This list could contain millions of items!
List<MyObject> objects = new List<MyObject>()
{
// Your objects
};
foreach (MyObject object in objects)
{
_repo.Insert(object);
await _repo.SaveChangesAsync().ConfigureAwait(false);
}
}
catch (Exception ex)
{
// Emergency maneuvers!
}
Would this work for 10,000 items? Maybe, yes! What about 100,000 items? Maybe, and most likely not! Then what about 1 million and above? I don’t think so!! So, what could the solution be, stored procedures?! Redesign the database to work around the million entries?! Pushing your cloud subscription to the max?! No. A simple yet efficient solution can be used by writing a few functions in C# that will allow you to use the System.Data SQL libraries in a clever way, using an almost forgotten class called SqlBulkCopy, which will allow the insertion by simply passing a DataTable object. So here is the approach:
1.Convert your object fields into DataTable columns (in this case I am using reflection but you don’t have to)
public DataTable ConvertCollectionToDataTable<T>(ICollection<T> itemsToConvert)
{
if (itemsToConvert.Count < 1)
{
// I made a custom exception, but you
// can really throw anything that suits you
throw new BulkException("The collection that is "
+ "to be inserted is empty");
}
PropertyDescriptorCollection props =
TypeDescriptor.GetProperties(typeof(T));
DataTable dataTable = new DataTable();
for (int i = 0; i < props.Count; i++)
{
PropertyDescriptor prop = props[i];
dataTable.Columns.Add(prop.Name, prop.PropertyType);
}
DataTableInsertion(itemsToConvert, props, dataTable);
return dataTable;
}
2.Insert the values of those fields for every single object into the DataTable
private void DataTableInsertion<T>(ICollection<T> itemsToConvert,
PropertyDescriptorCollection props,
DataTable dataTable)
{
object[] values = new object[props.Count];
foreach (T item in itemsToConvert)
{
for (int i = 0; i < values.Length; i++)
{
values[i] = props[i].GetValue(item); }
dataTable.Rows.Add(values);
}
}
3.Figure out the best options settings for your insertion and if you do want to create a transaction or not for the query execution, and then insert
With transaction
public void BulkInsert(DataTable dataTable,
SqlTransaction transaction,
string destinationTableName)
{
// Standard checks //
if (dataTable == null)
{
throw new BulkException("Cannot insert an null data table!");
}
if (dataTable.Columns.Count < 1)
{
throw new BulkException("Cannot insert an empty data table!");
}
try
{
// Create your SqlBulkCopy options, TableLock is optional
SqlBulkCopyOptions copyOptions =
SqlBulkCopyOptions.TableLock |
Sql-BulkCopyOptions.FireTriggers;
// Create your SqlBulkCopy object and pass in your connection
// (in this case in the transaction), your copy
// options, and your transaction
SqlBulkCopy bulkCopy =
new SqlBulkCopy(transaction.Connection,
copyOptions,
transaction);
// Set you destination table
bulkCopy.DestinationTableName = destinationTableName;
// Insert your items!
bulkCopy.WriteToServer(dataTable);
// Commit the transaction
transaction.Commit();
}
catch (Exception ex)
{
// In case of failure rollback and throw an
// exception (so you can go to a higher level catch block
// and throw a specific exception)
transaction.Rollback();
throw;
}
}
Without transaction
public void BulkInsert(DataTable dataTable, string destinationTableName)
{
// Simple standard checks! //
if (string.IsNullOrEmpty(_connectionString))
{
throw new BulkException("Connection string is null or empty!");
}
if (dataTable == null)
{
throw new BulkException("Cannot insert an null data table!");
}
if (dataTable.Columns.Count < 1)
{
throw new BulkException("Cannot insert an empty data table!");
}
// Create an SqlConnection passing in your connection string
using (SqlConnection sqlConnection =
new SqlConnection(_connectionString))
{
// Select your options for the insertion
// SqlBulkCopyOptions.TableLock can be
// removed if you don’t want to table lock
// SqlBulkCopyOptions.UseInternalTransaction,
// set it to no transaction
// and will throw an exception
// if something fails and you try to rollback
SqlBulkCopyOptions copyOptions =
SqlBulkCopyOptions.TableLock |
SqlBulkCopyOptions.FireTriggers |
SqlBulkCopyOptions.UseInternalTransaction;
// Open the connection to your db
sqlConnection.Open();
// Create an SqlBulkCopy object with all you
// have created until now,
// the last parameter is the transaction, which we don’t
// have and don’t want
// Mind: if you do happen to use a transaction,
// you must specify that in the options,
// otherwise passing in a transaction object as a parameter
// will throw an exception no matter what
SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnection,
copyOptions,
null);
// set the destination table
bulkCopy.DestinationTableName = destinationTableName;
// Insert your entries!
bulkCopy.WriteToServer(dataTable);
// Close the connection
sqlConnection.Close();
}
}
That’s it!
Still unsatisfied?!
What can we do to insert even more entries…faster?!
Simply Batch!
If your solution requires heavy duty entry insertion although you don’t really want to up that cloud subscription to the max, then simply batch your entries. This is straightforward and easy to adapt to different needs, the batches can be set by a key in the web.config file and injected into the class that is batching or simply define it as a field.
Something like this:
<add key="StandardBatchSize" value="100000"/>
Thus, create a method which splits your collection. Note: I am returning a stack here, you can easily adapt the methods ConvertCollectionToDataTable and DataTableInsertion to convert a stack to a DataTable. I am mainly using a stack for efficiency concerns as a stack is O(1) to create and it’s easy to keep track of when inserting or removing.
private Stack<T>[] Split<T>(ICollection<T> itemsToInsert)
{
// Make a list of stacks (our batches)
List<Stack<T>> splitStacks = new List<Stack<T>>()
{
new Stack<T>()
};
// Necessary evil, convert itemsToInsert to array
T[] itemsToInsertArray = itemsToInsert.ToArray();
// Start 2 counts, one is going to count our batches,
// one is going to our split split stack of batches
int splitStacksTrackCount = 0;
int trackingCount = 0;
for (int i = 0; i < itemsToInsertArray.Length; i++)
{
// Is a batch completed and ready to go,
// yes then package it for shipping!
if (trackingCount >= _batchSize)
{
splitStacks.Add(new Stack<T>());
splitStacksTrackCount++;
trackingCount = 0;
}
// Put in the items into the box!
splitStacks[splitStacksTrackCount]
.Push(itemsToInsertArray[i]);
trackingCount++;
}
// Yes an array of stacks ;),
// computer ram thanks you for your cooperation!
return splitStacks.ToArray();
}
Make a method to insert your items into the database (similar to our previous bulk insert):
private void BulkInsert(DataTable dataTable, string destinationTableName)
{
// Make SqlConnection with data string
using (SqlConnection sqlConnection = new SqlConnection(_connectionString))
{
// Select your options, in this case no transaction or internal transaction
SqlBulkCopyOptions copyOptions =
SqlBulkCopyOptions.TableLock |
SqlBulkCopyOptions.FireTriggers |
SqlBulkCopyOptions.UseInternalTransaction;
// Open connection
sqlConnection.Open();
// Create your SqlBulkCopy object with all your
// goodies, with a null transaction in this case
SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnection,
copyOptions,
null);
// Set the destination table
bulkCopy.DestinationTableName = destinationTableName;
// Insert your items
bulkCopy.WriteToServer(dataTable);
// Close your connection
sqlConnection.Close();
}
}
public void BatchedBulkInsert<T>(ICollection<T> itemsToInsert,
string destinationTableName)
{
// Standard checks //
if (string.IsNullOrEmpty(_connectionString))
{
throw new BulkException("Connection string is null "
+ "or empty!");
}
if (itemsToInsert == null)
{
throw new BulkException("Cannot insert an null collection!");
}
if (itemsToInsert.Count < 1)
{
throw new BulkException("Cannot insert an empty collection!");
}
// Split into batches
Stack<T>[] splitStacks = Split(itemsToInsert);
// Convert to data table, method adapted to receive a stack
// A stack is mainly used here to lower the overhead
// as the creation of a stack is O(1), and there are instances
// where thousands of stacks are created
DataTable[] dataTables = ConvertToDataTables(splitStacks);
// Insert!
for (int i = 0; i < dataTables.Length; i++)
{
BulkInsert(dataTables[i], destinationTableName);
}
}
Finally wrap everything into a method which will do the trick like the above
Voila'!
Conclusion
The intimidating task of designing a system that can enter million of entries often comes into play when designing an auditing feature or something similar, storing a lot of records all at the same time. Although sometimes the most daunting tasks of working around the database design can be transformed in a fun college-like algorithm design excursion that can be taken care of directly in C# in this case, without any stored procedures writing, and can be easily managed, changed and reused. All you need to remember is just woulda, coulda, shoulda.
Comments