Break up a list into batches with C# .NET

Here comes another classic interview question: write a function that accepts a collection and an integer. The function must divide the incoming collection up into individual collections of the size specified by the integer parameter. These individual collections can be called batches, groups or chunks and probably other things as well.

Let’s take the following collection of integers as an example:

[43, 65, 23, 56, 76, 454, 76, 54, 987]

If the incoming size integer is 3 then we want to end up with the following chunks:

[[43, 65, 23], [56, 76, 454], [76, 54, 987]]

…, i.e. a list of list elements where each chunk will be of size 3. So the size parameter means that we want to have at most 3 elements in each chunk, not that we want to have 3 chunks altogether.

If the size parameter is 2 then we want to end up with the following chunks:

[[43, 65], [23, 56], [76, 454], [76, 54], [987]]

We’ve got 5 chunks: 4 * 2 and the last chunk of size 1 as the 5th element in the list. Since 9 is not divisible by 2 we end up with a smaller chunk as the last element in the resulting list. That will always be the case if the total size of the collection parameter is not divisible by the size parameter. We just add the remainder into the last array.

In this post we’ll go through 4 different solutions. As expected, there are probably many more, but these should be enough in an interview setting. These classic interview questions are usually not difficult and you can most likely figure them out yourself based on your basic programming 101 skills. However, if you have never seen them then you might spend too much time on them and won’t have enough time over for the harder problems. So it’s good to be prepared and once you see one of these problems then you can just breeze through it quickly.

The 4 solutions presented here are based on the following:

  • iterating through the collection and building the chunks using a temporary list
  • chunking up the collection into ranges
  • using LINQ operators
  • using LINQ operators and the yield keyword

So if your interviewer says please don’t use LINQ, only bare-bones language constructs to test your basic training then you still have a couple of options.

Let’s start with a skeleton:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Algorithms
{
    public class Chunk
    {
        public IEnumerable<IEnumerable<T>> BuildChunks<T>(List<T> fullList, int batchSize)
        {
            return BuildChunksWithIteration(fullList, batchSize);            
        }

        private IEnumerable<IEnumerable<T>> BuildChunksWithLinq<T>(List<T> fullList, int batchSize)
        {
            return null;
        }

        private IEnumerable<IEnumerable<T>> BuildChunksWithLinqAndYield<T>(List<T> fullList, int batchSize)
        {
            return null;
        }

        private IEnumerable<IEnumerable<T>> BuildChunksWithIteration<T>(List<T> fullList, int batchSize)
        {
            return null;
        }

        private IEnumerable<IEnumerable<T>> BuildChunksWithRange<T>(List<T> fullList, int batchSize)
        {
            return null;
        }
    }
}

The private function names will give you a hint on which of the 4 solutions they represent.

We also add a couple of unit tests that are based on the same collection of 9 integers that we saw above:

using Microsoft.VisualStudio.TestTools.UnitTesting;
using Algorithms;
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;

namespace Algorithms.Tests
{
    [TestClass()]
    public class ChunkTests
    {
        [TestMethod()]
        public void BuildChunksTests()
        {
            Chunk ch = new Chunk();
            List<int> fullList = new List<int>() { 43, 65, 23, 56, 76, 454, 76, 54, 987 };
            var chunked = ch.BuildChunks(fullList, 3).ToList();

            Assert.AreEqual(3, chunked.Count);
            CollectionAssert.AreEqual(new List<int> { 43, 65, 23 }, chunked[0].ToList());
            CollectionAssert.AreEqual(new List<int> { 56, 76, 454 }, chunked[1].ToList());
            CollectionAssert.AreEqual(new List<int> { 76, 54, 987 }, chunked[2].ToList());

            chunked = ch.BuildChunks(fullList, 2).ToList();
            Assert.AreEqual(5, chunked.Count);
            CollectionAssert.AreEqual(new List<int> { 43, 65 }, chunked[0].ToList());
            CollectionAssert.AreEqual(new List<int> { 23, 56 }, chunked[1].ToList());
            CollectionAssert.AreEqual(new List<int> { 76, 454 }, chunked[2].ToList());
            CollectionAssert.AreEqual(new List<int> { 76, 54 }, chunked[3].ToList());
            CollectionAssert.AreEqual(new List<int> { 987 }, chunked[4].ToList());

            chunked = ch.BuildChunks(fullList, 5).ToList();
            Assert.AreEqual(2, chunked.Count);
            CollectionAssert.AreEqual(new List<int> { 43, 65, 23, 56, 76 }, chunked[0].ToList());
            CollectionAssert.AreEqual(new List<int> { 454, 76, 54, 987 }, chunked[1].ToList());

            chunked = ch.BuildChunks(fullList, 10).ToList();
            Assert.AreEqual(1, chunked.Count);
            CollectionAssert.AreEqual(new List<int> { 43, 65, 23, 56, 76, 454, 76, 54, 987 }, chunked[0].ToList());

            chunked = ch.BuildChunks(fullList, 1).ToList();
            Assert.AreEqual(9, chunked.Count);
            CollectionAssert.AreEqual(new List<int> { 43 }, chunked[0].ToList());
            CollectionAssert.AreEqual(new List<int> { 65 }, chunked[1].ToList());
            CollectionAssert.AreEqual(new List<int> { 23 }, chunked[2].ToList());
            CollectionAssert.AreEqual(new List<int> { 56 }, chunked[3].ToList());
            CollectionAssert.AreEqual(new List<int> { 76 }, chunked[4].ToList());
            CollectionAssert.AreEqual(new List<int> { 454 }, chunked[5].ToList());
            CollectionAssert.AreEqual(new List<int> { 76 }, chunked[6].ToList());
            CollectionAssert.AreEqual(new List<int> { 54 }, chunked[7].ToList());
            CollectionAssert.AreEqual(new List<int> { 987 }, chunked[8].ToList());
        }
    }
}

Let’s start with the iterative solution, i.e. BuildChunksWithIteration:

private IEnumerable<IEnumerable<T>> BuildChunksWithIteration<T>(List<T> fullList, int batchSize)
{
    var chunkedList = new List<List<T>>();
    var temporary = new List<T>();
    for (int i = 0; i < fullList.Count; i++)
    {
        var e = fullList[i];
        if (temporary.Count < batchSize)
        {
            temporary.Add(e);
        }
        else
        {
            chunkedList.Add(temporary);
            temporary = new List<T>() { e };
        }

        if (i == fullList.Count - 1)
        {
            chunkedList.Add(temporary);
        }
    }

    return chunkedList;
}

We initialise the return list and a temporary list. We then start the iteration in a classic for-loop. In the first if-block we look at the current value in the iteration and see if the temporary array is smaller than the batch size. If that’s the case then we add element to the temporary list. Otherwise we add the temporary list to the return list and re-instantiate it to a new list. We also add the current element to the newly built temporary list so that it doesn’t get lost in the loop.

The last if-statement…

if (i == fullList.Count – 1)

…checks if we have reached the end of the source list. If that’s the case then we add the most recent temporary list to the return list. This will ensure that the very last batch is also added to the final collection before finishing the loop even if its size is smaller than the required size.

Run the unit tests, they should all pass.

Next up is the range-based solution:

private IEnumerable<IEnumerable<T>> BuildChunksWithRange<T>(List<T> fullList, int batchSize)
{
    List<List<T>> chunkedList = new List<List<T>>();
    int index = 0;

    while (index < fullList.Count)
    {
        int rest = fullList.Count - index;
        if (rest >= batchSize)
        {
            chunkedList.Add(fullList.GetRange(index, batchSize));
        }
        else
        {
            chunkedList.Add(fullList.GetRange(index, rest));
        }
        index += batchSize;
    }

    return chunkedList;
}

We iterate through the collection in a while loop using the index variable which was initialised to 0. First we calculate the rest, i.e. the portion of the incoming collection that we haven’t chunked up yet. In the first iteration it will be 9 in our example as the list size is 9 and index is 0. The if-condition checks if the rest is at least as large as the required batch size. If that’s the case then we add a list to the return list using the GetRange function that’s available on lists. It takes a start index and a count variable. In the first iteration we want to get the range starting from position 0 of the source array and take 3 elements from 0. That will be the first chunk in the return list.

Otherwise if the rest is smaller than the batch size, then we fetch the rest of the source collection and add it to the return list. Finally we increment the index by the batch size.

Therefore in the second iteration the rest variable will be 9 – 3, i.e. 6 and we want to get the range from and including index 3 – which will be 4th element in a 0-based index – plus the batch size 3. We keep iterating through the source collection like that until index is larger than or equal to the size of the source collection, i.e. we have covered the source collection in its entirety.

Hook up the function from the main function:

public IEnumerable<IEnumerable<T>> BuildChunks<T>(List<T> fullList, int batchSize)
{
    return BuildChunksWithRange(fullList, batchSize);
    //return BuildChunksWithIteration(fullList, batchSize);
}

Re-run the unit tests and they will still pass.

Let’s now look at the solution with a couple of LINQ operators:

If you’re not sure how they are used then get more info on them from the linked blog posts. They are often used together for paging purposes and are similar to the GetRange function. However, with Take we don’t need to worry about an index-out-of-range exception, it will just take the available amount of elements from the requested range:

private IEnumerable<IEnumerable<T>> BuildChunksWithLinq<T>(List<T> fullList, int batchSize)
{
    int total = 0;
    var chunkedList = new List<List<T>>();
    while (total < fullList.Count)
    {
        var chunk = fullList.Skip(total).Take(batchSize);
        chunkedList.Add(chunk.ToList());
        total += batchSize;
    }

    return chunkedList;
}

You see that the loop is very similar to the GetRange-based solution, but we don’t need to calculate the rest, we can simply call Skip and Take and they are clever enough to figure out what we want.

Hook up the function list before:

public IEnumerable<IEnumerable<T>> BuildChunks<T>(List<T> fullList, int batchSize)
{
    //return BuildChunksWithRange(fullList, batchSize);
    //return BuildChunksWithIteration(fullList, batchSize);
    return BuildChunksWithLinq(fullList, batchSize);
}

The unit tests will still pass.

Finally we have the same LINQ-based solution but together with the yield keyword. This one can be a bit difficult to wrap your head around. It is similar to a recursive solution and is best demonstrated by various examples:

Here’s the solution:

private IEnumerable<IEnumerable<T>> BuildChunksWithLinqAndYield<T>(List<T> fullList, int batchSize)
{
    int total = 0;
    while (total < fullList.Count)
    {
        yield return fullList.Skip(total).Take(batchSize);
        total += batchSize;
    }
}

This is the short-hand form of BuildChunksWithLinq.

Call this function from the main function and the unit tests will still all pass.

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a comment

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.