Exploring a directory with the Java 8 Stream API

We saw an example of using the Java 8 Stream API in File I/O in this post. We saw how the Files object was enhanced with the lines() method to open a line reader stream to a text file.

There are other enhancements related to streams that make is simple to explore a directory on your hard drive. The following code example will collect all folders and files within the c:\gitrepos folder and add them to an ArrayList:

Path gitReposFolderPath = Paths.get("c:\\gitrepos");
gitReposFolderPath.toFile().getName();
try (Stream<Path> foldersWithinGitReposStream = Files.list(gitReposFolderPath))            
{
    List<String> elements = new ArrayList<>();
    foldersWithinGitReposStream.forEach(p -> elements.add(p.toFile().getName()));            
    System.out.println(elements);
}
catch (IOException ioe)
{

}

I got the following output:

[cryptographydotnet, dotnetformsbasedmvc5, entityframeworksixdemo, owinkatanademo, signalrdemo, singletondemoforcristian, text.txt, webapi2demo, windowsservicedemo]

The code returns both files and folders one level below the top directory, i.e. the “list” method does not dive into the subfolders. I put a text file into the folder – text.txt – just to test whether in fact all elements are returned.

Say you only need files – you can use the filter method:

foldersWithinGitReposStream.filter(p -> p.toFile().isFile()).forEach(p -> elements.add(p.toFile().getName())); 

This will only collect text.txt.

Let’s try something slightly more complex. We’ll organise the elements within the directory into a Map of Boolean and List of Paths. The key indicates whether the group of files are directories or not. We can use the collect method that we saw in this post:

try (Stream<Path> foldersWithinGitReposStream = Files.list(gitReposFolderPath))            
{
    Map<Boolean, List<Path>> collect = foldersWithinGitReposStream.collect(Collectors.groupingBy(p -> p.toFile().isDirectory()));
    System.out.println(collect);
}

This prints the following:

{false=[c:\gitrepos\text.txt], true=[c:\gitrepos\cryptographydotnet, c:\gitrepos\dotnetformsbasedmvc5, c:\gitrepos\entityframeworksixdemo, c:\gitrepos\owinkatanademo, c:\gitrepos\signalrdemo, c:\gitrepos\singletondemoforcristian, c:\gitrepos\webapi2demo, c:\gitrepos\windowsservicedemo]}

So we successfully grouped the paths.

As mentioned above the “list” method goes only one level deep. The “walk” method in turn digs deeper and extracts sub-directories as well:

try (Stream<Path> foldersWithinGitReposStream = Files.walk(gitReposFolderPath))
{
    List<String> elements = new ArrayList<>();
    foldersWithinGitReposStream.filter(p -> p.toFile().isFile()).forEach(p -> elements.add(p.toFile().getAbsolutePath()));
    System.out.println(elements);
}

We can also instruct the walk method to go n levels down with an extra integer argument:

try (Stream<Path> foldersWithinGitReposStream = Files.walk(gitReposFolderPath, 3))

View all posts related to Java here.

Advertisement

The Java Stream API part 5: collection reducers

Introduction

In the previous post we saw how to handle an ambiguous terminal reduction result of a Stream. There’s in fact another type of reducer function in Java 8 that we haven’t discussed so far: collectors, represented by the collect() function available for Stream objects. The first overload of the collect function accepts an object that implements the Collector interface.

Implementing the Collector interface involves implementing 5 functions: a supplier, an accumulator, a combiner, a finisher and characteristics provider. At this point I’m not sure how to implement all those methods. Luckily for us the Collectors object provides a long range of ready-made implementing classes that can be supplied to the collect function.

Purpose and first example

Collectors are similar to Maps and the Reducers we’ve seen up to now in this series at the same time. Depending on the exact implementation you take the collect function can e.g. map a certain numeric field of a custom object into an intermediary stream and calculate the average of that field in one step.

Let’s see that in action. We’ll revisit our Employee class:

public class Employee
{
    private UUID id;
    private String name;
    private int age;

    public Employee(UUID id, String name, int age)
    {
        this.id = id;
        this.name = name;
        this.age = age;
    }
        
    public UUID getId()
    {
        return id;
    }

    public void setId(UUID id)
    {
        this.id = id;
    }

    public String getName()
    {
        return name;
    }

    public void setName(String name)
    {
        this.name = name;
    }    
    
    public int getAge()
    {
        return age;
    }

    public void setAge(int age)
    {
        this.age = age;
    }
    
    public boolean isCool(EmployeeCoolnessJudger coolnessJudger)
    {
        return coolnessJudger.isCool(this);
    }
    
    public void saySomething(EmployeeSpeaker speaker)
    {
        speaker.speak();
    }
}

We’ve seen that some aggregation functions have ready-made methods in the Stream class: min, max, count and some others. However, there’s nothing for counting the average. What if I’d like to calculate the average age of my employees?

List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
        employees.add(new Employee(UUID.randomUUID(), "Marylin", 18));
        employees.add(new Employee(UUID.randomUUID(), "Freddie", 25));
        employees.add(new Employee(UUID.randomUUID(), "Mario", 43));
        employees.add(new Employee(UUID.randomUUID(), "John", 35));
        employees.add(new Employee(UUID.randomUUID(), "Julia", 55));        
        employees.add(new Employee(UUID.randomUUID(), "Lotta", 52));
        employees.add(new Employee(UUID.randomUUID(), "Eva", 42));
        employees.add(new Employee(UUID.randomUUID(), "Anna", 20)); 

It may not be obvious at first but the collect function can perform that – and a lot more. The Collectors class includes a ready-made implementation of Collector: averagingInt which accepts a ToIntFunction of T. The ToIntFunction will return an integer from the T object. In our case we need the age values so we can calculate the average age as follows:

ToIntFunction<Employee> toInt = Employee::getAge;
Double averageAge = employees.stream().collect(Collectors.averagingInt(toInt));     

averageAge will be 37.78.

Other examples

Collect all the names into a string list:

List<String> names = employees.stream().map(Employee::getName).collect(Collectors.toList());     

Compute sum of all ages in a different way:

int totalAge = employees.stream().collect(Collectors.summingInt(Employee::getAge));

Let’s change the age values a little before the next example:

employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
        employees.add(new Employee(UUID.randomUUID(), "Marilyn", 20));
        employees.add(new Employee(UUID.randomUUID(), "Freddie", 20));
        employees.add(new Employee(UUID.randomUUID(), "Mario", 30));
        employees.add(new Employee(UUID.randomUUID(), "John", 30));
        employees.add(new Employee(UUID.randomUUID(), "Julia", 50));
        employees.add(new Employee(UUID.randomUUID(), "Lotta", 30));
        employees.add(new Employee(UUID.randomUUID(), "Eva", 40));
        employees.add(new Employee(UUID.randomUUID(), "Anna", 20));    

We can group the employees by age into a Map of Integers:

Map<Integer, List<Employee>> employeesByAge = employees.stream().collect(Collectors.groupingBy(Employee::getAge));  

Here you’ll see that the key 20 will have 3 employees, key 50 will have 2 employees etc.

You can also supply another Collector to the groupingBy function if you want to have some different type as the value in the Map. E.g. the following will do the same as above except that the value will show the number of employees within an age group:

Map<Integer, Long> employeesByAge = employees.stream().collect(Collectors.groupingBy(Employee::getAge, Collectors.counting()));

You can partition the collection based on some boolean condition. Here we build a Map by putting the employees into one of two groups: younger than 40 or older. The partitionBy function will help solve this:

Map<Boolean, List<Employee>> agePartitioning = employees.stream().collect(Collectors.partitioningBy(emp -> emp.getAge()>= 40));

agePartitioning will have 6 employees who are younger than 40 and 3 who are either 40 or older which is the correct result.

You can create something like an ad-hoc toString() function:

String allEmployees = employees.stream().map(emp -> emp.getName().concat(",  ").concat(Integer.toString(emp.getAge()))).collect(Collectors.joining(" | "));

The above function will go through each employee, create a “name + , + age” string of each of them and then join all individual strings by a pipe character. The result will look like this:

Elvis, 50 | Marilyn, 20 | Freddie, 20 | Mario, 30 | John, 30 | Julia, 50 | Lotta, 30 | Eva, 40 | Anna, 20

Notice that the collector was intelligent not to put the pipe character after the last element.

The Collectors class has a lot more ready-made collectors. Just type “Collectors.” in an IDE which supports IntelliSense and you’ll be able to view the whole list. Chances are that if you need to perform a composite MapReduce operation on a collection then you’ll find something useful here.

This post concludes our discussion on the new Stream API in Java 8.

View all posts related to Java here.

The Java Stream API part 4: ambiguous reductions

Introduction

In the previous post we looked at the Reduce phase of the Java Stream API. We also discussed the role of the identity field in reductions. For example an empty integer list can be summed as the result of the operation will be the identity field.

Lack of identity

There are cases, however, where the identity field cannot be provided, such as the following functions:

  • findAny(): will select an arbitrary element from the collection stream
  • findFirst(): will select the first element
  • max(): finds the maximum value from a stream based on some compare function
  • min(): finds the minimum value from a stream based on some compare function
  • reduce(BinaryOperator): in the previous post we used an overloaded version of the reduce function where the ID field was provided as the first parameter. This overload is a generic version for all reduce functions where the first element is unknown

It made sense to supply an identity field for the summation function as it was used as the input into the first loop. For e.g. max() it’s not as straightforward. Let’s try to find the highest integer using the same reduce() function as before and pretend that the max() function doesn’t exist. A simple integer comparison function for an integers list is looping through the numbers and always taking the higher of the two being inspected:

Stream<Integer> integerStream = Stream.of(1, 2, 2, 70, 10, 4, 40);
        BinaryOperator<Integer> maxComparator = (i1, i2) ->
        {
            if (i1 > i2)
            {
                return i1;
            }
            return i2;
        };

Now we want to use the comparator in the reduce function and provide an identity. What value could we use to be sure that the first element in the comparison loop will always “win”? I.e. we need a value that will always be smaller than 1 in the above case so that 1 will be compared with 2 in the following step, assuming a sequential execution. In “hand-made” integer comparisons the first initial max value is usually the absolute minimum of an integer, i.e. Integer.MIN_VALUE. Let’s try that:

Integer handMadeMax = integerStream.reduce(Integer.MIN_VALUE, maxComparator);

handMadeMax will be 70. Similarly, a hand-made min function could look like this:

BinaryOperator<Integer> minComparator = (i1, i2) ->
        {
            if (i1 > i2)
            {
                return i2;
            }
            return i1;
        };

Integer handMadeMin = integerStream.reduce(Integer.MAX_VALUE, minComparator);

handMadeMin will yield 1.

So this solution works in most cases – except when the integer list is empty or if you have numbers that lie outside the int.max and int.min range in which case you’d use Long anyway. E.g. if you’re mapping some integer field from a list of custom objects, like the Employee class we saw in previous posts. If your search provides no Employee objects then the resulting integer collection will also be empty. What is the max value of an empty integer collection if we go with the above solution? It will be Integer.MIN_VALUE. We can simulate this scenario as follows:

Stream<Integer> empty = Stream.empty();
Integer handMadeMax = empty.reduce(Integer.MIN_VALUE, maxComparator);

handMadeMax will in fact be Integer.MIN_VALUE as it is the only element in the comparison loop. Is that the correct result? Not really. I’m not exactly what the correct mathematical response is but it is probably ambiguous.

Short tip: the Integer class has a built in comparator for min and max:

Integer::min
Integer::max

Optionals

Java 8 solves this dilemma with a new object type called Optional of T. The functions listed in the previous section all return an Optional. The max() function accepts a Comparator and we can use our good friends from Java 8, the lambda expressions to implement the Comparator interface and use it as a parameter to max():

Comparator<Integer> intComparatorAnonymous = Integer::compare;        
Optional<Integer> max = integerStream.max(intComparatorAnonymous);

An Optional object reflects the ambiguity of the result. It can be a valid integer from a non-empty integer collection or… …something undefined. The Optional object can be tested with the isPresent() method which returns true of there’s a valid value behind the calculation:

if (max.isPresent())
{
     int res = max.get();
}

“res” will be 70 as expected. If we perform the same logic on an empty integer list then isPresent() return false.

If there’s no valid value then you can use the orElse method to define a default without the need for an if-else statement:

Integer orElse = max.orElse(123);

You can also throw an exception with orElseThrow which accepts a lambda function that returns a Throwable:

Supplier<Exception> exceptionSupplier = () -> new Exception("Nothing to return");
Integer orElse = max.orElseThrow(exceptionSupplier);

A full map-filter-reduce example

Let’s return to our Employee object:

public class Employee
{
    private UUID id;
    private String name;
    private int age;

    public Employee(UUID id, String name, int age)
    {
        this.id = id;
        this.name = name;
        this.age = age;
    }
        
    public UUID getId()
    {
        return id;
    }

    public void setId(UUID id)
    {
        this.id = id;
    }

    public String getName()
    {
        return name;
    }

    public void setName(String name)
    {
        this.name = name;
    }    
    
    public int getAge()
    {
        return age;
    }

    public void setAge(int age)
    {
        this.age = age;
    }
    
    public boolean isCool(EmployeeCoolnessJudger coolnessJudger)
    {
        return coolnessJudger.isCool(this);
    }
    
    public void saySomething(EmployeeSpeaker speaker)
    {
        speaker.speak();
    }
}

We have the following employees list:

List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
        employees.add(new Employee(UUID.randomUUID(), "Marilyn", 18));
        employees.add(new Employee(UUID.randomUUID(), "Freddie", 25));
        employees.add(new Employee(UUID.randomUUID(), "Mario", 43));
        employees.add(new Employee(UUID.randomUUID(), "John", 35));
        employees.add(new Employee(UUID.randomUUID(), "Julia", 55));
        employees.add(new Employee(UUID.randomUUID(), "Lotta", 52));
        employees.add(new Employee(UUID.randomUUID(), "Eva", 42));
        employees.add(new Employee(UUID.randomUUID(), "Anna", 20));

Suppose we need to find the maximum age of all employees under 50:

  • map: we map all age values to an integer list
  • filter: we filter out those that are above 50
  • reduce: find the max of the filtered list

The three steps can be described in code as follows:

Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
Stream<Integer> filter = employeeAges.filter(age -> age < 50);
Optional<Integer> maxAgeUnderFifty = filter.max(Integer::compare);
if (maxAgeUnderFifty.isPresent())
{
     int res = maxAgeUnderFifty.get();
}

“res” will be 43 which is the correct value.

Let’s see another example: check if any employee under 50 has a name start starts with an M. We’re expecting “true” as we have Marilyn aged 18. We’ll first need to filter out the employees based on their ages, then map the names to a string collection and finally check if any of them starts with an M:

Stream<Employee> allUnderFifty = employees.stream().filter(emp -> emp.getAge() < 50);
Stream<String> allNamesUnderFifty = allUnderFifty.map(emp -> emp.getName());
boolean anyMatch = allNamesUnderFifty.anyMatch(name -> name.startsWith("M"));

anyMatch will be true as expected.

View the next part of this series here.

View all posts related to Java here.

The Java Stream API part 3: the Reduce phase

Introduction

In the previous part of the Java Stream API course we looked at streams in more detail. We discussed why streams are really empty shells to describe our intentions but do not themselves contain any data. We saw the difference between terminal and intermediary operations and we looked at a couple of examples for both types. At the end of the post we discussed the first part of the MapReduce algorithm i.e. the map() and flatMap() functions.

We’ll move onto the Reduce phase of the MapReduce algorithm.

Reduce

Now that we know how to do the mapping we can look at the “Reduce” part of MapReduce. In .NET there is a range of pre-defined Reduce operations, like the classic SQL ones such as Min, Max, Sum, Average. There are similar functions – reducers – in the Stream API.

The most generic method to represent the Reduce phase is the “reduce” method. We’ll return to our Employee collection to run the examples:

List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
        employees.add(new Employee(UUID.randomUUID(), "Marylin", 18));
        employees.add(new Employee(UUID.randomUUID(), "Freddie", 25));
        employees.add(new Employee(UUID.randomUUID(), "Mario", 43));
        employees.add(new Employee(UUID.randomUUID(), "John", 35));
        employees.add(new Employee(UUID.randomUUID(), "Julia", 55));        
        employees.add(new Employee(UUID.randomUUID(), "Lotta", 52));
        employees.add(new Employee(UUID.randomUUID(), "Eva", 42));
        employees.add(new Employee(UUID.randomUUID(), "Anna", 20)); 

Say we want to calculate the sum of the ages in the collection. Not a very useful statistics but it’s fine for the demo. We can see the Map and Reduce phases in action:

Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
int totalAge = employeeAges.reduce(0, (empAge1, empAge2) -> empAge1 + empAge2);

A quick tip, the lambda expression…:

(empAge1, empAge2) -> empAge1 + empAge2

…can be substituted with the static sum() method of Integer using the :: shorthand notation:

Integer::sum

The first line maps the Employee objects into integers through a lambda expression which selects the age property of each employee. Then the stream of integers is reduced by the “reduce” function. This particular overload of the reduce function accepts an identity for the reducer function and the reducer function itself.

Let’s look at the reducer function first. It is of type BinaryOperator from the java.util.function package which we discussed in this post. It is a specialised version of the BiFunction interface which accepts two parameters and returns a third one. BinaryOperator assumes that the input and output parameters are of the same type. In the above example we want to add the ages of the employees therefore we pass in two age integers and simply add them. As the reduce function is terminal, we can read the result in “totalAge”. In its current form totalAge will be equal to 340 which is in fact the sum of the ages.

The identity field will be an initial input into the reducer. If you run the above code with an identity of 100 instead of 0 then totalAge will be 440. The identity parameter will be inserted into the equation to calculate the first result, i.e. 0 + 50 = 50, which will be passed into the second step, i.e. 50 + 18 = 68 which in turn will be used as a parameter in the next step, and so on and so forth. Note that the reductions steps may well be executed in parallel without you adding any extra code. Hence don’t assume anything about the correct ordering of the steps but it doesn’t really matter as we’re adding numbers.

To make this point clearer let’s suppose we want to multiply all ages, i.e. 50*18*25…. We’ll need to change the age values otherwise not even a long will be able to hold the total. Let’s go with some small numbers – and risk being accused of favouring child employment:

List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(UUID.randomUUID(), "Elvis", 1));
        employees.add(new Employee(UUID.randomUUID(), "Marylin", 2));
        employees.add(new Employee(UUID.randomUUID(), "Freddie", 3));
        employees.add(new Employee(UUID.randomUUID(), "Mario", 4));
        employees.add(new Employee(UUID.randomUUID(), "John", 5));
        employees.add(new Employee(UUID.randomUUID(), "Julia", 6));        
        employees.add(new Employee(UUID.randomUUID(), "Lotta", 7));
        employees.add(new Employee(UUID.randomUUID(), "Eva", 8));
        employees.add(new Employee(UUID.randomUUID(), "Anna", 9)); 

What do you think will be the result of the below calculation?

Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
int totalAge = employeeAges.reduce(0, (empAge1, empAge2) -> empAge1 * empAge2);

Those who responded with “0” are correct. 0 is passed in as the first parameter in the first step along with the first age. 0 multiplied by any number is 0 so even the second step will yield 0 and so on. So for a multiplication you’ll need to provide 1:

Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
int totalAge = employeeAges.reduce(1, (empAge1, empAge2) -> empAge1 * empAge2);

…where totalAge will hold the correct value of 362880.

The identity value has another usage as well: if the source stream is empty after a terminal operation, i.e. if “employees” has no Employee objects at all then even the “employeeAges” stream will be empty. In that case the reduce function has nothing to work on so the identity value will be returned.

Example:

List<Employee> employees = new ArrayList<>();
Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
int totalAge = employeeAges.reduce(10, (empAge1, empAge2) -> empAge1 + empAge2);

totalAge will be 10.

Also, if the source stream yields only one element then the result will be that element and the identity combined.

Example:

List<Employee> employees = new ArrayList<>();
employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
Stream<Integer> employeeAges = employees.stream().map(emp -> emp.getAge());
int totalAge = employeeAges.reduce(10, (empAge1, empAge2) -> empAge1 + empAge2);

totalAge will be 10 + 50 = 60.

There are other Reduce functions for streams that are pretty self-explanatory:

  • count()
  • allMatch(), noneMatch(), anyMatch()
  • min, max
  • findFirst, findAny

We will look at min, max, findFirst and findAny in the next post as they are slightly different from the others.

One last note before we finish: if you try to run two terminal operations on the same stream then you’ll get an exception. You can only execute one terminal operation on a stream and it will be closed after that. To prevent that you should avoid assigning a variable to the stream and instead call [collection].stream() every time you want to create a new stream.

In the next post we’ll take a look at cases when the reducer function may not return anything.

View all posts related to Java here.

The Java Stream API part 2: the Map phase

Introduction

In the previous post we started looking into the new Stream API of Java 8 which makes working with collections easier. LINQ to Collections in .NET makes it a breeze to run queries on lists, maps – dictionaries in .NET – and other list-like objects and Java 8 is now coming with something similar. My overall impression is that LINQ in .NET is more concise and straightforward than the Stream API in Java.

In this post we’ll investigate Streams in greater detail.

Lazy execution of streams

If you’re familiar with LINQ statements in .NET then the notion of lazy or deferred execution is nothing new to you. Just because you have a LINQ statement, such as…

IEnumerable<Customer> customers = from c in DbContext.Customers where c.Id > 30 select c;

…the variable “customers” will not hold any data yet. You can execute the filter query with various other non-deferring operators like “ToList()”. We have a similar situation in the Stream API. Recall our Java code from the previous part:

Stream<Integer> of = Stream.of(1, 2, 4, 2, 10, 4, 40);
Predicate<Integer> pred = Predicate.isEqual(4);
Stream<Integer> filter = of.filter(pred);

The object called “filter” will at this point not hold any data. Writing the C# LINQ statement above won’t execute anything – writing of.filter(pred) in Java won’t execute anything either. They are simply declarations that describe what we want to do with a Collection. This is true for all methods in the Stream interface that return another Stream. Such operations are called intermediary operations. Methods that actually “do something” are called terminal operations or final operations.

Recall our Employee class from the previous part. We also had a list of employees:

List<Employee> employees = new ArrayList<>();
employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
.
.
.
employees.add(new Employee(UUID.randomUUID(), "Anna", 20));

Based on the above statements about a Stream object, can you guess what the List object called “filteredNames” will contain?

List<String> filteredNames = new ArrayList<>();
Stream<Employee> stream = employees.stream();
        
Stream<Employee> peekEmployees = employees.stream().peek(System.out::println);
Stream<Employee> filteredEmployees = peekEmployees.filter(emp -> emp.getAge() > 30);
Stream<Employee> peekFilteredEmployees = filteredEmployees.peek(emp -> filteredNames.add(emp.getName()));

The “peek” method is similar to forEach but it returns a Stream whereas forEach is void. Here we simply build Stream objects from other Stream objects. Those who answered “nothing” in response to the above questions were correct. “filteredNames” will remain an empty collection as we only declared our intentions to filter the source. The first “peek” method which invokes println won’t be executed, there will be nothing printed on the output window.

So if you’d like to “execute your intentions” then you’ll need to pick a terminal operation, such as forEach:

List<String> filteredNames = new ArrayList<>();
Stream<Employee> stream = employees.stream();
       
Stream<Employee> peekEmployees = employees.stream().peek(System.out::println);
Stream<Employee> filteredEmployees = peekEmployees.filter(emp -> emp.getAge() > 30);
filteredEmployees.forEach(emp -> filteredNames.add(emp.getName()));

The forEach loop will fill the filteredNames list correctly. Also, the System.out::println bit will be executed.

The map() operation

We mentioned the MapReduce algorithm in the previous post as it is extensively used in data mining. We are looking for meaningful information from a data set using some steps, such as Map, Filter and Reduce. We don’t always need all of these steps and we saw some very simple examples before. The Map step is represented by the map() intermediary operation which returns another Stream – hence it won’t execute anything:

Stream<Employee> employeeStream = employees.stream();
Stream<String> employeeNamesStream = employeeStream.map(emp -> emp.getName());

Our intention is to collect the names of the employees. We can do it as follows:

List<String> employeeNames = new ArrayList<>();
Stream<Employee> employeeStream = employees.stream();
employeeStream.map(emp -> emp.getName()).forEach(employeeNames::add);

We can also do other string operations like here:

List<String> employeeNames = new ArrayList<>();
Stream<Employee> employeeStream = employees.stream();
employeeStream.map(emp -> emp.getId().toString().concat(": ").concat(emp.getName())).forEach(employeeNames::add);

…where the employeeNames list will contain concatenated strings of the employee ID and name.

The flatMap() operation

You can use the flatMap operation to flatten a stream of streams. Say we have 3 different Employee lists:

List<Employee> employeesOne = new ArrayList<>();
employeesOne.add(new Employee(UUID.randomUUID(), "Elvis", 50));
employeesOne.add(new Employee(UUID.randomUUID(), "Marylin", 18));
employeesOne.add(new Employee(UUID.randomUUID(), "Freddie", 25));
employeesOne.add(new Employee(UUID.randomUUID(), "Mario", 43));
        
List<Employee> employeesTwo = new ArrayList<>();
employeesTwo.add(new Employee(UUID.randomUUID(), "John", 35));
employeesTwo.add(new Employee(UUID.randomUUID(), "Julia", 55));        
employeesTwo.add(new Employee(UUID.randomUUID(), "Lotta", 52));
        
List<Employee> employeesThree = new ArrayList<>();
employeesThree.add(new Employee(UUID.randomUUID(), "Eva", 42));
employeesThree.add(new Employee(UUID.randomUUID(), "Anna", 20));

Then suppose that we have a list of lists of employees:

List<List<Employee>> employeeLists = Arrays.asList(employeesOne, employeesTwo, employeesThree);

We can collect all employee names as follows:

List<String> allEmployeeNames = new ArrayList<>();
        
employeeLists.stream()
                .flatMap(empList -> empList.stream())
                .map(emp -> emp.getId().toString().concat(": ").concat(emp.getName()))
                .forEach(allEmployeeNames::add);

We first flatten the streams from the individual Employee lists then run the map function to retrieve the concatenated IDs and names. We finally put the elements into the allEmployeeNames collection.

Find the next post here where we go through the Reduce phase.

View all posts related to Java here.

The Java Stream API part 1: the basics

Introduction

Java 8 has a new API called the Stream API. The Stream API, which is represented by the typed interface Stream of T, targets collections. It is a brand new concept in Java and its importance and purpose can be likened to that of LINQ to Collections in .NET. It provides a mechanism to process data in some collection using the MapReduce or Map/Filter/Reduce algorithm.

Short summary of MapReduce

MapReduce is eagerly used in data mining and big data applications to find information from a large, potentially unstructured data set. Don’t worry, we won’t need any big data cluster to test the Stream API as even the smallest collections can be analysed. E.g. finding the average age of all Employees who have been employed for more than 5 years is a good candidate for the Stream API.

The Stream API introduces automatic parallelism in the computations without us having to write any extra technical code. We can avoid tedious intermediary stages, like looping through all employees to find the ones who have spent more than 5 years at the company and then calculating the average on them. That is an important goal of the Stream API, i.e. to avoid intermediary results and collections for the computations.

The individual parts of Map/Filter/Reduce, i.e. the Map, the Filter and the Reduce are steps or operations in a chain to compute something from a collection. Not all 3 steps are required in all data mining cases. Examples:

  • Finding the average age of employees who have been working at a company for more than 5 years: you map the age property of each employee to a list of integers but filter out those who have been working for less than 5 years. Then you calculate the average of the elements in the integer list, i.e. reduce the list to a single outcome.
  • Finding the ids of every employee: if the IDs are strings then you can map the ID fields into a list of strings, there’s no need for any filtering or reducing.
  • Finding the average age of all employees: you map the age of each employee into an integer list and then calculate the average of those integers in the reduce phase, there’s no need for filtering
  • Find all employees over 50 years of age: we filter out the employees who are younger than 50, there’s no need for mapping or reducing the employees collection.

MapReduce implementations in reality can become quite complex depending on the query and structure of the source data. We won’t go into those at all – I couldn’t even if I wanted to as large-scale data mining is not exactly my specialty.

A Stream is an object that will represent one such step in the algorithm. Although Streams operate on Collections, a Stream is NOT a collection. A Stream will not hold any data in the same sense as a Java collection holds data. Also, a Stream should not change the source data in any way, i.e. the collection that the Stream operates on, will remain untouched by the Stream. Keep in mind though, that the Stream steps are carried out in parallel, so it’s vital that they work on the same data otherwise you’ll get unpredictable results.

First example

Enough of the theory, let’s see some code. The easiest way to create a stream is to call the stream() method on a Collection such as a List. Recall from the posts on lambda expressions how we defined a forEach loop on a list of strings. We’ll first add the names of the employees to a string list in the old way and the print the names according to the new Lambda way:

List<String> names = new ArrayList<>();
for (Employee employee : companyEmployees)
{
       names.add(employee.getName());
}
Consumer<String> printConsumer = System.out::println;
names.forEach(printConsumer);

Read further on for a reminder on the Employee class.

The forEach method is also available on a Stream so the below code will perform the same:

Consumer<String> printConsumer = System.out::println;
Stream<String> stream = names.stream();
stream.forEach(printConsumer);

A Stream has a lot more interesting functions of course. It’s those functions where the new java.util.function functional interfaces will come in handy. If you don’t know what that package does then read through the posts on lambda expressions in Java referred to above.

Let’s revisit our Employee class for the next examples:

public class Employee
{
    private UUID id;
    private String name;
    private int age;

    public Employee(UUID id, String name, int age)
    {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public UUID getId()
    {
        return id;
    }

    public void setId(UUID id)
    {
        this.id = id;
    }

    public String getName()
    {
        return name;
    }

    public void setName(String name)
    {
        this.name = name;
    }    
    
    public int getAge()
    {
        return age;
    }

    public void setAge(int age)
    {
        this.age = age;
    }
}

…and we have the following collection:

List<Employee> employees = new ArrayList<>();
employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
employees.add(new Employee(UUID.randomUUID(), "Marylin", 18));
employees.add(new Employee(UUID.randomUUID(), "Freddie", 25));
employees.add(new Employee(UUID.randomUUID(), "Mario", 43));
employees.add(new Employee(UUID.randomUUID(), "John", 35));
employees.add(new Employee(UUID.randomUUID(), "Julia", 55));        
employees.add(new Employee(UUID.randomUUID(), "Lotta", 52));
employees.add(new Employee(UUID.randomUUID(), "Eva", 42));
employees.add(new Employee(UUID.randomUUID(), "Anna", 20));

Say we need to find all employees aged 50 and above:

Stream<Employee> stream = employees.stream();
Stream<Employee> fiftyAndAbove = stream.filter(emp -> emp.getAge() >= 50);

The filter() method of a Stream accepts a Predicate of T – Employee in this case – which will return true if the age of the employee is at least 50. Predicates can be chained with the “and”, “or” and “negate” default methods available in the Predicate interface:

Stream<Employee> stream = employees.stream();
        
Predicate<Employee> fiftyAndBelow = emp -> emp.getAge() <= 50;
Predicate<Employee> olderThanTwenty = emp -> emp.getAge() > 20;
Predicate<Employee> startsWithE = emp -> emp.getName().startsWith("E");
        
Predicate<Employee> joined = fiftyAndBelow.and(olderThanTwenty).and(startsWithE.negate());
        
Stream<Employee> filtered = stream.filter(joined);

Here we want to collect all Employees older than 20, at most 50 and whose name doesn’t start with an ‘E’.

You can create arbitrary Streams using the static “of” method of Stream:

Stream<Integer> of = Stream.of(1, 2, 4, 2, 10, 4, 40);
Predicate<Integer> pred = Predicate.isEqual(4);
Stream<Integer> filter = of.filter(pred);

Here we have a stream of integers and we want to collect the ones that are equal to 4.

If you’d like to see the contents of the stream “filter” then you can call the forEach method on it:

filter.forEach(System.out::println);

…which will correctly output 4 and 4, i.e. the two elements from stream “of” that are equal to 4.

OK, but how can we access the filtered elements? How can we look at the result of the query? We’ll see that in the next post.

View all posts related to Java here.

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: