The Java Stream API part 2: the Map phase
November 16, 2014 Leave a comment
Introduction
In the previous post we started looking into the new Stream API of Java 8 which makes working with collections easier. LINQ to Collections in .NET makes it a breeze to run queries on lists, maps – dictionaries in .NET – and other list-like objects and Java 8 is now coming with something similar. My overall impression is that LINQ in .NET is more concise and straightforward than the Stream API in Java.
In this post we’ll investigate Streams in greater detail.
Lazy execution of streams
If you’re familiar with LINQ statements in .NET then the notion of lazy or deferred execution is nothing new to you. Just because you have a LINQ statement, such as…
IEnumerable<Customer> customers = from c in DbContext.Customers where c.Id > 30 select c;
…the variable “customers” will not hold any data yet. You can execute the filter query with various other non-deferring operators like “ToList()”. We have a similar situation in the Stream API. Recall our Java code from the previous part:
Stream<Integer> of = Stream.of(1, 2, 4, 2, 10, 4, 40); Predicate<Integer> pred = Predicate.isEqual(4); Stream<Integer> filter = of.filter(pred);
The object called “filter” will at this point not hold any data. Writing the C# LINQ statement above won’t execute anything – writing of.filter(pred) in Java won’t execute anything either. They are simply declarations that describe what we want to do with a Collection. This is true for all methods in the Stream interface that return another Stream. Such operations are called intermediary operations. Methods that actually “do something” are called terminal operations or final operations.
Recall our Employee class from the previous part. We also had a list of employees:
List<Employee> employees = new ArrayList<>(); employees.add(new Employee(UUID.randomUUID(), "Elvis", 50)); . . . employees.add(new Employee(UUID.randomUUID(), "Anna", 20));
Based on the above statements about a Stream object, can you guess what the List object called “filteredNames” will contain?
List<String> filteredNames = new ArrayList<>();
Stream<Employee> stream = employees.stream();
Stream<Employee> peekEmployees = employees.stream().peek(System.out::println);
Stream<Employee> filteredEmployees = peekEmployees.filter(emp -> emp.getAge() > 30);
Stream<Employee> peekFilteredEmployees = filteredEmployees.peek(emp -> filteredNames.add(emp.getName()));
The “peek” method is similar to forEach but it returns a Stream whereas forEach is void. Here we simply build Stream objects from other Stream objects. Those who answered “nothing” in response to the above questions were correct. “filteredNames” will remain an empty collection as we only declared our intentions to filter the source. The first “peek” method which invokes println won’t be executed, there will be nothing printed on the output window.
So if you’d like to “execute your intentions” then you’ll need to pick a terminal operation, such as forEach:
List<String> filteredNames = new ArrayList<>();
Stream<Employee> stream = employees.stream();
Stream<Employee> peekEmployees = employees.stream().peek(System.out::println);
Stream<Employee> filteredEmployees = peekEmployees.filter(emp -> emp.getAge() > 30);
filteredEmployees.forEach(emp -> filteredNames.add(emp.getName()));
The forEach loop will fill the filteredNames list correctly. Also, the System.out::println bit will be executed.
The map() operation
We mentioned the MapReduce algorithm in the previous post as it is extensively used in data mining. We are looking for meaningful information from a data set using some steps, such as Map, Filter and Reduce. We don’t always need all of these steps and we saw some very simple examples before. The Map step is represented by the map() intermediary operation which returns another Stream – hence it won’t execute anything:
Stream<Employee> employeeStream = employees.stream(); Stream<String> employeeNamesStream = employeeStream.map(emp -> emp.getName());
Our intention is to collect the names of the employees. We can do it as follows:
List<String> employeeNames = new ArrayList<>(); Stream<Employee> employeeStream = employees.stream(); employeeStream.map(emp -> emp.getName()).forEach(employeeNames::add);
We can also do other string operations like here:
List<String> employeeNames = new ArrayList<>();
Stream<Employee> employeeStream = employees.stream();
employeeStream.map(emp -> emp.getId().toString().concat(": ").concat(emp.getName())).forEach(employeeNames::add);
…where the employeeNames list will contain concatenated strings of the employee ID and name.
The flatMap() operation
You can use the flatMap operation to flatten a stream of streams. Say we have 3 different Employee lists:
List<Employee> employeesOne = new ArrayList<>();
employeesOne.add(new Employee(UUID.randomUUID(), "Elvis", 50));
employeesOne.add(new Employee(UUID.randomUUID(), "Marylin", 18));
employeesOne.add(new Employee(UUID.randomUUID(), "Freddie", 25));
employeesOne.add(new Employee(UUID.randomUUID(), "Mario", 43));
List<Employee> employeesTwo = new ArrayList<>();
employeesTwo.add(new Employee(UUID.randomUUID(), "John", 35));
employeesTwo.add(new Employee(UUID.randomUUID(), "Julia", 55));
employeesTwo.add(new Employee(UUID.randomUUID(), "Lotta", 52));
List<Employee> employeesThree = new ArrayList<>();
employeesThree.add(new Employee(UUID.randomUUID(), "Eva", 42));
employeesThree.add(new Employee(UUID.randomUUID(), "Anna", 20));
Then suppose that we have a list of lists of employees:
List<List<Employee>> employeeLists = Arrays.asList(employeesOne, employeesTwo, employeesThree);
We can collect all employee names as follows:
List<String> allEmployeeNames = new ArrayList<>();
employeeLists.stream()
.flatMap(empList -> empList.stream())
.map(emp -> emp.getId().toString().concat(": ").concat(emp.getName()))
.forEach(allEmployeeNames::add);
We first flatten the streams from the individual Employee lists then run the map function to retrieve the concatenated IDs and names. We finally put the elements into the allEmployeeNames collection.
Find the next post here where we go through the Reduce phase.
View all posts related to Java here.