← Reading the value of a performance counter on Windows with C# .NET

The Java Stream API part 2: the Map phase →

The Java Stream API part 1: the basics

November 15, 2014 Leave a comment

Introduction

Java 8 has a new API called the Stream API. The Stream API, which is represented by the typed interface Stream of T, targets collections. It is a brand new concept in Java and its importance and purpose can be likened to that of LINQ to Collections in .NET. It provides a mechanism to process data in some collection using the MapReduce or Map/Filter/Reduce algorithm.

Short summary of MapReduce

MapReduce is eagerly used in data mining and big data applications to find information from a large, potentially unstructured data set. Don’t worry, we won’t need any big data cluster to test the Stream API as even the smallest collections can be analysed. E.g. finding the average age of all Employees who have been employed for more than 5 years is a good candidate for the Stream API.

The Stream API introduces automatic parallelism in the computations without us having to write any extra technical code. We can avoid tedious intermediary stages, like looping through all employees to find the ones who have spent more than 5 years at the company and then calculating the average on them. That is an important goal of the Stream API, i.e. to avoid intermediary results and collections for the computations.

The individual parts of Map/Filter/Reduce, i.e. the Map, the Filter and the Reduce are steps or operations in a chain to compute something from a collection. Not all 3 steps are required in all data mining cases. Examples:

Finding the average age of employees who have been working at a company for more than 5 years: you map the age property of each employee to a list of integers but filter out those who have been working for less than 5 years. Then you calculate the average of the elements in the integer list, i.e. reduce the list to a single outcome.
Finding the ids of every employee: if the IDs are strings then you can map the ID fields into a list of strings, there’s no need for any filtering or reducing.
Finding the average age of all employees: you map the age of each employee into an integer list and then calculate the average of those integers in the reduce phase, there’s no need for filtering
Find all employees over 50 years of age: we filter out the employees who are younger than 50, there’s no need for mapping or reducing the employees collection.

MapReduce implementations in reality can become quite complex depending on the query and structure of the source data. We won’t go into those at all – I couldn’t even if I wanted to as large-scale data mining is not exactly my specialty.

A Stream is an object that will represent one such step in the algorithm. Although Streams operate on Collections, a Stream is NOT a collection. A Stream will not hold any data in the same sense as a Java collection holds data. Also, a Stream should not change the source data in any way, i.e. the collection that the Stream operates on, will remain untouched by the Stream. Keep in mind though, that the Stream steps are carried out in parallel, so it’s vital that they work on the same data otherwise you’ll get unpredictable results.

First example

Enough of the theory, let’s see some code. The easiest way to create a stream is to call the stream() method on a Collection such as a List. Recall from the posts on lambda expressions how we defined a forEach loop on a list of strings. We’ll first add the names of the employees to a string list in the old way and the print the names according to the new Lambda way:

List<String> names = new ArrayList<>();
for (Employee employee : companyEmployees)
{
       names.add(employee.getName());
}
Consumer<String> printConsumer = System.out::println;
names.forEach(printConsumer);

Read further on for a reminder on the Employee class.

The forEach method is also available on a Stream so the below code will perform the same:

Consumer<String> printConsumer = System.out::println;
Stream<String> stream = names.stream();
stream.forEach(printConsumer);

A Stream has a lot more interesting functions of course. It’s those functions where the new java.util.function functional interfaces will come in handy. If you don’t know what that package does then read through the posts on lambda expressions in Java referred to above.

Let’s revisit our Employee class for the next examples:

public class Employee
{
    private UUID id;
    private String name;
    private int age;

    public Employee(UUID id, String name, int age)
    {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public UUID getId()
    {
        return id;
    }

    public void setId(UUID id)
    {
        this.id = id;
    }

    public String getName()
    {
        return name;
    }

    public void setName(String name)
    {
        this.name = name;
    }    
    
    public int getAge()
    {
        return age;
    }

    public void setAge(int age)
    {
        this.age = age;
    }
}

…and we have the following collection:

List<Employee> employees = new ArrayList<>();
employees.add(new Employee(UUID.randomUUID(), "Elvis", 50));
employees.add(new Employee(UUID.randomUUID(), "Marylin", 18));
employees.add(new Employee(UUID.randomUUID(), "Freddie", 25));
employees.add(new Employee(UUID.randomUUID(), "Mario", 43));
employees.add(new Employee(UUID.randomUUID(), "John", 35));
employees.add(new Employee(UUID.randomUUID(), "Julia", 55));        
employees.add(new Employee(UUID.randomUUID(), "Lotta", 52));
employees.add(new Employee(UUID.randomUUID(), "Eva", 42));
employees.add(new Employee(UUID.randomUUID(), "Anna", 20));

Say we need to find all employees aged 50 and above:

Stream<Employee> stream = employees.stream();
Stream<Employee> fiftyAndAbove = stream.filter(emp -> emp.getAge() >= 50);

The filter() method of a Stream accepts a Predicate of T – Employee in this case – which will return true if the age of the employee is at least 50. Predicates can be chained with the “and”, “or” and “negate” default methods available in the Predicate interface:

Stream<Employee> stream = employees.stream();
        
Predicate<Employee> fiftyAndBelow = emp -> emp.getAge() <= 50;
Predicate<Employee> olderThanTwenty = emp -> emp.getAge() > 20;
Predicate<Employee> startsWithE = emp -> emp.getName().startsWith("E");
        
Predicate<Employee> joined = fiftyAndBelow.and(olderThanTwenty).and(startsWithE.negate());
        
Stream<Employee> filtered = stream.filter(joined);

Here we want to collect all Employees older than 20, at most 50 and whose name doesn’t start with an ‘E’.

You can create arbitrary Streams using the static “of” method of Stream:

Stream<Integer> of = Stream.of(1, 2, 4, 2, 10, 4, 40);
Predicate<Integer> pred = Predicate.isEqual(4);
Stream<Integer> filter = of.filter(pred);

Here we have a stream of integers and we want to collect the ones that are equal to 4.

If you’d like to see the contents of the stream “filter” then you can call the forEach method on it:

filter.forEach(System.out::println);

…which will correctly output 4 and 4, i.e. the two elements from stream “of” that are equal to 4.

OK, but how can we access the filtered elements? How can we look at the result of the query? We’ll see that in the next post.

View all posts related to Java here.

Filed under Java Tagged with Java, java 8, stream

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Exercises in .NET with Andras Nemes

The Java Stream API part 1: the basics

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

The Java Stream API part 1: the basics

Share:

Related

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow