How complicated is to truly learn the Streams in Java?
Why am I about to write an article about well-documented feature of Java language, which is in use for more than two years in production environments? Well, as I was writing in the article about the Lambda expressions, there is a significant difference between using the features and understanding the features. And with the Streams, it’s the same situation for me as with the Lambdas. Frankly, I am using the Streams for more than a one and half year, and I am not satisfied with my understanding and ability to use them. It’s same as it was for the Lambdas, I am going to change my understanding of the Streams. But how complicated the streams are?
Word on Intelligent IDEs
Firstly, I am going to make a small note. I would say that majority of us, developers, have a major problem with the IDEs providing to us big set of capabilities around the coding. We are too lazy to read an API documentation where our IDE is intelligent enough for example to choose the right method by calculating with the context. Or replace some older (maybe more verbose) code with newer structure. This intelligent replacing is the exact bad situation with the Streams. Even for the complex for loops is the IDE providing hint or two click’s method to replace it with the Streams. Do we look more amusing when we are using the Streams? Maybe yes. Is our new code better understandable and maintainable? Well, not always. Did we learn anything by this automatic replacement? Indeed, almost nothing. I can point out only one thing we learned - our eyes will be used to see the streams in the code and at least show us the basic syntax. But nothing under the hood, nothing about the performance of such replacement, and of course nothing about the most efficient way how to structure our code. By this statement, I am not about to say to stop using IDEs. I am just saying, before using such built-in features, we need to make sure that we know what is IDE about to do and if it’s right for that context. That’s all, so let’s back to the Streams.
Brief History of the Streams in Java
The talks about the streams in Java started before ten years (2006). For the developers, it began to be interesting far later, about five years ago (2011). At that time, it was clear that the Streams and the Lambdas will be parts of the Java language in release 8. There were some first previews and beta versions of JDK also on that time. Still surprising that I am into writing this post five years after the API was born. Maybe there was another stuff which I was interested in, but the main reason to not look under the hood was: the majority of the projects were using Java 7 with no clear perspective to migrate to Java 8. Not always is one developer changing project just for newer technologies.
The Streams are using programming technique called Monads (on them I’ll make a note later). This technique comes from the functional languages where it is also the cornerstone. And originally is the collection pipeline from unix system command lines and their pipes implementation - | . The Streams allowed easy peasy lemon squeezy (oh, beautiful phrase) parallel processing by built-in API. The Lambdas significantly increased readability and avoided the majority of boilerplate code also for the Streams. The Streams operations are in majority stateless - doesn’t influence source data, so it enforces making explicit assignments and not produce any side effects. The Streams allowed inner iteration over the external one. After all, it looks like the bright tomorrows.
Monads from the Functional Languages
Developers used to coding in imperative languages are often feared about functional concepts. I am (still) also feared. To be concrete, I am frightened just to learn something about the Monads. Even when there are some statements like Monads are simple, they are not. I heard things that if you can learn Monads and understand them, you are unable to explain them afterward. Huh. After reading plenty of materials (this, this, and this for the start) covering Monads in Java language, I would say that for Java, it’s only the definition. Monads have no real impact to us when using the Lambdas, the Streams or any other interface providing the monadic behavior (Optional for example). So you can be calm again, it’s not necessary to bother with learning Monads in a profound way to learn the Streams.
Word on the Parallel Streams
Parallelization is intended to increase runtime performance in processing of large input. As it is in other situations, for making something able to run in parallel you need deeper knowledge than for anything else. Otherwise, you can end up with hard to spot and find bugs in your software. The Parallel stream seems to be easy enough and looks like low hanging fruit. Well, in a majority of the cases it’s the case, but I can only recommend learning common pitfalls with using streams: this or this.
Bright Tomorrows …
After some time with the Streams, I figured out that there the learning curve about the Streams is not so straightforward. It’s very likely asymptotic. There are plenty of examples, lots of questions&answers out there, plenty of explanations. What I learned is, even with significant experience, there are still questions in my mind, and they are harder and harder to answer. Therefore, it’s complicated. Even when you check the official documentation, there are passages which you need to go through several times (see this) for right understanding. I don’t suspect the creators of the API from over-complication but things are not straightforward as it looks when we see the code. And therefore, one should be extremely careful to let IDEs make some automatic replacement.
Let’s discuss some of the usual areas for mistakes:
-
Order of operations matters
-
By ordering the operations preceding terminal operation, you can significantly improve or reduce the performance of the processing. By calling something before calling “filter()” doesn’t make much sense to the output of the first call is filtered nevertheless.
-
-
One line expressions
-
When the lambda inside stream operation has more than one line, it seems pretty obscure and unconventional. Usage of one liner inside streams is standard practice. When the lambdas inside operation have tension to be more verbose, I prefer to use plain old for loop. It’s much more readable and understandable than four line lambda. Using more than one line is not a common mistake but not readable and unconventional code is a mistake.
-
-
Stateful behavior
-
A stateful lambda is one whose result depends on the state which may vary during the execution of the stream. It may cause unpredictable results, aren’t safe and may influence the performance of the processing. So the best approach is to avoid dependency to mutable state and restructure the operations to be stateless at all.
-
-
Reusing the Streams
-
Java Streams cannot be reused. After the terminal operation is performed, the stream is closed and don’t accept any other operation to process.
-
-
Side-effects
-
Changing variables from outer scope, setting values to variables from outer scope, writing something to filesystem, these side effects can easily lead to confusing results. We should focus on the writing the stream operations without possible side-effects to avoid unnecessary effects. In contrast, there are small number of the stream operations which are operating (and only can) by using side-effects - forEach or peek. These should be treated with special care and is encouraged to think twice before using them.
-
-
Modifying backing collection
-
When you are using plain old for loop, you must not modify the collection you are iterating. With streams you can do anything with backing collection which will leads to unpredictable results. So it’s encouraged not modifying backed collection in the streams operations as well as before Java 8.
-
-
Parallelization of reduction operations
-
The safest way how to leverage parallelization is with reduction operations. Reduction operation takes the input sequence and produce single summary result so there is big probability that your code will be parallel safe. But be aware with the more complicated reduction operations, where can be counterproductive to use parallelization.
-
-
And more and more
-
Points above are only a subset of possible areas to make a mistake. Many of the mistakes can be solved by searching for the problems with the concrete stream operation and there is significant chance that you will get the right answer quickly and with proper explanation.
-
So how hard is truly learn the Streams in Java?
To be honest, it took me a year and half of the second year of using the Streams to be able to use them as classic for-loops before. I mean to be not depending on IDE suggestions and be familiar with the standard API methods and possibilities. And took me 14 days to deep dive into collection pipeline mechanism, questions&answers and official documentation. Even after this 14 days I am not convinced that I know everything and without in-depth experience I would be nowhere. Nevertheless, I am pretty sure that I can write the Streams on demand, without side-effects where it’s not needed, and with optimal performance. So, after all, it’s hard and not straightforward. After initial engagement and a pleasure to work with the Streams waits for everybody point of disappointment, unexpected results and lack of performance. At these times, you will need to dive a little bit deeper into the Streams, and I would be glad if my post will be helpful for you.
P.S. If you enjoyed this post, you can share this post anywhere as well as follow me on twitter to stay in touch with my further articles and other thoughts.
P.S.2 Cover image by Zachary Young | unsplash.com.