Why Parallel
I've written a couple of blog
posts
about how the Parallel
type class has changed in
Cats 2.0,
but those posts don't really say much about why someone using Cats should care about Parallel
in the first
place. The name suggests that it has something to do with running computations at the same time, and
while that's one of things you can do with it (via the instance for IO
in
cats-effect, for example), it has a much, much wider
range of applications. This post will focus on a real-world use case for Parallel
that at a glance
might not seem to have much in common with running things in parallel: accumulating errors while validating form inputs.
Example problem🔗
This morning I remembered a Stack Overflow question about validation in Scala that I asked almost six years ago. In the question I give an example where I have a list of pairs of strings, and I want to parse all of the strings into integers, while also verifying that the second number in each pair is larger than the first. So the following would be valid input (in CSV form):
1, 2
-100, 100
200, 3000
While this would not:
a, 1
b, c
1, 0
I want to be able to present a complete list of problems to the person who submitted the data, so I need to be able to accumulate errors while parsing and validating. In the second data set, for example, there are four errors: three non-integer strings and one wrongly ordered pair.
Without Parallel🔗
Suppose I have some Scala code for parsing numbers and validating the pairs:
import scala.util.Try
case class InvalidSizes(x: Int, y: Int) extends Exception(
s"Error: $x is not smaller than $y!"
)
def parseInt(input: String): Either[Throwable, Int] = Try(input.toInt).toEither
def checkValues(p: (Int, Int)): Either[InvalidSizes, (Int, Int)] =
if (p._1 >= p._2) Left(InvalidSizes(p._1, p._2)) else Right(p)
I can then compose these methods using operations from Cats type classes like Traverse
(my
original Stack Overflow question used Scalaz, but in this post
I'll translate):
import cats.data.EitherNel
import cats.instances.either._
import cats.instances.list._
import cats.syntax.apply._
import cats.syntax.either._
import cats.syntax.traverse._
def checkParses(p: (String, String)): EitherNel[Throwable, (Int, Int)] =
(parseInt(p._1).toValidatedNel, parseInt(p._2).toValidatedNel).tupled.toEither
def parse(input: List[(String, String)]): EitherNel[Throwable, List[(Int, Int)]] =
input.traverse(
checkParses(_).flatMap(checkValues(_).toEitherNel).toValidated
).toEither
This solution works just fine:
scala> val badInput = List(("a", "1"), ("b", "c"), ("1", "0"))
badInput: List[(String, String)] = List((a,1), (b,c), (1,0))
scala> parse(badInput).leftMap(_.toList.foreach(println))
java.lang.NumberFormatException: For input string: "a"
java.lang.NumberFormatException: For input string: "b"
java.lang.NumberFormatException: For input string: "c"
InvalidSizes: Error: 1 is not smaller than 0!
The problem that I'm complaining about in this ancient Stack Overflow question is that all of these conversions
between Either
and Validated
feel inelegant:
I bounce back and forth between
ValidationNel
and\/
[Scalaz'sEither
] as appropriate depending on whether I need error accumulation or monadic binding.
I didn't know it at the time, but I was asking for Parallel
, which as far as I can tell was first
introduced into a library in a mainstream language a year later.
With Parallel🔗
The Parallel
type class really isn't anything more than a way to generalize this process of
going back and forth between monadic and applicative contexts. It allows us to rewrite our
checkParses
and parse
methods without ever referring to Validated
:
import cats.data.EitherNel
import cats.instances.either._
import cats.instances.list._
import cats.instances.parallel._
import cats.syntax.either._
import cats.syntax.parallel._
def checkParses(p: (String, String)): EitherNel[Throwable, (Int, Int)] =
(parseInt(p._1).toEitherNel, parseInt(p._2).toEitherNel).parTupled
def parse(input: List[(String, String)]): EitherNel[Throwable, List[(Int, Int)]] =
input.parTraverse(checkParses(_).flatMap(checkValues(_).toEitherNel))
These do exactly the same thing as the versions above, but instead of having to convert Either
to Validated
manually when we want error accumulation (and then convert back to Either
when we
want monadic binding), all we have to do is use the par
versions of tupled
and traverse
.
These parallelized operations are available here because Either[E, ?]
has a Parallel
instance
(assuming we have a Semigroup
for E
). This instance encodes the fact that we can
convert Either
values to Validated
values and then use applicative operations on those values
to get error accumulation, but it hides all of that from the user, who only has to know that
somehow validation is being done in parallel instead of sequentially (where it would fail fast).
This is a pattern that comes up often. To give one other example, IO.Par
in cats-effect is a version of IO
that
doesn't support monadic sequencing, but does support parallel processing. We don't want
users to have to think about IO.Par
, though—what we want is for them to be able to indicate
which traversals or other operations in IO
they want performed in parallel as opposed to sequentially.
As in our validation example, this is exactly the value added by Parallel
: it gives us a richer
set of operations without forcing us to know or care about the underlying implementation.