Today, we will finally dive in a more technical article! There are so many subjects to choose from, and I decided to focus on optimization, and will write a serie of articles on the subject. In this post, we will discover what submodular functions are and how one can maximize them in order to solve various problems. In these types of post, I will assume a basic knowledge of mathematics and mathematical notation.
What is a submodular function?
Let us first start with some definitions and notation. We first look at what a set function is.
Definition: A set function
is a function that assigns real values to every subset
of a set
often called the “ground set”.
As an example, imagine you are going on a hike, and you are wondering what you will put inside your backpack. Then the ground set in this case could be something like
, the items you can put inside your backpack. Let’s imagine that each combination of items brings a certain value. We can then represent that using a set function! For example, we could have
and so on for every possible subset of items.
Instead of enumerating all possibilities, we could also define a function as follows. Imagine that each item
has some value
. A possible set function on this ground set could be a function computing the total value of a subset that we put inside the backpack. That is, the function in this case is
Here, the value of a subset is the sum of the individual value of each element. Each element has some value as we explained that does not depend on the other elements we choose in
. However, in many cases, the value of one item may depend on whether we have already selected some other items. For example, suppose you are building a collection of stamps of different types. The first time you find a stamp of a new type, it adds a lot of value to your collection, whereas if you find a stamp of a type you already have, it won’t add much value to your collection.
In our example, supposing we want to maximize the total value of what we put in our backpack without any more constraints, the best thing to do is to obviously pick all items (if the all
‘s are positive), so that the sum of values is as high as possible. However things could become more complicated if for example we want to maximize the value of elements we pick, but the backpack cannot take more than three items and the function is a bit more complicated. We will come back to that later.
Now, let us define what a submodular function is.
Definition: A set function f is submodular if
is true for all
This definition being a bit out of nowhere, we try to find another way to define a submodular function, in terms of how adding an element changes the value of the set function
. For that, we define what the marginal contribution of an element to an existing set is.
Definition: Given a set function
, a subset
, and an element
, we define the marginal contribution of
with respect to
With this new definition, we can define submodularity in the following different way.
Definition: A set function
is submodular if and only if for all
, we have that
This basically tells us that adding a new element
to a set
has more value than adding it to a bigger
containing all elements in
already. This is what we explained with the stamp collection. So submodularity captures the notion of diminishing returns.
Now, let us see some examples of submodular functions. One can verify that these functions are indeed submodular, but we won’t do it here.
Cut of a graph
The first example of submodular function is one that computes the size of a cut in a graph
being the set of vertices and the set of edges respectively. We recall that the size of a cut of
induced by a set
is the number of edges that crosses the cut
Formally, we can define the function that computes the cut size as being
for every subset of vertices
Spread of information
Let us take the example where we are organizing an event, and we are ready to give tickets with reduced price to
people so that people talk about the event to their friends. We would want to select the
people that are most influencial. Now the function that computes how influential the set of
people we choose is submodular.
Many problems can be described as “coverage” problems. That is, assume we have a finite collection of sets
, all subsets of some universe
. We define the following set function
. Basically, choosing a certain number of the sets
‘s, the function computes how much we cover the universe
with the union of these sets.
Maximizing monotone submodular functions with cardinality constraints
You may wonder what can we do with these functions? Well first, there are many settings where the problem at hand can be represented through a set function. When that function is additionally submodular, then there exist algorithms that can help us solve our problem. Most of the time, we are interested in optimizing the function, maybe the goal could be to find its maximum. In our example with the hiking, we would like to find which subset of items brings the highest overall value, so that is a maximization problem.
More formally, given a ground set
and set function
, if we are interested in maximizing it then we aim to find
. Which methods could we use to solve this problem?
This method is the “brute force” one, it tries every possible subset
, compute the corresponding value
, and return the maximum value found. Even if the procedure is simple, it would take way too much time to do in practice!
Indeed, let’s compute the number of different possible subsets supposing that the ground set has size
. This is basically equal to
and we see that this is exponential in the size
To put some numbers, assuming it takes 1 millisecond to compute the function value of one of the subset, then the total compute time would be
milliseconds since we compute the function for all possible subsets. In our backpacking example, there are
elements in the ground set, so the total compute time of the naive maximization would be
milliseconds, pretty fast I agree. But now what if our problem has a ground set of size 50? Then the compute time would be
milliseconds, which is roughly 13 million years so not very practical you will surely agree. I let you imagine how long it would be when the size of the ground set is 1000…
Can we do better?
We have just seen that the naive procedure is simply too time-consuming. The upside though is that after waiting a long time, we are sure to know the maximum value. We will see that with the greedy algorithm, we get a solution which is not necessarily the best one, but can be a good approximation under certain conditions.
First, let us change a bit our goal. We will now focus on maximization under cardinality constraint. All this means is that now we still want to maximize the set function, but we only care about subsets which have a size that does not exceed some integer
. Formally, we are interested in
Notice that using a naive maximization in this case still would take a very long time because we would have to compute the function value for all subsets of size smaller or equal than
Instead, let’s see what the greedy algorithm does. It starts with an empty set
. Then it repeats
times the following: add to
the element of ground set
(not yet in
) which brings the largest marginal contribution to the our current set
The precise algorithm is the following:
This is much faster than the naive method because the computation time does not depend exponentially on
like in the nave maximization where it would take
milliseconds. In fact, here the computation time is always less than
milliseconds. So taking the same numbers as before, when it would take 13 million years to solve with the naive method, greedy would do it in 2.5 seconds, that’s quite an improvement!
Again here, let me stress that when using the greedy method, it is not guaranteed at all to find a very good solution. However under certain conditions, there exists a theorem which guarantees that the solution found is not too far from the optimal one. But for this , we need a new definition.
be a set function on ground set
is called monotone if for all
we have that
This simply means that adding more elements to an existing subset of the ground set can only increase the value of the function. When dealing with such a function which is also submodular, we can state the following:
be a monotone submodular function, and suppose we wish to maximize it under cardinality constraint
. Denote by
the optimal subset which maximizes the function (under the cardinality constraint). Then, if we use the greedy procedure, the set
produced is such that
This theorem basically say the following: if
is the maximum value we are looking for, then greedy guarantees that the set
it finds has function value greater than a not too small fraction (0.632 to be precise) of the optimal value. So we know that greedy will never perform too bad in that case.
It’s time to see how the methods we discovered actually perform in practice. For that, I suggest you to click on this link and follow the Jupyter Notebook I created which contains a few examples and will show you how fast and accurate these algorithms are.
Well this is already the end, I decided not to make this post too long as it is technical, and in any case I hope you enjoyed reading it!
Please let me know what you think about the article, and if you have any tips! Would you be interested in a more practical type of article when discussing such algorithms? Or would you prefer to have step-by-step proofs of the theorems stated? Don’t hesitate to share your ideas, and I will do my best to improve on the content 🙂
The notation and examples shown are greatly inspired from the lecture notes of the Advanced Algorithms class taught at EPFL in 2019-2020.