Submodular Functions and Greedy Maximization

8 min read

Today, we will finally dive in a more technical article! There are so many subjects to choose from, and I decided to focus on optimization, and will write a serie of articles on the subject. In this post, we will discover what submodular functions are and how one can maximize them in order to solve various problems. In these types of post, I will assume a basic knowledge of mathematics and mathematical notation.

What is a submodular function?

Let us first start with some definitions and notation. We first look at what a set function is.

Definition: A set function { f: 2^N \to \mathbb{R}} is a function that assigns real values to every subset { S \subseteq N} of a set { N} often called the “ground set”.

As an example, imagine you are going on a hike, and you are wondering what you will put inside your backpack. Then the ground set in this case could be something like { N = \{bottle, compass, sandwich, sunglasses, jacket, shoes\}}, the items you can put inside your backpack. Let’s imagine that each combination of items brings a certain value. We can then represent that using a set function! For example, we could have { f(\emptyset)=0, f(\{sunglasses\}) = 1}, f(\{bottle, compass\})=3, ... and so on for every possible subset of items.

Instead of enumerating all possibilities, we could also define a function as follows. Imagine that each item { i \in N} has some value { v_i \in \mathbb{R}}. A possible set function on this ground set could be a function computing the total value of a subset that we put inside the backpack. That is, the function in this case is { f(S) = \sum_{i \in S} v_i} for { S \subseteq N}. Here, the value of a subset is the sum of the individual value of each element. Each element has some value as we explained that does not depend on the other elements we choose in { S}. However, in many cases, the value of one item may depend on whether we have already selected some other items. For example, suppose you are building a collection of stamps of different types. The first time you find a stamp of a new type, it adds a lot of value to your collection, whereas if you find a stamp of a type you already have, it won’t add much value to your collection.

In our example, supposing we want to maximize the total value of what we put in our backpack without any more constraints, the best thing to do is to obviously pick all items (if the all { v_i}‘s are positive), so that the sum of values is as high as possible. However things could become more complicated if for example we want to maximize the value of elements we pick, but the backpack cannot take more than three items and the function is a bit more complicated. We will come back to that later.

The value of a stamp collection can be represented as a submodular function (Image by Frantisek Krejci)

Now, let us define what a submodular function is.

Definition: A set function f is submodular if { f(A) + f(B) \geq f(A \cup B) + f(A \cap B)} is true for all { A, B \subseteq N}.

This definition being a bit out of nowhere, we try to find another way to define a submodular function, in terms of how adding an element changes the value of the set function { f}. For that, we define what the marginal contribution of an element to an existing set is.

Definition: Given a set function { f}, a subset { S \subseteq N}, and an element { u \in N}, we define the marginal contribution of { u} to { S} with respect to { f} as { f(u|S) = f(S \cup \{u\})-f(S)}.

With this new definition, we can define submodularity in the following different way.

Definition: A set function { f} is submodular if and only if for all { A \subseteq B \subseteq N} and all { u \in N \setminus B}, we have that { f(u|A) \geq f(u|B)}.

This basically tells us that adding a new element { u} to a set { A} has more value than adding it to a bigger { B} containing all elements in { A} already. This is what we explained with the stamp collection. So submodularity captures the notion of diminishing returns.

Some examples

A maximum cut in a graph (By Miym – Own work, CC BY-SA 3.0)

Now, let us see some examples of submodular functions. One can verify that these functions are indeed submodular, but we won’t do it here.

Cut of a graph

The first example of submodular function is one that computes the size of a cut in a graph { G(V, E)}, with { V} and { E} being the set of vertices and the set of edges respectively. We recall that the size of a cut of { G} induced by a set { S \subseteq V} is the number of edges that crosses the cut { (S, V \setminus S)}. Formally, we can define the function that computes the cut size as being { f(S) = |\{(u, v) \in E: u \in S, v \in V \setminus S\}|} for every subset of vertices { S \subseteq V}.

Spread of information

Let us take the example where we are organizing an event, and we are ready to give tickets with reduced price to { k} people so that people talk about the event to their friends. We would want to select the { k} people that are most influencial. Now the function that computes how influential the set of { k} people we choose is submodular.


Many problems can be described as “coverage” problems. That is, assume we have a finite collection of sets { T_1, T_2, \dots, T_n}, all subsets of some universe { U}. We define the following set function { f(S) = |\cup_{i \in S} T_i|}. Basically, choosing a certain number of the sets { T_i}‘s, the function computes how much we cover the universe { U} with the union of these sets.

Maximizing monotone submodular functions with cardinality constraints

You may wonder what can we do with these functions? Well first, there are many settings where the problem at hand can be represented through a set function. When that function is additionally submodular, then there exist algorithms that can help us solve our problem. Most of the time, we are interested in optimizing the function, maybe the goal could be to find its maximum. In our example with the hiking, we would like to find which subset of items brings the highest overall value, so that is a maximization problem.

More formally, given a ground set { N} and set function { f}, if we are interested in maximizing it then we aim to find { \max_{S \subseteq V}f(S)}. Which methods could we use to solve this problem?

Naive maximization

This method is the “brute force” one, it tries every possible subset { S} of { N}, compute the corresponding value { f(S)}, and return the maximum value found. Even if the procedure is simple, it would take way too much time to do in practice!

Indeed, let’s compute the number of different possible subsets supposing that the ground set has size { |N|}. This is basically equal to

and we see that this is exponential in the size { N}.

To put some numbers, assuming it takes 1 millisecond to compute the function value of one of the subset, then the total compute time would be { 2^{|N|}} milliseconds since we compute the function for all possible subsets. In our backpacking example, there are { |N| = |\{bottle, compass, sandwich, sunglasses, jacket, shoes\}|=6} elements in the ground set, so the total compute time of the naive maximization would be { 2^{|N|}=2^6=64} milliseconds, pretty fast I agree. But now what if our problem has a ground set of size 50? Then the compute time would be { 2^{50}} milliseconds, which is roughly 13 million years so not very practical you will surely agree. I let you imagine how long it would be when the size of the ground set is 1000…

Can we do better?

Greedy maximization

We have just seen that the naive procedure is simply too time-consuming. The upside though is that after waiting a long time, we are sure to know the maximum value. We will see that with the greedy algorithm, we get a solution which is not necessarily the best one, but can be a good approximation under certain conditions.

First, let us change a bit our goal. We will now focus on maximization under cardinality constraint. All this means is that now we still want to maximize the set function, but we only care about subsets which have a size that does not exceed some integer { k}. Formally, we are interested in { \max_{S \subseteq V: |S| \leq k}f(S)}.

Notice that using a naive maximization in this case still would take a very long time because we would have to compute the function value for all subsets of size smaller or equal than { k}.

Instead, let’s see what the greedy algorithm does. It starts with an empty set { S}. Then it repeats { k} times the following: add to { S} the element of ground set { N} (not yet in { S}) which brings the largest marginal contribution to the our current set { S}.

The precise algorithm is the following:

This is much faster than the naive method because the computation time does not depend exponentially on { |N|} like in the nave maximization where it would take { 2^{|N|}} milliseconds. In fact, here the computation time is always less than { |N|^2} milliseconds. So taking the same numbers as before, when it would take 13 million years to solve with the naive method, greedy would do it in 2.5 seconds, that’s quite an improvement!

Again here, let me stress that when using the greedy method, it is not guaranteed at all to find a very good solution. However under certain conditions, there exists a theorem which guarantees that the solution found is not too far from the optimal one. But for this , we need a new definition.

Definition: Let { f} be a set function on ground set { N}. { f} is called monotone if for all { S \subseteq T \subseteq N} we have that { f(S) \leq f(T)}.

This simply means that adding more elements to an existing subset of the ground set can only increase the value of the function. When dealing with such a function which is also submodular, we can state the following:

Theorem: Let { f} be a monotone submodular function, and suppose we wish to maximize it under cardinality constraint { k}. Denote by { O} the optimal subset which maximizes the function (under the cardinality constraint). Then, if we use the greedy procedure, the set { S} produced is such that { f(S) \geq (1-(1-1/k)^k)f(O) \geq (1-1/e)f(O) \approx 0.632f(O)}.

This theorem basically say the following: if { f(O)} is the maximum value we are looking for, then greedy guarantees that the set { S} it finds has function value greater than a not too small fraction (0.632 to be precise) of the optimal value. So we know that greedy will never perform too bad in that case.


It’s time to see how the methods we discovered actually perform in practice. For that, I suggest you to click on this link and follow the Jupyter Notebook I created which contains a few examples and will show you how fast and accurate these algorithms are.

Well this is already the end, I decided not to make this post too long as it is technical, and in any case I hope you enjoyed reading it!

Please let me know what you think about the article, and if you have any tips! Would you be interested in a more practical type of article when discussing such algorithms? Or would you prefer to have step-by-step proofs of the theorems stated? Don’t hesitate to share your ideas, and I will do my best to improve on the content 🙂


The notation and examples shown are greatly inspired from the lecture notes of the Advanced Algorithms class taught at EPFL in 2019-2020.

Leave a Reply

Your email address will not be published. Required fields are marked *