Methodology

The methodology section introduces the Linear Threshold Model (LTM), a tool used to simulate influence propagation in networks. It explains the significance of solving the influence maximization problem and outlines the hill-climbing algorithm used for optimization.

Setting Up

Linear Threshold Model (LTM) is one of the famous models simulating the spread of influence in networks. Continuing with the movie-inspired analogy, let’s delve into the mechanics of how influence spreads through a network, denoted as \(G\). In this directed network, each individual is represented as a node, categorized as either active (an adopter of the idea) or inactive. Drawing from our motivation, the general assumption is that each node tends to become active.

Now, let’s consider the process in which nodes transition from being inactive to active. This transition occurs monotonically, meaning nodes can only shift from being inactive to active, but not the other way around. Thus, as time progresses, more and more of a node \(v\)’s neighbors become active. At some point, a neighbor node \(w\) influences \(v\) to become active, triggering further decisions among other nodes connected to \(v\).

Linear Threshold Model (LTM)

Linear Threshold Model example

Figure 1: Linear Threshold Model example

In the Linear Threshold model, a node \(v\) is influenced by each neighbor \(w\) with a weight \(b_{v,w}\) such that \[\sum_{w \text{ neighbor of v}} b_{v,w} \le 1\] . Each node \(v\) has a threshold \(\theta_v\) which is randomly drawn from the uniform distribution over the interval \([0,1]\), representing the different levels of tendency for each node to adopt the idea from their neighbors.

With a given a random threshold and an initial set of active nodes \(A_0\) (while all other nodes are inactive), the model operates in discrete steps. In step \(t\), all nodes that are active in step \(t-1\) remain active, and any inactive node \(v\) in step \(t-1\) becomes active if the weighted sum of its active neighbors is at least \(\theta_v\): \[\sum_{w\text{ active neighbors of v}} b_{v,w}\ge \theta_v\]

Influence Maximization Problem

The Influence Maximization Problem focuses on finding the best starting nodes to kick off the spread of influence in a network. It’s all about figuring out: Where should we begin to maximize the nodes’ impact?

To crack this problem, we define the influence of a set of nodes \(A\), denoted as \(\sigma(A)\), as the expected number of active nodes at the end of the process, assuming \(A\) is our initial set of active nodes \(A_0\). So, the goal of the influence maximization problem is to identify the best initial set of nodes \(A_0\), given a certain number \(k\) where \(k\in \bf{N}\), to maximize our influence.

Why is it important?

This problem is important because it helps us understand how information or behavior spreads in networks. By finding the most influential starting points, we can set off a chain reaction that gets more and more people on board with our idea or action. This matters a lot in areas like marketing, where we want to reach as many customers as possible, in disease control, where we aim to stop outbreaks before they spread, and in understanding social networks, where we want to see how trends catch on among groups of people.

Apporximation for Influence Maximization

However, it turns out that the influence maximization problem is acutally NP-hard, meaning that it could not be solved in polynomial time (assuming \(P\neq NP\)). Thus, solving it efficiently isn’t straightforward–it’s a really complex problem that could take a long time to solve, especially as the size of the network is huge (Processing 30,000 nodes will take days to complete).

Therefore, the best approach will be approximating the influence maximization problem. Here are several approximation guarantees to ensure that the following approximation method works.

Algorithm: Greedy Hill-Climbing

One proposed approximation strategy is to use the greedy hill-climbing.

Greedy hill-climbing is an iterative algorithm that begins with an arbitrary solution to a problem and then tries to enhance it by making small incremental adjustments. If a change results in a better solution, the algorithm adopts it and continues making further adjustments until no further improvements can be made.

In the context of the influence maximization problem, we employ greedy hill-climbing to identify the most influential nodes. We begin with an empty active set \(A_0\). At each step \(i\), the algorithm selects one node to activate, aiming to maximize its influence on the network. However, it’s important to note that this approach may only lead to finding local maximum seed nodes, rather than a globally optimal solution.

This approach achieves an approximation ratio of 63% but performs fairly slow and not salable.

Hill Climbing Algorithm. Created by Mahyavanshi, B.

Figure 2: Hill Climbing Algorithm. Created by Mahyavanshi, B.

References