Task Singular Vector¶

Task Singular Vector Merging (TSVM) is a model merging technique that uses Singular Value Decomposition (SVD) to combine multiple task-specific fine-tuned models into a single merged model. This method is particularly effective for merging models that have been fine-tuned on different tasks from a common pre-trained base model.

Mathematical Foundation¶

Problem Setup¶

Let \(\theta_0\) be the parameters of a pre-trained base model, and \(\{\theta_1, \theta_2, \ldots, \theta_n\}\) be the parameters of \(n\) models fine-tuned on different tasks from the same base model \(\theta_0\).

Task Vector Computation¶

For each fine-tuned model \(i\), we first compute the task vector \(\tau_i\), which represents the parameter changes from the base model:

\[\tau_i = \theta_i - \theta_0\]

SVD-Based Merging Algorithm¶

The core innovation of TSVM lies in applying SVD to the task vectors and then merging them in the singular vector space.

Step 1: SVD Decomposition¶

For each parameter matrix \(W^{(i)}_k\) in task vector \(\tau_i\) (where \(k\) indexes the layer/parameter group), if the matrix is 2-dimensional, we compute its SVD:

\[W^{(i)}_k = U^{(i)}_k S^{(i)}_k (V^{(i)}_k)^T\]

where:

\(U^{(i)}_k \in \mathbb{R}^{m \times r}\) contains the left singular vectors
\(S^{(i)}_k \in \mathbb{R}^{r \times r}\) is a diagonal matrix of singular values
\(V^{(i)}_k \in \mathbb{R}^{n \times r}\) contains the right singular vectors
\(r = \min(m, n)\) is the rank

Step 2: Dimension Reduction and Concatenation¶

To reduce memory usage and computational complexity, we apply a reduction factor:

\[\text{reduction_factor} = \frac{1}{T}\]

where \(T\) is the number of tasks.

For each task \(i\), we select only the top \(\lfloor r \cdot \text{reduction_factor} \rfloor\) singular components and place them in task-specific positions within larger matrices:

\[U = [U^{(1)}_k[:, :d], U^{(2)}_k[:, :d], \ldots, U^{(T)}_k[:, :d]]\]

\[S = \text{diag}(S^{(1)}_k[:d], S^{(2)}_k[:d], \ldots, S^{(T)}_k[:d])\]

\[V = [V^{(1)}_k[:, :d], V^{(2)}_k[:, :d], \ldots, V^{(T)}_k[:, :d]]\]

where \(d = \lfloor r \cdot \text{reduction_factor} \rfloor\).

Step 3: Second-Level SVD¶

We then compute the SVD of the concatenated matrices:

\[U = \hat{U} \hat{S} (\hat{V})^T\]

\[V = \hat{U} \hat{S} (\hat{V})^T\]

Step 4: Final Reconstruction¶

The merged task vector for parameter \(k\) is reconstructed as:

\[\tau_{\text{TSVM}} = \hat{U} \cdot (\hat{V})^T \cdot \text{diag}(S) \cdot \hat{U} \cdot (\hat{V})^T\]

Handling Non-2D Parameters¶

For parameters that are not 2-dimensional (e.g., bias vectors, normalization parameters), TSVM simply computes the arithmetic mean:

\[\tau_{\text{TSVM,non-2D}} = \frac{1}{T} \sum_{i=1}^{T} \tau^{(i)}_k\]

Final Model Construction¶

The final merged model parameters are obtained by adding the merged task vector to the base model:

\[\theta_{\text{TSVM}} = \theta_0 + \alpha \tau_{\text{TSVM}}\]

where \(\alpha\) is an optional global scaling factor.