Task Singular Vector¶
Task Singular Vector Merging (TSVM) is a model merging technique that uses Singular Value Decomposition (SVD) to combine multiple task-specific fine-tuned models into a single merged model. This method is particularly effective for merging models that have been fine-tuned on different tasks from a common pre-trained base model.
Mathematical Foundation¶
Problem Setup¶
Let \(\theta_0\) be the parameters of a pre-trained base model, and \(\{\theta_1, \theta_2, \ldots, \theta_n\}\) be the parameters of \(n\) models fine-tuned on different tasks from the same base model \(\theta_0\).
Task Vector Computation¶
For each fine-tuned model \(i\), we first compute the task vector \(\tau_i\), which represents the parameter changes from the base model:
SVD-Based Merging Algorithm¶
The core innovation of TSVM lies in applying SVD to the task vectors and then merging them in the singular vector space.
Step 1: SVD Decomposition¶
For each parameter matrix \(W^{(i)}_k\) in task vector \(\tau_i\) (where \(k\) indexes the layer/parameter group), if the matrix is 2-dimensional, we compute its SVD:
where:
- \(U^{(i)}_k \in \mathbb{R}^{m \times r}\) contains the left singular vectors
- \(S^{(i)}_k \in \mathbb{R}^{r \times r}\) is a diagonal matrix of singular values
- \(V^{(i)}_k \in \mathbb{R}^{n \times r}\) contains the right singular vectors
- \(r = \min(m, n)\) is the rank
Step 2: Dimension Reduction and Concatenation¶
To reduce memory usage and computational complexity, we apply a reduction factor:
where \(T\) is the number of tasks.
For each task \(i\), we select only the top \(\lfloor r \cdot \text{reduction_factor} \rfloor\) singular components and place them in task-specific positions within larger matrices:
where \(d = \lfloor r \cdot \text{reduction_factor} \rfloor\).
Step 3: Second-Level SVD¶
We then compute the SVD of the concatenated matrices:
Step 4: Final Reconstruction¶
The merged task vector for parameter \(k\) is reconstructed as:
Handling Non-2D Parameters¶
For parameters that are not 2-dimensional (e.g., bias vectors, normalization parameters), TSVM simply computes the arithmetic mean:
Final Model Construction¶
The final merged model parameters are obtained by adding the merged task vector to the base model:
where \(\alpha\) is an optional global scaling factor.