Tim Sort
Table of Contents
- IntroductionInsertion SortVisualizationImplementationTime & Space ComplexityMerge SortVisualizationImplementationTime & Space ComplexityTim SortImplementationTime & Space ComplexityOptimization1. Descending Order Judgment2. Dynamic Calculation of minRun3. Binary Insertion Sort4. Optimizing merge (balanced merge)Summary參考資料
Introduction
Array.prototype.sort
is a frequently used array method in development. Although ECMAScript does not specify which sorting algorithm to use, the Node V8 engine and most browsers use Tim Sort.
Tim Sort was invented by Tim Peters in 2002, initially to address performance issues with Python's list.sort() method under certain conditions. It was added to Java SE 7 in 2009, and to ECMAScript 2019 in 2017.
Tim Sort is a stable sorting algorithm with several characteristics:
- Time complexity is O(n log n), and space complexity is O(n).
- Stability: It ensures that equal elements maintain their original order.
- Adaptivity: Tim Sort performs very efficiently on partially sorted arrays.
- In-place sorting: No extra memory space is required.
Since Tim Sort is a combination of Insertion Sort and Merge Sort, we will first introduce these two sorting algorithms.
Insertion Sort
Think back to when you play poker, at the beginning we sort by size, and Insertion Sort simulates this way, starting from the second card, inserting the card into the previously sorted cards until the last card.
Visualization
Or refer to the visualization algorithm of Princeton
Implementation
Insertion Sort will traverse from the lower bound lo to the upper bound hi of the array. In each iteration, it will compare (compare) the current element arr[i] with the previous element arr[i - 1]. If the current value arr[i] is larger, it will swap (swap) positions until the last element. Although Insertion Sort is not efficient when there is a large amount of data, it performs better when there is a small amount of data."
const sort = require('./sort.js'); const http = require('http'); const hostname = '127.0.0.1'; const port = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/html'); res.end(`${sort.insertion([7, 10, 5, 3, 8, 4, 2, 9, 6])}`); }); server.listen(port, hostname, () => { console.log(`Server running at http://${hostname}:${port}/`); });
Time & Space Complexity
Sorting Algorithm | Time Complexity (Best/Worst) | Space Complexity |
---|---|---|
Insertion Sort | O(n) / O(n^2) | O(1) |
The best-case time complexity of Insertion Sort is O(n)
, the worst-case is O(n^2)
, and the average is O(n^2)
. This is because the time complexity depends on the number of swaps, and the worst-case scenario is when the array is in reverse order. The space complexity is O(1)
.
Merge Sort
Merge Sort is a divide and conquer algorithm that splits the array into two subarrays through recursion until there is only one element in the array, and then merges the two subarrays into a sorted array.
As the name suggests, divide and conquer involves dividing the problem into multiple subproblems through the division step, then solving the subproblems through recursion, and finally merging the solutions to the subproblems.
Visualization
Or refer to the visualization algorithm of Princeton
Implementation
Merge Sort can be divided into two steps. The first step is to split the array into two subarrays and sort the two subarrays. The second step is to compare the first element of the two subarrays and put the smaller element into a new array until both subarrays are put into the new array.
const sort = require('./sort.js'); const http = require('http'); const hostname = '127.0.0.1'; const port = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/html'); let arr = [7, 10, 5, 3, 8, 4, 2, 9, 6]; sort.mergeSort(arr); res.end(`${arr}`); }); server.listen(port, hostname, () => { console.log(`Server running at http://${hostname}:${port}/`); });
Time & Space Complexity
Sorting Algorithm | Time Complexity (Best/Worst) | Space Complexity |
---|---|---|
Insertion Sort | O(n) / O(n^2) | O(1) |
Merge Sort | O(nlog(n)) / O(n^2) | O(n) |
From the diagram, you can see that during the divide process, the array is divided into two subarrays, so it will execute log(n)
times. During the conquer process, the two subarrays are merged into a sorted array, so it will execute n
times. Therefore, the time complexity is O(n log n)
. The space complexity is O(n)
.
Tim Sort
The main concept of Tim Sort is the assumption that the elements in the array are partially sorted, and these partially sorted elements are called runs. Therefore, in Tim Sort, the array is first divided into multiple runs, and then these runs are merged into a sorted array. This continues until all elements in the array are merged into a sorted array.
Implementation
- Initially, a minimum run length (
MIN_MERGE
) is defined. For example, if the array size is 10 and the minimum run length is 4, then the array will be divided into 3 runs, which are 4, 4, 2 respectively. - Next, the array is traversed to find consecutive runs, and these runs are sorted using insertion sort.
- These runs are then put into a stack.
- The runs in the stack are merged one by one using merge, until there is only one run left in the stack. This run is the sorted array.
const sort = require('./sort.js'); const http = require('http'); const hostname = '127.0.0.1'; const port = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/html'); let arr = [7, 10, 5, 3, 8, 4, 2, 9, 6]; sort.timSort(arr); res.end(`${arr}`); }); server.listen(port, hostname, () => { console.log(`Server running at http://${hostname}:${port}/`); });
Let's break down the process using the input array [7, 10, 5, 3, 8, 4, 2, 9, 6]
:
Since MIN_MERGE
is 4, in the first iteration, we will sort the run [7, 10, 5, 3]
using insertion sort and push it into the stack. At this point, the stack is [{ base: 0, len: 4 }]
.
Next, we sort the run [8, 4, 2, 9]
using insertion sort and push it into the stack. At this point, the stack is [{ base: 0, len: 4 }, { base: 4, len: 4 }]
.
Finally, we sort the run [6]
using insertion sort and push it into the stack. At this point, the stack is [{ base: 0, len: 4 }, { base: 4, len: 4 }, { base: 8, len: 1 }]
.
Before merging, arr becomes [3, 5, 7, 10, 2, 4, 8, 9, 6]
. Then we start merging. We first merge the two runs [{ base: 0, len: 4 }, { base: 4, len: 4 }]
. After merging, the result is [2, 3, 4, 5, 7, 8, 9, 10, 6]
, and the stack is [{ base: 0, len: 8 }, { base: 8, len: 1 }]
.
Next, we merge the two runs [{ base: 0, len: 8 }, { base: 8, len: 1 }]
. After merging, the result is [2, 3, 4, 5, 6, 7, 8, 9, 10]
, and the stack is [{ base: 0, len: 9 }]
. At this point, there is only one run left in the stack, so the sorting is complete.
Time & Space Complexity
Sorting Algorithm | Time Complexity (Best/Worst) | Space Complexity |
---|---|---|
Insertion Sort | O(n) / O(n^2) | O(1) |
Merge Sort | O(nlog(n)) /O(nlog(n)) | O(n) |
Tim Sort | O(n) / O(nlog(n)) | O(n) |
Optimization
1. Descending Order Judgment
The first possible optimization is that the run may be in descending order, like [4, 3, 2, 1]
. In this case, it only needs to be reversed to become ascending order, instead of performing an insertion sort like dealing with a random order like [7, 10, 5, 3]
. So, during traversal, we can judge if this run is in descending order, then reverse it if so.
First, we can create a new function countRunAndMakeAscending to calculate the length of the run, and determine whether it is descending. If it is, it will be reversed through reverseRange.
function countRunAndMakeAscending(arr, lo, hi) {let runHi = lo + 1;if (runHi === hi) {return 1;}// check if it'sif (arr[lo] > arr[runHi]) {while (runHi < hi && arr[runHi] > arr[runHi - 1]) {runHi++;}reverseRange(arr, lo, runHi);} else {// 升冪while (runHi < hi && arr[runHi] >= arr[runHi - 1]) {runHi++;}}return runHi - lo;}
// reversefunction reverseRange(arr, lo, hi) {hi--;while (lo < hi) {[arr[lo], arr[hi]] = [arr[hi], arr[lo]];lo++;hi--;}}
Next, we add this countRunAndMakeAscending
function to the previously written timSort
. But if runLen is less than MIN_MERGE
, we will need to forcibly add more elements to the run and sort them.
Another optimization is that after countRunAndMakeAscending
, we know that [lo, lo+runLen] is already sorted, so we can start directly from lo+runLen during the insertion sort.
minRun
2. Dynamic Calculation of minRun
is a concept in Tim sort, referring to the minimum length of a run (a non-decreasing or non-increasing value sequence). The setting of minRun ensures that this size can achieve a balance between the efficiency of internal sorting in the run and the efficiency of merging runs.
The calculation method of minRun
is as follows:
- If
n < MIN_MERGE
, thenminRun = n
. (If the length of the array is less than MIN_MERGE (such as 64), there is no need to decompose it further. Direct sorting is fine.) - If
n >= 64
, a minRun will be chosen. For example, if the length of the array is 2048, then the ideal run size should be 32, so that it can be evenly decomposed into 64 runs. This ensures that we have enough runs to carry out the merge operation, and each run is the same size, making full use of the efficiency of the merge operation.
This means that when n is a power of 2, minRun will be n/2, so that we can ensure that each merged run is a power of 2, ensuring the efficiency of the merge.
function minRunLength(n) {let r = 0;while (n >= MIN_MERGE) {r |= n & 1;n >>= 1;}return n + r;}
This function uses bitwise operations to determine the minRun value. The MIN_MERGE constant is the minimum size for a run to be eligible for merging. This function keeps halving the input n until it is less than MIN_MERGE. Then, it returns n + r, which gives the minRun length.
3. Binary Insertion Sort
The third optimization is to change insertion sort into binary insertion sort. But before that, let's review what binary search is. Binary search is a search algorithm that finds a specific element in a sorted array. The search starts from the middle. If the target element is smaller than the middle element, the search range is reduced to the left half. Otherwise, it's reduced to the right half. This continues until the target element is found or the search range has been reduced to 0.
This can be applied directly to insertion sort. Since insertion sort starts from the left and ensures that the elements on the left are already sorted, binary search can be used to replace the original linear search (compare one by one) and reduce the comparison time. At most, it only requires O(log n) time complexity. However, the time complexity of such insertion sort still needs to be O(n^2) because in the worst case, all elements need to be moved one place to the right. So this optimization only reduces the comparison time, but still requires element movement.
function binaryInsertionSort(arr, lo = 0, hi = arr.length, start = 0) {if (lo === start) {start += 1;}for (let i = lo; i < hi; i++) {let current = arr[i];let left = 0;let right = i;while (left < right) {let mid = Math.floor((left + right) / 2);if (current < arr[mid]) {right = mid;} else {left = mid + 1;}}for (let j = i; j > left; j--) {arr[j] = arr[j - 1];}arr[left] = current;}return arr;}
4. Optimizing merge (balanced merge)
The fourth optimization is for the merging part, with the goal of keeping each run as close to the same size as possible to improve merging efficiency.
Previously, each run was placed in a stack, and during the final merge, two runs were taken from the stack to be merged until the stack length was 1. However, this could mean that the sizes of each run might not be the same, which could lead to inefficient merging.
So the stack should meet the following two conditions:
stack[i-1].len > stack[i].len + stack[i+1].len
: The length of the i-1 run in the stack needs to be greater than the sum of the lengths of the next runs.stack[i].len > stack[i+1].len
: The length of the i run in the stack needs to be greater than the length of the next run.
Next, create a mergeCollapse function to check whether the stack meets the above two conditions. If not, it merges until it does.
function mergeCollapse(arr, stack) {while (stack.length > 1) {let n = stack.length - 2;if ((n > 0 && stack[n - 1].len <= stack[n].len + stack[n + 1].len) ||(n > 1 && stack[n - 2].len <= stack[n - 1].len + stack[n].len)) {if (stack[n - 1].len < stack[n + 1].len) {// if stack[n-1].len < stack[n+1].len, merge stack[n-1] and stack[n]// else merge stack[n] and stack[n+1]n -= 1;}} else if (stack[n].len > stack[n + 1].len) {break;}mergeAt(arr, stack, n);}}
Next is the mergeAt function. This function will merge the i-th run and the (i+1)-th run in the stack. The merged run is then placed at the i-th position in the stack, and the (i+1)-th run is removed.
function mergeAt(arr, stack, i) {if (i !== stack.length - 2 || i !== stack.length - 3) {throw new Error('Invalid arguments');}const { base: base1, len: len1 } = stack[i];const { base: base2, len: len2 } = stack[i + 1];stack[i] = {base: base1,len: len1 + len2,};if (i === stack.length - 3) {stack[i + 1] = stack[i + 2];}stack.pop();merge(arr, base1, base2, base2 + len2);}
Summary
- Tim Sort is essentially a combination of insertion sort and merge sort.
- Most of the optimizations mainly target the insertion sort and Merge sort parts, making the insertion sort more efficient when the data size is small, and enhancing the efficiency of the merge part in merge sort.
- The first optimization is to reverse a run if it's in descending order, to ascending order, which can avoid unnecessary sorting.
- The second optimization uses minRun to ensure that during the sorting process, each run (i.e., each sub-array that needs to be sorted separately) has a minimum size. This ensures that this size can achieve a balance between the efficiency of performing internal sorting and merging runs.
- The third optimization modifies the part of insertion sort to binary insertion sort. Although the time complexity is still O(n^2), binary insertion sort is more efficient than the general insertion sort because it reduces the number of comparisons.
- Lastly, the optimization targets the merging part, ensuring that each run maintains the same size during the merging stage to enhance the efficiency of merging.
const sort = require('./sort.js'); const http = require('http'); const hostname = '127.0.0.1'; const port = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/html'); let arr = [7, 10, 5, 3, 8, 4, 2, 9, 6]; sort.timSort(arr); res.end(`${arr}`); }); server.listen(port, hostname, () => { console.log(`Server running at http://${hostname}:${port}/`); });