数据结构之优先级队列-EW帮帮网

本文主要介绍了优先级队列和堆的实现原理。首先讲解了堆的基本概念，包括大根堆和小根堆的存储结构和父子节点关系。然后详细讲解了堆的模拟实现过程，包括创建堆、插入元素和删除元素的操作步骤及时间复杂度分析。接着分析了Java中PriorityQueue的源码实现，包括构造方法、扩容机制和比较机制。最后讨论了topK问题的解决方案，提出使用大根堆/小根堆来高效获取前k个最小/最大元素的算法思路。文章通过代码示例展示了如何实现这些数据结构，并分析了相关操作的时间复杂度。

一、优先级队列和堆

优先级队列的两个基本操作：返回高优先级对象，添加新的对象；

PriorityQueue 底层使用了堆这种数据结构，堆是在完全二叉树的基础上进行了一些调整；

堆是按照完全二叉树的顺序存储（层序遍历）的方式存储；

小根堆：孩子节点都比根节点大；

大根堆：孩子节点都比根节点小；

如果根节点的下标是 i，左孩子节点的下标为 2 * i + 1，右孩子节点的下标为 2 * i + 2；

如果孩子节点的下标是 i，根节点的下标为（i - 1）/ 2；

二、堆的模拟实现

1. 堆的创建

思路（以大根堆为例）：

从后往前遍历每一棵子树，如果根节点的值比左右孩子的节点的值小，就将左右孩子节点的最大值和根节点的值交换；

交换完成后，这棵树的左右子树有可能出现孩子节点的值比根节点的值大的情况，因此需要将已经交换的孩子节点再次作为根节点，判断根节点和左右孩子节点的值，直至交换完最后一棵子树；

createHeap(): void 创建堆，从后往前遍历每一棵子树，向下调整根节点值；

shiftDown(int parent, int usedSize): void 计算孩子节点下标，如果孩子节点的值大于根节点的值，向下调整根节点的值；

public class Heap {
    private int[] elem;
    private int usedSize;

    public Heap(){
        this.elem = new int[10];
    }

    public Heap(int[] array){
        int n = array.length;
        this.elem = new int[n];
        for(int i = 0; i < array.length; i++){
            this.elem[i] = array[i];
            this.usedSize++;
        }
    }

    public void createHeap(){
        for(int parent = (this.usedSize - 1 - 1) / 2; parent >= 0; parent--){
            shiftDown(parent, this.usedSize);
        }
    }

    private  void shiftDown(int parent, int usedSize){
        int child = parent * 2 + 1;
        while(child < usedSize){
            if(child + 1 < usedSize && this.elem[child] < this.elem[child + 1]){
                child++;
            }
            if(this.elem[parent] < this.elem[child]){
                swap(parent, child);
                parent = child;
                child = parent * 2 + 1;
            }
        }
    }

    private void swap(int i, int j){
        int tmp = this.elem[i];
        this.elem[i] = this.elem[j];
        this.elem[j] = tmp;
    }
}

2. 计算建堆的时间复杂度

假设有 n 个节点，那么堆的高度为：h = log2^(n + 1)；

计算节点的个数：从第一层到最后一层分别为 2^0, 2^1, 2^2,..., 2^(h-3), 2^(h-2), 2^(h-1)；

计算要调整的高度：最坏情况下，从第一层到倒数第二层都需要调整，最后一层不需要调整，调整的高度分别是 h - 1, h - 2, h - 3, ..., 2, 1, 0；

因此需要花费的时间为：

T(n) = 2^0 * (h - 1) + 2^1 * (h - 2) + 2^2 * (h - 3) + 2^3 * (h - 4) + ... + 2^(h - 3) * 2 + 2^(h - 2) * 1；

2 * T(n) = 2^1 * (h - 1) + 2^2 * (h - 2) + 2^3 * (h - 3) + ... + 2^(h - 2) * 2 + 2^(h - 1) * 1;

使用错位相减计算：

T(n) = -2^0 * (h - 1) + 2^1 + 2^2 + 2^3 + ... + 2^(h - 3) + 2^(h - 2) + 2^(h - 1)；

T(n) = 2^0 + 2^1 + 2^2 + 2^3 + ... + 2^(h - 3) + 2^(h - 2) + 2^(h - 1) - h；

使用等比数列求和公式：

T(n) = 1 * (1 - 2^h) / (1 - 2) - h = 2^h - 1 - h；

将 h = log2^(n + 1) 带入公式：T(n) = n - log2^(n + 1)；

当 n 足够的大情况下，log2^(n + 1）相比于 n 是可以忽略的，因此建堆的时间复杂度为 O(N)；

3. 堆的插入

思路：

先在最后一个位置放上插入的元素，再将这个元素向上调整；

向上调整时，该节点作为 child 节点和根节点发生了交换，这个节点又会作为新的 child 节点，可能仍然需要继续向上调整，因此需要循环直至 child 作为整个完全二叉树的根节点；

如果没有发生交换，说明已经是一个大根堆了，则跳出循环即可；

offer(int val): void 向堆中添加元素；

shiftUp(int child): void 如果根节点的值比孩子节点的值小，向上调整孩子节点的值；

    public void offer(int val){
        if(isFull()){
            this.elem = Arrays.copyOf(this.elem,  2 * this.elem.length);
        }
        this.elem[this.usedSize] = val;
        this.usedSize++;
        shiftUp(this.usedSize - 1);
    }

    private boolean isFull(){
        return this.usedSize == this.elem.length;
    }

    private void shiftUp(int child){
        while(child != 0){
            int parent = (child - 1) / 2;
            if(this.elem[child] > this.elem[parent]){
                swap(parent, child);
            }else{
                break;
            }
            child = parent;
        }
    }

4. 堆的删除

思路：

将根节点和最后一个位置的元素进行交换，再将根节点位置的元素向下调整；

poll(): int 弹出堆顶元素；

    public int poll(){
        if(isEmpty()){
            return -1;
        }
        swap(0, this.usedSize - 1);
        this.usedSize--;
        shiftDown(0, this.usedSize);
        return this.elem[this.usedSize];
    }

    public boolean isEmpty(){
        return this.usedSize == 0;
    }

三、优先级队列源码

PriorityQueue 默认是小根堆；

PriorityQueue 中放置的元素必须能够比较大小，否则会抛出 ClassCastException 异常；

PriorityQueue 中不能放置 null 对象，否则会抛出空指针异常；

插入和删除元素的时间复杂度为 O(logN)；

1. 构造方法

两个构造方法本质上都是调用：

public PriorityQueue(int initialCapacity,Comparator<? super E> comparator)

第一个参数为初始化容量，第二个参数为比较器；

默认的初始化容量 DEFAULT_INITIAL_CAPCITY 为 11；

public class PriorityQueue<E> extends AbstractQueue<E>
    implements java.io.Serializable {

    private static final long serialVersionUID = -7720805057305804111L;

    private static final int DEFAULT_INITIAL_CAPACITY = 11;

    transient Object[] queue; // non-private to simplify nested class access

    private int size = 0;

    private final Comparator<? super E> comparator;

    transient int modCount = 0; // non-private to simplify nested class access

    public PriorityQueue() {
        this(DEFAULT_INITIAL_CAPACITY, null);
    }

    public PriorityQueue(int initialCapacity) {
        this(initialCapacity, null);
    }

    public PriorityQueue(Comparator<? super E> comparator) {
        this(DEFAULT_INITIAL_CAPACITY, comparator);
    }

    public PriorityQueue(int initialCapacity,
                         Comparator<? super E> comparator) {
        // Note: This restriction of at least one is not actually needed,
        // but continues for 1.5 compatibility
        if (initialCapacity < 1)
            throw new IllegalArgumentException();
        this.queue = new Object[initialCapacity];
        this.comparator = comparator;
    }
}

2. 扩容机制

MAX_ARRAY_SIZE 最大的数组容量为 Integer.MAX_VALUE - 8；

grow(int minCapacity): void 扩容：

当前的容量小于 64 字节：当前容量 + 当前容量 + 2 字节；

当前容量大于等于 64 字节：当前容量 * 1.5；

如果扩容后的容量大于 MAX_ARRAY_SIZE，调用 hugeCapacity(int minCapacity): int

hugeCapacity(int minCapacity): int 大容量扩容机制：

如果所需的最小容量溢出了（变成负数），则抛出异常；

如果所需的最小容量大于 MAX_ARRAY_SIZE，扩容为 Integer.MAX_VALUE；

如果所需的最小容量小于等于 MAX_ARRAY_SIZE，扩容为 MAX_ARRAY_SIZE；

    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    /**
     * Increases the capacity of the array.
     *
     * @param minCapacity the desired minimum capacity
     */
    private void grow(int minCapacity) {
        int oldCapacity = queue.length;
        // Double size if small; else grow by 50%
        int newCapacity = oldCapacity + ((oldCapacity < 64) ?
                                         (oldCapacity + 2) :
                                         (oldCapacity >> 1));
        // overflow-conscious code
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        queue = Arrays.copyOf(queue, newCapacity);
    }

    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }

3. 比较机制

offer(E e): boolean 向堆中添加元素，如果堆中没有元素，直接添加到 0 下标，如果堆中有元素，调用 siftUp() 方法；

siftUp(int k, E x): void 向上调整，如果传了比较器，调用 siftUpUsingComparator() 方法，否则调用 siftUpComparable() 方法；

siftUpUsingComparator(int k, E x): void 使用比较器，向上调整；

计算根节点下标，将根节点下标的元素保存在变量 e 里面；

假设比较器是 a.compareTo(b):

使用比较器（调用比较器的 compare() 方法）比较 x 和 e，如果大于等于 0（x 大于等于根节点的值），跳出循环，把 x 放到堆的下一个位置，此时建立的是一个小根堆；

如果小于 0（x 的值比根节点小），把 e 放到下一个位置，向上调整，直到 x 比某一个根节点大或者已经到 0 下标的位置，把 x 放到这个位置，此时建立的是一个小根堆；

因此比较器传 a.compareTo(b)，建立的就是小根堆；

反之，比较器传 b.compareTo(b)，建立的就是大根堆；

    public boolean offer(E e) {
        if (e == null)
            throw new NullPointerException();
        modCount++;
        int i = size;
        if (i >= queue.length)
            grow(i + 1);
        size = i + 1;
        if (i == 0)
            queue[0] = e;
        else
            siftUp(i, e);
        return true;
    }

    private void siftUp(int k, E x) {
        if (comparator != null)
            siftUpUsingComparator(k, x);
        else
            siftUpComparable(k, x);
    }

    private void siftUpUsingComparator(int k, E x) {
        while (k > 0) {
            int parent = (k - 1) >>> 1;
            Object e = queue[parent];
            if (comparator.compare(x, (E) e) >= 0)
                break;
            queue[k] = e;
            k = parent;
        }
        queue[k] = x;
    }

    private void siftUpComparable(int k, E x) {
        Comparable<? super E> key = (Comparable<? super E>) x;
        while (k > 0) {
            int parent = (k - 1) >>> 1;
            Object e = queue[parent];
            if (key.compareTo((E) e) >= 0)
                break;
            queue[k] = e;
            k = parent;
        }
        queue[k] = key;
    }

四、topK问题

以前 k 个最小的元素为例：

使用优先级队列进行排序：

    public int[] smallestK(int[] arr, int k) {
        PriorityQueue<Integer> heap = new PriorityQueue<>();
        for(int x : arr) heap.offer(x);

        int[] ret = new int[k];
        for(int i = 0; i < k; i++) {
            ret[i] = heap.poll();
        }

        return ret;
    }

如果数组中元素非常多，优先级队列可能存不下，并且这样做是效率不高的；

推荐的做法：

使用数组中前 k 个数据建立一个容量为 k 的大根堆（堆顶元素是 k 个元素中最大的）；

遍历后续的元素，如果元素小于堆顶元素，那么堆顶元素一定不是前 k 个最小的元素之一，堆顶元素出堆，新元素入堆；

    public int[] smallestK(int[] arr, int k) {
        int[] ret = new int[k];
        
        if(k == 0) return ret;

        PriorityQueue<Integer> heap = new PriorityQueue<>((a, b) -> b - a);
        for(int i = 0; i < k; i++) heap.offer(arr[i]);

        int n = arr.length;
        for(int i = k; i < n; i++){
            if(arr[i] < heap.peek()){
                heap.poll();
                heap.offer(arr[i]);
            }
        }
        
        for(int i = 0; i < k; i++){
            ret[i] = heap.poll();
        }

        return ret;
    }

找前 k 小的元素，建大根堆；找前 k 大的元素，建小根堆；

堆中元素就是前 k 小或者大的元素，如果是找第 k 小或者第 k 大元素，直接返回堆顶元素即可；

数据结构之优先级队列

系列文章目录

前言