哈希表HashTable-EW帮帮网

此时又会引入一个新的问题，取模时数据如果重复怎么办，比如 1 和 10001 ，都要存储在当前容量为 10 的哈希表中，应该怎么办呢？
此时的办法时，按照插入的顺序，先来者居上，这里 10001 的位置被 1 顶掉了，所以它要朝后去，那键值为 2 的地方被 10001 挤掉了，后来插入的 2 只能继续挤后面的位置：

2.2 节点的定义

从这里就可以看出，每一个位置都要去标记一下是否已经存在值，若没有，对应的元素可以落座；若有，那么该元素只能向后走，这里使用枚举来记录节点状态：

enum State
{
	EXIST,
	EMPTY,
	DELETE
};

毕竟数据结构还是哈希表，每个节点就必要存储一个 KV 模型，下面就是节点的定义：

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	State _state = EMPTY;
};

2.3 HashTable的定义

定义好了单个节点，下面就要来看一下整个表是怎么存储节点的，其实哈希表使用什么数据结构都可以存储它的节点，单个节点的定义一旦明确了，节点之间的连接方式相较之下就自由多了，这里使用数组 vector 进行存储。此外，根据上面的描述，应该可以认识到，哈希表不一定是满的，因为后续扩容的需要，这里还要记录一下哈希表当前的有效数据个数：

template<class K, class V>
class HashTable
{
private:
    vector<HashNode<K, V>> _table;
    size_t _n
}

三、Insert 函数

3.1 插入的逻辑

首先，就是对于数据的精简，上面已经说了，可以使用Key的值模上HashTable的大小：

size_t index = kv.first % _table.size();

其次，就是要确认一下哈希表该节点的状态，若为 EXIT ，那么就需要继续向哈希表的后面查找，直到某一节点为 EMPTY 。
此外，还有一特殊情况，那就是若从哈希表的某点开始向后查找，有可能查找到哈希表的结尾，此时就需要再从哈希表头开始查找，这时应该怎么办呢？那就是让索引模上哈希表的size。

        size_t index = kv.first % _table.size();
		while (_table[index]._state == EXIST)
		{
			index++;
			index %= _table.size();
		}

当这个 while 循环结束后，说明终于找到了空节点，此时可以进行插入，但别忘了有效数据要进行自增：

        size_t index = kv.first % _table.size();
		while (_table[index]._state == EXIST)
		{
			index++;
			index %= _table.size();
		}
		_table[index]._kv = kv;
		_table[index]._state = EXIST;
		_n++;

3.2 扩容

当然，当整个表满了之后，要对表进行扩容。
但是当表还没满时，插入的效率就已经大大缩减，因为表快满了时，每个数的插入基本都要对哈希表进行遍历，这样就发挥不出它应有的优势，使用这里引入载荷因子的概念。

何时进行扩容

哈希表的扩容通常依赖于一个关键参数——载荷因子（Load Factor）。载荷因子是当前存储在哈希表中的元素数量与哈希表总容量的比例，公式为：

当载荷因子超过某个阈值（如 0.75）时，就可能进行扩容。这个阈值是一个平衡空间效率和时间效率的折中选择。载荷因子越高，表中的冲突可能越多，从而影响查找和插入操作的效率；载荷因子过低，则可能导致空间的浪费。

下面使用图来解释一下上面如何进行扩容，这里以载荷因子为 0.7 举例：

如何进行扩容

哈希表扩容通常涉及以下几个步骤：

创建更大的哈希表：通常新的哈希表大小是原来的两倍。
重新计算哈希：因为哈希表的大小改变了，大部分元素的哈希值相对于新表的大小也会改变。因此，需要重新计算每个元素的哈希值，并将它们插入到新的哈希表中。
迁移数据：将原哈希表中的所有元素按照新的哈希值重新插入到新表中。这一步是成本最高的，因为它涉及到遍历旧表中的所有元素并重新插入。
释放旧表空间：一旦旧表中的所有元素都迁移到新表中，旧表的空间可以被释放。

下面使用图来解释一下上面如何进行扩容，这里以载荷因子为 0.7 举例：

通过上图的解释，就可以得到结论：扩容后，所有数据都需要重新建立映射关系，这也是成本最高的一步！这里使用新的HashTable，并直接借用它的Insert。注意：不可以直接在原HashTable中进行扩容和数据的重新插入，这样可能会导致原来的数据被覆盖！

        if (_n * 10 / _table.size() >= 7)
		{
			HashTable<K, V> newHT;
			newHT._table.resize(_table.size() * 2);
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i]._state == EXIST)
					newHT.Insert(_table[i]._kv);
			}
			_table.swap(newHT._table);
		}

3.3 插入函数代码

    bool Insert(const pair<K, V>& kv)
	{
		if (_n * 10 / _table.size() >= 7)
		{
			HashTable<K, V> newHT;
			newHT._table.resize(_table.size() * 2);
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i]._state == EXIST)
					newHT.Insert(_table[i]._kv);
			}
			_table.swap(newHT._table);
		}
		size_t index = kv.first % _table.size();
		while (_table[index]._state == EXIST)
		{
			index++;
			index %= _table.size();
		}
		_table[index]._kv = kv;
		_table[index]._state = EXIST;
		_n++;
		return true;
	}

四、Find 函数

Find 函数也是根据 Key 的值进行的，所以这和 Insert 函数查找空节点时的逻辑一样，可以追踪哈希表各个节点的状态，如果该节点非空，再根据哈希表的键值与要查找键值的比对来确认查找节点的地址。另外，使用节点的地址作为函数的返回值，也是为了后面使用 Erase 函数的方便：

	HashNode<K, V>* Find(const K& Key)
	{
		size_t index = Key % _table.size();
		while (_table[index]._state != EMPTY)
		{
			if (_table[index]._state == EXIST 
            && _table[index]._kv.first == Key)
			{
				return &_table[index];
			}
			index++;
			index %= _table.size();
		}
		return nullptr;
	}

五、Erase 函数

使用 Erase 函数，也是从更改哈希表节点的状态下手，只需要更改节点的状态，不需要实现 Value 的覆盖即可完成节点的删除。

	bool Erase(const K& Key)
	{
		HashNode<K, V>* HD = Find(Key);
		if (HD == nullptr)
			return false;
		else
		{
			HD->_state = DELETE;
			_n--;
			return true;
		}
	}

六、完成代码

#pragma once
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include<vector>
using namespace std;

enum State
{
	EXIST,
	EMPTY,
	DELETE
};

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	State _state = EMPTY;
};

template<class K, class V>
class HashTable
{
public:
	HashTable()
	{
		_table.resize(10);
	}
	bool Insert(const pair<K, V>& kv)
	{
		if (_n * 10 / _table.size() >= 7)
		{
			HashTable<K, V> newHT;
			newHT._table.resize(_table.size() * 2);
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i]._state == EXIST)
					newHT.Insert(_table[i]._kv);
			}
			_table.swap(newHT._table);
		}
		size_t index = kv.first % _table.size();
		while (_table[index]._state == EXIST)
		{
			index++;
			index %= _table.size();
		}
		_table[index]._kv = kv;
		_table[index]._state = EXIST;
		_n++;
		return true;
	}
	HashNode<K, V>* Find(const K& Key)
	{
		size_t index = Key % _table.size();
		while (_table[index]._state != EMPTY)
		{
			if (_table[index]._state == EXIST && _table[index]._kv.first == Key)
			{
				return &_table[index];
			}
			index++;
			index %= _table.size();
		}
		return nullptr;
	}

	bool Erase(const K& Key)
	{
		HashNode<K, V>* HD = Find(Key);
		if (HD == nullptr)
			return false;
		else
		{
			HD->_state = DELETE;
			_n--;
			return true;
		}
	}
private:
	vector<HashNode<K, V>> _table;
	size_t _n;
};
void TestHT1()
{
	int tmp[] = {11, 22, 8, 3, 13, 55};
	HashTable<int, int> HT;
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
	cout << HT.Find(11) << endl;
	cout << HT.Find(22) << endl;

	HT.Erase(11);
	cout << HT.Find(11) << endl;
	cout << HT.Find(22) << endl;
}
void TestHT2()
{
	int tmp[] = { 11, 22, 8, 3, 13, 55 };
	HashTable<int, int> HT;
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
	int tmp2[] = { 0, 2, 4, 6, 8 };
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
}

七、Key值为其他时

上面的代码都存在取模的运算，显而易见，只有整型才能进行取模运算，当 Key 值为浮点数或者负数甚至是字符串时，哈希表就不管用了。

7.1 Key 为非字符串

当 Key 为非字符串时，如果是负数、浮点数，都可以对它们进行强制的显式类型转化，可以借助一个仿函数，对负数和浮点数进行优化处理，至于负数如何强转为正数，请看下图：

仿函数的写法也很简单：

template<class K>
struct HashFunc
{
    size_t operator()(const K& Key)
    {
        return (size_t)Key;
    }
}

此时只需要在HashTable的模板参数中加入该模板即可：

template<class K, class V, class Hash = HashFunc<K>>
class HashTable
{
};

调用时，使用 Way 实例化一个对象，然后直接使用对象(Key)即可：

        Hash hs;
		size_t index = hs(kv.first) % _table.size();

7.2 Key 为字符串

Key 为字符串时，其实有很多种转换方法，下面介绍的是BKDR算法：

以下是BKDR算法的一个简单实现：

#include <iostream>
#include <string>

unsigned long BKDRHash(const std::string& str) 
{
    unsigned long hash = 0;
    size_t seed = 131; // 31 131 1313 13131等，常见质数基数
    for (char c : str) 
    {
        hash = hash * seed + c;
    }
    return hash;
}

int main() 
{
    std::string input = "hello";
    std::unsigned long hashValue = BKDRHash(input);
    std::cout << "The BKDR hash of \"" << input << "\" is " << hashValue << std::endl;
    return 0;
}

因为模板只有Key为String和非String这两种情况，完全可以进行模板的特化：

template<>
struct HashFunc<string>
{
	size_t operator()(string& Key)
	{
		size_t hash = 0;
		for (auto ch : Key)
		{
			hash *= 131;
			hash += ch;
		}
		return hash;
	}
};

7.3 总结

1. 通用模板

对于一般类型 K 的实例：

template<class K>
struct HashFunc
{
    size_t operator()(const K& Key)
    {
        return (size_t)Key;
    }
};

这个模板尝试将任何类型 K 的对象直接转换为 size_t 类型。这种方式假定类型 K 可以安全地转换为 size_t，通常适用于基本数据类型如 int、char 等。

2. 特化模板对 `std::string`

对 std::string 类型进行了特化：

template<>
struct HashFunc<string>
{
    size_t operator()(string& Key)
    {
        size_t hash = 0;
        for (auto ch : Key)
        {
            hash *= 131;
            hash += ch;
        }
        return hash;
    }
};

这个特化版本使用了一个简单的哈希算法，通过遍历字符串中的每个字符，将哈希值乘以一个基数（这里使用了131，这是一个常见的质数基数用于哈希函数中），然后加上字符的ASCII值。

通用模板的调用

这个模板旨在处理那些可以直接转换为 size_t 的数据类型，比如基本的数值类型（int, double, char 等）。当你使用这种类型的数据时，模板会尝试将其转换成一个 size_t 值。

示例代码：

HashFunc<int> hashFunc;
int num = 42;
size_t hashValue = hashFunc(num);  // 直接调用通用模板

特化模板的调用

针对 std::string 类型的特化版本设计是为了处理那些不能直接转换为 size_t 的复杂数据类型，如字符串。这个特化版本使用了一个自定义的哈希函数来计算字符串的哈希值。

示例代码：

HashFunc<std::string> stringHashFunc;
std::string text = "hello";
size_t stringHash = stringHashFunc(text);  // 调用针对 std::string 的特化版本

八、完整代码

可以直接使用Gitee查看只包含整型的代码和兼容型的代码：

登录 - Gitee.comhttps://gitee.com/bright-and-sparkling-at-night/studying/commit/3325b294a780b29de7ccf7bf24a541895ba5d91a以下是代码和测试用例：

#pragma once
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include<vector>
#include<string>
using namespace std;

enum State
{
	EXIST,
	EMPTY,
	DELETE
};

template<class K>
struct HashFunc
{
	size_t operator()(const K& Key)
	{
		return (size_t)Key;
	}
};
template<>
struct HashFunc<string>
{
	size_t operator()(string& Key)
	{
		size_t hash = 0;
		for (auto ch : Key)
		{
			hash *= 131;
			hash += ch;
		}
		return hash;
	}
};

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	State _state = EMPTY;
};

template<class K, class V, class Hash = HashFunc<K>>
class HashTable
{
public:
	HashTable()
	{
		_table.resize(10);
	}
	bool Insert(const pair<K, V>& kv)
	{
		if (_n * 10 / _table.size() >= 7)
		{
			HashTable<K, V> newHT;
			newHT._table.resize(_table.size() * 2);
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i]._state == EXIST)
					newHT.Insert(_table[i]._kv);
			}
			_table.swap(newHT._table);
		}
		Hash hs;
		size_t index = hs(kv.first) % _table.size();
		while (_table[index]._state == EXIST)
		{
			index++;
			index %= _table.size();
		}
		_table[index]._kv = kv;
		_table[index]._state = EXIST;
		_n++;
		return true;
	}
	HashNode<K, V>* Find(const K& Key)
	{
		Hash hs;
		size_t index = hs(Key) % _table.size();
		while (_table[index]._state != EMPTY)
		{
			if (_table[index]._state == EXIST && _table[index]._kv.first == Key)
			{
				return &_table[index];
			}
			index++;
			index %= _table.size();
		}
		return nullptr;
	}

	bool Erase(const K& Key)
	{
		HashNode<K, V>* HD = Find(Key);
		if (HD == nullptr)
			return false;
		else
		{
			HD->_state = DELETE;
			_n--;
			return true;
		}
	}
private:
	vector<HashNode<K, V>> _table;
	size_t _n;
};
void TestHT1()
{
	int tmp[] = {11.2, -22, 8, 3, 13, 55};
	HashTable<int, int> HT;
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
	cout << "11.2:" << HT.Find(11.2) << endl;
	cout << "-22:" << HT.Find(-22) << endl;

	HT.Erase(11.2);
	cout << "11.2:" << HT.Find(11.2) << endl;
	cout << "-22:" << HT.Find(-22) << endl;
}
void TestHT2()
{
	int tmp[] = { 11, 22, 8, 3, 13, 55 };
	HashTable<int, int> HT;
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
	int tmp2[] = { 0, 2, 4, 6, 8 };
	for (auto e : tmp)
	{
		HT.Insert(make_pair(e, e));
	}
}

哈希表HashTable

一、引言

二、哈希表的构成

2.1 数据分散时的处理