【🏠作者主页】:吴秋霖
【💼作者介绍】:擅长爬虫与JS加密逆向分析!Python领域优质创作者、CSDN博客专家、阿里云博客专家、华为云享专家。一路走来长期坚守并致力于Python与爬虫领域研究与开发工作!
【🌟作者推荐】:对爬虫领域以及JS逆向分析感兴趣的朋友可以关注《爬虫JS逆向实战》《深耕爬虫领域》
未来作者会持续更新所用到、学到、看到的技术知识!包括但不限于:各类验证码突防、爬虫APP与JS逆向分析、RPA自动化、分布式爬虫、Python领域等相关文章
作者声明:文章仅供学习交流与参考!严禁用于任何商业与非法用途!否则由此产生的一切后果均与作者无关!如有侵权,请联系作者本人进行删除!
1. 写在前面
这个站搜索请求必须携带一个Token
,生成的话是在它自己sec
接口请求生成的(无感验证生成
),请求生成Token
的参数中有验签需要处理,源码套了混淆。有概率会出现二次验证(极验
),总得来说比较简单,之前一个小伙伴找到咨询补环境的时候出现异常时因为反调试的问题,整个只需要处理一下格式化检测跟那个内存溢出无限循行的问题就可以
分析网站:
aHR0cHM6Ly9tLmFwcC5taS5jb20v
2. 接口分析
这里随便搜索一个关键词,可以看到提交的请求参数中有一个Token
,这个参数的值在上一个请求触发并在接口响应数据中返回,这个有效性仅一次,不能够固定。如下所示:
{"msg":"非正常请求","code":403001,"data":null,"logId":"MO-29s4w-elibom-3c-noitcudorp-noitargetni-bew-erotsppa_0825121058059_33aa"}
生成Token
参数的接口请求参数有两个动态参数(s、d
)需要处理,根据堆栈进入到m.js
混淆过的JS文件,找到发包的位置跟一下可以看到最终参数生成的位置,如下所示:
3. 补环境分析
混淆的JS代码中实现了一些普遍的反调试
手段,包括不限于环境检测,Function.prototype.toString
检测以及一些自动化工具的检测。混淆源代码的控制流扁平化跟字符串的加密(所有的字符都放在_0x3fb6数组中
)运行时动态去还原,如下所示:
// Function.prototype.toString检测
var _0x4ef304 = function() {
var _0x5ca3e4 = new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');
return !_0x5ca3e4['\x74\x65\x73\x74'](_0x20e69d['\x74\x6f\x53\x74\x72\x69\x6e\x67']());
};
// 浏览器指纹检测
function _0x836b91() {
/Android ((\d).\d+)/['test'](navigator['userAgent']);
return parseInt(RegExp['$2']) < 6;
}
// WebGL检测
var _0x5c8ed2 = document.createElement('canvas');
var _0x510957 = _0x5c8ed2.getContext('webgl') || _0x5c8ed2.getContext('experimental-webgl');
// 无限递归(导致崩溃)
function _0x3fa0e2(_0x16d6fa) {
if (_0x16d6fa['indexOf']('\x69' === -1)) {
_0x3f7dc2(_0x16d6fa);
}
}
// 内存占用
var _0x2e4c9a = [];
for (var i = 0; i < 1000000; i++) {
_0x2e4c9a.push(Math.random());
}
可以看到上图中_0x27edce
就是入口的加密函数了,两个参数一个是env
的结构化数据,还有一个固定的字符串search
传不传都可以,如下图所示:
如果是选择补环境的方案,不想去分析整个JS的混淆加密逻辑,只需要把m.js
整个源码拿出来即可,补环境这里作者使用jsdom
快速实现的(大家也可以自己手补或者用其他的框架都行
),环境头如下所示:
const { JSDOM } = require("jsdom");
const baseUrl = "https://m.app.mi.com/";
const dom = new JSDOM("", {
url: baseUrl,
referrer: baseUrl,
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
runScripts: "dangerously"
});
window = dom.window;
document = window.document;
window.HTMLCanvasElement.prototype.getContext = function() {
return {
fillRect: function() {},
clearRect: function() {},
getImageData: function(x, y, w, h) {
return {
data: new Uint8ClampedArray(w * h * 4)
};
},
putImageData: function() {},
createImageData: function() {
return [];
},
setTransform: function() {},
drawImage: function() {},
save: function() {},
fillText: function() {},
restore: function() {},
beginPath: function() {},
moveTo: function() {},
lineTo: function() {},
closePath: function() {},
stroke: function() {},
translate: function() {},
scale: function() {},
rotate: function() {},
arc: function() {},
fill: function() {},
measureText: function() {
return { width: 0 };
},
transform: function() {},
rect: function() {},
clip: function() {},
};
};
window.HTMLCanvasElement.prototype.toDataURL = function() {
return "";
};
navigator = {
appVersion:"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
platform:'MacIntel',
appCodeName:'Mozilla',
appName:'Netscape',
language:'en-US',
product:'Gecko',
vendorSub:'',
vendor:'Google Inc.',
userAgent:'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36'
}
然后使用window
导出_0x27edce
到全局使用即可,有两个地方的小细节需要处理一下。就是上面检测点里面的一个无限递归导致内存满溢出的问题
还有一个就是格式化检测
,处理一下注释或者修改一下就可以,如下所示:
4. 纯算法还原
function _0x27edce(_0xd7d75d, _0x264211) {
var _0x4874ab = function(_0xd7d75d) {
for (var _0x264211 = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '-', '=', '_', '+', '~', '`', '{', '}', '[', ']', '|', ':', '<', '>', '?', '/', '.'], _0x4874ab = [], _0xb6b2f = 0x0; _0xb6b2f < _0xd7d75d; _0xb6b2f += 0x1)
_0x4874ab[_0x5ebc('0x17')](_0x264211[parseInt(0x59 * Math['random'](), 0xa)]);
return _0x4874ab[_0x5ebc('0x19')]('');
}(0x10)
, _0xb6b2f = _0x25aa39[_0x5ebc('0x261')][_0x5ebc('0x262')][_0x5ebc('0x263')](_0x5ebc('0x264'))
, _0xd88633 = _0x25aa39[_0x5ebc('0x265')]['pkcs7'][_0x5ebc('0x266')](_0x25aa39[_0x5ebc('0x261')][_0x5ebc('0x262')][_0x5ebc('0x263')](JSON['stringify'](_0xd7d75d)))
, _0xd88633 = new _0x25aa39[(_0x5ebc('0x267'))][(_0x5ebc('0x12f'))](_0x25aa39['utils']['utf8'][_0x5ebc('0x263')](_0x4874ab),_0xb6b2f)[_0x5ebc('0x12b')](_0xd88633)
, _0xd88633 = _0x5aeeb2['encode'](_0x3969ee[_0x5ebc('0x268')](_0x25aa39['utils'][_0x5ebc('0x11d')][_0x5ebc('0x269')](_0xd88633)))
, _0x4874ab = _0x3250e2[_0x5ebc('0x12b')](_0x5aeeb2[_0x5ebc('0x117')](_0x4874ab), _0x3250e2[_0x5ebc('0x26a')](_0x5ebc('0x26b')))
, _0xd7d75d = _0x5aeeb2[_0x5ebc('0x117')](JSON[_0x5ebc('0x26c')](_0xd7d75d))
, _0x264211 = (_0x264211 = _0x264211 + _0xd7d75d,
_0x379e77[_0x5ebc('0x143')](_0x264211));
return Object(_0x1c50fb['i'])() ? {
's': _0x264211,
'd': _0xd7d75d
} : {
's': _0x4874ab,
'd': _0xd88633
};
}
这里从上面这段核心的混淆
代码开始进行分析,还原纯算加密的流程,_0xd7d75d
的原始对象是env
的一个大串,_0x264211
是一个可选参数,_0x4874ab
这里从随机字符表中获取到了一个16
位的随机字符(AES的密钥
),_0x264211
是密钥的字符集,实现如下:
import random
def generate_aes_key():
charset = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-=_+~`{}[]|:<>?/.")
return "".join(random.choice(charset) for _ in range(16)).encode("utf-8")
这里我们得到了AES
的密钥,也拿到env
的结构化数据,往下继续看d
参数对应的_0xd88633
怎么来的,混淆JS中特征也很明显,其中有pkcs7
,跳转到如下代码处:
var _0x25aa39 = {
'AES': _0x4cca13,
'ModeOfOperation': {
'cbc': _0x35f959
},
'utils': {
'hex': _0x264211,
'utf8': _0x51dd15
},
'padding': {
'pkcs7': {
'pad': function(_0xd7d75d) {
var _0x264211 = 0x10 - (_0xd7d75d = _0x300d04(_0xd7d75d, !0x0))['length'] % 0x10
, _0x4874ab = _0x2d1977(_0xd7d75d[_0x5ebc('0x51')] + _0x264211);
_0x294129(_0xd7d75d, _0x4874ab);
for (var _0xb6b2f = _0xd7d75d[_0x5ebc('0x51')]; _0xb6b2f < _0x4874ab[_0x5ebc('0x51')]; _0xb6b2f++)
_0x4874ab[_0xb6b2f] = _0x264211;
return _0x4874ab;
}
}
}
}
_0x5ebc('0x264')
这个是AES加密的IV
,_0x5ebc('0x12f')
是AES加密使用的模式,然后_0x5aeeb2
调用的如下:
var _0x5aeeb2 = {
'base64': _0x5ebc('0x119'),
'encode': function(_0xd7d75d) {
if (!_0xd7d75d)
return !0x1;
for (var _0x264211, _0x4874ab, _0xb6b2f, _0xd88633, _0x49094b, _0x3aca4c, _0x253b33 = '', _0x2f77ed = 0x0; _0xb6b2f = (_0x3aca4c = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x2,
_0xd88633 = (0x3 & _0x3aca4c) << 0x4 | (_0x264211 = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x4,
_0x49094b = (0xf & _0x264211) << 0x2 | (_0x4874ab = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x6,
_0x3aca4c = 0x3f & _0x4874ab,
isNaN(_0x264211) ? _0x49094b = _0x3aca4c = 0x40 : isNaN(_0x4874ab) && (_0x3aca4c = 0x40),
_0x253b33 += this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0xb6b2f) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0xd88633) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0x49094b) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0x3aca4c),
_0x2f77ed < _0xd7d75d[_0x5ebc('0x51')]; )
;
return _0x253b33;
},
'decode': function(_0xd7d75d) {
if (!_0xd7d75d)
return !0x1;
_0xd7d75d = _0xd7d75d[_0x5ebc('0x43')](/[^A-Za-z0-9\+\/\=]/g, '');
for (var _0x264211, _0x4874ab, _0xb6b2f, _0xd88633, _0x49094b = '', _0x3aca4c = 0x0; _0x264211 = this[_0x5ebc('0x11a')][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),
_0x4874ab = this['base64'][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),
_0xb6b2f = this[_0x5ebc('0x11a')][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),
_0xd88633 = this['base64'][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),
_0x49094b += String[_0x5ebc('0x11b')](_0x264211 << 0x2 | _0x4874ab >> 0x4),
0x40 != _0xb6b2f && (_0x49094b += String[_0x5ebc('0x11b')]((0xf & _0x4874ab) << 0x4 | _0xb6b2f >> 0x2)),
0x40 != _0xd88633 && (_0x49094b += String['fromCharCode']((0x3 & _0xb6b2f) << 0x6 | _0xd88633)),
_0x3aca4c < _0xd7d75d[_0x5ebc('0x51')]; )
;
return _0x49094b;
}
}
现在上面的分析,可以知道参数d
的实现先是对env_data
数据进行了一个JSON序列化,如下所示:
然后生成AES
的密钥,根据调试信息中获取到的CBC、IV
等信息对参数d
加密并编码,还原算法实现如下所示:
import json
import base64
from Crypto.Util.Padding import pad
from Crypto.Cipher import AES, PKCS1_v1_5
def aes_cbc_encrypt_fixed_iv(key: bytes, data: bytes) -> bytes:
iv = b"0102030405060708"
cipher = AES.new(key, AES.MODE_CBC, iv)
return cipher.encrypt(pad(data, AES.block_size))
def sign(env_data: dict) -> dict:
json_data = json.dumps(env_data, separators=(',', ':'), ensure_ascii=False).encode('utf-8')
# 随机16位密钥
aes_key = generate_aes_key()
encrypted_data = aes_cbc_encrypt_fixed_iv(aes_key, json_data)
aes_key_b64 = base64.b64encode(aes_key).decode()
d = base64.b64encode(encrypted_data).decode()
return d
接下来看参数s
是如何加密生成的,_0x4874ab
这个地方获取了一个getPublicKey
,然后公钥在_0x5ebc('0x26b')
进行了一个RSA
的加密,在最初的大数组中也能看到相关的特征,如下所示:
_0x3250e2 = {
'getPublicKey': function(_0xd7d75d) {
return !(_0xd7d75d[_0x5ebc('0x51')] < 0x32) && (_0x5ebc('0x11e') == _0xd7d75d['substr'](0x0, 0x1a) && ('-----END\x20PUBLIC\x20KEY-----' == (_0xd7d75d = _0xd7d75d[_0x5ebc('0x115')](0x1a))[_0x5ebc('0x115')](_0xd7d75d[_0x5ebc('0x51')] - 0x18) && (_0xd7d75d = _0xd7d75d['substr'](0x0, _0xd7d75d['length'] - 0x18),
!(_0xd7d75d = new _0x56ab29(_0x5aeeb2['decode'](_0xd7d75d)))[_0x5ebc('0x42')] && (_0x5ebc('0x11f') === (_0xd7d75d = _0xd7d75d[_0x5ebc('0x11')])[0x0][0x0][0x0] && new _0x8f65e0(_0xd7d75d[0x0][0x1][0x0][0x0],_0xd7d75d[0x0][0x1][0x0][0x1])))));
},
'encrypt': function(_0xd7d75d, _0x264211) {
if (!_0x264211)
return !0x1;
var _0x4874ab = _0x264211[_0x5ebc('0x116')][_0x5ebc('0x10f')]() + 0x7 >> 0x3;
if (!(_0xd7d75d = this[_0x5ebc('0x120')](_0xd7d75d, _0x4874ab)))
return !0x1;
if (!(_0xd7d75d = _0xd7d75d[_0x5ebc('0x121')](_0x264211[_0x5ebc('0x118')], _0x264211[_0x5ebc('0x116')])))
return !0x1;
for (_0xd7d75d = _0xd7d75d[_0x5ebc('0x54')](0x10); _0xd7d75d[_0x5ebc('0x51')] < 0x2 * _0x4874ab; )
_0xd7d75d = '0'[_0x5ebc('0x18')](_0xd7d75d);
return _0x5aeeb2[_0x5ebc('0x117')](_0x3969ee['decode'](_0xd7d75d));
},
'pkcs1pad2': function(_0xd7d75d, _0x264211) {
if (_0x264211 < _0xd7d75d[_0x5ebc('0x51')] + 0xb)
return null;
for (var _0x4874ab = [], _0xb6b2f = _0xd7d75d[_0x5ebc('0x51')] - 0x1; 0x0 <= _0xb6b2f && 0x0 < _0x264211; )
_0x4874ab[--_0x264211] = _0xd7d75d[_0x5ebc('0xec')](_0xb6b2f--);
for (_0x4874ab[--_0x264211] = 0x0; 0x2 < _0x264211; )
_0x4874ab[--_0x264211] = Math['floor'](0xfe * Math[_0x5ebc('0x3d')]()) + 0x1;
return _0x4874ab[--_0x264211] = 0x2,
_0x4874ab[--_0x264211] = 0x0,
new _0x20635b(_0x4874ab);
}
综上分析发现它这个参数s
是对AES
的密钥进行了一层RSA
后再编码得到的,所以服务端那边的校验则是先对s
参数的值进行一个B64
的解码,然后使用RSA私钥
进行解密得到AES
的密钥,服务端再拿着这个16
字节的密钥去解参数d
加密后的业务数据,以此验证本次请求的合法性,至此纯算的加密流程图及算法实现如下:
from Crypto.PublicKey import RSA
def rsa_encrypt_pkcs1_v1_5(data: bytes, public_key_pem: str) -> bytes:
rsa_key = RSA.import_key(public_key_pem)
cipher = PKCS1_v1_5.new(rsa_key)
return cipher.encrypt(data)
aes_key = generate_aes_key()
aes_key_b64 = base64.b64encode(aes_key).decode()
s = rsa_encrypt_pkcs1_v1_5(aes_key_b64.encode())
不管是补环境还是纯算,有一处小细节需要注意一下。在构建环境数据env_data
的时候,涉及到时间戳的地方都需要动态生成传递,然后如果加密参数不对的话是通过不了接口验签的,会出现如下所示的情况:
{'msg': '参数错误', 'code': 400, 'data': {'message': 'invalid data', 'status': 463}}
文中之前开头也提到了会有概率触发一个极验
的二次行为验证滑块,这个感兴趣的也可以去分析一下,触发率极低,调试的时候就出现过一次,如下所示: