百度登录与贴吧签到

一直想学Python做辅助技能,结果发现不知道怎么学,看廖大神的教程总感觉是概念性的东西,看的时候懂,看完就忘记……于是又想起折腾百度登录了,刚好难度又不会太高,又可以让自己熟悉一下Python语法什么的……

过程

基本分析

登录页面:https://wappass.baidu.com/passport/?login

  • 涉及两个js,分别为:

    https://wappass.baidu.com/static/touch/js/base_650ea03.js
    https://wappass.baidu.com/static/touch/js/login_54355d0.js

  • 输入用户名密码,点击登录,发送出POST请求:

    POST https://wappass.baidu.com/wp/api/login

  • form参数列表(已去掉一些空或无用的参数):

    username=codemoon
    password=552cf20136a71e9e24b77972373e1369d39488b6790d91241c969655559a9c1a7dc40e1c44c2ec6854098b0b111704e56b9400433baf604dfe1ccb9ef817c12c6ce04cf3e8f893a34ccda24b37a9ba58a8e459f3200df87d3905cff52efbba9fc1bef777f2ad3f0bd40f3b0b0211a99c00e67b24d6ff3c96da2beab602105b56
    verifycode=
    vcodestr=
    servertime=b4ad7c520c
    gid=D600999-A410-4847-8481-B5E406FA5F8C
    logLoginType=wap_loginTouch
  • verifycodevcodestr涉及验证码,后面再讨论。

    要注意的是,请求还要随便带一个Cookie,不然服务器返回:”开启cookie之后才能登录“.

  • 登录成功则返回响应内容:

    {
    "errInfo": {
    "no": "0",
    "msg": ""
    },
    "data": {
    "u": "https:\/\/wap.baidu.com?uid=1494376458795_520&ssid=ff48ab865551d1709191b7f779813736.3.1494376473.1.AjJChlKzM9fk",
    "serverTime": "",
    "codeString": "",
    "bduss": "ARTcWhYQ2JOSVhNYkJFZVZprX5UVk93TjlBNVlTbWttVDJVbm9-Mi1BZ1o3VGxaSVFBqUFBJCQAAAAAAAAAAAEAAAA9nC4BY29kZW1vb24AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABlgElkZYBJZZ",
    "ptoken": "ed799d5b919aa00ef4f13b65ffcd748b",
    "bcsn": "",
    "bcsync": "",
    "bcchecksum": "",
    "bctime": "",
    "gotoUrl": "",
    "userid": "",
    "phone": "",
    "appealurl": "",
    "second_u": "",
    "ppU": ""
    }
    }
  • 顺便还有Cookie:

    Set-Cookie: BAIDU_WISE_UID=wpass_1494398827734_660; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com
    Set-Cookie: HISTORY=0e67dd3c395772a48505fdb800377569; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com
    Set-Cookie: UBI=fi_PncwhpxZ%7ETaJcxSJpjJBawYlkm7tJf8Kg4WcvaT2LOjE4zbLUvOSLJL6mmFaGKW3hwApmXkpe4eVpsc2UKcVoRG5ZYwNIKWcZE19irIW2sXrg%7EYL4WrW2zBaPbXGVf8frk-s94igMg7hw31rCbrqAuA_; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com; httponly
    Set-Cookie: PASSID=FRLg; expires=Tue, 10-May-2016 06:47:07 GMT; path=/; domain=wappass.baidu.com; httponly
    Set-Cookie: WAP20_1494398827=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
    Set-Cookie: PTOKEN=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.baidu.com
    Set-Cookie: BDUSS=ARTcWhYQ2JOSVhNYkJFZVZprX5UVk93TjlBNVlTbWttVDJVbm9-Mi1BZ1o3VGxaSVFBqUFBJCQAAAAAAAAAAAEAAAA9nC4BY29kZW1vb24AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABlgElkZYBJZZ; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=.baidu.com; httponly
    Set-Cookie: SAVEUSERID=1eb65baf66ac574da855c134; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com; httponly
    Set-Cookie: PTOKEN=ed799d5b919aa00ef4f13b65ffcd748b; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com; secure; httponly
    Set-Cookie: STOKEN=3172d1b0b8a3c82e860f33df1b2926bf494084922684b11dd6d7b50120502b81; expires=Sun, 27-Jul-2025 06:47:07 GMT; path=/; domain=wappass.baidu.com; secure; httponly
    Set-Cookie: BAIDUID=EC02381AF1CCE974F6E3E422B4EE2A78:FG=1; expires=Thu, 10-May-18 06:47:07 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1

有了cookie,后面的各种发帖,签到什么的行为,就好办了。下面先介绍下form的各种参数怎么取得。

获取servertime参数

获得time的值:fc5f705266, 把密码连接起来得到密码变形字符串:123456fc5f705266

  • 具体代码:

    def getServerTime():
    url = "https://wappass.baidu.com/wp/api/security/antireplaytoken?v=" + getTimestamp()
    res = requests.get(url)
    try:
    return res.json()["time"]
    except:
    return ""

计算gid参数

gid的格式是类似xxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx这样的,相关的js代码来自登录页的base_xxxxxx.js

e.guideRandom = function() {
return "xxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx".replace(/[xy]/g, function(t) {
var e = 16 * Math.random() | 0, //0~15随机数
i = ("x" == t) ? e : 3 & e | 8; //如果是x,用随机数替换,如果是y,用(3 & 随机数 | 8)替换
return i.toString(16) //转成16进制字符
}).toUpperCase() //转成大写
}()

翻译成Python代码:

def getGID():
def transform(char):
if char == "4" or char == "-": return char
number = random.randint(0, 15)
if char != "x": number = 3 & number | 8
return format(number, "x").upper()
return "".join([transform(c) for c in "xxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx"])

计算password参数

login_xxxxx.js里有这么一段:

var u = new RSAKeyPair("10001", "", a.rsa);
c.password = encryptedString(u, c.password)

其中encryptedString的第二个参数就是上述密码变形字符串。
a.rsa为固定字符串:

B3C61EBBA4659C4CE3639287EE871F1F48F7930EA977991C7AFE3CC442FEA49643212E7D570C853F368065CC57A2014666DA8AE7D493FD47D171C0D894EEE3ED7F99F6798B7FFD7B5873227038AD23E3197631A8CB642213B9F27D4901AB0D92BFA27542AE890855396ED92775255C977F5C302F1E7ED4B1E369C12CB6B1822F

我们需要拿到encryptedString的返回值,然后作为登录请求的密码字段参数。

打开login_xxxxx.js看看,1000多行代码……
RSAKeyPair涉及很多函数,然后我JS也渣,看着头大,更别说全部翻译成Python代码了……于是找前端小哥咨询下,看看python有没有办法直接执行JS代码

在这里找到一个答案。 看完后决定用PyExecJS这个库来帮忙执行JS代码,当然了,直接用PyExecJS执行整个login_xxxxx.js是不行的,因为没有window啊,所以要去除window的相关代码。

如何从login_xxxxx.js精简出核心算法代码,在咨询了前端小哥各种零碎问题之后,我分离出来了,大概310+行,精简后的代码在文章最后面。

然后再自己添加一个函数进去,输入原密码和servertime,返回加密后的密码:

function encryptPass(pass, serverTime) {
var password = SBCtoDBC(pass) + serverTime; # 密码变形
setMaxDigits(131);
var u = new RSAKeyPair("10001", "", "B3C61EBBA4659C4CE3639287EE871F1F48F7930EA977991C7AFE3CC442FEA49643212E7D570C853F368065CC57A2014666DA8AE7D493FD47D171C0D894EEE3ED7F99F6798B7FFD7B5873227038AD23E3197631A8CB642213B9F27D4901AB0D92BFA27542AE890855396ED92775255C977F5C302F1E7ED4B1E369C12CB6B1822F");
password = encryptedString(u, password);
//console.log(password);
return password;
}

在Python直接调用encryptPass即可得到加密后的密码:

ctx = execjs.compile(jssource)
result = ctx.call("encryptPass", password, serverTime)

登录

参数都知道之后,直接POST就好了,见代码:

servertime = getServerTime()
gid = getGID()
password = encryptPassword(password, servertime)

postData = {
"username" : username,
"password" : password,
"servertime" : servertime,
"gid" : gid,
"verifycode" : "",
"vcodestr" : "",
"logLoginType" : "wap_loginTouch",
}

anyCookie = {"aaaa": "2A64BAAC0FF64743A2CE8A60A71075E7"} # 要随便带一个Cookie,不然服务器返回:”开启cookie之后才能登录“
url = "https://wappass.baidu.com/wp/api/login"
res = requests.post(url, data=params, cookies=anyCookie)
pprint.pprint(res.json())
errCode, errMsg, codeString = res.json()["errInfo"]["no"], res.json()["errInfo"]["msg"], res.json()["data"]["codeString"]
cookie = res.cookies

是的,就这么简单,cookie就包含了BAIDUSS之类的东西了。

验证码

登录密码错误次数过多的时候(几次就会触发),服务器响应json的错误号码为500001,错误信息为请您输入验证码

{
'data': {'appealurl': '',
'bcchecksum': '',
'bcsn': '',
'bcsync': '',
'bctime': '',
'bduss': '',
'codeString': 'jxGe807e28dbd5cc1d1020e14ba9801587b7a8e4406150231e3',
'gotoUrl': '',
'phone': '',
'ppU': '',
'ptoken': '',
'second_u': '',
'serverTime': '1494506062',
'u': '',
'userid': ''},
'errInfo': {'msg': '请您输入验证码', 'no': '500001'}
}

留意json里有个codeString,它是验证码获取标识,拼接在这个url后面即可:"https://wappass.baidu.com/cgi-bin/genimage?jxGe807e28dbd5cc1d1020e14ba9801587b7a8e4406150231e3
你可以用浏览器打开看看,是一张PNG图片,把验证码人工识别出来之后,再填两个参数到登录请求里就可以了:

postData = {
"username" : username,
"password" : password,
"servertime" : servertime,
"gid" : gid,
"verifycode" : "abcd",
"vcodestr" : "jxGe807e28dbd5cc1d1020e14ba9801587b7a8e4406150231e3",
"logLoginType" : "wap_loginTouch",
}

verifycode就是验证码,vcodestr就是标识,这时候再POST一次就登录成功了。

部署

嘿嘿,脚本写好了,当然是上传到自己的vps,然后crontab让它每天凌晨0点跑几次啦!

刚好朋友开了个亚马逊的AWS,免费一年的,我就scp传上去跑一跑试试咯,结果要安装python3环境,安装各种python第三方,详细的就不说了,网上大把教程。

还有就是由于服务器没有图形界面,所以tkinter也用不上了,删掉所有相关的代码,做了个服务器专用的版本。

  • python3要安装的第三方:
sudo /usr/local/python3/bin/pip3 install requests
sudo /usr/local/python3/bin/pip3 install PyExecJS
sudo /usr/local/python3/bin/pip3 install beautifulsoup4
  • execJS需要有JavaScript引擎,如果服务器没有,可以自行安装NodeJS:
wget https://nodejs.org/dist/v6.10.3/node-v6.10.3.tar.gz
tar -zvxf node-v6.10.3.tar.gz
cd node-v6.10.3
./configure
make
sudo make install
  • 配置crontab, crontab -e, 我服务器时区没调,所以我写了16:
0,1 16 * * * /usr/local/python3/bin/python3 /home/ec2-user/tieba/tieba_signin_server.py
59 15 * * * /usr/local/python3/bin/python3 /home/ec2-user/tieba/tieba_signin_server.py

然后手动试跑了几次,发现在异地登录会导致需要安全验证,手机短信验证什么的,不(懒)好(癌)处(末)理(期),建议还是本地登录拿到cookies然后scp给服务器脚本目录,服务器只做签到处理好了。

总结

第一次写这么长的Python(有一半是JS…),磕磕碰碰,语法什么的都是临时Google看的,不懂的就问问前端小哥……感觉一下子就回到了刚学编程的时候。

感觉用Python写些小功能的东西,代码量还是比较少的,而且各种import的库都很好用,但是语法方面就感觉没Swift舒服呀(脑残粉……),可能是刚入门,还没体会到Python真正的内涵,嗯,努力学习!

代码在github这里:https://github.com/darkhandz/BaiduLoginWithTiebaSignin