0%

Python之selenium模块

发表于 2022-01-18 更新于 2022-01-22 分类于 python

selenium

最初是一个自动化测试工具,通过模拟用户来驱动浏览器的一些操作，比如:访问页面，点击按钮，下载，下拉，输入等操作。并且其支持市面上主流的各种浏览器。

模块安装

pip install selenium

使用模块前的准备

首先应查看自己的浏览器版本，这里我使用的是谷歌浏览器，查看路径为：设置->关于谷歌，我的版本为：版本 96.0.4664.45（正式版本）（64 位）
到网上去下载自己相对应版本的浏览器驱动，下载下来解压后，将文件放到自己的python项目中，后续会调用
这里附上谷歌浏览器驱动下载地址(其他种类浏览器自行百度找到相关驱动下载即可)：

http://chromedriver.storage.googleapis.com/index.html
各位选择自己版本下载即可

使用案列

# selenium模块
from selenium import webdriver
import time
from lxml import etree

# 实列化一个浏览器对象，选择自己的浏览器，这里我是Chrome,并正确输入浏览器的驱动路径
# driver = webdriver.Firefox("驱动路径")
obj_bro = webdriver.Chrome(executable_path='./chromedriver_win32/chromedriver.exe')

# 让浏览器发起一个url请求
obj_bro.get("https://www.tommonkey.cn/")
page_content = obj_bro.page_source  # 使用page_source方法来获取当前页面的源码数据

# 这里我使用xpath来定位元素，除了xpath还有其他的定位方式：class，id，name，tag，css等。
# click():点击定位的按钮对应的url
obj_bro.find_element_by_xpath("/html/body/main/header/div[1]/nav/ul/li[6]/a").click()
time.sleep(2)

# 查看当前操作浏览的网页url
print("当前正在浏览的网页地址为："+obj_bro.current_url+"\n该网站的title为："+obj_bro.title)

# 后退
obj_bro.back()
time.sleep(2)
# 前进
obj_bro.forward()
time.sleep(2)

# 使用link_text()定位到搜索框，然后输入，进行交互
obj_bro.find_element_by_link_text("搜索").click()
time.sleep(1)

# 弹出来搜索框后，重新定位输入的位置，输入参数
path = obj_bro.find_element_by_xpath("/html/body/main/header/div[1]/div[2]/div/div[1]/div/input")
path.send_keys("python\n")
time.sleep(1)

# 模拟输入完内容后的回车操作,但我这里报错了，然后我想到了可以用：\n来代替回车。我可正他娘是个人才。
#path.submit()
time.sleep(2)

# 清楚搜索框内容
path.clear()

# 关闭浏览器
obj_bro.quit()

其他常用操作

刷新：driver.refresh()
控制浏览器窗口大小：driver.set_window_size(200, 400)
双击：double_click()
右击：context_click()
拖动：drag_and_drop()
鼠标悬停：move_to_element()

sulenium
这个模块的功能非常多，人类能操作他基本都可以实现，甚至人类实现起来困难的他也可以。更多的用法，还是百度自行获取吧！

还没结束！我追加了一些内容！！！

自己空闲时间做了一个练习，关于使用selenium自动登录12306的小demo。

# 12306爬取相关信息
# author: tommonkey
# data: 2022.1.18

# 通过selenium来实现自动化登录

from selenium import webdriver
import time
from selenium.webdriver import ChromeOptions  # 规避检测
from selenium.webdriver import ActionChains   # 导入动作链

# 实现规避检测
option = ChromeOptions()
option.add_experimental_option("excludeSwitches",['enable-automation'])  # 设置开发者模式启动，该模式下webdriver属性为正常值
option.add_experimental_option('useAutomationExtension', False)  # 隐藏 ‘正在受到自动软件的控制’ 这几个字
option.add_argument("--disable-blink-features")
option.add_argument("--disable-blink-features=AutomationControlled")

# 关闭密码弹框是否保存
prefs = {"":""}
prefs["credentials_enable_service"] = False
prefs["profile.password_manager_enabled"] = False
prefs["credentials_enable_service"] = False
prefs["profile.password_manager_enabled"] = False
option.add_experimental_option("prefs", prefs)

# option.add_argument('--headless')  # 设置无头浏览器
# option.add_argument('--disable-gpu')

# 登录页
def login(username,passwd,url):
    # script = 'Object.defineProperty(navigator,"webdriver",{get:() => undefined,});'
    home_page = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=option)
    home_page.maximize_window()  # 最大窗口显示
    time.sleep(1)
    home_page.get(url)
    # home_page.execute_script(script)  # 执行js:规避反爬虫措施
    time.sleep(1)
    home_page.find_element_by_xpath("//*[@id='J-btn-login']").click()
    time.sleep(1)
    home_page.find_element_by_xpath("//*[@id='J-userName']").send_keys(username)
    time.sleep(1)
    home_page.find_element_by_xpath("//*[@id='J-password']").send_keys(passwd)
    time.sleep(1)
    home_page.find_element_by_link_text("立即登录").click()
    time.sleep(4)
    # 绕过验证模块，使用selenium的动作链完成
    action = ActionChains(home_page)
    # home_page.save_screenshot("verify.jpg")  # 当前页面截图
    verify_path = home_page.find_element_by_xpath("//*[@id='J-slide-passcode']")  # 定位验证滑块所在位置
    action.click_and_hold(verify_path)
    time.sleep(1)
    # 使用浏览器的检查功能：发现拖动条长度为340px
    action.move_by_offset(340,0).perform()    # move_by_offset(x,y):x表示水平移动，y表示垂直方向移动。perform():表示立即执行操作链
    time.sleep(2)
    action.release()  # 释放操作链
    # 跳过‘新型冠状病毒感染的肺炎疫情防控工作’的提示框，如果弹出的话
    time.sleep(20)

    tip = home_page.window_handles   # 获取页面所有句柄
    time.sleep(20)
    for n in tip:
        home_page.switch_to.window(n)  # 切换到最前面的页面
        time.sleep(20)
        home_page.find_elements_by_xpath("//*[@id='pop_164278112297010379']/div[2]/div[3]/a").click()
        print("user login successed! And current url:" + home_page.current_url)
        print(home_page)
    a = input("a")

if __name__ == "__main__":
    username = "17681030545"
    passwd = "Ldd1689714135"
    url = "https://www.12306.cn/index/"  # 这里url地址一定要确保正确，我就一开始没注意，后面的验证环节死活都出不来，浪费了半天时间！
    login(username,passwd,url)

OK! IT’S OVER!!!

奖励作者买杯可乐？