https://github.com/dataabc/weiboSpider
https://github.com/dataabc/weibo-crawler
https://github.com/dataabc/weibo-search
Operating Environment
Programming Language: Python 3
- To install Python, please visit: Python Official Website
- During the installation process, remember to select “Add to PATH”
Operating Systems: Windows/Linux/macOS
weiboSpider
Program Installation - Source Code Installation
Extract and navigate to the program directory
$ pip install -r requirements.txt
If encountering errors during the installation of dependencies, please open the requirements.txt file and remove the specified version numbers at the end.
Furthermore, it is recommended for all users to employ embedded Python or virtual environments for installation in order to avoid contaminating their default environment.
Executing the Program
For users who opt for source code installation, the following command can be run within the weiboSpider directory. For users who choose pip installation, the following command can be executed in any directory with write permissions.
$ python -m weibo_spider
Upon initial execution, a config.json configuration file will be automatically created in the current directory. Once configured, executing the same command will allow you to retrieve Weibo data.
Program Configuration
For information on program configuration, please refer to the README and docs in the source code.
WeiboCrawler
WeiboCrawler is a variant of WeiboSpider, sharing similar underlying logic. Please consult its README for installation instructions.
weibo-crawler tutorial (CN).pdf
weiboSearch
All configurations of this program are completed in the “settings.py” file located at “weibo-search\weibo\settings.py”.
Download Script
Unzip the folder.
Install Scrapy
Install Scrapy.
pip install scrapy
If there are compatibility issues.
pip uninstall Twisted
pip install Twisted==22.10.0
Please verify the successful installation by entering the command in the terminal.
scrapy version
Install dependencies
pip install -r requirements.txt
4. Configuration Procedure
Amend the settings in weibo\settings.py.
To comprehend the program settings, please refer to the README and docs in the source code.
Ensure to configure the cookie, keeping the time span within 5 days, if the time span is extensive, divide the crawling into multiple sessions.
- Open weibo.cn in your browser and log in.
- Press F12 and switch to the Network tab.
- Refresh the page, ensuring the URL does not have any suffix.
- Locate the cookie section in the request headers of weibo.cn, then copy it into the settings.py file.
- Remember, you need to retrieve your own cookie before each program run.
5. Executing the Program
Navigate to the directory where scrapy.cfg is located in the command line, or right-click and open in the terminal.
In the terminal, input:
scrapy crawl search
The execution time may be lengthy, but you can interrupt it in advance by pressing Ctrl+C.