-
- Downloads
Version 0.1.2 of Resilipipe: Added opengraph column, Anchor texts for...
Version 0.1.2 of Resilipipe: Added opengraph column, Anchor texts for outgoing_links, URL-level curlielabels and warc_offset
Showing
- README.md 127 additions, 7 deletionsREADME.md
- resilipipe/pyproject.toml 6 additions, 4 deletionsresilipipe/pyproject.toml
- resilipipe/resilipipe/__init__.py 1 addition, 1 deletionresilipipe/resilipipe/__init__.py
- resilipipe/resilipipe/conf/config.py 97 additions, 80 deletionsresilipipe/resilipipe/conf/config.py
- resilipipe/resilipipe/conf/modules.yaml 1 addition, 0 deletionsresilipipe/resilipipe/conf/modules.yaml
- resilipipe/resilipipe/jobs/spark.py 18 additions, 10 deletionsresilipipe/resilipipe/jobs/spark.py
- resilipipe/resilipipe/parse/README.md 5 additions, 5 deletionsresilipipe/resilipipe/parse/README.md
- resilipipe/resilipipe/parse/modules/abstract.py 25 additions, 22 deletionsresilipipe/resilipipe/parse/modules/abstract.py
- resilipipe/resilipipe/parse/modules/collection_indices.py 5 additions, 4 deletionsresilipipe/resilipipe/parse/modules/collection_indices.py
- resilipipe/resilipipe/parse/modules/curlielabels.py 267 additions, 76 deletionsresilipipe/resilipipe/parse/modules/curlielabels.py
- resilipipe/resilipipe/parse/modules/geoparsing.py 2 additions, 2 deletionsresilipipe/resilipipe/parse/modules/geoparsing.py
- resilipipe/resilipipe/parse/modules/links.py 100 additions, 33 deletionsresilipipe/resilipipe/parse/modules/links.py
- resilipipe/resilipipe/parse/modules/ows_headers.py 76 additions, 0 deletionsresilipipe/resilipipe/parse/modules/ows_headers.py
- resilipipe/resilipipe/parse/warc_preprocessing.py 28 additions, 48 deletionsresilipipe/resilipipe/parse/warc_preprocessing.py
- scripts/run_preprocessor.sh 2 additions, 0 deletionsscripts/run_preprocessor.sh
- tests/data/sample.warc.gz 0 additions, 0 deletionstests/data/sample.warc.gz
- tests/jobs/test_spark.py 14 additions, 1 deletiontests/jobs/test_spark.py
- tests/parse/test_warc_preprocessing.py 14 additions, 14 deletionstests/parse/test_warc_preprocessing.py
Loading
Please register or sign in to comment