-
- Downloads
Updated pipeline with (1) Datacenter parameters in conf.yaml (2) Improved...
Updated pipeline with (1) Datacenter parameters in conf.yaml (2) Improved language labeling (3) More extensive logging of languages (4) Bash scripts with less redundancy
Showing
- README.md 34 additions, 27 deletionsREADME.md
- conf/config.py 12 additions, 2 deletionsconf/config.py
- conf/config.yaml 8 additions, 9 deletionsconf/config.yaml
- estimate_resources.py 4 additions, 6 deletionsestimate_resources.py
- log/README.md 9 additions, 9 deletionslog/README.md
- log/minio_logging.py 6 additions, 49 deletionslog/minio_logging.py
- log/statistics.py 2 additions, 1 deletionlog/statistics.py
- parse/README.md 2 additions, 2 deletionsparse/README.md
- parse/language.py 7 additions, 18 deletionsparse/language.py
- parse/warc_preprocessing.py 8 additions, 3 deletionsparse/warc_preprocessing.py
- requirements.txt 1 addition, 1 deletionrequirements.txt
- scripts/prepare.sh 3 additions, 2 deletionsscripts/prepare.sh
- scripts/submit_spark.sh 13 additions, 9 deletionsscripts/submit_spark.sh
Loading
Please register or sign in to comment