Contributing to EDS-NLP
We welcome contributions ! There are many ways to help. For example, you can:
- Help us track bugs by filing issues
- Suggest and help prioritise new functionalities
- Develop a new pipe ! Fork the project and propose a new functionality through a pull request
- Help us make the library as straightforward as possible, by simply asking questions on whatever does not seem clear to you.
Development installation
To be able to run the test suite, run the example notebooks and develop your own pipeline component, you should clone the repo and install it locally.
# Clone the repository and change directory
$ git clone https://github.com/aphp/edsnlp.git
---> 100%
$ cd edsnlp
# Optional: create a virtual environment
$ python -m venv venv
$ source venv/bin/activate
# Install the package with common, dev, setup dependencies in editable mode
$ pip install -e '.[dev,setup]'
# And build resources
$ python scripts/conjugate_verbs.py
To make sure the pipeline will not fail because of formatting errors, we added pre-commit hooks using the pre-commit
Python library. To use it, simply install it:
$ pre-commit install
The pre-commit hooks defined in the configuration will automatically run when you commit your changes, letting you know if something went wrong.
The hooks only run on staged changes. To force-run it on all files, run:
$ pre-commit run --all-files
---> 100%
color:green All good !
Proposing a merge request
At the very least, your changes should :
- Be well-documented ;
- Pass every tests, and preferably implement its own ;
- Follow the style guide.
Testing your code
We use the Pytest test suite.
The following command will run the test suite. Writing your own tests is encouraged !
python -m pytest
Testing Cython code
Make sure the package is installed in editable mode. Otherwise Pytest
won't be able to find the Cython modules.
Should your contribution propose a bug fix, we require the bug be thoroughly tested.
Architecture of a pipeline component
Pipes should follow the same pattern :
edsnlp/pipes/<pipe>
|-- <pipe>.py # Defines the component logic
|-- patterns.py # Defines matched patterns
|-- factory.py # Declares the component to spaCy
Style Guide
We use Black to reformat the code. While other formatter only enforce PEP8 compliance, Black also makes the code uniform. In short :
Black reformats entire files in place. It is not configurable.
Moreover, the CI/CD pipeline enforces a number of checks on the "quality" of the code. To wit, non black-formatted code will make the test pipeline fail. We use pre-commit
to keep our codebase clean.
Refer to the development install tutorial for tips on how to format your files automatically. Most modern editors propose extensions that will format files on save.
Documentation
Make sure to document your improvements, both within the code with comprehensive docstrings, as well as in the documentation itself if need be.
We use MkDocs
for EDS-NLP's documentation. You can checkout the changes you make with:
# Install the requirements
$ pip install -e '.[docs]'
---> 100%
color:green Installation successful
# Run the documentation
$ mkdocs serve
Go to localhost:8000
to see your changes. MkDocs watches for changes in the documentation folder and automatically reloads the page.