Filters

pygit2 supports defining and registering libgit2 blob filters implemented in Python.

The Filter type

class pygit2.Filter

Base filter class to be used with libgit2 filters.

Inherit from this class and override the check(), write() and close() methods to define a filter which can then be registered via pygit2.filter_register().

A new Filter instance will be instantiated for each stream which needs to be filtered. For each stream, filter methods will be called in this order:

  • check()

  • write() (may be called multiple times)

  • close()

Filtered output data should be written to the next filter in the chain during write() and close() via the write_next method. All output data must be written to the next filter before returning from close().

If a filter is dependent on reading the complete input data stream, the filter should only write output data in close().

attributes: str = ''

Space-separated string list of attributes to be used in check()

check(src: FilterSource, attr_values: List[str | None])

Check whether this filter should be applied to the given source.

check will be called once per stream.

If Passthrough is raised, the filter will not be applied.

Parameters:

src: The source of the filtered blob.

attr_values: The values of each attribute for the blob being filtered.

attr_values will be a sorted list containing attributes in the order they were defined in cls.attributes.

close(write_next: Callable[[bytes], None])

Close this filter.

close() will be called once per stream whenever all writes() to this stream have been completed.

Parameters:
write_next: The write() method of the next filter in the chain.

Any remaining filtered output data must be written to write_next before returning.

write(data: bytes, src: FilterSource, write_next: Callable[[bytes], None])

Write input data to this filter.

write() may be called multiple times per stream.

Parameters:

data: Input data.

src: The source of the filtered blob.

write_next: The write() method of the next filter in the chain.

Filtered output data should be written to write_next whenever it is available.

class pygit2.FilterSource

A filter source represents the file/blob to be processed.

Registering filters

pygit2.filter_register(name: str, filter_cls: Type[Filter], [priority: int = C.GIT_FILTER_DRIVER_PRIORITY]) None

Register a filter under the given name.

Filters will be run in order of priority on smudge (to workdir) and in reverse order of priority on clean (to odb).

Two filters are preregistered with libgit2:
  • GIT_FILTER_CRLF with priority 0

  • GIT_FILTER_IDENT with priority 100

priority defaults to GIT_FILTER_DRIVER_PRIORITY which imitates a core Git filter driver that will be run last on checkout (smudge) and first on checkin (clean).

Note that the filter registry is not thread safe. Any registering or deregistering of filters should be done outside of any possible usage of the filters.

pygit2.filter_unregister(name: str) None

Unregister the given filter.

Note that the filter registry is not thread safe. Any registering or deregistering of filters should be done outside of any possible usage of the filters.

Example

The following example is a simple Python implementation of a filter which enforces that blobs are stored with unix LF line-endings in the ODB, and checked out with line-endings in accordance with the .gitattributes eol setting.

class CRLFFilter(pygit2.Filter):
    attributes = "text eol=*"

    def __init__(self):
        super().__init__()
        self.linesep = b'\r\n' if os.name == 'nt' else b'\n'
        self.buffer = io.BytesIO()

    def check(self, src, attr_values):
        if src.mode == pygit2.enums.FilterMode.SMUDGE:
            # attr_values contains the values of the 'text' and 'eol'
            # attributes in that order (as they are defined in
            # CRLFFilter.attributes
            eol = attr_values[1]

            if eol == 'crlf':
                self.linesep = b'\r\n'
            elif eol == 'lf':
                self.linesep = b'\n'
        else:  # src.mode == pygit2.enums.FilterMode.CLEAN
            # always use LF line-endings when writing to the ODB
            self.linesep = b'\n'

    def write(data, src, write_next):
        # buffer input data in case line-ending sequences span chunk boundaries
        self.buffer.write(data)

    def close(self, write_next):
        # apply line-ending conversion to our buffered input and write all
        # of our output data
        self.buffer.seek(0)
        write_next(self.linesep.join(self.buffer.read().splitlines()))