0.8.0 (2023-05-31)#

Breaking Changes#

  • Rename methods of FileConnection classes:

    • get_directoryresolve_dir

    • get_fileresolve_file

    • listdirlist_dir

    • mkdircreate_dir

    • rmdirremove_dir

    New naming should be more consistent.

    They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36)

  • Deprecate onetl.core.FileFilter class, replace it with new classes:

    • onetl.file.filter.Glob

    • onetl.file.filter.Regexp

    • onetl.file.filter.ExcludeDir

    Old class will be removed in v1.0.0. (#43)

  • Deprecate onetl.core.FileLimit class, replace it with new class onetl.file.limit.MaxFilesCount.

    Old class will be removed in v1.0.0. (#44)

  • Change behavior of BaseFileLimit.reset method.

    This method should now return self instead of None. Return value could be the same limit object or a copy, this is an implementation detail. (#44)

  • Replaced FileDownloader.filter and .limit with new options .filters and .limits:

    onETL < 0.8.0#
    FileDownloader(
        ...,
        filter=FileFilter(glob="*.txt", exclude_dir="/path"),
        limit=FileLimit(count_limit=10),
    )
    
    onETL >= 0.8.0#
    FileDownloader(
        ...,
        filters=[Glob("*.txt"), ExcludeDir("/path")],
        limits=[MaxFilesCount(10)],
    )
    

    This allows to developers to implement their own filter and limit classes, and combine them with existing ones.

    Old behavior still supported, but it will be removed in v1.0.0. (#45)

  • Removed default value for FileDownloader.limits, user should pass limits list explicitly. (#45)

  • Move classes from module onetl.core:

    before#
    from onetl.core import DBReader
    from onetl.core import DBWriter
    from onetl.core import FileDownloader
    from onetl.core import FileUploader
    from onetl.core import FileResult
    from onetl.core import FileSet
    

    with new modules onetl.db and onetl.file:

    after#
    from onetl.db import DBReader
    from onetl.db import DBWriter
    
    from onetl.file import FileDownloader
    from onetl.file import FileUploader
    
    # not a public interface
    from onetl.file.file_result import FileResult
    from onetl.file.file_set import FileSet
    

    Imports from old module onetl.core still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)

Features#

  • Add rename_dir method.

    Method was added to following connections:

    • FTP

    • FTPS

    • HDFS

    • SFTP

    • WebDAV

    It allows to rename/move directory to new path with all its content.

    S3 does not have directories, so there is no such method in that class. (#40)

  • Add onetl.file.FileMover class.

    It allows to move files between directories of remote file system. Signature is almost the same as in FileDownloader, but without HWM support. (#42)

Improvements#

  • Document all public methods in FileConnection classes:

    • download_file

    • resolve_dir

    • resolve_file

    • get_stat

    • is_dir

    • is_file

    • list_dir

    • create_dir

    • path_exists

    • remove_file

    • rename_file

    • remove_dir

    • upload_file

    • walk (#39)

  • Update documentation of check method of all connections - add usage example and document result type. (#39)

  • Add new exception type FileSizeMismatchError.

    Methods connection.download_file and connection.upload_file now raise new exception type instead of RuntimeError, if target file after download/upload has different size than source. (#39)

  • Add new exception type DirectoryExistsError - it is raised if target directory already exists. (#40)

  • Improved FileDownloader / FileUploader exception logging.

    If DEBUG logging is enabled, print exception with stacktrace instead of printing only exception message. (#42)

  • Updated documentation of FileUploader.

    • Class does not support read strategies, added note to documentation.

    • Added examples of using run method with explicit files list passing, both absolute and relative paths.

    • Fix outdated imports and class names in examples. (#42)

  • Updated documentation of DownloadResult class - fix outdated imports and class names. (#42)

  • Improved file filters documentation section.

    Document interface class onetl.base.BaseFileFilter and function match_all_filters. (#43)

  • Improved file limits documentation section.

    Document interface class onetl.base.BaseFileLimit and functions limits_stop_at / limits_reached / reset_limits. (#44)

  • Added changelog.

    Changelog is generated from separated news files using towncrier. (#47)

Misc#

  • Improved CI workflow for tests.

    • If developer haven’t changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.

    • If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.

    • Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies

    • Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)