0.8.0 (2023-05-31)#
Breaking Changes#
Rename methods of
FileConnection
classes:get_directory
→resolve_dir
get_file
→resolve_file
listdir
→list_dir
mkdir
→create_dir
rmdir
→remove_dir
New naming should be more consistent.
They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36)
Deprecate
onetl.core.FileFilter
class, replace it with new classes:onetl.file.filter.Glob
onetl.file.filter.Regexp
onetl.file.filter.ExcludeDir
Old class will be removed in v1.0.0. (#43)
Deprecate
onetl.core.FileLimit
class, replace it with new classonetl.file.limit.MaxFilesCount
.Old class will be removed in v1.0.0. (#44)
Change behavior of
BaseFileLimit.reset
method.This method should now return
self
instead ofNone
. Return value could be the same limit object or a copy, this is an implementation detail. (#44)Replaced
FileDownloader.filter
and.limit
with new options.filters
and.limits
:FileDownloader( ..., filter=FileFilter(glob="*.txt", exclude_dir="/path"), limit=FileLimit(count_limit=10), )
FileDownloader( ..., filters=[Glob("*.txt"), ExcludeDir("/path")], limits=[MaxFilesCount(10)], )
This allows to developers to implement their own filter and limit classes, and combine them with existing ones.
Old behavior still supported, but it will be removed in v1.0.0. (#45)
Removed default value for
FileDownloader.limits
, user should pass limits list explicitly. (#45)Move classes from module
onetl.core
:from onetl.core import DBReader from onetl.core import DBWriter from onetl.core import FileDownloader from onetl.core import FileUploader from onetl.core import FileResult from onetl.core import FileSet
with new modules
onetl.db
andonetl.file
:from onetl.db import DBReader from onetl.db import DBWriter from onetl.file import FileDownloader from onetl.file import FileUploader # not a public interface from onetl.file.file_result import FileResult from onetl.file.file_set import FileSet
Imports from old module
onetl.core
still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)
Features#
Add
rename_dir
method.Method was added to following connections:
FTP
FTPS
HDFS
SFTP
WebDAV
It allows to rename/move directory to new path with all its content.
S3
does not have directories, so there is no such method in that class. (#40)Add
onetl.file.FileMover
class.It allows to move files between directories of remote file system. Signature is almost the same as in
FileDownloader
, but without HWM support. (#42)
Improvements#
Document all public methods in
FileConnection
classes:download_file
resolve_dir
resolve_file
get_stat
is_dir
is_file
list_dir
create_dir
path_exists
remove_file
rename_file
remove_dir
upload_file
walk
(#39)
Update documentation of
check
method of all connections - add usage example and document result type. (#39)Add new exception type
FileSizeMismatchError
.Methods
connection.download_file
andconnection.upload_file
now raise new exception type instead ofRuntimeError
, if target file after download/upload has different size than source. (#39)Add new exception type
DirectoryExistsError
- it is raised if target directory already exists. (#40)Improved
FileDownloader
/FileUploader
exception logging.If
DEBUG
logging is enabled, print exception with stacktrace instead of printing only exception message. (#42)Updated documentation of
FileUploader
.Class does not support read strategies, added note to documentation.
Added examples of using
run
method with explicit files list passing, both absolute and relative paths.Fix outdated imports and class names in examples. (#42)
Updated documentation of
DownloadResult
class - fix outdated imports and class names. (#42)Improved file filters documentation section.
Document interface class
onetl.base.BaseFileFilter
and functionmatch_all_filters
. (#43)Improved file limits documentation section.
Document interface class
onetl.base.BaseFileLimit
and functionslimits_stop_at
/limits_reached
/reset_limits
. (#44)Added changelog.
Changelog is generated from separated news files using towncrier. (#47)
Misc#
Improved CI workflow for tests.
If developer haven’t changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)