arclib¶
Python has a very thorough standard library, including a variety of modules for compressing and decompressing data. However, all of them have different APIs for working with them., leading to difficulty in attempting to support multiple archive formats. arclib is an attempt to fix this by providing a common API that bridges the various archive modules in Python.
Rationale¶
A while back, I was trying to port a Python application from using zip files to using tar files. To say it was “painful” is an understatement. The two modules have conceptually similar but immensely different APIs. Therefore, I started work on arclib to provide a unified API between the two, as well as between the other archive modules in the standard library.
Archive categories¶
arclib
divides Python’s five archive modules (zipfile, tarfile, lzma,
bzip2, and gzip) into three categories:
- Basic, one-shot: the only compression method in this category is gzip. With gzip, the only allowed method of compression and decompression is “one-shot”, where all the data must be processed at once.
- Basic, incremental: LZMA and bzip2 fall under here. These not only allow one-shot but also incremental compression and decompression, where the data is fed in in small chunks.
- Complex, file-system: Tar and zip both are considering “complex” in arclib, in that they both allow one to store entire directory trees, not just chunks of data.
Modules¶
arclib contains the following modules:
Basic compression/decompression¶
- arclib.gz: A module that exposes a basic, one-shot gzip compression/decompression API.
- arclib.bz2: A module that exposes both a basic, one-shot bzip2 compression/decompression and an incremental API.
- arclib.lzma: A module that exposes both a basic, one-shot LZMA compression/decompression and an incremental API.
Note that arclib.gz does not expose an incremental API.
Complex, file-system compression/decompression¶
Both arclib.zip and arclib.tar expose an API that allows for manipulation of their respective complex archive formats.
The APIs¶
Base¶
All modules in arclib implement this API:
-
open
(*args, **kw)¶ Returns a
File
with the given arguments.
Basic, one-shot¶
All the “basic” modules (gz, bz2, and lzma) implement this API:
detach, truncate https://docs.python.org/3/library/io.html#io.BufferedIOBase
-
compress
(data)¶ Compresses data using the corresponding compression method.
Parameters: data (bytes) – The data to compress. Returns: The compressed data. Return type: bytes
-
decompress
(data)¶ Decompresses data using the corresponding decompression method.
Parameters: data (bytes) – The data to decompress. Returns: The decompressed data. Return type: bytes
In addition, their open
function is an alias for the corresponding Python
module’s open
and therefore returns the module’s File
. For instance,
arclib.gz.open
is an alias for Python’s own gzip.open
and returns
gzip.GzipFile
.
Basic, incremental¶
Both arclib.bz2 and arclib.lzma (*not* arclib.gz) implement this API.
-
class
Compressor
¶ A class that implements incremental compression. All types of this kind are instances of
arclib.AbstractBasicCompressor
. Example usage:my_compressor = arclib.bz2.Compressor() # The compressor object. compressed_data = b'' # The resulting compressed data. compressed_data += my_compressor.compress(b'Something to compress...') compressed_data += my_compressor.compress(b'More stuff!') compressed_data += my_compressor.flush() # Always remember the flush call!
-
compress
(data)¶ Incrementally compresses data using the corresponding compression method.
Parameters: data (bytes) – The data to compress. Returns: A portion of compressed data, or an empty byte string. Note that this data is not considered valid on its own, and must be combined with both other calls to compress
and the result offlush()
.Return type: bytes
-
flush
()¶ Flushes the compressor’s internal buffers.
Returns: The rest of the compressed data. Return type: bytes
-
-
class
Decompressor
¶ A class that implements incremental compression. All types of this kind are instances of
arclib.AbstractBasicDecompressor
. Example usage:compressed_data = arclib.bz2.compress(b'Some data to compress!') my_decompressor = arclib.bz2.Decompressor() # The decompressor object. decompressed_data = b'' # The resulting decompressed data. # Decompress some data. decompressed_data += my_decompressor.decompress(compressed_data[:5]) # And some more data! decompressed_data += my_decompressor.decompress(compressed_data[5:]) assert decompressed_data == b'Some data to compress!' assert my_decompressor.eof
-
decompress
(data)¶ Incrementally decompresses data using the corresponding decompression method.
Parameters: data (bytes) – The data to decompress. Returns: A portion of decompressed data, or an empty byte string. Note that this data is not the complete decompressed data, and must b combined with other calls to decompress
.Return type: bytes
-
eof
¶ Whether or not the end of the compressed data has been reached.
-
unused_data
¶ Any unused data left over after the decompression completed.
-
Complex¶
Both arclib.zip and arclib.tar implement this API.
-
open
(*args, **kw) Opens an archive. All arguments are passed to the corresponding function; for instance,
arclib.zip.open
passes all its arguments tozipfile.open
.Returns: The opened archive file. Return type: File
-
openobj
(fileobj, **kw)¶ Opens the given file object. Whereas open opens a file path, openobj opens an in-memory file object.
Returns: The opened archive file. Return type: File
-
class
File
¶ An opened archive file. Can be used as a context manager. Example:
import arclib.zip with arclib.zip.open('myfile.zip') as f: # Stuff here. # f is automatically closed.
-
close
()¶ Close the archive file.
-
info_for
(member)¶ Returns an
Info
object containing information about the given archive member.Parameters: member (str) – A string describing the path to the archive member, e.g. x/y/z
.Returns: The member information object. Return type: Info
-
all_info
()¶ Retrieves
Info
objects for all the archive’s members.Returns: A list of all the Info
objects for all the archive’s members.Return type: list of Info
-
members
()¶ Retrieves all the archive’s members.
Returns: A list of strings, one for each archive member. Return type: list of str
-
dump
()¶ Dump a description archive’s contents to standard output.
-
add
(path, arcname=None, recursive=True)¶ Adds a file or directory to the archive.
Parameters: - path (str) – The path to add to the archive.
- arcname (str) – The name to give the file when placing it in the
archive. If
None
, then it will be the same as path, but with leading roots and the drive removed. - recursive (bool) – If path is a directory and this is a truthy value, then the directory’s contents will also be added to the archive.
-
add_data
(path, data)¶ Adds a
bytes
object to the archive.Parameters: - path (str) – The name to give the file when placing it in the archive.
- data (bytes) – The file’s contents.
-
extract
(member, path=None)¶ Extracts a member from the archive.
Parameters: - member (str) – The member to extract.
- path (str) – The target path to extract the member to; if
None
, then it will be the current directory.
arclib.zip.File.extract
also takes the following keyword argument:Parameters: pwd (str) – The password to use to extract the file, or None
.
-
open
(member, universal_newlines=False)¶ Extracts a member from the archive into memory rather that onto the disk. Returns a bytes file-like object with the following properties:
- name - The name of the member.
- read(size=-1) - Read and return size bytes from the file.
If
universal_newlines
isTrue
, then the file object will be an instance ofio.TextIOWrapper
that also has thename
property.Parameters: - member (str) – The member to extract.
- universal_newlines (str) – If
True
, returns anio.TextIOWrapper
that also has a property name, which is the name of the member. Otherwise, returns a file-like object as mentioned above.
arclib.zip.File.extract
also takes the following keyword argument:Parameters: pwd (str) – The password to use to extract the file, or None
.Returns: The file-like object as explained above, if the member is present. If it is not present, returns None
.
-
-
class
Info
¶ An object containing information about an archive member.
-
info
¶ The underlying, “true” info object. With
arclib.zip.Info
, this is an instance ofzipfile.ZipInfo
; witharclib.tar.Info
, this is an instance oftarfile.TarInfo
.
-
filename
¶ The name of the file within the archive.
-
size
¶ The number of bytes that the file takes up within the archive.
-
mtime
¶ A
datetime.datetime
object containing the last modification time of the file.
-