Comparison with luigi

The major source of inspiration for the apetype package was luigi developed at Spotify. When it was developed, type hints were not a thing yet in Python, so it is very understandable that it was not used in luigi. Using a lot of luigi.Parameters makes the code look bloated and the way of inheritance between tasks did also not work smoothly for me, especially when dealing with task that needed to be executed for a dynamically generated list of samples. With that in mind, apetype was developed. In this section, a 1 on 1 comparison with the toy example of luigi.

Top artists example

The code snippets on luigi’s documentation page ( https://luigi.readthedocs.io/en/stable/example_top_artists.html ) are not directly executable, but the full code can be found at examples/top_artists.py in the luigi repository. In the comparison here, code should also not be executed. It is just to compare the syntax.

Aggregate Artist Streams with luigi

>>> class AggregateArtists(luigi.Task):
...     date_interval = luigi.DateIntervalParameter()
...
...     def output(self):
...         return luigi.LocalTarget("data/artist_streams_%s.tsv" % self.date_interval)
...
...     def requires(self):
...         return [Streams(date) for date in self.date_interval]
...
...     def run(self):
...         artist_count = defaultdict(int)
...
...         for input in self.input():
...             with input.open('r') as in_file:
...                 for line in in_file:
...                     timestamp, artist, track = line.strip().split()
...                     artist_count[artist] += 1
...
...         with self.output().open('w') as out_file:
...             for artist, count in artist_count.iteritems():
...                 print(artist, count, file=out_file)

Aggregate Artist Streams with apetype

>>> import apetype as at
... import datetime, pathlib
...
... class DateInterval(at.TaskBase):
...     days: float = 0
...     hours: float = 0
...
...     def timedelta(_) -> datetime.timedelta:
...         return datetime.timedelta(days=_.days, hours=_.hours)
...
... class Streams(at.TaskBase):
...     date: float
...
... class AggregateArtists(at.TaskBase):
...     date_interval: DateInterval
...
...     def date(_, date_interval) -> list:
...         return [d for d in date_interval.timedelta]
...
...     def output(_, date_interval) -> pathlib.Path:
...         return pathlib.Path("data/artist_streams_%s.tsv" % date_interval)
...
...     def streams_list(_, date: at.tasks.InjectItems) -> list:
...         return Streams(date)
...
...     def main(_, streams_list, output) -> type(None):
...         artist_count = defaultdict(int)
...
...         for input in streams_list:
...             with input.open('r') as in_file:
...                 for line in in_file:
...                     timestamp, artist, track = line.strip().split()
...                     artist_count[artist] += 1
...
...         with output.open('w') as out_file:
...             for artist, count in artist_count.iteritems():
...                 print(artist, count, file=out_file)

Conclusion

Watch out Luigi, Donkey Kong is back and he wants to take over the plumbing business .. in Python programming, with a keen sense and (in)appropriate use of class.