Comparison with luigi¶
The major source of inspiration for the apetype package was luigi
developed at Spotify. When it was developed, type hints were not a
thing yet in Python, so it is very understandable that it was not
used in luigi. Using a lot of luigi.Parameters
makes the code
look bloated and the way of inheritance between tasks did also not
work smoothly for me, especially when dealing with task that needed to
be executed for a dynamically generated list of samples. With that in
mind, apetype was developed. In this section, a 1 on 1 comparison
with the toy example of luigi.
Top artists example¶
The code snippets on luigi’s documentation page
( https://luigi.readthedocs.io/en/stable/example_top_artists.html )
are not directly executable, but the full code can be found at
examples/top_artists.py
in the luigi repository. In the
comparison here, code should also not be executed. It is just to
compare the syntax.
Aggregate Artist Streams with luigi¶
>>> class AggregateArtists(luigi.Task):
... date_interval = luigi.DateIntervalParameter()
...
... def output(self):
... return luigi.LocalTarget("data/artist_streams_%s.tsv" % self.date_interval)
...
... def requires(self):
... return [Streams(date) for date in self.date_interval]
...
... def run(self):
... artist_count = defaultdict(int)
...
... for input in self.input():
... with input.open('r') as in_file:
... for line in in_file:
... timestamp, artist, track = line.strip().split()
... artist_count[artist] += 1
...
... with self.output().open('w') as out_file:
... for artist, count in artist_count.iteritems():
... print(artist, count, file=out_file)
Aggregate Artist Streams with apetype¶
>>> import apetype as at
... import datetime, pathlib
...
... class DateInterval(at.TaskBase):
... days: float = 0
... hours: float = 0
...
... def timedelta(_) -> datetime.timedelta:
... return datetime.timedelta(days=_.days, hours=_.hours)
...
... class Streams(at.TaskBase):
... date: float
...
... class AggregateArtists(at.TaskBase):
... date_interval: DateInterval
...
... def date(_, date_interval) -> list:
... return [d for d in date_interval.timedelta]
...
... def output(_, date_interval) -> pathlib.Path:
... return pathlib.Path("data/artist_streams_%s.tsv" % date_interval)
...
... def streams_list(_, date: at.tasks.InjectItems) -> list:
... return Streams(date)
...
... def main(_, streams_list, output) -> type(None):
... artist_count = defaultdict(int)
...
... for input in streams_list:
... with input.open('r') as in_file:
... for line in in_file:
... timestamp, artist, track = line.strip().split()
... artist_count[artist] += 1
...
... with output.open('w') as out_file:
... for artist, count in artist_count.iteritems():
... print(artist, count, file=out_file)
Conclusion¶
Watch out Luigi, Donkey Kong is back and he wants to take over the plumbing business .. in Python programming, with a keen sense and (in)appropriate use of class.