python 2.7 - Modifying an existing, timezone-naive scheduler to deal with daylight savings time? -
we have timezone-unaware scheduler in pure python.
it uses heapq (a python binary heap) of ordered events, containing time, callback , arguments callback. gets least-valued time heapq, computes number of seconds until event occur, , sleeps number of seconds before running job.
we don't need worry computers being suspended; run on dedicated server, not laptop.
we'd make scheduler cope timezone changes, don't have problem in november did (we had important job had adjusted in database make run @ 8:15am instead of 9:15am - runs @ 8:15am). i'm thinking could:
- store times in utc.
- make scheduler sleep 1 minute , test, in loop, recomputing “now” each time, , doing <= comparison against job datetimes.
- jobs run more once hour should “just run normally”.
- hourly jobs run in between 2:00am , 2:59am (inclusive) on time change day, should skip hour pst->pdt, , run time pdt->pst.
- jobs run less hourly should avoid rerunning in either case on days have time change.
does sound right? might off?
thanks!
i've written scheduling few times before respect other programming languages. concepts valid python well. may wish read of these posts: 1, 2, 3, 4, 5, 6
i'll try address specific points again, python perspective:
it's important separate separate recurrence pattern execution time. recurrence pattern should store time user enter it, local time. if recurrence pattern "just 1 time", should still stored local time. scheduling 1 of handful of use cases common advice of "always work in utc" not hold up!
you need store time zone identifier. these should iana time zones, such
america/los_angeles
oreurope/london
. in python, can use pytz library work time zones these.the execution time should indeed based on utc. next execution time event should calculated local time in recurrence pattern. may wish calculate , store these execution times in advance, such can determine next events run.
you should prepared recalculate these execution times. may wish periodically, @ minimum should done time apply time zone update system. can (and should) subscribe tz update announcements iana, , corresponding pytz updates on pypi.
think of way. when convert local time utc, you're assuming know time zone rules will be @ point in time, nobody can predict governments in future. time zone rules can change, , do. need take consideration.
you should test invalid , ambiguous times, , have plan dealing them. these easy hit when scheduling, recurring events.
for example, might schedule task run @ 2:00 every day - on day of spring-forward transition time doesn't exist. should do? in many cases, you'll want run @ 3:00 on day, since it's next time after 1:59 am. in (rarer) contexts, might run @ 1:00 am, or @ 1:59 am, or skip day entirely.
likewise, might schedule task run @ 1:00 every day, on day of fall-back transition, 1:00 occurs twice. do? in many cases, first instance (which daylight instance) right time fire. in other (rarer) cases, second instance may more appropriate, or (even rarer) might appropriate run job twice.
with regard jobs run on every x [hours/minutes/seconds]
type schedule:
these easiest schedule utc, , should not affected dst changes.
if these only types of jobs running, can base whole system on utc. if you're running mix of different types of jobs, might consider setting "local time zone" "utc" in recurrence pattern.
alternatively, schedule them true local time, make sure when job runs calculates next execution time based on current execution time, should in utc.
you shouldn't distinguish between jobs run more hourly, or jobs run less hourly. expect hourly run 25 times on day of fall-back transition, , 23 times on day of spring-forward transition.
with regard plan sleep , wake once per minute in loop - work, long don't have sub-minute tasks deal with. may not efficient way deal though. if pre-calculate , store execution times, set single task wake @ next time run, run needs run, set new task next execution time. don't have wake once per minute.
you should think resources need run scheduled jobs. happens if schedule 1000 tasks need run @ midnight? won't able run simultaneously on single computer. might queue them run in batches, or spread out load different time slots. in cloud environment perhaps spin additional workers handle load.
Comments
Post a Comment