Stop Wrestling with UNIX Timestamps: A Clean Pandas Solution
During the initial stages of building my International Space Station (ISS) data engineering project, I quickly realised a format issue with the timestamps broadcast by the ISS Open API. The timestamps used by the API were set in the UNIX format, which aren't terribly user friendly for consumers of the data.
My goal was to ensure that my dataframe's timestamps are correctly converted from UNIX to ISO 8601 during ingestion. UNIX timestamps represent how many seconds have passed since midnight 1 January 1970.
An example UNIX timestamp looks like this:
1707568200 → Converts to 2024-02-10T12:30:00 UTC
While interesting from a historical point of view, I didn't want my ETL application to keep timestamps in this format. So after doing research, the pandas library has a useful function called to_datetime(series). This function can then be combined with another pandas function called dt.strftime to finally convert the timestamp to the ISO 8601 format.
def convert_unix_to_iso(series: pd.Series) -> pd.Series:
"""
Convert Unix timestamps to ISO 8601 formatted datetime strings.
Args:
series (pd.Series): Series containing Unix timestamps
Returns:
pd.Series: Series with timestamps in ISO format (YYYY-MM-DDTHH:MM:SS)
"""
return pd.to_datetime(series, unit='s').dt.strftime('%Y-%m-%dT%H:%M:%S')
The above code takes a series right after my ISS ETL script ingests data, and immediately converts the relevant column of timestamps into ISO 8601 format, which is far more useful for end users.