The LSST is a large optical survey project funded by the National Science Foundation and the Department of Energy. It will continually image the sky, identify changes in near real time, and over a decade of operations collect tens of petabytes of data building up the deepest, widest, image of the Universe. Its data will enable a range of science goals from identification of Near Earth Asteroids to understanding the nature of Dark Energy.
A survey of this scale requires significant computing resources but also a modern, high-performance, scalable, data processing and analysis system. The LSST Data Management team is guiding an effort to build such a suite. Primarily written in Python and C++, open source, and comprised of modular codes ranging from science pipelines to web user interfaces, the LSST software stack will power the LSST and form a basis that other projects can reuse in the future.
The LSST DM team is distributed across a number of partner institutions — the LSST Project Office, the Infrared Processing and Analysis Center, the National Center for Supercomputing Applications, Princeton University, SLAC National Accelerator Laboratory, and the University of Washington — but also helped by contributors from the community, the LSST science collaborations, and other project subsystems.
The LSST Science Pipelines will implement the core image processing and data analysis algorithms needed to process optical survey imaging data at low latency and unprecedented scale and accuracy. We are writing pipelines for single-epoch image processing, coaddition, image differencing, optimal multi-epoch measurements, and (global) photometric and astrometric calibration, among others.
To satisfy the need to efficiently store, query, and analyze catalogs running into trillions of rows and petabytes of data, we are developing Qserv, a distributed shared-nothing SQL database query system.
One of the most important jobs of a large survey is to provide access. This includes access to catalogs, processed images, and raw images. Access in the next generation of surveys will extend to visualization and analysis. We are writing interfaces that will allow thousands of users to query, download, visualize, and analyze petabytes of LSST data.
In order to build a scalable, portable processing system, we are creating extensible middleware to transparently access data irrespective of storage location or format.
The LSST data processing pipelines will need to efficiently scale from single core execution to tens of thousands of cores. To meet this requirement we are building an orchestration framework to launch and monitor jobs on many different systems at many different scales.
The LSST data processing codes are being developed in an iterative, agile, fashion. Though engineering first light is still six years away, prototype versions of a number of LSST codes are already being tested on simulations and being applied to existing data (e.g., reprocessing SDSS Stripe 82, or processing HSC Survey data).
While already state-of-the-art in many areas, LSST software is still in its infancy when it comes to end-user friendliness, documentation, and API stability. There is no binary distribution yet — builds must be done from source. Knowledge of Python (and willingness to write some Python code) are necessary to work with the current code base.
Warning At this stage, the LSST software will be of greatest interest to the LSST Science Collaborations, large survey builders (or those reprocessing large survey data sets), and astronomical image processing enthusiasts. If you're just looking to reduce a few observations with a ready-to-use tool, it may be better to look into one of the more polished and/or established packages such as AstroPy or the AstrOmatic suite.
curl -OL https://raw.githubusercontent.com/lsst/lsst/12.0/scripts/newinstall.sh bash newinstall.sh source loadLSST.bash eups distrib install -t v12_0 lsst_apps
Once you've installed the stack, see here for examples of what you can do with it.
All LSST DM code is visible on GitHub, spread across 100+ repositories.
We're in the process of assembling the team of 45+ scientists, software engineers, and IT experts needed to build, commission, and operate the data system for LSST.
Current LSST DM job openings:
The LSST data processing system, though still in an early construction phase, is an open source (GPLv3) software project free for anyone to use and is open to contributions from the community.
We invite you to:
8.4 meter, wide-field, f/1.2 telescope.
3.2 Gigapixel, 189 4k x 4k CCD camera, with 2-second readout.
PetaFLOPS of computing power, hundreds of PB of storage, gigabit long-haul networks.
Beginning early in the next decade, the LSST will collect over 50 PB of raw data, resulting in over 30 trillion observations of 40 billion astronomical sources. It will measure the positions and properties of over 20 billion stars, or 10% of all stars in the Milky Way.
The LSST will scan the visible sky once every three days, charting objects that change or move: from exploding supernovae to potentially hazardous near-Earth asteroids.
LSST data will be available with no proprietary period to all astronomers in the United States, Chile, and International Partners. Alerts about variable sources will be available world-wide within 60 seconds. The LSST data processing stack will be open source (GPL v3).