Understanding Software-2.0: A Study of Machine Learning library usage and evolution

Dilhara, Malinda; Ketkar, Ameya; Dig, Danny

Understanding Software-2.0: A Study of Machine Learning library usage and evolution

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/defaults/3b591h056

Descriptions

Attribute Name	Values
Creator	Dilhara, Malinda Ketkar, Ameya Dig, Danny
Abstract	Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models, i.e., Software-2.0, has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,394 top-rated open-source projects with 46,125 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: the ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries, (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models, and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.
License	All rights reserved
Resource Type	Research Paper
Date Issued	2020-07-01
Degree Level	Doctoral
Degree Field	Computer Science
Academic Affiliation	Electrical Engineering and Computer Science
Rights Statement	In Copyright
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]

Relationships

Parents:: This work has no parents.

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	MalindaDilhara2023.pdf	2020-11-16	Public	Download

ScholarsArchive@OSU

Understanding Software-2.0: A Study of Machine Learning library usage and evolution

Downloadable Content

Descriptions

Relationships

Items