Sociology of Programming Languages LEO A. MEYEROVICH @LMEYEROV BERKELEY // SWIVEL.IO ARIEL S. RABKIN @ARITALKING PRINCETON 1 Sociology of Programming Languages ^ ADOPTION LEO A. MEYEROVICH @LMEYEROV BERKELEY // SWIVEL.IO ARIEL S. RABKIN @ARITALKING PRINCETON 2 MAN AND THE MACHINE CULT OF PERSONALITY 3 ENIAC 4 2011 5 PL Blind Spot: Social Design I can only stand on shoulders of giants for intrinsic feature design Performance parallel algorithms/compilers/ synthesis Abstraction concurrency/dynamism (FRP) 6 Zeroing In “Psychological aspects of programming and in the computational aspects of psychology” ICSE, FSE, MSR, CHASE, … CSCW Computer Supported Cooperative Work and Social Computing 7 Sociology of Languages Principles Language s Standalone topic! [Implications for Design, Paul Dourish, CHI’06] 8 Why Start with Adoption? [P. Coburn; switching costs] Change Function threshold to adopt: perceived adoption need perceived adoption pain > 1 FP!!! new language 9 Why Start with Adoption? “From now on, my goal in life would be to also drive the denominator down to zero” - Erik Meijer Confession of a Used Language Salesman 10 Why Start with Adoption? FP!!! new language FP!! same language 11 SOCIAL THEORIES [Onward! ‘12] Adoption Model? Decision Making? Acquisition? ANALYSIS OF 200K PROJECTS AND 20K DEVS [PLATEAU ‘12, OOPSLA ‘13] Challenge Problems: Design & 12 Well-Studied Social Theories of Adoption [Onward! ‘12] optimize for longevity Technology Toys Economics rational Music Medicine quantitative Religion Policy Linguistics “different”, not … … “better” 13 [Mark, 1998] Ecological Theory Music is fun with friends! Can’t listen to all music… 14 [Mark, 1998] Social Network ~ Preferences Ecological Theory Music competes for social networks, not individuals 15 [PLATEAU 2013] 200K Projects (2000-2010) 16 Popularity Across Niches 60% Popularity 40% bloggin g Java searc h 20% 0% Project categories (223) 4% 3% 2% build tools Scheme 1% 0% Project categories (223) 17 Popularity vs. Niche: Dispersion Popularity 1 Java 0.1 C# PL/SQL 0.01 Assembly Fortran 0.001 VBScript Scheme Prolog 0.0001 0 1 2 3 4 Dispersion across niches (σ / µ) 5 18 Most Used Languages CDF (Ohloh) 100% 90% 80% 70% 60% DSLs dominate 50% Cumulativ 40% e css 30% Popularityhtml c shell java javascript c++ python make php bat sql rubyc# winner takes all 20% 10% 0% xml Language 19 Odds for Unpopular Languages? 100.0000% 10.0000% 1.0000% Proportion 0.1000% of Projects for 0.0100% Language SourceFor ge BUGGY DATA Sources only track certain languages Ohloh 0.0010% 0.0001% 1 10 Language Rank (Decreasing ) 100 20 Slash + + Wired Survey 1,600 responses (2 days) • InternaKonal audience • 83% have at least 1 degree • 73% are out of school 21 Use of Unpopular Languages 100.0000% 10.0000% 1.0000% Proportion 0.1000% of Projects for 0.0100% Language 0.0010% SourceFor ge Long Tail Design for niches and grow Slashdot Survey Ohloh 0.0001% 1 10 Language Rank (Decreasing ) 100 22 How Do People Pick Languages? hQp://bpodgursky.wordpress.com/2013/08/22/updates-‐to-‐language-‐vs-‐income-‐breakdown-‐post/ 23 P(L’ | L) p(popular) 75% p(prev) 30% L L’ 24 Poll to Dig Deeper Typically, what factor most influences language selection? 25 Polling Perceived Reality In your last project, what factor most influenced language selection? 26 Survey of 1,679 Developers Extrinsic factors dominate! (on last project) 27 Demographics Matter Probability of Using a Language on Last Project 28 Surveys of Biased Samples < 20yr olds: correctness less important The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. massive open online course survey (MOOC) Avg. Age 37 30 Degree 53% 55% Employed Dev 92% 62% Female 3% 16% Hobbyists learn quickly More latent biases? 29 Sample Bias in Repo Software Early SourceForge Adopters Late GitHub etc. Adopters 2000 2002 2008 2010 30 Cross-Validating Adoption of Java Generics People Define Top 20 Projects [Parnin 2011] (online course) People Invoke Class List<T>{…} ! n = new List<Int>()! 14% 28% 44% 60% vague self-‐reported jargon How often do you create …: never, sometimes, … How often do you invoke …: never, sometimes, … Also: very different values for C++ templates 31 [Rogers 1963, Ryan & Gross 1943] Detailed Model: Diffusion of Innovation Ado pYo n: 12 Y ears 500+ tech adoption studies later… 32 [Rogers 1963] Diffusion of Innovation: Process 1. Knowledge 2. Persuasion not so bad talk + read 4. Trial The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. 3. Decision 5. Confirmation 33 [Kelly 1991, Limanonda 1994 ] Actionable Example: Safe Sex Process knowledge and persuasion x decision/trial/confirmation Catalysts rel. advantage and simple x observability x trialability and compatibility Sounds like static typing... 34 Two Weekends to Spread Safe Sex [Kelly 1991, Limanonda 1994 ] hang out at gay bars, identify opinion leaders 1 2 teach to promote, give visible badge 35 [Kelly 1991, Limanonda 1994 ] Safe Sex IntervenKon: Success! Reported acKvity 80% safe sex 60% 40% unsafe sex 20% 0% 3 months 3 years 36 Noteworthy Diffusion of Innovations Trialability Coverity post-mortem result Relative advantage Hadoop, EC2 niche, quant. benefit Compatibility Observability jQuery > Scala > … jsFiddle / JS libraries, E-DSLs, shareable URLs JVMLs Simplicity Scheme, Ruby, Scala pay-as-you-go abstractions & language-as-a-library 37 38 ? Adoption Language 39 Goal? Adoption Language 40 Process? Adoption Language 41 Fuel Adoption Language 42 1000000 JavaScript Posts over Time on StackOverflow 1 0.1 10000 0.01 CDF 100000 new answers 1000 2009 2010 2011 11/1/08 11/1/09 11/1/10 2012 11/1/11 2013 0.001 11/1/12 43 1 100000 0.1 CDF 1000000 JavaScript Posts over Time on StackOverflow cumulaKve answers 10000 0.01 new answers 1000 2009 2010 2011 11/1/08 11/1/09 11/1/10 2012 11/1/11 2013 0.001 11/1/12 44 1 0.9 0.8 0.7 100000 0.6 cumulaKve answers 0.5 0.4 10000 0.3 new answers 0.2 0.1 1000 2009 2010 2011 2012 2013 0 11/1/08 11/1/09 11/1/10 11/1/11 11/1/12 cumulaKve distrib CDF 1000000 JavaScript Posts over Time on StackOverflow 45 (Network Effect, Commons, …) Metcalfe’s Law Developers: 1,000 – 1,000,000 Users: 100 – 1,000,000,000+ Artifacts: Network’s Value: O(N2) CPUs, libraries, program traces, REPL sessions, … 46 Data 1/2: Guided REPLs/APIs Useful? Others? Preprocessing? install(“fit”); import(“fit”)! a = fitdistr(data, distr=“exp”)! plot(a); summary(a)! Postprocessing? Others? 47 Data 2/2: Traces for Whitebox Testing input Z3: synthesize input1 for null ! input2 for !null! null che ck [Sen, Engler, Godefroid..] Augment facts with “TraceDB” Synth can fail.. 48 Empirical Tools 1/2: Data Rich Packaging The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. REPL sessions? traces, Executable states, aliases, s MWEs + Tests 49 Empirical Tools 2/2: Analyzers • Survey design for language design & prospecting • Rapid prototyping for social learning • Repo mining is being tackled by many people 50 Recap 1. BIG GAP: “social” language principles & designs 2. Adoption: social literature & empirical analysis Onwards 2012, PLATEAU 2012, OOPSLA 2013 3. Empirical tools: needs instrument design research Surveys (MOOCs!) >> repository mining 4. Big opportunities for social languages 51