Speaker: Prof. Ion Stoica, University of California, Berkeley
Date: Tue, 6th September, 2016
Time: 9.15 a.m. - 10.30 a.m.
Session Chair: Surajit Chaudhuri, Microsoft Research
Almost six years ago we started the Spark project at UC Berkeley. Spark is a cluster computing engine that is optimized for in-memory processing, and unifies support for a variety of workloads, including batch, interactive querying, streaming, and iterative computations. Spark is now the most active big data project in the open source community, and is already being used by over one thousand organizations.
One of the reasons behind Spark's success has been our early bet on the continuous increase in the memory capacity and the feasibility to fit many realistic workloads in the aggregate memory of typical production clusters. Today, we are witnessing new trends, such as Moore's law slowing down, and the emergence of a variety of computation and storage technologies, such as GPUs, FPGAs, and 3D Xpoint. In this talk, I'll discuss some of the lessons we learned in developing Spark as a unified computation platform, and the implications of today's hardware and software trends on the development of the next generation of big data processing systems.
Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems.
He is an ACM Fellow and has received numerous awards, including the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution, and in 2013, he co-founded Databricks a startup to commercialize Apache Spark.
Speaker: Anand Rajaraman, Founding Partner, Rocketship.vc
Date: Wed, 7th, September 2016
Time: 9.15 a.m. - 10.30 a.m.
Session Chair: Jayant Haritsa, IISc Bangalore
We live in an era where software is transforming industries, the sciences, and society as a whole. This exciting phenomenon has been described by the phrase "software is eating the world." It is becoming increasingly apparent that data is the fuel powering software's conquests. Data is the new disruptor.
It's hard to believe that the first decade of the Big Data era is already behind us. Silicon Valley has been at the forefront of developing and applying data-driven approaches to create disruption at many levels: infrastructure (e.g., Hadoop and Spark), capabilities (e.g., image recognition and machine translation), and killer apps (e.g., self-driving cars and messaging bots).
In this talk, we first look back on the past decade and share learnings from the frontlines of data-driven disruption. Looking ahead, we then describe challenges and opportunities for the next decade. Since this has also been a personal journey, we will use examples drawn from personal experience to illustrate each point.
Anand Rajaraman is a Founding Partner of two Silicon Valley venture capital funds focused on early-stage technology companies: Milliways Ventures and RocketshipVC. He was the co-founder of two successful startups: Junglee (acquired by Amazon.com) and Kosmix (acquired by Walmart). At Walmart, he created and led WalMartLabs (as its Senior Vice President). As an academic, Anand's research has focused at the intersection of database systems, the World-Wide Web, and social media. His research publications have won several awards at prestigious academic conferences, including three retrospective 10-year Best Paper awards at ACM SIGMOD and VLDB, and ICDT. His textbook "Mining of Massive Datasets", co-authored with Jeff Ullman and Jure Leskovec, has been published.