VP Technology Strategy, MapR
Crystal Valentine is Vice President of Technology Strategy at MapR, a Silicon Valley-based big data company. She has an extensive background in big data research and practice. Before joining MapR, she was a professor of computer science at Amherst College. She is the author of several academic publications in the areas of algorithms, high-performance computing, and computational biology and holds a patent for Extreme Virtual Memory.
As a former consultant at Ab Initio Software working with Fortune 500 companies to design and implement high-throughput, mission-critical applications and as a tech expert consulting for equity investors focused on technology, she has developed significant business experience in the enterprise computing industry.
Dr. Valentine received her doctorate in Computer Science from Brown University and was a Fulbright Scholar to Italy.
LinkedIn: https://www.linkedin.com/in/crystal-valentine-29003a53
The “Next-Gen” is Now: How Big Data Technologies are Revolutionizing the Application Landscape
From Silicon Valley to Moscow, in industry and academia the rise of big data platforms is shifting the value of data. By lowering the cost of collecting and analyzing data at scale, big data platforms empower us to extract greater value from data. The result is a fundamental shift in computational architectures. Enterprise data warehouses and high-performance clusters are being replaced by massively distributed systems built on commodity hardware in order to support petabyte-size and heterogeneous data sets in a cost-effective way.
As architectures change, a concomitant shift has occurred at the software and application levels. A new vanguard of big data technologies–including real-time data streams, cloud computing, micro-services, artificial intelligence algorithms, containers and virtualization, and converged data platforms—are changing programming paradigms, enabling new data-intensive applications while increasing computational efficiency and developer productivity. In this talk, Crystal will highlight real-world examples of how these emerging big data technologies are forming the foundation of next-gen enterprise applications.
Master-class: Build a Time Series Application with Spark Streaming, MapR Streams and HBase
October 30, 2016
In English
Requires separate registration
More and more applications have to provide scalable, high-throughput, fault-tolerant stream processing of live data streams. A very good example of this are all the Internet of Things – IoT – applications. Through this in-depth workshop you will get a jump-start on scaling distributed computing by taking an example time series application and coding through different aspects of working with such a dataset.
What you will learn
Entire workshop will take 3 hours. You’ll get up to speed on emerging techniques and technologies by analyzing case studies, develop new technical skills , share emerging best practices in big data (Hadoop, Spark, IoT, Kafka API and Apache HBase).
We will cover building an end to end distributed processing pipeline using various distributed stream input sources, to rapidly ingest, process and store large volumes of high speed data.
The depth and breadth of what’s covered at workshop requires basic knowledge of Scala and Java to work on exercises intended to teach you the features of Spark Streaming for processing live data streams ingested from sources like Apache Kafka, sockets or files, and storing the processed data in HBase.
Additional information
https://www.mapr.com/services/mapr-academy/big-data-hadoop-online-training
https://www.mapr.com/blog/getting-started-sample-programs-apache-kafka-09
https://www.mapr.com/blog/getting-started-sample-programs-mapr-streams
https://www.mapr.com/blog/high-speed-kafka-api-publish-subscribe-streaming-architecture-how-works-message-level
https://www.mapr.com/blog/spark-streaming-hbase
https://www.mapr.com/blog/guidelines-hbase-schema-design
NB!
Participants are requested to have laptops with internet access (wifi connection will be available) and the following software:
- JDK 8
- Git
- Maven 3.x or later
- Virtual Box