Big data

abstract

Pin
  1. Replication, indexes, caching..Kafka
  2. WebRTC

    capture &| stream data between browsers w/o intermediary & plugins|3rd party

  3. Column-oriented DBMS
  4. Shared nothing architecture
  5. CAP theorem
  6. About CAP

MOM

Pin

consequence of a messaging based approach: no longer a need for a single conceptual model to underpin the integration effort

  1. nanomsg
  2. Asynchronous Communication in SOA
  3. Redis based message queue
  4. Disque – a distributed message broker | Hacker News

Kafka

Pin

Distributed commit log. good for high volume data processing pipelines and realtime/batch consumers.

  1. Kafka rx slides
  2. cjdev/kafka-rx
  3. Apache Kafka for Beginners
  4. Clients - Apache Kafka - Apache Software Foundation
  5. MODELING SPECIFIC DATA TYPES IN KAFKA
  6. Documentation

Confluent

Pin
  1. Blog
  2. Intro
  3. Avro

    backward compatibility, forward compatibility, and full compatibility

  4. Zookeeper
  5. Camus

    a MapReduce implementation that dramatically eases continuous upload of data into Hadoop clusters

  6. Load data into Hadoop with Camus

IMDB

Pin

High throughput;Low Latency No buffer management If serialized: no locking/batching

  1. In-memory database
  2. tarantool.org

    2 storage engines lockless in-memory & disk w/ 2level Btree transactions+replication w/ WAL for CD

  3. MemSQL, VoltDB
  4. Comparison

Datascript

Pin
  1. tonsky/datascript

VoltDB

Pin

ultra-fast transaction throughput at low latencies commonly used with a DW (or Hadoop) to optimize OLTP throughput & analytic queries/repor

  1. Blog
  2. Comparing VoltDB to Postgres

MemSQL

Pin
  1. MemSQL: The Fastest In-Memory Database

RocksDB

Pin

Embedded persistent KVS Not distributed No failover Not highly-available, if machine dies you lose your data

  1. Blog
  2. MongoDB + RocksDB
  3. Parse announcement
  4. with osquery

deepstream

Pin

messaging server persisting to RethinkDB

  1. A Scalable Server for Realtime Web Apps
  2. configure deepstream to use the RethinkDB storage connector
  3. Offline support

Magnet

Pin
  1. Magnet Message

Hive

Pin

facilitates querying and managing large datasets residing in distributed storage

  1. Apache Hive

SmartStack

Pin
  1. airbnb/smartstack-cookbook
  2. nerds.airbnb.com
  3. SmartStack vs. Consul

Consul

Pin
  1. consul.io

stream processing systems

Pin

compute streams off other data streams in real-time as events occurred. useful in areas w/ lots of complex transformations

  1. Storm
  2. Spark
  3. Samza
  4. Samza vs Storm
  5. Samza vs Storm
  6. Samza vs Spark

Hadoop

Pin

computational engine

  1. Six Super-Scale Hadoop Deployments
  2. Hadoop pioneered this approach.
0 Comment
Comments or thoughts?
Submit
Cancel
or
Email a link to this board
Share this board on Facebook
Share this board on Twitter
Notice label will go here