A Tale of Two Services – Links

Here are reference links from my presentation, “A Tale of Two Services”

NoSQL-related:

Windows Azure Table Service
Amazon Web Services SimpleDB
Oracle Berkely DB
Redis
MemcacheDB
Hibari
Project Voldemort
Riak
GT.M
Apache CouchDB
MongoDb
Raven DB
Google Bigtable
Apache Cassandra
Hypertable
Microsoft Dryad
Neo4j
“NoSQL Distilled” by Sadalage & Fowler

mongoDB – related:

mongodb.org
10gen.com
mongovue.com
MSDN March 2010
Hanselman’s Blog Post
Nielsen’s Blog Post
Mahugh’s Blog Post

Neo4j – related:

Neo4j.org
Using A Graph Database to Power the “Web of Things”
Cypher “Cheat Sheet”
Neo4j in a .NET World – Tatham Oddie

ASP.Net Web API – related:

For Visual Studio 2010: http://www.asp.net/web-api
Getting-started lessons: http://www.asp.net/web-api/videos/getting-started

Advertisements
Posted in Uncategorized

The Most Common Problem(s) – Mark’s Rant

I can’t print!

I wish I had a dollar for every time I had to help a person with a printing problem.

I can’t find my file.

“I know I saved it somewhere, but I don’t know where and I can’t remember what the file was named. I think I was using Word.”

How do I do this?

“I don’t have time to read the instructions, read a book, or take a course. Just tell me everything you know that you spent years learning about this subject – in 5 minutes.”

Here’s the solution (instead of here’s the problem).

“We need a whole new system to…”

I need a new computer (aka – my computer is too slow).

“I need to have 15 applications running at the same time to listen to music, watch YouTube, Facebook, e-mail, chat, … Oh, plus I might need to run this application that is my actual job.”

The system is down.

The person next to me is not having any problems, but for some reason my machine doesn’t work, so the entire system is down.

This is not “User Friendly”

“I would like the application to do my job for me so that I can spend more time getting paid to talk with my friends.”

Posted in Uncategorized

A Contact is a Contact (is a Contact…)

Contacts are at the heart of every business application that I have worked with. Whenever I start work on a new application for a client, the first thing I have to plan for is the migration of contact data from some legacy system into the new application. Wouldn’t it be nice if I could just use a service that I could use as a master contact management system? That way I could use the service instead of constantly having to re-invent a custom system or integrate with other systems that have their own contact management system built-in. When I use the word “contact”, I mean a person, company, or organization. When I work with contacts, I want to be able to see all of their basic information, such as names, phone numbers, email addresses, postal and street addresses, etc. I also want to see all the other information in my system associated with a contact, which may include documents, invoices, orders, agreements, appointments, and any correspondences. The correspondences may include email, letters, messages, and recordings. I think this is what anyone would want from a contact management system, but I am finding it hard to find a system like that as a desktop or web-based application.

What I envision is a list of contacts that can be organized into groups, much like the way you can organize them with Gmail Contacts, or Live Mail Contacts. However, these are email clients, so they are primarily designed to work with email (of course). It is interesting that you can’t display a contact and show all of the emails that were sent to and received from that contact. You can sort a list of emails by sent-to or received-from and even filter the results, but you can’t just show the contact with the related emails.

I can find document management systems that come close, but they have to be integrated with email and messaging systems. They also require more integration with customer relationship management systems, human resource management systems, and other enterprise systems. As it turns out, for every major business entity that I work with, contacts are the one entity type that ends up being related to every other major entity. For example, emails have email-to and email-from, orders have orders-to and orders-from, projects have people responsible for each task, etc.

Let’s see what can happen when we start with contact management.

Posted in InfoTrail, Uncategorized

Mark’s Manifesto

I have collected some baseline principles that guide my software development process.

The Requirements Principle:
Requirements are never  _____ (fill in the blank) enough.
Most clients are of the “I’ll know what I want when I see it.” variety. By the way, I am that way too. I don’t think it’s possible to fully specify any software project that is worth doing.
“Can you make it do this?” is another common question. My answer: “It’s just a matter time & money and the mythical man-month” (see the Pi Principle below).
“I want it to be user friendly.” – Of course, I always try to make it as un-user-friendly as possible.

Pi Principle:
Everything takes 3.14 … times as long to complete as what you originally estimate.
Corollary: Everything costs 3.14 … times as much as management wants to spend.
Pi has such a mystical quality about it, the way it goes on forever without repeating. It’s not unlike some projects that seem to never end.

The Golden Rule:
He (or She) who has the gold makes the rules.
But often times the rules are unclear and subject to change. The one who has the gold may not be right, but that person has the power to pucker up and stop a project as well as keep progress flowing. (You don’t have to have brains or a heart to be a boss, you just have to be an ….)

Commitment Principle:
It’s easy to say anything, but hard to do everything.
Corollary: Talk is cheap. Do what you say you will do (DWYSYWD).

Planning Principle:
A bad plan isn’t better than no plan at all, it’s just a bad plan.
Corollary: Success Principle: “If at first you don’t succeed, try, try again. Then quit. There’s no point is being a damn fool about it.” – W.C. Fields

Completion Principle:
The end is not in sight, except when death is looming.
Continuous improvement is good. Deadlines are good too. (Git ‘er done!)

Knowledge Principle:
You only know what you know.
But you can learn anything (and forget a lot of what you’ve learned).
Corollary: Nothing is easy, unless you know it.

The Code of Excellence:
Quality (not necessarily cleanliness) is next to Godliness.
Quality, the level of excellence, is often something that you can’t always define, but you know it when it moves you spiritually.

Posted in Uncategorized

Adventures in NoSQL Land

I first got interested in NoSQL databases from a May, 2010 article in MSDN magazine. This article started me on a quest to learn more about the potential of using different databases other than relational databases like MS SQL Server, Oracle, or MySQL. Although I have worked mostly with MongoDB, I have been amazed at the growth that has occurred in the NoSQL community. Here is just some of the stuff I have discovered:

Wikipedia has an overview of NoSQL. It says that NoSQL could be called “Not only SQL” and that it differs from relational database management systems in that these data storage approaches may not require fixed table schemas, usually avoid join operations, and typically scale horizontally (by adding multiple servers rather than using more powerful servers). I was particularly drawn to the idea of no fixed table schemas because I have found that the total cost of making changes to database structures to be high, particularly for the type of business applications that I develop.

Roots of NoSQL

I think that NoSQL attempts to address (or accept and deal with) some of the data storage needs for large-scale distributed computing:

  • Distributed Computing – arose from the evolution of the Internet and the ability to create systems “in the cloud”. (I like “The Eight Fallacies of Distributed Computing” attributed to Peter Deutsch, 1994, Sun Microsystems:
    1. The network is reliable
    2. Latency is zero
    3. Bandwidth is infinite
    4. The network is secure
    5. Topology doesn’t change
    6. There is one administrator
    7. Transport cost is zero
    8. The network is homogeneous)
  • CAP theorem for distributed systems – Eric Brewer 2000  (Yahoo)
    – Consistency
    –Availability
    –Partition tolerance
    –(Pick any two)
Challenges
Some major challenges for data storage systems are:
  • Scalability
    • Decentralization – distributed systems
    • Flexibility – elasticity
    • Fault tolerance – how defective machines effect the system
    • Consistency – relative levels and how they affect the system
  • Caching
    • Frequency of reads and writes
    • Eventual consistency
  • Speed
    • Joins and relationships
    • De-normalization
  • Schemas
    • keys
    • indexes
    • views
    • stored procedures
    • views
    • etc.
  • Transactions (ACID)
    • Atomicity
    • Consistency
    • Isolation
    • Durability

NoSQL and relational databases address these challenges in different ways, so they have strengths and weaknesses associated with the design decisions.

NoSQL Databases

There are many database products that are called NoSQL. (I was surprised at how many there are and the number is increasing.) Here are some of them, by category:

I wish I could say that I have worked with all of these products, but I can’t. It has been fun fiddling with many of them, though.

NoSQL Means?

With all of these categories and products, what does “NoSQL” really mean? I have a few ideas:

  • No tables – objects, collections, nodes
  • No (or fewer) foreign keys and constraints
  • No ACID – can’t have it all
  • No sophisticated query planners: mostly REST
  • No declarative query language (more procedural)
  • More flexible, fluid designs (dynamic schemas)
  • More natural (and richer?) data representations
  • Highly scalable (horizontal scaling e.g., more machines, not bigger machines)
  • Sparse data – optional/multi-value fields
  • Large datasets (but small datasets too)
  • Meaningful identifiers
  • Access patterns (such as map-reduce)

Why Use NoSQL?

NoSQL has made inroads into applications when:

  • The scale-up of relational database cost is too high (when compared with NoSQL).
  • There are lots of temporary data that don’t need to be stored in a relational database.
  • There are complex queries with large datasets that need to be optimized.
  • Transactions don’t need to be very durable.
  • Object models considered to involve too many joins or have to be greatly de-normalized.
  • Large quantities of Large Objects (CLOB or BLOB) are stored in a database.
  • There is a need for fast data reads (but maybe not writes).

Considerations for Using NoSQL

Here are some things to consider, particularly when evaluating using a NoSQL database:

  • What is the problem that needs to be solved? (I know this one seems obvious. 🙂 )
  • Data storage growth requirements – scalability & Big Data
  • Data structure changes – potential shoehorned tables and queries
  • Object inter-dependencies and/or coupling
  • Cardinalities of relationships
  • Data access patterns
  • Application structure
  • Transactions
  • Single collection opportunities
  • Operating system(s)
  • Drivers – availability, support
  • File storage
  • Indexes
  • Map reduce/path transversal
  • Hybrid solution potential

Impact on InfoTrail

I have been experimenting with using MongoDB as the main data storage engine for InfoTrail modules. So far it has shown some significant benefits:

  • A collection-per-entity has reduced the number of tables to deal with. I am looking into potential to use a single collection for all entities that would benefit from caching and another single collection for all entities that are not cached. This could greatly reduce development time and cost as well as fit well into the software factory approach. As a relational database, basic InfoTrail has over 500 tables, and more tables are added for individual customizations.
  • Dynamic schema saves time in modification/enhancement of the modules. I would like to have the ability to have the user/admin add, update, or delete keys, values, and sub-collections from a system admin screen. This also would fit well with the need for change/version control.
  • Data retrieval rates appear to be faster (by at least an order of magnitude), but this will have to be benchmarked.
I haven’t found any reason not to use MongoDB yet. I have tried CouchDB, but so far I have preferred MongoDB. I also am working with Neo4j because it really works well with linking entities together.

Links

Here are some links that I found helpful so far:

NoSQLDatabase.org – a great overview site for all things NoSQL
NoSQLTapes – a collection video discussions and interviews by Tim Anglade

Posted in InfoTrail, NoSQL

InfoTrail – Backgrounder Part2

(continued from InfoTrail -Backgrounder Part1)

Foundations of InfoTrail

InfoTrail modules are built on the shoulders of the works of many others, so let’s review some of the seminal works that are being used and evaluated:

Where to Start: Top Down, Bottom Up, Middle Out

Start with requirements? These should sound familiar:

  • “I’m a visual person. I’ll know what I want when I see it.”
  • “Give me some alternatives and I’ll pick the one I like best.”
  • “Give me an estimate for the price and delivery for the project I just described to you.”
  • “Above all, we need the application to be user friendly…”

The U2 Syndrome

Business software development is still too much of an iterative process. We talk about reusable components, patterns, and levels of abstraction, but I still haven’t found what I’m looking for. When I want to build an application that has scheduling and calendars, I want to be able to go online and find an open-source (free or very low cost) bundle of code and documentation in the language(s) of my choice, much like I can find a 1/4-20 screw, an 8 foot 2×4 piece of wood, or a lawnmower at Home Depot. So since I can’t find what I’m looking for, I’m going to try to build it.

Let’s Get Started

Consider the common business entities that are listed in InfoTrail -Backgrounder Part1. Couldn’t there be some kind of standardization that would allow the “manufacture” of software that handles those fundamental areas?

After enough (whatever that means) of the specifications for a project were agreed upon, I used to think that it was best to start a project by developing the database first. This was because I wanted to base the application on the universal data models in the “The Data Model Resource Book, Volumes 1, 2, and 3”, by Len Silverston. I wanted to build-in the standardized, but flexible data structures from the beginning. I also found it very helpful to review the current data structures (legacy data) that were being used to assure that all the entities, elements, and data requirements were being met. Also, the data structure, to a great degree, dictated what could be done with the user interfaces. Getting the database structure correct at the outset saved a lot of time because changes to the data structure have ripple effects through the whole project.  This can translate to considerable time and money.

However, application stakeholders (my customers) found this approach very difficult to accept for a variety of reasons. Probably the primary reason is that the output, e.g., reports and user interfaces, is what most people deal with when using a system. So it makes sense to start with the user interface and report designs. Stakeholders like to see an application’s output from the outset of a project (if not before).

We want the system to be service-oriented, so we need to start with our services first, right? The middle tiers of the application will have a huge impact on the scalability, reliability, security, etc. of the application.

So what’s the best approach to start with?

Let’s look at where the various elements of the application can be connected into a unified whole (framework) so we can start development at all levels and make iterations continuously as the application is developed. These iterations are necessary until we can establish conventions and standards to follow. We are trying to use a software factory as a business model, so let’s make the design of the factory be the starting point. Also let’s decide to use metadata combined with for the various elements of the designs that our software machines use to create the application. The metadata can serve as a good starting place for the project.

Before we really get started, let’s do some research into NoSQL, RESTful services, jQuery, single page applications, and federated authentication.

Posted in InfoTrail

InfoTrail – Backgrounder Part1

What is InfoTrail?

Trails End Systems’ product line, InfoTrail, is based on the concept of a software factory that creates software products and services from application modules that can be assembled, configured, and used for a variety of information technology solutions.

Decisions, decisions

Let’s look at some major elements of a modern application development project (in no particular order):

  • User Interfaces
    • Reports
    • Dashboards
    • Operations
    • Menus & navigation
    • Analysis (Business Intelligence)
    • User experience
    • Multimedia
    • Multiple targets (desktop, browser, desktop, phone, devices)
    • Multiple screen sizes
    • Graphic design
  • Data
    • Operational
    • Logs
    • Warehouse
    • Metadata
    • Indexes
    • Views
    • Stored procedures
    • Operational procedures
    • Legacy data
    • External data sources
    • Maintenance
  • Extract, transform, load (legacy data and data warehousing)
    • Tools
    • Validation
    • Logging
  • Services (Middle Tier)
    • Data access
    • Authentication
    • Navigation
    • Business rules
    • Communications
    • Integration
    • Service bus
  • Make vs. buy decisions
    • Product evaluations
    • Cost analysis
  • Development tools
    • Deployment
    • Version control
    • Testing
    • Debugging
  • Development methodologies
    • Waterfall
    • Prototyping
    • Incremental
    • Spiral
    • RAD
    • Extreme
  • Programming languages
    • Assembler
    • C
    • C++
    • C#
    • F#
    • Ruby
    • PHP
    • Java
    • Javascript
    • XSL
    • HTML
  • Project management
    • Tools
    • Measurements
  • Documentation
    • Help files
    • Code
    • Procedures
    • Promotional materials
  • Users
    • Requirements
    • Training
  • Legal and regulatory requirements
  • Target hardware infrastructure

 … and these aren’t all of the things that require major decisions that impact the success or failure of a software project. Probably the most important is staffing, which isn’t on the list above. Finding the “best” approach is a daunting process. There are many products, services, and techniques available and they are continuing to emerge and evolve with time. Keeping up with all that is readily available is a daunting process too.

(I want to blog about this stuff while my company works through the process of developing InfoTrail modules.)

The software factory approach reduces the complexity and cycle time of software development by assembling applications by using standardization (patterns, models, templates, and frameworks) , modularization, and code generation. This is just using automation techniques applied to software development. It is similar to what has happened to the production of injection molds. The old process involved hand-making drawings of the part to be molded, then hand-making drawings of the mold, giving the drawings to a mold maker who made the parts of the mold with a milling machine. The more modern approach is to design the part with a computer-aided design program. Prototypes are then made with a 3-D printer. When the design is finalized,  the part design file is then imported into a mold designing program. The mold design file is then loaded into a computer-controlled electrical discharge machining tool that makes the mold.

Software factories, like manufacturing facilities can be very specialized, or they can be general-purpose like a custom manufacturer to simply assemble pre-made components. 

So what’s a module?

At the lowest level, a software module could be the software equivalent of a nut or bolt in manufacturing. This could be a single file (dll, text, .exe). But at the highest level it could be the software equivalent of an entire manufacturing facility with all the tools, people, machines. This could be a group of web sites, web services, applications, and systems. The key is that the modules must be designed to connect (communicate) with other modules so that they can become components of even bigger systems.

The purpose of the InfoTrail product line is to allow the assembly of personal or business information systems from modules that manage data associated with common entities. InfoTrail modules encapsulate a basic business or personal entity including user interfaces, middle tier services and components, and data storage. They are designed to be discoverable, configurable, extensible, and scalable. They also are intended to be self- documenting in a way that is discoverable, configurable, extensible, and scalable because of the software factory methods and components that are used to manufacture the code.

Here is a list of major entities that make up the foundation of InfoTrail (in alphabetical order):

  • Account (Transactions)
  • Agreement
  • Budget
  • Calendar (Events)
  • Campaign
  • Claim
  • Contact
  • Document
  • Facility
  • Location
  • Navigation (Menus)
  • Order
  • Product
  • Quote (Requests for Quote)
  • Requirement
  • Rule
  • Shipment
  • Task
  • Report

User interface shells provide a view into which the various user interface components can be placed. They are designed to be specific to the target device.

Posted in InfoTrail