Having worked on data ingestion part of IoT for 6 months with IoThub, what I have understood is that a few strategies should be applied while during data ingress to ensure smooth and meaningful data ingress.
- Understanding the frequency of data ingress
If you look at the tiers given by Azure itself for data ingress Total Messages per day and each message packet size determine the one you choose among the 4 available. The more the messages, the more difficult it is to stuff everything into a database and then parse to make it meaningful. The ideal solution is to remove certain data which are off the chart or send it specific repositories. For example, if you want to remove data from event 1 processed and moved to datastore 1 and data from event 2 moved into datastore 2 at the outset itself.
- Annotation metadata to data
Now that your IoT Device has collected billions of data and that you have stored it in some format, to build a relationship and history, would you search through the entire tables and build that hierarchy. I guess, you are better off adding metadata on the run
- Stream Analytics is a must
If you want to raise an alarm when there is a temperature spike or fall, or if you want to start another process when the process 1 is continuously failing / not sending data, we would want a Stream analytics solution whomsoever provides in your ecosystem, be it Azure, Amazon, Google, IBM and the like. No one wants to build analytics and wait 10 mins for the instant alerts.
(Article in KDNuggets helped me verify that my understanding is in tune with what the industry and academia in general thinks about data ingress in general.)