Posted In:Big Data Archives - AppFerret

standard

The Difference Between Artificial Intelligence, Machine Learning, and Deep Learning

2018-04-08 - By 

Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different. Plus, how AI and IoT are inextricably connected.

We’re all familiar with the term “Artificial Intelligence.” After all, it’s been a popular focus in movies such as The Terminator, The Matrix, and Ex Machina (a personal favorite of mine). But you may have recently been hearing about other terms like “Machine Learning” and “Deep Learning,” sometimes used interchangeably with artificial intelligence. As a result, the difference between artificial intelligence, machine learning, and deep learning can be very unclear.

I’ll begin by giving a quick explanation of what Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) actually mean and how they’re different. Then, I’ll share how AI and the Internet of Things are inextricably intertwined, with several technological advances all converging at once to set the foundation for an AI and IoT explosion.

So what’s the difference between AI, ML, and DL?

First coined in 1956 by John McCarthy, AI involves machines that can perform tasks that are characteristic of human intelligence. While this is rather general, it includes things like planning, understanding language, recognizing objects and sounds, learning, and problem solving.

We can put AI in two categories, general and narrow. General AI would have all of the characteristics of human intelligence, including the capacities mentioned above. Narrow AI exhibits some facet(s) of human intelligence, and can do that facet extremely well, but is lacking in other areas. A machine that’s great at recognizing images, but nothing else, would be an example of narrow AI.

At its core, machine learning is simply a way of achieving AI.

Arthur Samuel coined the phrase not too long after AI, in 1959, defining it as, “the ability to learn without being explicitly programmed.” You see, you can get AI without using machine learning, but this would require building millions of lines of codes with complex rules and decision-trees.

So instead of hard coding software routines with specific instructions to accomplish a particular task, machine learning is a way of “training” an algorithm so that it can learnhow. “Training” involves feeding huge amounts of data to the algorithm and allowing the algorithm to adjust itself and improve.

To give an example, machine learning has been used to make drastic improvements to computer vision (the ability of a machine to recognize an object in an image or video). You gather hundreds of thousands or even millions of pictures and then have humans tag them. For example, the humans might tag pictures that have a cat in them versus those that do not. Then, the algorithm tries to build a model that can accurately tag a picture as containing a cat or not as well as a human. Once the accuracy level is high enough, the machine has now “learned” what a cat looks like.

Deep learning is one of many approaches to machine learning. Other approaches include decision tree learning, inductive logic programming, clustering, reinforcement learning, and Bayesian networks, among others.

Deep learning was inspired by the structure and function of the brain, namely the interconnecting of many neurons. Artificial Neural Networks (ANNs) are algorithms that mimic the biological structure of the brain.

In ANNs, there are “neurons” which have discrete layers and connections to other “neurons”. Each layer picks out a specific feature to learn, such as curves/edges in image recognition. It’s this layering that gives deep learning its name, depth is created by using multiple layers as opposed to a single layer.

AI and IoT are Inextricably Intertwined

I think of the relationship between AI and IoT much like the relationship between the human brain and body.

Our bodies collect sensory input such as sight, sound, and touch. Our brains take that data and makes sense of it, turning light into recognizable objects and turning sounds into understandable speech. Our brains then make decisions, sending signals back out to the body to command movements like picking up an object or speaking.

All of the connected sensors that make up the Internet of Things are like our bodies, they provide the raw data of what’s going on in the world. Artificial intelligence is like our brain, making sense of that data and deciding what actions to perform. And the connected devices of IoT are again like our bodies, carrying out physical actions or communicating to others.

Unleashing Each Other’s Potential

The value and the promises of both AI and IoT are being realized because of the other.

Machine learning and deep learning have led to huge leaps for AI in recent years. As mentioned above, machine learning and deep learning require massive amounts of data to work, and this data is being collected by the billions of sensors that are continuing to come online in the Internet of Things. IoT makes better AI.

Improving AI will also drive adoption of the Internet of Things, creating a virtuous cycle in which both areas will accelerate drastically. That’s because AI makes IoT useful.

On the industrial side, AI can be applied to predict when machines will need maintenance or analyze manufacturing processes to make big efficiency gains, saving millions of dollars.

On the consumer side, rather than having to adapt to technology, technology can adapt to us. Instead of clicking, typing, and searching, we can simply ask a machine for what we need. We might ask for information like the weather or for an action like preparing the house for bedtime (turning down the thermostat, locking the doors, turning off the lights, etc.).

Converging Technological Advancements Have Made this Possible

Shrinking computer chips and improved manufacturing techniques means cheaper, more powerful sensors.

Quickly improving battery technology means those sensors can last for years without needing to be connected to a power source.

Wireless connectivity, driven by the advent of smartphones, means that data can be sent in high volume at cheap rates, allowing all those sensors to send data to the cloud.

And the birth of the cloud has allowed for virtually unlimited storage of that data and virtually infinite computational ability to process it.

Of course, there are one or two concerns about the impact of AI on our society and our future. But as advancements and adoption of both AI and IoT continue to accelerate, one thing is certain; the impact is going to be profound.

 

Original article here.


standard

Why SQL is beating NoSQL, and what this means for the future of data

2018-03-29 - By 

After years of being left for dead, SQL today is making a comeback. How come? And what effect will this have on the data community?

Since the dawn of computing, we have been collecting exponentially growing amounts of data, constantly asking more from our data storage, processing, and analysis technology. In the past decade, this caused software developers to cast aside SQL as a relic that couldn’t scale with these growing data volumes, leading to the rise of NoSQL: MapReduce and Bigtable, Cassandra, MongoDB, and more.

Yet today SQL is resurging. All of the major cloud providers now offer popular managed relational database services: e.g., Amazon RDSGoogle Cloud SQLAzure Database for PostgreSQL (Azure launched just this year). In Amazon’s own words, its PostgreSQL- and MySQL-compatible database Aurora database product has been the “fastest growing service in the history of AWS”. SQL interfaces on top of Hadoop and Spark continue to thrive. And just last month, Kafka launched SQL support. Your humble authors themselves are developers of a new time-series database that fully embraces SQL.

In this post we examine why the pendulum today is swinging back to SQL, and what this means for the future of the data engineering and analysis community.


Part 1: A New Hope

To understand why SQL is making a comeback, let’s start with why it was designed in the first place.

Our story starts at IBM Research in the early 1970s, where the relational database was born. At that time, query languages relied on complex mathematical logic and notation. Two newly minted PhDs, Donald Chamberlin and Raymond Boyce, were impressed by the relational data model but saw that the query language would be a major bottleneck to adoption. They set out to design a new query language that would be (in their own words): “more accessible to users without formal training in mathematics or computer programming.”

Read the Full article here.

 


standard

70 Amazing Free Data Sources You Should Know

2017-12-20 - By 

70 free data sources for 2017 on government, crime, health, financial and economic data, marketing and social media, journalism and media, real estate, company directory and review, and more to start working on your data projects.

Every great data visualization starts with good and clean data. Most of people believe that collecting big data would be a rough thing, but it’s simply not true. There are thousands of free data sets available online, ready to be analyzed and visualized by anyone. Here we’ve rounded up 70 free data sources for 2017 on government, crime, health, financial and economic data,marketing and social media, journalism and media, real estate, company directory and review, and more.

We hope you could enjoy this and save a lot time and energy searching blindly online.

Free Data Source: Government

  1. Data.gov: It is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime freely by the US Government.
  2. Data.gov.uk: There are datasets from all UK central departments and a number of other public sector and local authorities. It acts as a portal to all sorts of information on everything, including business and economy, crime and justice, defence, education, environment, government, health, society and transportation.
  3. US. Census Bureau: The website is about the government-informed statistics on the lives of US citizens including population, economy, education, geography, and more.
  4. The CIA World Factbook: Facts on every country in the world; focuses on history, government, population, economy, energy, geography, communications, transportation, military, and transnational issues of 267 countries.
  5. Socrata: Socratais a mission-driven software company that is another interesting place to explore government-related data with some visualization tools built-in. Its data as a service has been adopted by more than 1200 government agencies for open data, performance management and data-driven government.
  6. European Union Open Data Portal: It is the single point of access to a growing range of data from the institutions and other bodies of the European Union. The data boosts includes economic development within the EU and transparency within the EU institutions, including geographic, geopolitical and financial data, statistics, election results, legal acts, and data on crime, health, the environment, transport and scientific research. They could be reused in different databases and reports. And more, a variety of digital formats are available from the EU institutions and other EU bodies. The portal provides a standardised catalogue, a list of apps and web tools reusing these data, a SPARQL endpoint query editor and rest API access, and tips on how to make best use of the site.
  7. Canada Open Datais a pilot project with many government and geospatial datasets. It could help you explore how the Government of Canada creates greater transparency, accountability, increases citizen engagement, and drives innovation and economic opportunities through open data, open information, and open dialogue.
  8. Datacatalogs.org: It offers open government data from US, EU, Canada, CKAN, and more.
  9. U.S. National Center for Education Statistics: The National Center for Education Statistics (NCES) is the primary federal entity for collecting and analyzing data related to education in the U.S. and other nations.
  10. UK Data Service: The UK Data Service collection includes major UK government-sponsored surveys, cross-national surveys, longitudinal studies, UK census data, international aggregate, business data, and qualitative data.

Free Data Source: Crime

  1. Uniform Crime Reporting: The UCR Program has been the starting place for law enforcement executives, students, researchers, members of the media, and the public seeking information on crime in the US.
  2. FBI Crime Statistics: Statistical crime reports and publications detailing specific offenses and outlining trends to understand crime threats at both local and national levels.
  3. Bureau of Justice Statistics: Information on anything related to U.S. justice system, including arrest-related deaths, census of jail inmates, national survey of DNA crime labs, surveys of law enforcement gang units, etc.
  4. National Sex Offender Search: It is an unprecedented public safety resource that provides the public with access to sex offender data nationwide. It presents the most up-to-date information as provided by each Jurisdiction.

Free Data Source: Health

  1. U.S. Food & Drug Administration: Here you will find a compressed data file of the Drugs@FDA database. Drugs@FDA, is updated daily, this data file is updated once per week, on Tuesday.
  2. UNICEF: UNICEF gathers evidence on the situation of children and women around the world. The data sets include accurate, nationally representative data from household surveys and other sources.
  3. World Health Organisation:  statistics concerning nutrition, disease and health in more than 150 countries.
  4. Healthdata.gov: 125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.
  5. NHS Health and Social Care Information Centre: Health data sets from the UK National Health Service. The organization produces more than 260 official and national statistical publications. This includes national comparative data for secondary uses, developed from the long-running Hospital Episode Statistics which can help local decision makers to improve the quality and efficiency of frontline care.

Free Data Source: Financial and Economic Data

  1. World Bank Open Data: Education statistics about everything from finances to service delivery indicators around the world.
  2. IMF Economic Data: An incredibly useful source of information that includes global financial stability reports, regional economic reports, international financial statistics, exchange rates, directions of trade, and more.
  3. UN Comtrade Database: Free access to detailed global trade data with visualizations. UN Comtrade is a repository of official international trade statistics and relevant analytical tables. All data is accessible through API.
  4. Global Financial Data: With data on over 60,000 companies covering 300 years, Global Financial Data offers a unique source to analyze the twists and turns of the global economy.
  5. Google Finance: Real-time stock quotes and charts, financial news, currency conversions, or tracked portfolios.
  6. Google Public Data Explorer: Google’s Public Data Explorer provides public data and forecasts from a range of international organizations and academic institutions including the World Bank, OECD, Eurostat and the University of Denver. These can be displayed as line graphs, bar graphs, cross sectional plots or on maps.
  7. U.S. Bureau of Economic Analysis: U.S. official macroeconomic and industry statistics, most notably reports about the gross domestic product (GDP) of the United States and its various units. They also provide information about personal income, corporate profits, and government spending in their National Income and Product Accounts (NIPAs).
  8. Financial Data Finder at OSU: Plentiful links to anything related to finance, no matter how obscure, including World Development Indicators Online, World Bank Open Data, Global Financial Data, International Monetary Fund Statistical Databases, and EMIS Intelligence.
  9. National Bureau of Economic Research: Macro data, industry data, productivity data, trade data, international finance, data, and more.
  10. U.S. Securities and Exchange Commission: Quarterly datasets of extracted information from exhibits to corporate financial reports filed with the Commission.
  11. Visualizing Economics: Data visualizations about the economy.
  12. Financial Times: The Financial Times provides a broad range of information, news and services for the global business community.

Free Data Source: Marketing and Social Media

  1. Amazon API: Browse Amazon Web Services’Public Data Sets by category for a huge wealth of information. Amazon API Gateway allows developers to securely connect mobile and web applications to APIs that run on Amazon Web(AWS) Lambda, Amazon EC2, or other publicly addressable web services that are hosted outside of AWS.
  2. American Society of Travel Agents: ASTA is the world’s largest association of travel professionals. It provides members information including travel agents and the companies whose products they sell such as tours, cruises, hotels, car rentals, etc.
  3. Social Mention: Social Mention is a social media search and analysis platform that aggregates user-generated content from across the universe into a single stream of information.
  4. Google Trends: Google Trends shows how often a particular search-term is entered relative to the total search-volume across various regions of the world in various languages.
  5. Facebook API: Learn how to publish to and retrieve data from Facebook using the Graph API.
  6. Twitter API: The Twitter Platform connects your website or application with the worldwide conversation happening on Twitter.
  7. Instagram API: The Instagram API Platform can be used to build non-automated, authentic, high-quality apps and services.
  8. Foursquare API: The Foursquare API gives you access to our world-class places database and the ability to interact with Foursquare users and merchants.
  9. HubSpot: A large repository of marketing data. You could find the latest marketing stats and trends here. It also provides tools for social media marketing, content management, web analytics, landing pages and search engine optimization.
  10. Moz: Insights on SEO that includes keyword research, link building, site audits, and page optimization insights in order to help companies to have a better view of the position they have on search engines and how to improve their ranking.
  11. Content Marketing Institute: The latest news, studies, and research on content marketing.

Free Data Source: Journalism and Media

  1. The New York Times Developer Network– Search Times articles from 1851 to today, retrieving headlines, abstracts and links to associated multimedia. You can also search book reviews, NYC event listings, movie reviews, top stories with images and more.
  2. Associated Press API: The AP Content API allows you to search and download content using your own editorial tools, without having to visit AP portals. It provides access to images from AP-owned, member-owned and third-party, and videos produced by AP and selected third-party.
  3. Google Books Ngram Viewer: It is an online search engine that charts frequencies of any set of comma-delimited search strings using a yearly count of n-grams found in sources printed between 1500 and 2008 in Google’s text corpora.
  4. Wikipedia Database: Wikipedia offers free copies of all available content to interested users.
  5. FiveThirtyEight: It is a website that focuses on opinion poll analysis, politics, economics, and sports blogging. The data and code on Github is behind the stories and interactives at FiveThirtyEight.
  6. Google Scholar: Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. It includes most peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents.

Free Data Source: Real Estate

  1. Castles: Castles are a successful, privately owned independent agency. Established in 1981, they offer a comprehensive service incorporating residential sales, letting and management, and surveys and valuations.
  2. Realestate.comRealEstate.com serves as the ultimate resource for first-time home buyers, offering easy-to-understand tools and expert advice at every stage in the process.
  3. Gumtree: Gumtree is the first site for free classifieds ads in the UK. Buy and sell items, cars, properties, and find or offer jobs in your area is all available on the website.
  4. James Hayward: It provides an innovative database approach to residential sales, lettings & management.
  5. Lifull Homes: Japan’s property website.
  6. Immobiliare.it: Italy’s property website.
  7. Subito: Italy’s property website.
  8. Immoweb: Belgium’s leading property website.

Free Data Source: Business Directory and Review

  1. LinkedIn: LinkedIn is a business- and employment-oriented social networking service that operates via websites and mobile apps. It has 500 million members in 200 countries and you could find the business directory here.
  2. OpenCorporates: OpenCorporates is the largest open database of companies and company data in the world, with in excess of 100 million companies in a similarly large number of jurisdictions. Our primary goal is to make information on companies more usable and more widely available for the public benefit, particularly to tackle the use of companies for criminal or anti-social purposes, for example corruption, money laundering and organised crime.
  3. Yellowpages: The original source to find and connect with local plumbers, handymen, mechanics, attorneys, dentists, and more.
  4. Craigslist: Craigslist is an American classified advertisements website with sections devoted to jobs, housing, personals, for sale, items wanted, services, community, gigs, résumés, and discussion forums.
  5. GAF Master Elite Contractor: Founded in 1886, GAF has become North America’s largest manufacturer of commercial and residential roofing (Source: Fredonia Group study). Our success in growing the company to nearly $3 billion in sales has been a result of our relentless pursuit of quality, combined with industry-leading expertise and comprehensive roofing solutions. Jim Schnepper is the President of GAF, an operating subsidiary of Standard Industries. When you are looking to protect the things you treasure most, here are just some of the reasons why we believe you should choose GAF.
  6. CertainTeed: You could find contractors, remodelers, installers or builders in the US or Canada on your residential or commercial project here.
  7. Companies in California: All information about companies in California.
  8. Manta: Manta is one of the largest online resources that deliver products, services and educational opportunities. The Manta directory boasts millions of unique visitors every month who search comprehensive database for individual businesses, industry segments and geographic-specific listings.
  9. EU-Startups: Directory about startups in EU.
  10. Kansas Bar Association: Directory for lawyers. The Kansas Bar Association (KBA) was founded in 1882 as a voluntary association for dedicated legal professionals and has more than 7,000 members, including lawyers, judges, law students, and paralegals.

Free Data Source: Other Portal Websites

  1. Capterra: Directory about business software and reviews.
  2. Monster: Data source for jobs and career opportunities.
  3. Glassdoor: Directory about jobs and information about inside scoop on companies with employee reviews, personalized salary tools, and more.
  4. The Good Garage Scheme: Directory about car service, MOT or car repair.
  5. OSMOZ: Information about fragrance.
  6. Octoparse: A free data extraction tool to collect all the web data mentioned above online.

Do you know some great data sources? Contact to let us know and help us share the data love.

More Related Sources:

Top 30 Big Data Tools for Data Analysis

Top 30 Free Web Scraping Software

 

Original article here.

 


standard

Google Launches Public Beta of Cloud Dataprep

2017-09-24 - By 

Google recently announced that Google Cloud Dataprep—the new managed data  wrangling service developed in collaboration with Trifacta—is now available in public beta. This service enables analysts and data scientists to visually explore and prepare data for analysis in seconds within the Google Cloud Platform.

Now that the Google Cloud Dataprep beta is open to the public, more companies can experience the benefits of Trifacta’s data preparation platform. From predictive transformation to interactive exploration, Trifacta’s  intuitive workflow has accelerated the data preparation process for Google Cloud Dataprep customers who have tried it out within private beta.

In addition to the same functionality found in Trifacta, Google Cloud Dataprep users also benefit from features that are unique to the collaboration with Google:

True SaaS offering 
With Google Cloud Dataprep, there’s no software to install or manage. Unlike a marketplace offering that deploys into Google ecosystem, Cloud Dataprep is a fully-managed service that does not require configuration or administration.

Single Sign On through Google Cloud Identity & Access Management
All users can easily access Cloud Dataprep using the same login / credential that they already used for any other Google service. This ensures highly secure and consistent access to Google services and data based on the permissions and roles defined through Google IAM.

Integration to Google Cloud Storage and Google BigQuery (read & write)
Users can browse, preview,  import data from and publish results to Google Cloud Storage and Google BigQuery directly through Cloud Dataprep. This is a huge boon for the teams that rely upon Google-generated data. For example:

  • Marketing teams leveraging DoubleClick Ads data can make that data available in Google BigQuery, then use Cloud Dataprep to prepare and publish the result back into BigQuery for downstream analysis. Learn more here.
  • Telematics data scientists can connect Cloud Dataprep directly to raw log data (often in JSON format) stored on Google Cloud Storage, and then prepare it for machine learning models executed in TensorFlow.
  • Retail business analysts can upload Excel data from their desktop to Google Cloud Storage, parse and combine it with BigQuery data to augment the results (beyond the limits of Excel), and eventually making the data available to various analytic tools like Google Data Studio, Looker, Tableau, Qlik or Zoomdata.

Big data scale provided by Cloud Dataflow 
By leveraging a serverless, auto-scaling data processing engine (Google Cloud Dataflow), Cloud Dataprep can handle any size of data, located anywhere in the world. This means that users don’t have to worry about optimizing their logic as their data grows, nor have to choose where their jobs run. At the same time, IT can rely on Cloud Dataflow to efficiently scale resources only as needed. Finally, it allows for enterprise-grade monitoring and logging in Google Stackdriver.

World-class Google support
As a Google service, Cloud Dataprep is subject to the same standards as other Google Beta product &  services. These benefits include:

  • World class uptime and availability around the world
  • Official support provided by Google Cloud Platform
  • Centralized usage-based billing managed on a per project basis with quotas and detailed reports

Early Google Cloud Dataprep Customer Feedback

Although Cloud Dataprep has only been in private beta for a short amount of time, we’ve had substantial participation from thousands of early private beta users and we’re excited to share some of the great feedback. Here’s a sample of what early users are saying:

Merkle Inc. 
Cloud Dataprep allows us to quickly view and understand new datasets, and its flexibility supports our data transformation needs. The GUI is nicely designed, so the learning curve is minimal. Our initial data preparation work is now completed in minutes, not hours or days,” says Henry Culver, IT Architect at Merkle. “The ability to rapidly see our data, and to be offered transformation suggestions in data delivery, is a huge help to us as we look to rapidly assimilate new datasets.”

Venture Development Center

“We needed a platform that was versatile, easy to utilize and provided a migration path as our needs for data review, evaluation, hygiene, interlinking and analysis advanced. We immediately knew that Google Cloud Platform, with Cloud Dataprep and BigQuery, were exactly what we were looking for. As we develop our capability and movement into the data cataloging, QA and delivery cycle, Cloud Dataprep allows us to accomplish this quickly and adeptly,” says Matthew W. Staudt, President of Venture Development Center.

For more information on these customers check out Google’s blog on the public beta launch here.

Cloud Dataprep Public Beta: Furthering Wrangling Success

Now that the beta version of Google Cloud Dataprep is open to the public, we’re excited to see more organizations achieve data wrangling success from the launch of  the public beta of Google Cloud Dataprep. From multinational banks to consumer retail companies to government agencies, there’s a  growing number of customers using Trifacta’s consistent transformation logic, user experience, workflow, metadata management, and comprehensive data governance to reduce data preparation times and improve data quality.

If you’re interested in Google Dataprep, you can sign up with your own personal account for free access OR login using your company’s existing Google account. Visit cloud.google.com/dataprep to learn more.

For more information about how Trifacta interoperates with cloud providers like Google Cloud and with on-prem infrastructure, download our brief.

 

Original article here.

 


standard

Big Data Analytics in Healthcare: Fuelled by Wearables and Apps

2017-07-11 - By 

Driven by specialised analytics systems and software, big data analytics has decreased the time required to double medical knowledge by half, thus compressing healthcare innovation cycle period, shows the much discussed Mary Meeker study titled Internet Trends 2017.

The presentation of the study is seen as an evidence of the proverbial big data-enabled revolution, that was predicted by experts like McKinsey and Company. “A big data revolution is under way in health care. Over the last decade pharmaceutical companies have been aggregating years of research and development data into medical data bases, while payors and providers have digitised their patient records,” the McKinsey report had said four years ago.

The Mary Meeker study shows that in the 1980s it took seven years to double medical knowledge which has been decreased to only 3.5 years after 2010, on account of massive use of big data analytics in healthcare. Though most of the samples used in the study were US based, the global trends revealed in it are well visible in India too.

“Medicine and underlying biology is now becoming a data-driven science where large amounts of structured and unstructured data relating to biological systems and human health is being generated,” says Dr Rohit Gupta of MedGenome, a genomics driven research and diagnostics company based in Bengaluru.

Dr Gupta told Firstpost that big data analytics has made it possible for MedGenome, which focuses on improving global health by decoding genetic information contained in an individual genome, to dive deeper into genetics research.

“While any individual’s genome information is useful for detecting the known mutations for diseases, underlying new patterns of complicated diseases and their progression requires genomics data from many individuals across populations — sometimes several thousands to even few millions amounting to exabytes of information,” he said.

All of which would have been a cumbersome process without the latest data analytics tools that big data analytics has brought forth.

The company that started work on building India-specific baseline data to develop more accurate gene-based diagnostic testing kits in the year 2015 now conducts 400 genetic tests across all key disease areas.

What is Big Data

According to Mitali Mukerji, senior principal scientist, Council of Scientific and Industrial Research when a large number of people and institutions digitally record health data either in health apps or in digitised clinics, these information become big data about health. The data acquired from these sources can be analysed to search for patterns or trends enabling a deeper insight into the health conditions for early actionable interventions.

Big data is growing bigger
But big data analytics require big data. And proliferation of Information technology in the health sector has enhanced flow of big data exponentially from various sources like dedicated wearable health gadgets like fitness trackers and hospital data base. Big data collection in the health sector has also been made possible because of the proliferation of smartphones and health apps.

The Meeker study shows that the download of health apps have increased worldwide in 2016 to nearly 1,200 million from nearly 1,150 million in the last year and 36 percent of these apps belong to the fitness and 24 percent to the diseases and treatment ones.

Health apps help the users monitor their health. From watching calorie intake to fitness training — the apps have every assistance required to maintain one’s health. 7 minute workout, a health app with three million users helps one get that flat tummy, lose weight and strengthen the core with 12 different exercises. Fooducate, another app, helps keep track of what one eats. This app not only counts the calories one is consuming, but also shows the user a detailed breakdown of the nutrition present in a packaged food.

For Indian users, there’s Healthifyme, which comes with a comprehensive database of more than 20,000 Indian foods. It also offers an on-demand fitness trainer, yoga instructor and dietician. With this app, one can set goals to lose weight and track their food and activity. There are also companies like GOQii, which provide Indian customers with subscription-based health and fitness services on their smartphones using fitness trackers that come free.

Dr Gupta of MedGenome explains that data accumulated in wearable devices can either be sent directly to the healthcare provider for any possible intervention or even predict possible hospitalisation in the next few days.

The Meeker study shows that global shipment of wearable gadgets grew from 26 million in 2014 to 102 million in 2016.

Another area that’s shown growth is electronic health records. In the US, electronic health records in office-based physicians in United States have soared from 21 percent in 2004 to 87 percent in 2015. In fact, every hospital with 500 beds (in the US) generate 50 petabytes of health data.

Back home, the Ministry of Electronics and Information Technology, Government of India, runs Aadhar-based Online Registration System, a platform to help patients book appointments in major government hospitals. The portal has the potential to emerge into a source if big data offering insights on diseases, age groups, shortcomings in hospitals and areas to improve. The website claims to have already been used to make 8,77,054 appointments till date in 118 hospitals.

On account of permeation of digital technology in health care, data growth has recorded 48% growth year on year, the Meeker study says. The accumulated mass of data, according to it, has provided deeper insights in health conditions. The study shows drastic increase of citations from 5 million in 1977 to 27 million in 2017. Easy access to big data has ensured that scientists can now direct their investigations following patterns analysed from such information and less time is required to arrive at conclusion.

“If a researcher has huge sets of data at his disposal, he/she can also find out patterns and simulate it through machine learning tools, which decreases the time required to arrive at a conclusion. Machine learning methods become more robust when they are fed with results analysed from big data,” says Mukerji.

She further adds, “These data simulation models, rely on primary information generated from a study to build predictive models that can help assess how human body would respond to a given perturbation,” says Mukerji.

The Meeker also study shows that Archimedes data simulation models can conduct clinical trials from data related to 50,000 patients collected over a period of 30 years, in just a span of two months. In absence of this model it took seven years to conduct clinical trials on data related to 2,838 patients collected over a period of seven years.

As per this report in 2016 results of 25,400 number of clinical trial was publically available against 1,900 in 2009.

The study also shows that data simulation models used by laboratories have drastically decreased time required for clinical trials. Due to emergence of big data, rise in number of publically available clinical trials have also increased, it adds.

Big data in scientific research

The developments grown around big-data in healthcare has broken the silos in scientific research. For example, the field of genomics has taken a giant stride in evolving personalised and genetic medicine with the help of big data.

A good example of how big data analytics can help modern medicine is the Human Genome Project and the innumerous researches on genetics, which paved way for personalised medicine, would have been difficult without the democratisation of data, which is another boon of big data analytics. The study shows that in the year 2008 there were only 5 personalised medicines available and it has increased to 132 in the year 2016.

In India, a Bangalore-based integrated biotech company recently launched ‘Avestagenome’, a project to build a complete genetic, genealogical and medical database of the Parsi community. Avestha Gengraine Technologies (Avesthagen), which launched the project believes that the results from the Parsi genome project could result in disease prediction and accelerate the development of new therapies and diagnostics both within the community as well as outside.

MedGenome has also been working on the same direction. “We collaborate with leading hospitals and research institutions to collect samples with research consent, generate sequencing data in our labs and analyse it along with clinical data to discover new mutations and disease causing perturbations in genes or functional pathways. The resultant disease models and their predictions will become more accurate as and when more data becomes available.”

Mukerji says that democratisation of data fuelled by proliferation of technology and big data has also democratised scientific research across geographical boundaries. “Since data has been made easily accessible, any laboratory can now proceed with research,” says Mukerji.

“We only need to ensure that our efforts and resources are put in the right direction,” she adds.

Challenges with big data

But Dr Gupta warns that big-data in itself does not guarantee reliability for collecting quality data is a difficult task.

Moreover, he said, “In medicine and clinical genomics, domain knowledge often helps and is almost essential to not only understand but also finding ways to effectively use the knowledge derived from the data and bring meaningful insights from it.”

Besides, big data gathering is heavily dependent on adaptation of digital health solutions, which further restricts the data to certain age groups. As per the Meeker report, 40 percent of millennial respondents covered in the study owned a wearable. On the other hand 26 percent and 10 percent of the Generation X and baby boomers, respectively, owned wearables.

Similarly, 48 percent millennials, 38 percent Generation X and 23 percent baby boomers go online to find a physician. The report also shows that 10 percent of the people using telemedicine and wearable proved themselves super adopters of the new healthcare technology in 2016 as compared to 2 percent in 2015.
Collection of big data.

Every technology brings its own challenges, with big data analytics secure storage and collection of data without violating the privacy of research subjects, is an added challenge. Something, even the Meeker study does not answer.

“Digital world is really scary,” says Mukerji.

“Though we try to secure our data with passwords in our devices, but someone somewhere has always access to it,” she says.

The health apps which are downloaded in mobile phones often become the source of big-data not only for the company that has produced it but also to the other agencies which are hunting for data in the internet. “We often click various options while browsing internet and thus knowingly or unknowingly give a third party access to some data stored in the device or in the health app,” she adds.

Dimiter V Dimitrov a health expert makes similar assertions in his report, ‘Medical Internet of Things and Big Data in Healthcare‘. He reports that even wearables often have a server which they interact to in a different language providing it with required information.

“Although many devices now have sensors to collect data, they often talk with the server in their own language,” he said in his report.

Even though the industry is still at a nascent stage, and privacy remains a concern, Mukerji says that agencies possessing health data can certainly share them with laboratories without disclosing patient identity.

Original article here.

 


standard

Great R packages for data import, wrangling and visualization

2017-06-23 - By 

One of the great things about R is the thousands of packages users have written to solve specific problems in various disciplines — analyzing everything from weather or financial data to the human genome — not to mention analyzing computer security-breach data.

Some tasks are common to almost all users, though, regardless of subject area: data import, data wrangling and data visualization. The table below show my favorite go-to packages for one of these three tasks (plus a few miscellaneous ones tossed in). The package names in the table are clickable if you want more information. To find out more about a package once you’ve installed it, type help(package = "packagename") in your R console (of course substituting the actual package name ).

See original article and interactive table here.

 


standard

New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll

2017-05-24 - By 

Python caught up with R and (barely) overtook it; Deep Learning usage surges to 32%; RapidMiner remains top general Data Science platform; Five languages of Data Science.

The 18th annual KDnuggets Software Poll again got huge participation from analytics and data science community and vendors, attracting about 2,900 voters, almost exactly the same as last year. Here is the initial analysis, with more detailed results to be posted later.

Python, whose share has been growing faster than R for the last several years, has finally caught up with R, and (barely) overtook it, with 52.6% share vs 52.1% for R.

The biggest surprise is probably the phenomenal share of Deep Learning tools, now used by 32% of all respondents, while only 18% used DL in 2016 and 9% in 2015. Google Tensorflow rapidly became the leading Deep Learning platform with 20.2% share, up from only 6.8% in 2016 poll, and entered the top 10 tools.

While in 2014 I wrote about Four main languages for Analytics, Data Mining, Data Science being R, Python, SQL, and SAS, the 5 main languages of Data Science in 2017 appear to be Python, R, SQL, Spark, and Tensorflow.

RapidMiner remains the most popular general platform for data mining/data science, with about 33% share, almost exactly the same as in 2016.

We note that many vendors have encouraged their users to vote, but all vendors had equal chances, so this does not violate KDnuggets guidelines. We have not seen any bot voting or direct links to vote for only one tool this year.

Spark grew to about 23% and kept its place in top 10 ahead of Hadoop.

Besides TensorFlow, another new tool in the top tier is Anaconda, with 22% share.

Top Analytics/Data Science Tools

Fig 1: KDnuggets Analytics/Data Science 2017 Software Poll: top tools in 2017, and their share in the 2015-6 polls

See original full article here.


standard

Cloud, backup and storage devices—how best to protect your data

2017-03-31 - By 

We are producing more data than ever before, with more than 2.5 quintillion bytes produced every day, according to computer giant IBM. That’s a staggering 2,500,000,000,000 gigabytes of data and it’s growing fast.

We have never been so connected through smart phones, smart watches, laptops and all sorts of wearable technologies inundating today’s marketplace. There were an estimated 6.4 billion connected “things” in 2016, up 30% from the previous year.

We are also continuously sending and receiving data over our networks. This unstoppable growth is unsustainable without some kind of smartness in the way we all produce, store, share and backup data now and in the future.

In the cloud

Cloud services play an essential role in achieving sustainable data management by easing the strain on bandwidth, storage and backup solutions.

But is the cloud paving the way to better backup services or is it rendering backup itself obsolete? And what’s the trade-off in terms of data safety, and how can it be mitigated so you can safely store your data in the cloud?

The cloud is often thought of as an online backup solution that works in the background on your devices to keep your photos and documents, whether personal or work related, backed up on remote servers.

In reality, the cloud has a lot more to offer. It connects people together, helping them store and share data online and even work together online to create data collaboratively.

It also makes your data ubiquitous, so that if you lose your phone or your device fails you simply buy a new one, sign in to your cloud account and voila! – all your data are on your new device in a matter of minutes.

Do you really back up your data?

An important advantage of cloud-based backup services is also the automation and ease of use. With traditional backup solutions, such as using a separate drive, people often discover, a little too late, that they did not back up certain files.

Relying on the user to do backups is risky, so automating it is exactly where cloud backup is making a difference.

Cloud solutions have begun to evolve from online backup services to primary storage services. People are increasingly moving from storing their data on their device’s internal storage (hard drives) to storing them directly in cloud-based repositories such as DropBox, Google Drive and Microsoft’s OneDrive.

Devices such as Google’s Chromebook do not use much local storage to store your data. Instead, they are part of a new trend in which everything you produce or consume on the internet, at work or at home, would come from the cloud and be stored there too.

Recently announced cloud technologies such as Google’s Drive File Stream or Dropbox’s Smart Sync are excellent examples of how are heading in a new direction with less data on the device and a bigger primary storage role for the cloud.

Here is how it works. Instead of keeping local files on your device, placeholder files (sort of empty files) are used, and the actual data are kept in the cloud and downloaded back onto the device only when needed.

Edits to the files are pushed to the cloud so that no local copy is kept on your device. This drastically reduces the risk of data leaks when a device is lost or stolen.

So if your entire workspace is in the cloud, is backup no longer needed?

No. In fact, backup is more relevant than ever, as disasters can strike cloud providers themselves, with hacking and ransomware affecting cloud storage too.

Backup has always had the purpose of reducing risks using redundancy, by duplicating data across multiple locations. The same can apply to cloud which can be duplicated across multiple cloud locations or multiple cloud service providers.

Privacy matters

Yet beyond the disruption of the market, the number-one concern about the use of for storing user data is privacy.

Data privacy is strategically important, particularly when customer data are involved. Many privacy-related problems can happen when using the cloud.

There are concerns about the processes used by cloud providers for privacy management, which often trade privacy for convenience. There are also concerns about the technologies put in place by cloud providers to overcome privacy related issues, which are often not effective.

When it comes to technology, encryption tools protecting your sensitive data have actually been around for a long time.

Encryption works by scrambling your data with a very large digital number (called a key) that you keep secret so that only you can decrypt the data. Nobody else can decode your data without that key.

Using encryption tools to encrypt your data with your own key before transferring it into the cloud is a sensible thing to do. Some cloud providers are now offering this option and letting you choose your own key.

Share vs encryption

But if you store data in the cloud for the purpose of sharing it with others – and that’s often the precise reason that users choose to use – then you might require a process to distribute encryption keys to multiple participants.

This is where the hassle can start. People you share with would need to get the key too, in some way or another. Once you share that key, how would you revoke it later on? How would you prevent it from being re-shared without your consent?

More importantly, how would you keep using the collaboration features offered by cloud providers, such as Google Docs, while working on encrypted files?

These are the key challenges ahead for cloud users and providers. Solutions to those challenges would truly be game-changing.

 

Original article here.


standard

10 new AWS cloud services you never expected

2017-01-27 - By 

From data scooping to facial recognition, Amazon’s latest additions give devs new, wide-ranging powers in the cloud

In the beginning, life in the cloud was simple. Type in your credit card number and—voilà—you had root on a machine you didn’t have to unpack, plug in, or bolt into a rack.

That has changed drastically. The cloud has grown so complex and multifunctional that it’s hard to jam all the activity into one word, even a word as protean and unstructured as “cloud.” There are still root logins on machines to rent, but there are also services for slicing, dicing, and storing your data. Programmers don’t need to write and install as much as subscribe and configure.

Here, Amazon has led the way. That’s not to say there isn’t competition. Microsoft, Google, IBM, Rackspace, and Joyent are all churning out brilliant solutions and clever software packages for the cloud, but no company has done more to create feature-rich bundles of services for the cloud than Amazon. Now Amazon Web Services is zooming ahead with a collection of new products that blow apart the idea of the cloud as a blank slate. With the latest round of tools for AWS, the cloud is that much closer to becoming a concierge waiting for you to wave your hand and give it simple instructions.

Here are 10 new services that show how Amazon is redefining what computing in the cloud can be.

Glue

Anyone who has done much data science knows it’s often more challenging to collect data than it is to perform analysis. Gathering data and putting it into a standard data format is often more than 90 percent of the job.

Glue is a new collection of Python scripts that automatically crawls your data sources to collect data, apply any necessary transforms, and stick it in Amazon’s cloud. It reaches into your data sources, snagging data using all the standard acronyms, like JSON, CSV, and JDBC. Once it grabs the data, it can analyze the schema and make suggestions.

The Python layer is interesting because you can use it without writing or understanding Python—although it certainly helps if you want to customize what’s going on. Glue will run these jobs as needed to keep all the data flowing. It won’t think for you, but it will juggle many of the details, leaving you to think about the big picture.

FPGA

Field Programmable Gate Arrays have long been a secret weapon of hardware designers. Anyone who needs a special chip can build one out of software. There’s no need to build custom masks or fret over fitting all the transistors into the smallest amount of silicon. An FPGA takes your software description of how the transistors should work and rewires itself to act like a real chip.

Amazon’s new AWS EC2 F1 brings the power of FGPA to the cloud. If you have highly structured and repetitive computing to do, an EC2 F1 instance is for you. With EC2 F1, you can create a software description of a hypothetical chip and compile it down to a tiny number of gates that will compute the answer in the shortest amount of time. The only thing faster is etching the transistors in real silicon.

Who might need this? Bitcoin miners compute the same cryptographically secure hash function a bazillion times each day, which is why many bitcoin miners use FPGAs to speed up the search. Anyone with a similar compact, repetitive algorithm you can write into silicon, the FPGA instance lets you rent machines to do it now. The biggest winners are those who need to run calculations that don’t map easily onto standard instruction sets—for example, when you’re dealing with bit-level functions and other nonstandard, nonarithmetic calculations. If you’re simply adding a column of numbers, the standard instances are better for you. But for some, EC2 with FGPA might be a big win.

Blox

As Docker eats its way into the stack, Amazon is trying to make it easier for anyone to run Docker instances anywhere, anytime. Blox is designed to juggle the clusters of instances so that the optimum number are running—no more, no less.

Blox is event driven, so it’s a bit simpler to write the logic. You don’t need to constantly poll the machines to see what they’re running. They all report back, so the right number can run. Blox is also open source, which makes it easier to reuse Blox outside of the Amazon cloud, if you should need to do so.

X-Ray

Monitoring the efficiency and load of your instances used to be simply another job. If you wanted your cluster to work smoothly, you had to write the code to track everything. Many people brought in third parties with impressive suites of tools. Now Amazon’s X-Ray is offering to do much of the work for you. It’s competing with many third-party tools for watching your stack.

When a website gets a request for data, X-Ray traces as it as flows your network of machines and services. Then X-Ray will aggregate the data from multiple instances, regions, and zones so that you can stop in one place to flag a recalcitrant server or a wedged database. You can watch your vast empire with only one page.

Rekognition

Rekognition is a new AWS tool aimed at image work. If you want your app to do more than store images, Rekognition will chew through images searching for objects and faces using some of the best-known and tested machine vision and neural-network algorithms. There’s no need to spend years learning the science; you simply point the algorithm at an image stored in Amazon’s cloud, and voilà, you get a list of objects and a confidence score that ranks how likely the answer is correct. You pay per image.

The algorithms are heavily tuned for facial recognition. The algorithms will flag faces, then compare them to each other and references images to help you identify them. Your application can store the meta information about the faces for later processing. Once you put a name to the metadata, your app will find people wherever they appear. Identification is only the beginning. Is someone smiling? Are their eyes closed? The service will deliver the answer, so you don’t need to get your fingers dirty with pixels. If you want to use impressive machine vision, Amazon will charge you not by the click but by the glance at each image.

Athena

Working with Amazon’s S3 has always been simple. If you want a data structure, you request it and S3 looks for the part you want. Amazon’s Athena now makes it much simpler. It will run the queries on S3, so you don’t need to write the looping code yourself. Yes, we’ve become too lazy to write loops.

Athena uses SQL syntax, which should make database admins happy. Amazon will charge you for every byte that Athena churns through while looking for your answer. But don’t get too worried about the meter running out of control because the price is only $5 per terabyte. That’s about 50 billionths of a cent per byte. It makes the penny candy stores look expensive.

Lambda@Edge

The original idea of a content delivery network was to speed up the delivery of simple files like JPG images and CSS files by pushing out copies to a vast array of content servers parked near the edges of the Internet. Amazon is taking this a step further by letting us push Node.js code out to these edges where they will run and respond. Your code won’t sit on one central server waiting for the requests to poke along the backbone from people around the world. It will clone itself, so it can respond in microseconds without being impeded by all that network latency.

Amazon will bill your code only when it’s running. You won’t need to set up separate instances or rent out full machines to keep the service up. It is currently in a closed test, and you must apply to get your code in their stack.

Snowball Edge

If you want some kind of physical control of your data, the cloud isn’t for you. The power and reassurance that comes from touching the hard drive, DVD-ROM, or thumb drive holding your data isn’t available to you in the cloud. Where is my data exactly? How can I get it? How can I make a backup copy? The cloud makes anyone who cares about these things break out in cold sweats.

The Snowball Edge is a box filled with data that can be delivered anywhere you want. It even has a shipping label that’s really an E-Ink display exactly like Amazon puts on a Kindle. When you want a copy of massive amounts of data that you’ve stored in Amazon’s cloud, Amazon will copy it to the box and ship the box to wherever you are. (The documentation doesn’t say whether Prime members get free shipping.)

Snowball Edge serves a practical purpose. Many developers have collected large blocks of data through cloud applications and downloading these blocks across the open internet is far too slow. If Amazon wants to attract large data-processing jobs, it needs to make it easier to get large volumes of data out of the system.

If you’ve accumulated an exabyte of data that you need somewhere else for processing, Amazon has a bigger version called Snowmobile that’s built into an 18-wheel truck complete with GPS tracking.

Oh, it’s worth noting that the boxes aren’t dumb storage boxes. They can run arbitrary Node.js code too so you can search, filter, or analyze … just in case.

Pinpoint

Once you’ve amassed a list of customers, members, or subscribers, there will be times when you want to push a message out to them. Perhaps you’ve updated your app or want to convey a special offer. You could blast an email to everyone on your list, but that’s a step above spam. A better solution is to target your message, and Amazon’s new Pinpoint tool offers the infrastructure to make that simpler.

You’ll need to integrate some code with your app. Once you’ve done that, Pinpoint helps you send out the messages when your users seem ready to receive them. Once you’re done with a so-called targeted campaign, Pinpoint will collect and report data about the level of engagement with your campaign, so you can tune your targeting efforts in the future.

Polly

Who gets the last word? Your app can, if you use Polly, the latest generation of speech synthesis. In goes text and out comes sound—sound waves that form words that our ears can hear, all the better to make audio interfaces for the internet of things.

Original article here.


standard

IoT + Big Data Means 92% Of Everything We Do Will Be In The Cloud

2016-12-24 - By 

You don’t need Sherlock Holmes to tell you that cloud computing is on the rise, and that cloud traffic keeps going up. However, it is enlightening to see the degree by which it is increasing, which is, in essence, about to quadruple in the next few years. By that time, 92% percent of workloads will be processed by cloud data centers; versus only eight percent being processed by traditional data centers.

Cisco, which does a decent job of measuring such things, just released estimates that shows cloud traffic likely to rise 3.7-fold by 2020, increasing 3.9 zettabytes (ZB) per year in 2015 (the latest full year data for which data is available) to 14.1 ZB per year by 2020.

The big data and associated Internet of Things are a big part of this growth, the study’s authors state. By 2020, database, analytics and IoT workloads will account for 22% of total business workloads, compared to 20% in 2015. The total volume of data generated by IoT will reach 600 ZB per year by 2020, 275 times higher than projected traffic going from data centers to end users/devices (2.2 ZB); 39 times higher than total projected data center traffic (15.3 ZB).

Public cloud is growing faster than private cloud growth, the survey also finds. By 2020, 68% (298 million) of the cloud workloads will be in public cloud data centers, up from 49% (66.3 million) in 2015. During the same time period, 32% (142 million) of the cloud workloads will be in private cloud data centers, down from 51% (69.7 million) in 2015.

As the Cisco team explains it, much of the shift to public cloud will likely be part of hybrid cloud strategies. For example, “cloud bursting is an example of hybrid cloud where daily computing requirements are handled by a private cloud, but for sudden spurts of demand the additional traffic demand — bursting — is handled by a public cloud.”

The Cisco estimates also show that while Software as a Service (SaaS, for online applications) will keep soaring, there will be less interest in Infrastructure as a Service (IaaS, for online servers, capacity, storage).  By 2020, 74% of the total cloud workloads will be software-as-a-service (SaaS) workloads, up from 65% at this time. Platform as a Service (PaaS, for development tools, databases, middleware) also will see a boost — eight percent of the total cloud workloads will be PaaS workloads, down from nine percent in 2015. However,  IaaS workloads will total 17% of the total cloud workloads, down from 26%.

The Cisco analysts explain that the lower percentage growth for IaaS may be attributable to the growing shift away from private cloud to public cloud providers. For starters, IaaS was far less disruptive to the business — a rearrangement of data center resources, if you will. As SaaS offerings gain in sophistication, those providers may offer IaaS support behind the scenes. “In the private cloud, initial deployments were predominantly IaaS. Test and development types of cloud services were the first to be used in the enterprise; cloud was a radical change in deploying IT services, and this use was a safe and practical initial use of private cloud for enterprises. It was limited, and it did not pose a risk of disrupting the workings of IT resources in the enterprise. As trust in adoption of SaaS or mission-critical applications builds over time with technology enablement in processing power, storage advancements, memory advancements, and networking advancements, we foresee the adoption of SaaS type applications to accelerate over the forecast period, while shares of IaaS and PaaS workloads decline.”

On the consumer side, video and social networking will lead the increase in consumer workloads. By 2020, consumer cloud storage traffic per user will be 1.7 GB per month, compared to 513 MB per month in 2015. By 2020, video streaming workloads will account for 34% of total consumer workloads, compared to 29% in 2015. Social networking workloads will account for 24% of total consumer workloads, up from 20 percent in 2015.  In the next four years, 59% (2.3 billion users) of the consumer Internet population will use personal cloud storage up from 47% (1.3 billion users) in 2015.

Original article here.


standard

Gartner’s Top 10 Strategic Technology Trends for 2017

2016-12-05 - By 

Artificial intelligence, machine learning, and smart things promise an intelligent future.

Today, a digital stethoscope has the ability to record and store heartbeat and respiratory sounds. Tomorrow, the stethoscope could function as an “intelligent thing” by collecting a massive amount of such data, relating the data to diagnostic and treatment information, and building an artificial intelligence (AI)-powered doctor assistance app to provide the physician with diagnostic support in real-time. AI and machine learning increasingly will be embedded into everyday things such as appliances, speakers and hospital equipment. This phenomenon is closely aligned with the emergence of conversational systems, the expansion of the IoT into a digital mesh and the trend toward digital twins.

Three themes — intelligent, digital, and mesh — form the basis for the Top 10 strategic technology trends for 2017, announced by David Cearley, vice president and Gartner Fellow, at Gartner Symposium/ITxpo 2016 in Orlando, Florida. These technologies are just beginning to break out of an emerging state and stand to have substantial disruptive potential across industries.

Intelligent

AI and machine learning have reached a critical tipping point and will increasingly augment and extend virtually every technology enabled service, thing or application.  Creating intelligent systems that learn, adapt and potentially act autonomously rather than simply execute predefined instructions is primary battleground for technology vendors through at least 2020.

Trend No. 1: AI & Advanced Machine Learning

AI and machine learning (ML), which include technologies such as deep learning, neural networks and natural-language processing, can also encompass more advanced systems that understand, learn, predict, adapt and potentially operate autonomously. Systems can learn and change future behavior, leading to the creation of more intelligent devices and programs.  The combination of extensive parallel processing power, advanced algorithms and massive data sets to feed the algorithms has unleashed this new era.

In banking, you could use AI and machine-learning techniques to model current real-time transactions, as well as predictive models of transactions based on their likelihood of being fraudulent. Organizations seeking to drive digital innovation with this trend should evaluate a number of business scenarios in which AI and machine learning could drive clear and specific business value and consider experimenting with one or two high-impact scenarios..

Trend No. 2: Intelligent Apps

Intelligent apps, which include technologies like virtual personal assistants (VPAs), have the potential to transform the workplace by making everyday tasks easier (prioritizing emails) and its users more effective (highlighting important content and interactions). However, intelligent apps are not limited to new digital assistants – every existing software category from security tooling to enterprise applications such as marketing or ERP will be infused with AI enabled capabilities.  Using AI, technology providers will focus on three areas — advanced analytics, AI-powered and increasingly autonomous business processes and AI-powered immersive, conversational and continuous interfaces. By 2018, Gartner expects most of the world’s largest 200 companies to exploit intelligent apps and utilize the full toolkit of big data and analytics tools to refine their offers and improve customer experience.

Trend No. 3: Intelligent Things

New intelligent things generally fall into three categories: robots, drones and autonomous vehicles. Each of these areas will evolve to impact a larger segment of the market and support a new phase of digital business but these represent only one facet of intelligent things.  Existing things including IoT devices will become intelligent things delivering the power of AI enabled systems everywhere including the home, office, factory floor, and medical facility.

As intelligent things evolve and become more popular, they will shift from a stand-alone to a collaborative model in which intelligent things communicate with one another and act in concert to accomplish tasks. However, nontechnical issues such as liability and privacy, along with the complexity of creating highly specialized assistants, will slow embedded intelligence in some scenarios.

Digital

The lines between the digital and physical world continue to blur creating new opportunities for digital businesses.  Look for the digital world to be an increasingly detailed reflection of the physical world and the digital world to appear as part of the physical world creating fertile ground for new business models and digitally enabled ecosystems.

Trend No. 4: Virtual & Augmented Reality

Virtual reality (VR) and augmented reality (AR) transform the way individuals interact with each other and with software systems creating an immersive environment.  For example, VR can be used for training scenarios and remote experiences. AR, which enables a blending of the real and virtual worlds, means businesses can overlay graphics onto real-world objects, such as hidden wires on the image of a wall.  Immersive experiences with AR and VR are reaching tipping points in terms of price and capability but will not replace other interface models.  Over time AR and VR expand beyond visual immersion to include all human senses.  Enterprises should look for targeted applications of VR and AR through 2020.

Trend No. 5: Digital Twin

Within three to five years, billions of things will be represented by digital twins, a dynamic software model of a physical thing or system. Using physics data on how the components of a thing operate and respond to the environment as well as data provided by sensors in the physical world, a digital twin can be used to analyze and simulate real world conditions, responds to changes, improve operations and add value. Digital twins function as proxies for the combination of skilled individuals (e.g., technicians) and traditional monitoring devices and controls (e.g., pressure gauges). Their proliferation will require a cultural change, as those who understand the maintenance of real-world things collaborate with data scientists and IT professionals.  Digital twins of physical assets combined with digital representations of facilities and environments as well as people, businesses and processes will enable an increasingly detailed digital representation of the real world for simulation, analysis and control.

Trend No. 6: Blockchain

Blockchain is a type of distributed ledger in which value exchange transactions (in bitcoin or other token) are sequentially grouped into blocks.  Blockchain and distributed-ledger concepts are gaining traction because they hold the promise of transforming industry operating models in industries such as music distribution, identify verification and title registry.  They promise a model to add trust to untrusted environments and reduce business friction by providing transparent access to the information in the chain.  While there is a great deal of interest the majority of blockchain initiatives are in alpha or beta phases and significant technology challenges exist.

Mesh

The mesh refers to the dynamic connection of people, processes, things and services supporting intelligent digital ecosystems.  As the mesh evolves, the user experience fundamentally changes and the supporting technology and security architectures and platforms must change as well.

Trend No. 7: Conversational Systems

Conversational systems can range from simple informal, bidirectional text or voice conversations such as an answer to “What time is it?” to more complex interactions such as collecting oral testimony from crime witnesses to generate a sketch of a suspect.  Conversational systems shift from a model where people adapt to computers to one where the computer “hears” and adapts to a person’s desired outcome.  Conversational systems do not use text/voice as the exclusive interface but enable people and machines to use multiple modalities (e.g., sight, sound, tactile, etc.) to communicate across the digital device mesh (e.g., sensors, appliances, IoT systems).

Trend No. 8: Mesh App and Service Architecture

The intelligent digital mesh will require changes to the architecture, technology and tools used to develop solutions. The mesh app and service architecture (MASA) is a multichannel solution architecture that leverages cloud and serverless computing, containers and microservices as well as APIs and events to deliver modular, flexible and dynamic solutions.  Solutions ultimately support multiple users in multiple roles using multiple devices and communicating over multiple networks. However, MASA is a long term architectural shift that requires significant changes to development tooling and best practices.

Trend No. 9: Digital Technology Platforms

Digital technology platforms are the building blocks for a digital business and are necessary to break into digital. Every organization will have some mix of five digital technology platforms: Information systems, customer experience, analytics and intelligence, the Internet of Things and business ecosystems. In particular new platforms and services for IoT, AI and conversational systems will be a key focus through 2020.   Companies should identify how industry platforms will evolve and plan ways to evolve their platforms to meet the challenges of digital business.

Trend No. 10: Adaptive Security Architecture

The evolution of the intelligent digital mesh and digital technology platforms and application architectures means that security has to become fluid and adaptive. Security in the IoT environment is particularly challenging. Security teams need to work with application, solution and enterprise architects to consider security early in the design of applications or IoT solutions.  Multilayered security and use of user and entity behavior analytics will become a requirement for virtually every enterprise.

Original article here.


standard

How to profit from the IoT: 4 quick successes and 4 bigger ideas

2016-11-24 - By 

During the past few years, much has been made of the billions of sensors, cameras, and other devices being connected exponentially in the “Internet of Things” (IoT)—and the trillions of dollars in potential economic value that is expected to come of it. Yet as exciting as the IoT future may be, a lot of the industry messaging has gone right over the heads of people who today operate plants, run businesses and are responsible for implementing IoT-based solutions. Investors find themselves wondering what is real, and what is a hyped-up vision of a future that is still years away.

Over the past decade, I have met with dozens of organizations in all corners of the globe, talking with people about IoT. I’ve worked with traditional industrial companies struggling to change outmoded manufacturing processes, and I’ve worked with innovative young startups that are redefining long-held assumptions and roles. And I can tell you that the benefits of IoT are not in some far-off future scenario. They are here and now—and growing. The question is not whether companies should begin deploying IoT—the benefits of IoT are clear—but how.

So, how do the companies get started on the IoT journey? It’s usually best to begin with a small, well-defined project that improves efficiency and productivity around existing processes. I’ve seen countless organizations, large and small, enjoy early success in their IoT journey by taking one of the following “fast paths” to IoT payback:

 
  • Connected operations. By connecting key processes and devices in their production process on a single network, iconic American motorcycle maker Harley Davidson increased productivity by 80%, reduced its build-to-order cycle from 18 months to two weeks, and grew overall profitability by 3%-4%.
  • Remote operations. A dairy company in India began remotely monitoring the freezers in its 150 ice cream stores, providing alerts in case of power outages. The company began realizing a payback within a month and saw a five-fold return on its investment within 13 months.
  • Predictive analytics. My employer Cisco has deployed sensors and used energy analytics software in manufacturing plants, reducing energy consumption by 15% to 20%.
  • Predictive maintenance. Global mining company Rio Tinto uses sensors to monitor the condition of its vehicles, identifying maintenance needs before they become problems—and saves $2 million a day every time it avoids a breakdown.

These four well-proven scenarios are ideal candidates to get started on IoT projects. Armed with an early success, companies can then build momentum and begin to tackle more transformative IoT solutions. Here, IoT provides rich opportunities across many domains, including:

 
  • New business opportunities and revenue streams. Connected operations combined with 3D printing, for example, are making personalization and mass customization possible in ways not imagined a few years ago.
  • New business models. IoT enables equipment manufacturers to adopt service-oriented business models. By gathering data from devices installed at a customer site, manufacturers like Japanese industrial equipment maker Fanuc can offer remote monitoring, analytics and predictive maintenance services to reduce costs and improve uptime.
  • New business structures. In many traditional industries, customers have typically looked to a single vendor for a complete end-to-end solution, often using closed, proprietary technologies. Today IoT, with its flexibility, cost, and time-to-market advantages, is driving a shift to an open technology model where solution providers form an ecosystem of partners. As a result, each participant provides its best-in-class capabilities to contribute to a complete IoT solution for their customers.
  • New value propositions for consumers. IoT is helping companies provide new hyper-relevant customer experiences and faster, more accurate services than ever before. Just think of the ever-increasing volume of holiday gift orders placed online on “Black Monday.” IoT is speeding up the entire fulfillment process, from ordering to delivery. Connected robots and Radio Frequency Identification (RFIUD) tags in the warehouse make the picking and packing process faster and more accurate. Real-time preventive maintenance systems keep delivery vehicles up and running. Telematic sensors record temperate and humidity throughout the process. So, not only can you track your order to your doorstep, your packages are delivered on time—and they arrive in optimal condition.

 

So, yes, IoT is real today and is already having a tremendous impact. It is gaining traction in industrial segments, logistics, transportation, and smart cities. Other industries, such as healthcare, retail, and agriculture are following closely.

We are just beginning to understand IoT’s potential. But if you are an investor wondering where the smart money is going, one thing is certain: 10 years from now, you’ll have to look hard to find an industry that has not been transformed by IoT.

Original article here.

 


standard

Artificial Intelligence Will Grow 300% in 2017

2016-11-06 - By 

Insights matter. Businesses that use artificial intelligence (AI), big data and the Internet of Things (IoT) technologies to uncover new business insights “will steal $1.2 trillion per annum from their less informed peers by 2020.” So says Forrester in a new report, “Predictions 2017: Artificial Intelligence Will Drive The Insights Revolution.”

Across all businesses, there will be a greater than 300% increase in investment in artificial intelligence in 2017 compared with 2016. Through the use of cognitive interfaces into complex systems, advanced analytics, and machine learning technology, AI will provide business users access to powerful insights never before available to them. It will help, says Forrester, “drive faster business decisions in marketing, ecommerce, product management and other areas of the business by helping close the gap from insights to action.”

The combination of AI, Big data, and IoT technologies will enable businesses investing in them and implementing them successfully to overcome barriers to data access and to mining useful insights. In 2017 these technologies will increase business’ access to data, broaden the types of data that can be analyzed, and raise the level of sophistication of the resulting insight. As a result, Forrester predicts an acceleration in the trend towards democratization of data analysis. While in 2015 it found that only 51% of data and analytics decision-makers said that they were able to easily obtain data and analyze it without the help of technologist, Forrester expects this figure to rise to around 66% in 2017.

Big data technologies will mature and vendors will increasingly integrate them with their traditional analytics platforms which will facilitate their incorporation in existing analytics processes in a wide range of organizations. The use of a single architecture for big data convergence with agile and actionable insights will become more widespread.

The third set of technologies supporting insight-driven businesses, those associated with IoT, will also become integrated with more traditional analytics offerings and Forrester expects the number of digital analytics vendors offering IoT insights capabilities to double in 2017. This will encourage their customers to invest in networking more devices and exploring the data they produce. For example, Forrester has found that 67% of telecommunications decision-makers are considering or prioritizing developing IoT or M2M initiatives in 2017.

The increased investment in IoT will lead to new type of analytics which in turn will lead to new business insights. Currently, much of the data that is generated by edge devices such as mobile phones, wearables, or cars, goes unused as “immature data and analytics practices cause most firms to squander these insights opportunities,” says Forrester. In 2016, less than 50% of data and analytics decision-makers have adopted location analytics, but Forrester expects the adoption of location analytics will grow to over two-thirds of businesses by the end of 2017.  The resulting new insights will enable firms to optimize their customers’ experiences as they engage in the physical world with products, services and support.

In general, Forrester sees encouraging signs that more companies are investing in initiatives to get rid of existing silos of customer knowledge so they can coordinate better and drive insights throughout the entire enterprise. Specifically, Forrester sees three such initiatives becoming prominent in 2017:

Organizations with Chief Data Officers (CDOs) will become the majority in 2017, up from a global average of 47% in 2016. But to become truly insights-driven, says Forrester, “firms must eventually assign data responsibilities to CIOs and CMOs, and even CEOs, in order to drive swift business action based on data driven insights.”

Customer data management projects will increase by 75%. In 2016, for the first time, 39% of organizations have embarked on a big data initiative to support cross-channel tracking and attribution, customer journey analytics, and better segmentation. And nearly one-third indicated plans to adopt big data technologies and solutions in the next twelve months.

Forrester expects to see a marked increase in the adoption of enterprise-wide insights-driven practices as firms digitally transform their business in 2017. Leading customer intelligence practices and strategies will become “the poster child for business transformation,” says Forrester.

Longer term, according to Forrester’s “The Top Emerging Technologies To Watch: 2017 To 2021,” Artificial intelligence-based services and applications will eventually change most industries and redistribute the workforce.

Original article here.


standard

Legal Marijuana needs Big Data to Grow

2016-10-26 - By 

Cannabis-specific data services are optimizing business and educating consumers

When it comes to growing a plant-based industry, big data and technology may be more valuable than fertilizer.

As the legal marijuana industry continues to expand, so has the need for data services to increase efficiency and consumer education. But companies like SAPSAPGF, +0.18% and NetSuite N, -1.39% that typically provide data collection and organization to more traditional industries aren’t extending their business to the legal cannabis industry, creating an opening for new companies catering just to the marijuana market to fill the space.

“They’re bringing normal business processes into the cannabis industry,” says Alan Brochstein, founder of 420 Investor, an investor community for publicly traded cannabis companies. “It’s about time.” SAP and NetSuite didn’t respond to requests for comment.

With the legal cannabis market expected to reach $6.7 billion in medical and recreational sales in 2016, industry experts expect data to play a primary role in accelerating this growth. These data services track everything from plant cultivation to consumer purchasing trends, helping marijuana retailers comply with state regulations and optimize their inventory to meet demand.

One of those services, Flowhub, was founded in late 2014 as a cannabis-tracking software to help marijuana growers operate in compliance with state regulations. Before that, many manufacturers had a blind spot when it came to supply chain management, opening them up to potential missteps like missing plants or pesticide use, says Kyle Sherman, the company’s chief executive. “We wanted to provide tools to help people so we can legalize [cannabis] responsibly,” Sherman says.

Read more: The marijuana business might have a high-stakes pest problem

About 100 companies in Colorado and Oregon are signing onto the Flowhub system, according to Sherman. The software system also allows for other data applications to be incorporated with it, similar to how mobile apps work within smartphones.

One of these data applications is Headset, which launched nearly two months ago but is already tracking $65 million worth of cannabis transaction information, according to Cy Scott, the company’s chief executive. Scott described Headset as “the Nielsen of cannabis,” providing market intelligence data for marijuana sellers including guidance ranging from how to stock inventory to how to price products. “It was almost obvious,” Scott says. “Every other retail industry has this service.”

The Headset application can tell retailers specific details which they can use to base inventory decisions off, like if granola edibles are outselling caramel edibles, and overall trends, like the decline in popularity of marijuana flowers — the smokable form of the plant — Scott says. “Our customers range from the largest retailers to the newest retailers in the industry,” he adds.

There are also data resources for consumers. Leafly, a cannabis information resource website, uses crowdsourced data to provide reviews of strains and dispensary directories to help customers navigate the legal marketplace. The gradual legalization of the industry has brought a new source of community-based feedback, says Zack Hutson, director of public relations at Privateer Holdings, a cannabis private-equity firm that owns Leafly, adding that the site had about 9 million unique visitors in February.

Cannabis Reports also provides a comprehensive database for the strains of cannabis on the legal market. The company’s chief executive, David Drake, says he brought the website online after noticing the absence of tech services within the legal cannabis industry. The database includes more than 30,000 strains of marijuana, the companies that produce them, the lab tests performed on them, medical studies and other information gathered from online research.

“It’s a really big responsibility to have that amount of data, and we’re making it available in a very open fashion” Drake says. “We’re looking to try and serve anybody trying to find out about cannabis.”

The company provides free information for consumers on its websites, and businesses can pay a monthly fee for customer insight data and data organization like charts and pricing information. “It makes people a lot more comfortable about the industry when you know all the data is there and it’s all transparent,” Drake says.

This transparency may be crucial for the industry as legalization movements across the country continue to gain steam. Much of the negative reputation marijuana has garnered in past decades has been drawn from “false data,” says Flowhub’s Sherman. “What we really need to squash prohibition is great data.”

Original article here.


standard

Big Data and Cloud – Are You Ready to Embrace Both?

2016-09-23 - By 

This week’s Economist magazine has the cover story about Uber; the world’s most valuable startup that symbolizes disruptive innovation. The race to reinvent transportation service worldwide is so fast that it’ll dramatically change the way we travel, in the next 5-10 years. While studying the success study of Uber, I was more interested in factors that led to the exceptional growth of the company – spreading to 425 global cities in 7 years, with a market cap of $70 billion.

There are surely multiple factors that contributed to its success, but what made me surprised was its capitalization of data analytics. In 2014, Uber launched UberPool, which uses algorithms to match riders based on location and sets the price based on the likelihood of picking up another passenger. It analyzes consumers’ transaction history and spending patterns and provides intelligent recommendations for personalized services.

Uber is just one example; thousands of enterprises have already embraced big data and predictive analytics for HR management, hiring, financial management, and employee relations management. Latest startups are already leveraging analytics to bring data-driven and practical recommendations for the market. However, this does not mean that situation is ideal.

According to MIT Technology Review, roughly 0.5 percent of digital data is analyzed, which means, companies are losing millions of opportunities to make smart decisions, improve efficiency, attract new prospects and achieve business goals. The reason is simple; they are not leveraging the potential offered by data analytics.

Though the percentage of data being analyzed is disappointing, research endorses the growing realization in businesses about the adoption of analytics. By 2020, around 1.7 megabytes of new information will be created every single second, for every human being on the planet.

Another thing that is deeply associated with the growing data asset is a cloud. As the statistics endorse, data creation is on the rise; it’ll lead to storage and security issues for the businesses. Though there are five free cloud services, the adoption rate is still disappointing.

When we explore why big data analysis is lagging behind and how to fix the problem, it’s vital to assess the storage services too. Though there are organizations that have been using cloud storage for years, the adoption of the same is slow. It’s usually a good option to host general data on the cloud while keeping sensitive information on the premise.

Big Data and Cloud for Business:

As we noted in the previous post, private cloud adoption increased from 63% to 77%, which has driven hybrid cloud adoption up from 58% to 71% year-over-year. There are enough reasons and stats to explain the need for cloud storage and big data analytics for small businesses. Here are three fundamental reasons why companies need some reliable cloud technology to carry out big data analytics exercise.

1. Cost:

Looking at the available options at this point, there are two concerns. Some are either too costly and time-consuming or just unreliable and insecure. Without a clear solution, the default has been to do the bare minimum with the available data. If we can successfully integrate data into the cloud, the ultimate cost of both (storage & analytics) services will turn flat and benefit the business.

2. Security:

We have already discussed that companies have a gigantic amount of data, but they have no clue as to what to do with it. The first thing they need is to keep their data in a secure environment where no breach could occur. Look at recent revelations about Dropbox hack, which is now being reported to have happened. It affected over 65 million accounts associated with the service. Since moving significant amounts of data in and out of the cloud comes with security risks, one has to ensure that the cloud service he/she is using is reliable.

See, there are concerns and risks but thanks to big players IBM, Microsoft, and Google; trust in cloud services is increasing day by day and adoption is on the rise.

3. Integration:

If you look at the different sales, marketing, and social media management tools, they all offer integration with other apps. For example, you can integrate Facebook with MailChimp, Salesforce with MailChimp; which means, your (marketing/sales) cloud offers two-in-one service. It not only processes your data and provides analytics but also ensures that findings and data remain in a secure environment.

4. Automation:

Once you can remove uncertainty, and find a reliable but cost-effective solution for the business, the next comes is feature set. There are cloud services that offer wider automation features, enabling users to save their time and use it for some more important stuff. Data management, campaign management, data downloads, real-time analytics, automatic alerts, and drip management are some of the key automation features that any data analytics architect will be looking forward to.

While integrating cloud with data analytics, make sure that it serves your purpose while keeping the cost under control. Otherwise, the entire objective of the exercise will be lost. As big data becomes an integral part of any business, data management applications will turn user-friendlier and equally affordable. It is a challenge, but there are a lot of opportunities for small businesses to take big data into account and achieve significant results.

Original article here.


standard

Azure Stack will be a disruptive, game changing technology

2016-08-29 - By 

Few companies will use pure public or private cloud computing and certainly no company should miss the opportunity to leverage a combination. Hybrids of private and public cloud, multiple public cloud services and non-cloud services will serve the needs of more companies than any single cloud model and so it’s important that companies stop and consider their long term cloud needs and strategy.

Providing insight into the future of cloud computing is something that Pulsant has a lot of experience in and our focus on hybrid IT and hybrid services allows us to see where the adoption of public and private cloud benefits our customers’ strategies and requirements.

Since so much of IT’s focus in the recent past (and in truth, even now) has been on private cloud, any analytics that show the growth of public cloud give us a sense of how the hybrid idea will progress. The business use of SaaS is increasingly driving a hybrid model by default. Much of hybrid cloud use comes because of initial trials of public cloud services. As business users adopt more public cloud, SaaS in particular, they will need more support from companies, such as Pulsant, to help provide solutions for true integration and governance of their cloud.

Game changer

The challenge, as always in the cloud arena, is that there is no strict definition of the term ‘hybrid.’ There has been, until recently, a distinct lack of vendors and service providers able to offer simple solutions to some of the day-to-day challenges faced by most companies who are trying to develop a cloud strategy. Challenges include those of governance, security, consistent experiences between private and public services and the ability to simply ‘build once’ and ‘operate everywhere’.

Enter Azure Stack — it’s not often that I use language like “game changing” and “disruptive technology” but in the case of Azure Stack I don’t think these terms can be understated. For the first time you have a service provider (for that’s what Microsoft is becoming) that is addressing what hybrid IT really means and how to make it simple and easy to use.

So what is Azure Stack?

Few companies will use pure public or private cloud computing and certainly no company should miss the opportunity to leverage a combination. Hybrids of private and public cloud, multiple public cloud services and non-cloud services will serve the needs of more companies than any single cloud model and so it’s important that companies stop and consider their long term cloud needs and strategy.

Providing insight into the future of cloud computing is something that Pulsant has a lot of experience in and our focus on hybrid IT and hybrid services allows us to see where the adoption of public and private cloud benefits our customers’ strategies and requirements.

Since so much of IT’s focus in the recent past (and in truth, even now) has been on private cloud, any analytics that show the growth of public cloud give us a sense of how the hybrid idea will progress. The business use of SaaS is increasingly driving a hybrid model by default. Much of hybrid cloud use comes because of initial trials of public cloud services. As business users adopt more public cloud, SaaS in particular, they will need more support from companies, such as Pulsant, to help provide solutions for true integration and governance of their cloud.

Game changer

The challenge, as always in the cloud arena, is that there is no strict definition of the term ‘hybrid.’ There has been, until recently, a distinct lack of vendors and service providers able to offer simple solutions to some of the day-to-day challenges faced by most companies who are trying to develop a cloud strategy. Challenges include those of governance, security, consistent experiences between private and public services and the ability to simply ‘build once’ and ‘operate everywhere’.

Enter Azure Stack — it’s not often that I use language like “game changing” and “disruptive technology” but in the case of Azure Stack I don’t think these terms can be understated. For the first time you have a service provider (for that’s what Microsoft is becoming) that is addressing what hybrid IT really means and how to make it simple and easy to use.

So what is Azure Stack?

This is the simple question that completely differentiates Azure (public) / Azure Stack from a traditional VM-based environment. When you understand this, you understand how Azure Stack is a disruptive and game changing technology.

For a long time now application scalability has been achieved by simply adding more servers (memory, processors, storage, etc.) If there was a need for more capacity the answer was “add more servers”. Ten years ago, that still meant buying another physical server and putting it in a rack. With virtualisation (VMware, Hyper-V, OpenStack) it has been greatly simplified, with the ability to simply “spin-up” another virtual machine on request. Even this is now being superseded by the advent of cloud technologies.

Virtualisation may have freed companies from the need for having to buy and own hardware (capital drain and the constant need for upgrades) but with virtualisation companies still have the problem of the overhead of an operating system (Windows/Linux), possibly a core application (e.g. Microsoft SQL) and, most annoyingly, a raft of servers and software to patch, maintain and manage. Even with virtualisation there is a lot of overhead required to run applications as is the case when running dozens of “virtual machines” to host the applications and services being used.

The public cloud takes the next step and allows the aggregation of things like CPUs, storage, networking, database tiers, web tiers and simply allows a company to be allocated the amount of capacity it needs and applications are given the necessary resources dynamically. More importantly, resources can be added and removed at a moment’s notice without the need to add VMs or remove them. This in turn means less ‘virtual machines’ to patch and manage and so less overhead.

The point of Azure Stack is that it takes the benefits of public cloud and takes the next logical step in this journey — to bring the exact capabilities and services into your (private) data centre. This will enable a host of new ideas letting companies develop a whole Azure Stack ecosystem where:

  • Hosting companies can sell private Azure Services direct from their datacentres
  • System integrators can design, deploy and operate Azure solution once but deliver in both private and public clouds
  • ISVs can write Azure-compatible software once and deploy in both private and public clouds
  • Managed service providers can deploy, customise and operate Azure Stack themselves

I started by making the comment that I thought Azure Stack will be a disruptive, game changing technology for Pulsant and its customers. I believe that it will completely change how datacentres will manage large scale applications, and even address dev/test and highly secured and scalable apps. It will be how hosting companies like Pulsant will offer true hybrid cloud services in the future.

Original article here.


standard

These R packages import sports, weather, stock data and more

2016-08-27 - By 

There are lots of good reasons you might want to analyze public data, from detecting salary trends in government data to uncovering insights about a potential investment (or your favorite sports team).

But before you can run analyses and visualize trends, you need to have the data. The packages listed below make it easy to find economic, sports, weather, political and other publicly available data and import it directly into R — in a format that’s ready for you to work your analytics magic.

Packages that are on CRAN can be installed on your system by using the R command install.packages("packageName") — you only need to run this once. GitHub packages are best installed with the devtools package — install that once with install.packages("devtools") and then use that to install packages from GitHub using the formatdevtools::install_github("repositoryName/packageName"). Once installed, you can load a package into your working session once each session using the formatlibrary("packageName").

Some of the sample code below comes from package documentation or blog posts by package authors. For more information about a package, you can runhelp(package="packageName") in R to get info on functions included in the package and, if available, links to package vignettes (R-speak for additional documentation). To see sample code for a particular function, tryexample(topic="functionName", package="packageName") or simply ?functionName for all available help about a function including any sample code (not all documentation includes samples).

For more useful R packages, see Great R Packages for data import, wrangling and visualization.

R packages to import public data

PACKAGECATEGORYDESCRIPTIONSAMPLE CODEMORE INFO
blscrapeREconomics, GovernmentFor specific information about U.S. salaries and employment info, the Bureau of Labor Statistics offers a wealth of data available via this new package. blsAPIpackage is another option. CRAN.bls_api(c(“LEU0254530800”, “LEU0254530600”),
startyear = 2000, endyear = 2015)
Blog post by package author
FredRFinance, GovernmentIf you’re interested just in Fed data, FredR can access data from the Federal Reserve Economic Data API, including 240,000 US and international data sets from 77 sources.Free API key needed. GitHub.fred <- FredR(api.key)
fred$series.search(“GDP”)
gdp <- fred$series.observations(series_id = ‘GDPC1’)
Project’s GitHub page
quantmodFinance, GovernmentThis package is designed for financial modelling but also has functions to easily pull data from Google Finance, Yahoo Finance and the St. Louis Federal Reserve (FRED). CRAN.getSymbols(“DEXJPUS”,src=”FRED”)Intro on getting data
censusapiGovernmentThere are several other R packages that work with data from the U.S. Census, but this aims to be complete and offer data from all the bureau’s APIs, not just from one or two surveys. API key required. GitHub.mydata <- getCensus(name=”acs5″, vintage=2014,
key=mycensuskey,
vars=c(“NAME”, “B01001_001E”, “B19013_001E”),
region=”congressional district:*”, regionin=”state:36″)
This Urban Institute presentationhas more details; theproject GitHub pageoffers some basics.
RSocrataGovernmentPull data from any municipality that uses the Socrata data platform. Created by the City of Chicago data team. CRAN.mydata <- read.socrata(
“https://data.cityofchicago.org/
Transportation/Towed-Vehicles/ygr5-vcbg”)
RSocrata blog post
forbesListRMiscA bit of a niche offering, this taps into lists maintained by Forbes including largest private companies, top business schools and top venture capitalists. GitHub.#top venture capitalists 2012-2016
mydata <-
get_years_forbes_list_data(years = 2012:2016,
list_name = “Top VCs”)
See theproject GitHub page. You may need to manually load the tidyr package for code to work.
pollstRPoliticsThis package pulls political polling data from the Huffington Post Pollster API. CRAN.elec_2016_polls <- pollster_chart_data(
“2016-general-election-trump-vs-clinton”)
See theIntro vignette
LahmanSportsR interface for the famed Lahman baseball database. CRAN.batavg <- battingStats()Blog postHacking the new Lahman Package 4.0-1 with RStudio
stattleshipRSportsStattleship offers NFL, NBA, NHL and MLB game data via a partnership with Gracenote. API key (currently still free) needed. GitHub.set_token(“your-API-token”)
sport <- ‘baseball’
league <- ‘mlb’
ep <- ‘game_logs’
q_body <- list(team_id=’mlb-bos’, status=’ended’,
interval_type=’regularseason’)
gls <- ss_get_result(sport=sport, league=league,
ep=ep, query=q_body, walk=TRUE)
game_logs <- do.call(‘rbind’,
lapply(gls, function(x) x$game_logs))
See theStattleship blog post
weatherDataWeatherPull historical weather data from cities/airports around the world. CRAN. If you have trouble pulling data, especially on a Mac, try uninstalling and re-installing a different version with the codeinstall_github("ozagordi/weatherData")mydata <- getWeatherForDate(“BOS”, “2016-08-01”,
end_date=”2016-08-15″)
See thispost by the package author.

 

Original article here.


standard

Is Anything Ever ‘Forgotten’ Online?

2016-08-13 - By 

When someone types your name into Google, suppose the first link points to a newspaper article about you going bankrupt 15 years ago, or to a YouTube video of you smoking cigarettes 20 years ago, or simply a webpage that includes personal information such as your current home address, your birth date, or your Social Security number. What can you do — besides cry?

Unlike those living the United States, Europeans actually have some recourse. The European Union’s “right to be forgotten” (RTBF) law allows EU residents to fill out an online form requesting that a search engine (such as Google) remove links that compromise their privacy or unjustly damage their reputation. A committee at the search company, primarily consisting of lawyers, will review your request, and then, if deemed appropriate, the site will no longer display those unwanted links when people search for your name.

But privacy efforts can backfire. A landmark example of this happened in 2003, when actress and singerBarbra Streisand sued a California couple who took aerial photographs of the entire length of the state’s coastline, which included Streisand’s Malibu estate. Streisand’s suit argued that her privacy had been violated, and tried to get the photos removed from the couple’s website so nobody could see them. But the lawsuit itself drew worldwide media attention; far more people saw the images of her home than would have through the couple’s online archive.

In today’s digital world, privacy is a regular topic of concern and controversy. If someone discovered the list of all the things people had asked to be “forgotten,” they could shine a spotlight on that sensitive information. Our research explored whether that was possible, and how it might happen. Our research has shown that hidden news articles can be unmasked with some hacking savvy and a moderate amount of financial resources.

Keeping the past in the past

The RTBF law does not require websites to take down the actual web pages containing the unwanted information. Rather, just the search engine links to those pages are removed, and only from results from searches for specific terms.

In most circumstances, this is perfectly fine. If you shoplifted 20 years ago, and people you have met recently do not suspect you shoplifted, it is very unlikely they would discover — without the aid of a search engine — that you ever shoplifted by simply browsing online content. By removing the link from Google’s results for searches of your name, your brief foray into shoplifting would be, for all intents and purposes, “forgotten.”

This seems like a practical solution to a real problem that many people are facing today. Google has received requests to remove more than 1.5 million links from specific search results and has removed 43 percent of them.

‘Hiding’ in plain sight

But our recent research has shown that a transparency activist or private investigator, with modest hacking skills and financial resources, can find newspaper articles that have been removed from search results and identify the people who requested those removals. This data-driven attack has three steps.

First, the searcher targets a particular online newspaper, such as the Spanish newspaper El Mundo, and uses automated software tools to download articles that may be subject to delisting (such as articles about financial or sexual misconduct). Second, he again uses automated tools to get his computer to extract the names mentioned in the downloaded articles. Third, he runs a program to query google.es with each of those names, to see if the corresponding article is in the google.es search results or not. If not, then it is most certainly a RTBF delisted link, and the corresponding name is the person who requested the delisting.

As a proof of concept, we did exactly this for a subset of articles from El Mundo, a Madrid-based daily newspaper we chose in part because one of our team speaks Spanish. From the subset of downloaded articles, we discovered two that are being delisted by google.es, along with the names of the corresponding requesters.

Using a third-party botnet to send the queries to Google from many different locations, and with moderate financial resources ($5,000 to $10,000), we believe the effort could cover all candidate articles in all major European newspapers. We estimate that 30 to 40 percent of the RTBF delisted links in the media, along with their corresponding requesters, could be discovered in this manner.

Lifting the veil

Armed with this information, the person could publish the requesters’ names and the corresponding links on a new website, naming those who have things they want forgotten and what it is they hope people won’t remember. Anyone seeking to find information on a new friend or business associate could visit this site — in addition to Google — and find out what, if anything, that person is trying to bury in the past. One such site already exists.

At present, European law only requires the links to be removed from country- or language-specific sites, such as google.fr and google.es. Visitors to google.com can still see everything. This is the source of a major European debate about whether the right to be forgotten should also require Google to remove links from searches on google.com. But because our approach does not involve using google.com, it would still work even if the laws were extended to cover google.com.

Should the right to be forgotten exist?

Even if delisted links to news stories can be discovered, and the identities of their requesters revealed, the RTBF law still serves a useful and important purpose for protecting personal privacy.

By some estimates, 95 percent of RTBF requests are not seeking to delist information that was in the news. Rather, people want to protect personal details such as their home address or sexual orientation, and even photos and videos that might compromise their privacy. These personal details typically appear in social media like Facebook or YouTube, or in profiling sites, such as profileengine.com. But finding these delisted links for social media is much more difficult because of the huge number of potentially relevant web pages to be investigated.

People should have the right to retain their privacy — particularly when it comes to things like home addresses or sexual orientation. But you may just have to accept that the world might not actually forget about the time when as a teenager when your friend challenged you to shoplift.

Original article here.


standard

OpenAI, Elon Musk’s Wild Plan to Set Artificial Intelligence Free

2016-08-07 - By 

THE FRIDAY AFTERNOON news dump, a grand tradition observed by politicians and capitalists alike, is usually supposed to hide bad news. So it was a little weird that Elon Musk, founder of electric car maker Tesla, and Sam Altman, president of famed tech incubator Y Combinator, unveiled their new artificial intelligence company at the tail end of a weeklong AI conference in Montreal this past December.

But there was a reason they revealed OpenAI at that late hour. It wasn’t that no one was looking. It was that everyonewas looking. When some of Silicon Valley’s most powerful companies caught wind of the project, they began offering tremendous amounts of money to OpenAI’s freshly assembled cadre of artificial intelligence researchers, intent on keeping these big thinkers for themselves. The last-minute offers—some made at the conference itself—were large enough to force Musk and Altman to delay the announcement of the new startup. “The amount of money was borderline crazy,” says Wojciech Zaremba, a researcher who was joining OpenAI after internships at both Google andFacebook and was among those who received big offers at the eleventh hour.

How many dollars is “borderline crazy”? Two years ago, as the market for the latest machine learning technology really started to heat up, Microsoft Research vice president Peter Lee said that the cost of a top AI researcher had eclipsed the cost of a top quarterback prospect in the National Football League—and he meant under regular circumstances, not when two of the most famous entrepreneurs in Silicon Valley were trying to poach your top talent. Zaremba says that as OpenAI was coming together, he was offered two or three times his market value.

OpenAI didn’t match those offers. But it offered something else: the chance to explore research aimed solely at the future instead of products and quarterly earnings, and to eventually share most—if not all—of this research with anyone who wants it. That’s right: Musk, Altman, and company aim to give away what may become the 21st century’s most transformative technology—and give it away for free.

Zaremba says those borderline crazy offers actually turned him off—despite his enormous respect for companies like Google and Facebook. He felt like the money was at least as much of an effort to prevent the creation of OpenAI as a play to win his services, and it pushed him even further towards the startup’s magnanimous mission. “I realized,” Zaremba says, “that OpenAI was the best place to be.”

That’s the irony at the heart of this story: even as the world’s biggest tech companies try to hold onto their researchers with the same fierceness that NFL teams try to hold onto their star quarterbacks, the researchers themselves just want to share. In the rarefied world of AI research, the brightest minds aren’t driven by—or at least not only by—the next product cycle or profit margin. They want to make AI better, and making AI better doesn’t happen when you keep your latest findings to yourself.

This morning, OpenAI will release its first batch of AI software, a toolkit for building artificially intelligent systems by way of a technology called “reinforcement learning”—one of the key technologies that, among other things, drove the creation of AlphaGo, the Google AI that shocked the world by mastering the ancient game of Go. With this toolkit, you can build systems that simulate a new breed of robot, play Atari games, and, yes, master the game of Go.

But game-playing is just the beginning. OpenAI is a billion-dollar effort to push AI as far as it will go. In both how the company came together and what it plans to do, you can see the next great wave of innovation forming. We’re a long way from knowing whether OpenAI itself becomes the main agent for that change. But the forces that drove the creation of this rather unusual startup show that the new breed of AI will not only remake technology, but remake the way we build technology.

AI Everywhere

Silicon Valley is not exactly averse to hyperbole. It’s always wise to meet bold-sounding claims with skepticism. But in the field of AI, the change is real. Inside places like Google and Facebook, a technology called deep learning is already helping Internet services identify faces in photos, recognize commands spoken into smartphones, and respond to Internet search queries. And this same technology can drive so many other tasks of the future. It can help machinesunderstand natural language—the natural way that we humans talk and write. It can create a new breed of robot, giving automatons the power to not only perform tasks but learn them on the fly. And some believe it can eventually give machines something close to common sense—the ability to truly think like a human.

But along with such promise comes deep anxiety. Musk and Altman worry that if people can build AI that can do great things, then they can build AI that can do awful things, too. They’re not alone in their fear of robot overlords, but perhaps counterintuitively, Musk and Altman also think that the best way to battle malicious AI is not to restrict access to artificial intelligence but expand it. That’s part of what has attracted a team of young, hyper-intelligent idealists to their new project.

OpenAI began one evening last summer in a private room at Silicon Valley’s Rosewood Hotel—an upscale, urban, ranch-style hotel that sits, literally, at the center of the venture capital world along Sand Hill Road in Menlo Park, California. Elon Musk was having dinner with Ilya Sutskever, who was then working on the Google Brain, the company’s sweeping effort to build deep neural networks—artificially intelligent systems that can learn to perform tasks by analyzing massive amounts of digital data, including everything fromrecognizing photos to writing email messages to, well,carrying on a conversation. Sutskever was one of the top thinkers on the project. But even bigger ideas were in play.

Sam Altman, whose Y Combinator helped bootstrap companies like Airbnb, Dropbox, and Coinbase, had brokered the meeting, bringing together several AI researchers and a young but experienced company builder named Greg Brockman, previously the chief technology officer at high-profile Silicon Valley digital payments startup called Stripe, another Y Combinator company. It was an eclectic group. But they all shared a goal: to create a new kind of AI lab, one that would operate outside the control not only of Google, but of anyone else. “The best thing that I could imagine doing,” Brockman says, “was moving humanity closer to building real AI in a safe way.”

Musk was there because he’s an old friend of Altman’s—and because AI is crucial to the future of his various businesses and, well, the future as a whole. Tesla needs AI for its inevitable self-driving cars. SpaceX, Musk’s other company, will need it to put people in space and keep them alive once they’re there. But Musk is also one of the loudest voices warning that we humans could one day lose control of systems powerful enough to learn on their own.

The trouble was: so many of the people most qualified to solve all those problems were already working for Google (and Facebook and Microsoft and Baidu and Twitter). And no one at the dinner was quite sure that these thinkers could be lured to a new startup, even if Musk and Altman were behind it. But one key player was at least open to the idea of jumping ship. “I felt there were risks involved,” Sutskever says. “But I also felt it would be a very interesting thing to try.”

Breaking the Cycle

Emboldened by the conversation with Musk, Altman, and others at the Rosewood, Brockman soon resolved to build the lab they all envisioned. Taking on the project full-time, he approached Yoshua Bengio, a computer scientist at the University of Montreal and one of founding fathers of the deep learning movement. The field’s other two pioneers—Geoff Hinton and Yann LeCun—are now at Google and Facebook, respectively, but Bengio is committed to life in the world of academia, largely outside the aims of industry. He drew up a list of the best researchers in the field, and over the next several weeks, Brockman reached out to as many on the list as he could, along with several others.

Many of these researchers liked the idea, but they were also wary of making the leap. In an effort to break the cycle, Brockman picked the ten researchers he wanted the most and invited them to spend a Saturday getting wined, dined, and cajoled at a winery in Napa Valley. For Brockman, even the drive into Napa served as a catalyst for the project. “An underrated way to bring people together are these times where there is no way to speed up getting to where you’re going,” he says. “You have to get there, and you have to talk.” And once they reached the wine country, that vibe remained. “It was one of those days where you could tell the chemistry was there,” Brockman says. Or as Sutskever puts it: “the wine was secondary to the talk.”

By the end of the day, Brockman asked all ten researchers to join the lab, and he gave them three weeks to think about it. By the deadline, nine of them were in. And they stayed in, despite those big offers from the giants of Silicon Valley. “They did make it very compelling for me to stay, so it wasn’t an easy decision,” Sutskever says of Google, his former employer. “But in the end, I decided to go with OpenAI, partly of because of the very strong group of people and, to a very large extent, because of its mission.”

The deep learning movement began with academics. It’s only recently that companies like Google and Facebook and Microsoft have pushed into the field, as advances in raw computing power have made deep neural networks a reality, not just a theoretical possibility. People like Hinton and LeCun left academia for Google and Facebook because of the enormous resources inside these companies. But they remain intent on collaborating with other thinkers. Indeed, as LeCun explains, deep learning research requires this free flow of ideas. “When you do research in secret,” he says, “you fall behind.”

As a result, big companies now share a lot of their AI research. That’s a real change, especially for Google, which has long kept the tech at the heart of its online empiresecret. Recently, Google open sourced the software engine that drives its neural networks. But it still retains the inside track in the race to the future. Brockman, Altman, and Musk aim to push the notion of openness further still, saying they don’t want one or two large corporations controlling the future of artificial intelligence.

The Limits of Openness

All of which sounds great. But for all of OpenAI’s idealism, the researchers may find themselves facing some of the same compromises they had to make at their old jobs. Openness has its limits. And the long-term vision for AI isn’t the only interest in play. OpenAI is not a charity. Musk’s companies that could benefit greatly the startup’s work, and so could many of the companies backed by Altman’s Y Combinator. “There are certainly some competing objectives,” LeCun says. “It’s a non-profit, but then there is a very close link with Y Combinator. And people are paid as if they are working in the industry.”

According to Brockman, the lab doesn’t pay the same astronomical salaries that AI researchers are now getting at places like Google and Facebook. But he says the lab does want to “pay them well,” and it’s offering to compensate researchers with stock options, first in Y Combinator and perhaps later in SpaceX (which, unlike Tesla, is still a private company).

Nonetheless, Brockman insists that OpenAI won’t give special treatment to its sister companies. OpenAI is a research outfit, he says, not a consulting firm. But when pressed, he acknowledges that OpenAI’s idealistic vision has its limits. The company may not open source everything it produces, though it will aim to share most of its research eventually, either through research papers or Internet services. “Doing all your research in the open is not necessarily the best way to go. You want to nurture an idea, see where it goes, and then publish it,” Brockman says. “We will produce lot of open source code. But we will also have a lot of stuff that we are not quite ready to release.”

Both Sutskever and Brockman also add that OpenAI could go so far as to patent some of its work. “We won’t patent anything in the near term,” Brockman says. “But we’re open to changing tactics in the long term, if we find it’s the best thing for the world.” For instance, he says, OpenAI could engage in pre-emptive patenting, a tactic that seeks to prevent others from securing patents.

But to some, patents suggest a profit motive—or at least a weaker commitment to open source than OpenAI’s founders have espoused. “That’s what the patent system is about,” says Oren Etzioni, head of the Allen Institute for Artificial Intelligence. “This makes me wonder where they’re really going.”

The Super-Intelligence Problem

When Musk and Altman unveiled OpenAI, they also painted the project as a way to neutralize the threat of a malicious artificial super-intelligence. Of course, that super-intelligence could arise out of the tech OpenAI creates, but they insist that any threat would be mitigated because the technology would be usable by everyone. “We think its far more likely that many, many AIs will work to stop the occasional bad actors,” Altman says.

But not everyone in the field buys this. Nick Bostrom, the Oxford philosopher who, like Musk, has warned against the dangers of AI, points out that if you share research without restriction, bad actors could grab it before anyone has ensured that it’s safe. “If you have a button that could do bad things to the world,” Bostrom says, “you don’t want to give it to everyone.” If, on the other hand, OpenAI decides to hold back research to keep it from the bad guys, Bostrom wonders how it’s different from a Google or a Facebook.

He does say that the not-for-profit status of OpenAI could change things—though not necessarily. The real power of the project, he says, is that it can indeed provide a check for the likes of Google and Facebook. “It can reduce the probability that super-intelligence would be monopolized,” he says. “It can remove one possible reason why some entity or group would have radically better AI than everyone else.”

But as the philosopher explains in a new paper, the primary effect of an outfit like OpenAI—an outfit intent on freely sharing its work—is that it accelerates the progress of artificial intelligence, at least in the short term. And it may speed progress in the long term as well, provided that it, for altruistic reasons, “opts for a higher level of openness than would be commercially optimal.”

“It might still be plausible that a philanthropically motivated R&D funder would speed progress more by pursuing open science,” he says.

Like Xerox PARC

In early January, Brockman’s nine AI researchers met up at his apartment in San Francisco’s Mission District. The project was so new that they didn’t even have white boards. (Can you imagine?) They bought a few that day and got down to work.

Brockman says OpenAI will begin by exploring reinforcement learning, a way for machines to learn tasks by repeating them over and over again and tracking which methods produce the best results. But the other primary goal is what’s called “unsupervised learning”—creating machines that can truly learn on their own, without a human hand to guide them. Today, deep learning is driven by carefully labeled data. If you want to teach a neural network to recognize cat photos, you must feed it a certain number of examples—and these examples must be labeled as cat photos. The learning is supervised by human labelers. But like many others researchers, OpenAI aims to create neural nets that can learn without carefully labeled data.

“If you have really good unsupervised learning, machines would be able to learn from all this knowledge on the Internet—just like humans learn by looking around—or reading books,” Brockman says.

He envisions OpenAI as the modern incarnation of Xerox PARC, the tech research lab that thrived in the 1970s. Just as PARC’s largely open and unfettered research gave rise to everything from the graphical user interface to the laser printer to object-oriented programing, Brockman and crew seek to delve even deeper into what we once considered science fiction. PARC was owned by, yes, Xerox, but it fed so many other companies, most notably Apple, because people like Steve Jobs were privy to its research. At OpenAI, Brockman wants to make everyone privy to its research.

This month, hoping to push this dynamic as far as it will go, Brockman and company snagged several other notable researchers, including Ian Goodfellow, another former senior researcher on the Google Brain team. “The thing that was really special about PARC is that they got a bunch of smart people together and let them go where they want,” Brockman says. “You want a shared vision, without central control.”

Giving up control is the essence of the open source ideal. If enough people apply themselves to a collective goal, the end result will trounce anything you concoct in secret. But if AI becomes as powerful as promised, the equation changes. We’ll have to ensure that new AIs adhere to the same egalitarian ideals that led to their creation in the first place. Musk, Altman, and Brockman are placing their faith in the wisdom of the crowd. But if they’re right, one day that crowd won’t be entirely human.

Original article here.


standard

Talend soars after topping targets in second Bay Area tech IPO of 2016

2016-07-29 - By 

Big Data integration software company Talend on Thursday raised about $95 million in a closely watched IPO that is just the second one from a Bay Area tech company this year.

American Depositary Shares in the Redwood City company, which has roots in France, opened its first day of trading (NASDAQ:TLND) Friday up more than 50 percent.

The company late on Thursday priced 5.25 million shares at $18, above the target range of between $15 and $17 it set earlier this month. Underwriters have the option to buy an additional 787,500, so the final tally on money raised could approach $110 million.

It opened at $27.66 and traded between $25 and $28 in early trading after CEOMike Tuchen rang the opening bell on the Nasdaq exchange in New York.

The only other IPO in the region was by San Francisco-based cloud communications business Twilio, which raised $150 million last month and whose stock (NYSE:TWLO) has nearly tripled since.

Many observers believe that a strong showing by Talend, following close on the heels of Twilio’s success, could convince others waiting in the wings to make the leap.

Talend is the seventh tech IPO of 2016, according to Renaissance Capital, which said the average return of the previous six this year has topped 70 percent.

CEO Tuchen issued a statement about his company’s IPO that emphasized Talend has a long way to go to fully capitalize on the market it is addressing. Spending on software that connects and integrates data is expected grow to as much $21 billion in 2019, according to industry research firm IDC.

Talend last year lost $22 million on revenue of $76 million, compared to a $22.5 million loss on revenue of $62.6 million in 2014.

It has more than 1300 customers worldwide, including Air France, Citi, General Electric and Travis Perkins.

“We’re looking forward to maximizing the opportunity we have in front of us and building our business for long-term success,” Tuchen said.

Before going public, Talend raised more than $100 million from investors who included Galileo Partners, Balderton Capital, Idinvest Partners, Iris Capital and Silver Lake Partners.

Original article here.


standard

What That Election Probability Means

2016-07-28 - By 

We now have our presidential candidates, and for the next few months you get to hear about the changing probability of Hillary Clinton and Donald Trump winning the election. As of this writing,the Upshot estimates a 68% probability for Clinton and 32% for Donald Trump. FiveThirtyEight estimates 52% and 48% for Clinton and Trump, respectively. Forecasts are kind of all over the place this far out from November. Plus, the numbers aren’t especially accurate post-convention.

But the probabilities will start to converge and grow more significant.

So what does it mean when Clinton has a 68% chance of becoming president? What if there were a 90% chance that Trump wins?

Some interpret a high percentage as a landslide, which often isn’t the case with these election forecasts, and it certainly doesn’t mean the candidate with a low chance will lose. If this were the case, the Cleveland Cavaliers would not have beaten the Golden State Warriors, and I would not be sitting here hating basketball.

Fiddle with the probabilities in the graphic simulation here to see what I mean.

Even when you shift the probability far left or far right, the opposing candidate still gets some wins. That doesn’t mean a forecast was wrong. That’s just randomness and uncertainty at play.

The probability estimates the percentage of times you get an outcome if you were to do something multiple times. In the case of Clinton’s 68% chance, run an election hundreds of times, and the statistical model that spit out the percentage thinks that Clinton wins about 68% of those theoretical elections. Conversely, it thinks Trump wins 32% of them.

So as we get closer to election day, even if there’s a high probability for one candidate over the other, what I’m saying is — there’s a chance.

Original article here.


standard

IBM Watson: Machine-Of-All-Trades

2016-07-28 - By 

From fashion to food to healthcare, IBM’s Watson has many guises across different industries. Here’s a look at some of the work IBM’s AI system has been doing since its Jeopardy! heyday.

After defeating Brad Rutter and Ken Jennings in a game of Jeopardy! in 2011, IBM’s Watson couldn’t survive on its $77,147 in winnings. Unlike Microsoft’s Cortana and Apple’s Siri, Watson lacked a parent willing to let it continue living in the basement rent-free, so it got a paying job in healthcare, helping insurer Wellpoint and doctors by providing treatment advice.

Since then, and following investments of more than $1 billion, Watson has become a machine-of-all-trades. Through a combination of machine learning, natural language processing, and a variety of other technologies, Watson is helping companies across a broad spectrum of businesses. Beyond healthcare, Watson earns its keep in fashion, hospitality, food, gaming, retail, financial services, and veterinary medicine.

Its latest engagement involves protecting computers from its own kind. On Tuesday, IBM announced Watson for Cyber Security, a role that comes with residence in the cloud, instead of the Power 750 systems it inhabits on corporate premises.

This fall, Watson, with the assistance of researchers at eight universities, will begin learning to recognize cyber-security threats in the hope that its cognitive capabilities will help identify malicious code and formulate mitigation strategies. The core of its training data will come from IBM’s X-Force research library, which includes data on 8 million spam and phishing attacks, as well as more than 100,000 vulnerabilities.

IBM believes that Watson’s ability to understand unstructured data makes it well-suited for malware hunting. The firm said that 80% of all Internet data is unstructured, and that the typical organization makes use of only about 8% of this data. Given AI’s already considerable role in fraud detection, it wouldn’t be surprising to see Watson excel in cyber-security.

Marc van Zadelhoff, general manager of IBM Security, sees Watson as an answer to the cyber-security talent shortage. “Even if the industry was able to fill the estimated 1.5 million open cyber-security jobs by 2020, we’d still have a skills crisis in security,” he said in a statement.

So robots, of a sort, are taking jobs. But this may be for the best, since the work of processing more than 15,000 security documents a month, according to IBM, would be rather a chore.

At the same time, Watson will benefit from free labor, in the form of job training provided by students at the eight universities involved with the project.

You, too, may see Watson or something similar working in your industry. It won’t be a particularly social colleague, but at least it will get you those reports you need on time.

To take a look at a several of the many faces of Watson, go here.

 

Original article here.


Site Search

Search
Exact matches only
Search in title
Search in content
Search in comments
Search in excerpt
Filter by Custom Post Type
 

BlogFerret

Help-Desk
X
Sign Up

Enter your email and Password

Log In

Enter your Username or email and password

Reset Password

Enter your email to reset your password

X
<-- script type="text/javascript">jQuery('#qt_popup_close').on('click', ppppop);