Posted On:deep learning Archives - AppFerret
2018 has been an eventful year for AI to say the least! We’ve seen advances in generative models, the AlphaGo victory, several data breach scandals, and so much more. I’m going to briefly review AI in 2018 before giving 10 predictions on where the space is going in 2019. Prepare yourself, my predictions range from more Kubernetes infused ML pipelines to the first business use case of generative modeling of 3D worlds. Happy New Year and enjoy!
Original video here.
Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different. Plus, how AI and IoT are inextricably connected.
We’re all familiar with the term “Artificial Intelligence.” After all, it’s been a popular focus in movies such as The Terminator, The Matrix, and Ex Machina (a personal favorite of mine). But you may have recently been hearing about other terms like “Machine Learning” and “Deep Learning,” sometimes used interchangeably with artificial intelligence. As a result, the difference between artificial intelligence, machine learning, and deep learning can be very unclear.
I’ll begin by giving a quick explanation of what Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) actually mean and how they’re different. Then, I’ll share how AI and the Internet of Things are inextricably intertwined, with several technological advances all converging at once to set the foundation for an AI and IoT explosion.
So what’s the difference between AI, ML, and DL?
First coined in 1956 by John McCarthy, AI involves machines that can perform tasks that are characteristic of human intelligence. While this is rather general, it includes things like planning, understanding language, recognizing objects and sounds, learning, and problem solving.
We can put AI in two categories, general and narrow. General AI would have all of the characteristics of human intelligence, including the capacities mentioned above. Narrow AI exhibits some facet(s) of human intelligence, and can do that facet extremely well, but is lacking in other areas. A machine that’s great at recognizing images, but nothing else, would be an example of narrow AI.
At its core, machine learning is simply a way of achieving AI.
Arthur Samuel coined the phrase not too long after AI, in 1959, defining it as, “the ability to learn without being explicitly programmed.” You see, you can get AI without using machine learning, but this would require building millions of lines of codes with complex rules and decision-trees.
So instead of hard coding software routines with specific instructions to accomplish a particular task, machine learning is a way of “training” an algorithm so that it can learnhow. “Training” involves feeding huge amounts of data to the algorithm and allowing the algorithm to adjust itself and improve.
To give an example, machine learning has been used to make drastic improvements to computer vision (the ability of a machine to recognize an object in an image or video). You gather hundreds of thousands or even millions of pictures and then have humans tag them. For example, the humans might tag pictures that have a cat in them versus those that do not. Then, the algorithm tries to build a model that can accurately tag a picture as containing a cat or not as well as a human. Once the accuracy level is high enough, the machine has now “learned” what a cat looks like.
Deep learning is one of many approaches to machine learning. Other approaches include decision tree learning, inductive logic programming, clustering, reinforcement learning, and Bayesian networks, among others.
Deep learning was inspired by the structure and function of the brain, namely the interconnecting of many neurons. Artificial Neural Networks (ANNs) are algorithms that mimic the biological structure of the brain.
In ANNs, there are “neurons” which have discrete layers and connections to other “neurons”. Each layer picks out a specific feature to learn, such as curves/edges in image recognition. It’s this layering that gives deep learning its name, depth is created by using multiple layers as opposed to a single layer.
AI and IoT are Inextricably Intertwined
I think of the relationship between AI and IoT much like the relationship between the human brain and body.
Our bodies collect sensory input such as sight, sound, and touch. Our brains take that data and makes sense of it, turning light into recognizable objects and turning sounds into understandable speech. Our brains then make decisions, sending signals back out to the body to command movements like picking up an object or speaking.
All of the connected sensors that make up the Internet of Things are like our bodies, they provide the raw data of what’s going on in the world. Artificial intelligence is like our brain, making sense of that data and deciding what actions to perform. And the connected devices of IoT are again like our bodies, carrying out physical actions or communicating to others.
Unleashing Each Other’s Potential
The value and the promises of both AI and IoT are being realized because of the other.
Machine learning and deep learning have led to huge leaps for AI in recent years. As mentioned above, machine learning and deep learning require massive amounts of data to work, and this data is being collected by the billions of sensors that are continuing to come online in the Internet of Things. IoT makes better AI.
Improving AI will also drive adoption of the Internet of Things, creating a virtuous cycle in which both areas will accelerate drastically. That’s because AI makes IoT useful.
On the industrial side, AI can be applied to predict when machines will need maintenance or analyze manufacturing processes to make big efficiency gains, saving millions of dollars.
On the consumer side, rather than having to adapt to technology, technology can adapt to us. Instead of clicking, typing, and searching, we can simply ask a machine for what we need. We might ask for information like the weather or for an action like preparing the house for bedtime (turning down the thermostat, locking the doors, turning off the lights, etc.).
Converging Technological Advancements Have Made this Possible
Shrinking computer chips and improved manufacturing techniques means cheaper, more powerful sensors.
Quickly improving battery technology means those sensors can last for years without needing to be connected to a power source.
Wireless connectivity, driven by the advent of smartphones, means that data can be sent in high volume at cheap rates, allowing all those sensors to send data to the cloud.
And the birth of the cloud has allowed for virtually unlimited storage of that data and virtually infinite computational ability to process it.
Of course, there are one or two concerns about the impact of AI on our society and our future. But as advancements and adoption of both AI and IoT continue to accelerate, one thing is certain; the impact is going to be profound.
Original article here.
No one really knows how the most advanced algorithms do what they do. That could be a problem.
Last year, a strange self-driving car was released onto the quiet roads of Monmouth County, New Jersey. The experimental vehicle, developed by researchers at the chip maker Nvidia, didn’t look different from other autonomous cars, but it was unlike anything demonstrated by Google, Tesla, or General Motors, and it showed the rising power of artificial intelligence. The car didn’t follow a single instruction provided by an engineer or programmer. Instead, it relied entirely on an algorithm that had taught itself to drive by watching a human do it.
Getting a car to drive this way was an impressive feat. But it’s also a bit unsettling, since it isn’t completely clear how the car makes its decisions. Information from the vehicle’s sensors goes straight into a huge network of artificial neurons that process the data and then deliver the commands required to operate the steering wheel, the brakes, and other systems. The result seems to match the responses you’d expect from a human driver. But what if one day it did something unexpected—crashed into a tree, or sat at a green light? As things stand now, it might be difficult to find out why. The system is so complicated that even the engineers who designed it may struggle to isolate the reason for any single action. And you can’t ask it: there is no obvious way to design such a system so that it could always explain why it did what it did.
The mysterious mind of this vehicle points to a looming issue with artificial intelligence. The car’s underlying AI technology, known as deep learning, has proved very powerful at solving problems in recent years, and it has been widely deployed for tasks like image captioning, voice recognition, and language translation. There is now hope that the same techniques will be able to diagnose deadly diseases, make million-dollar trading decisions, and do countless other things to transform whole industries.
But this won’t happen—or shouldn’t happen—unless we find ways of making techniques like deep learning more understandable to their creators and accountable to their users. Otherwise it will be hard to predict when failures might occur—and it’s inevitable they will. That’s one reason Nvidia’s car is still experimental.
Already, mathematical models are being used to help determine who makes parole, who’s approved for a loan, and who gets hired for a job. If you could get access to these mathematical models, it would be possible to understand their reasoning. But banks, the military, employers, and others are now turning their attention to more complex machine-learning approaches that could make automated decision-making altogether inscrutable. Deep learning, the most common of these approaches, represents a fundamentally different way to program computers. “It is a problem that is already relevant, and it’s going to be much more relevant in the future,” says Tommi Jaakkola, a professor at MIT who works on applications of machine learning. “Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a ‘black box’ method.”
There’s already an argument that being able to interrogate an AI system about how it reached its conclusions is a fundamental legal right. Starting in the summer of 2018, the European Union may require that companies be able to give users an explanation for decisions that automated systems reach. This might be impossible, even for systems that seem relatively simple on the surface, such as the apps and websites that use deep learning to serve ads or recommend songs. The computers that run those services have programmed themselves, and they have done it in ways we cannot understand. Even the engineers who build these apps cannot fully explain their behavior.
This raises mind-boggling questions. As the technology advances, we might soon cross some threshold beyond which using AI requires a leap of faith. Sure, we humans can’t always truly explain our thought processes either—but we find ways to intuitively trust and gauge people. Will that also be possible with machines that think and make decisions differently from the way a human would? We’ve never before built machines that operate in ways their creators don’t understand. How well can we expect to communicate—and get along with—intelligent machines that could be unpredictable and inscrutable? These questions took me on a journey to the bleeding edge of research on AI algorithms, from Google to Apple and many places in between, including a meeting with one of the great philosophers of our time.
In 2015, a research group at Mount Sinai Hospital in New York was inspired to apply deep learning to the hospital’s vast database of patient records. This data set features hundreds of variables on patients, drawn from their test results, doctor visits, and so on. The resulting program, which the researchers named Deep Patient, was trained using data from about 700,000 individuals, and when tested on new records, it proved incredibly good at predicting disease. Without any expert instruction, Deep Patient had discovered patterns hidden in the hospital data that seemed to indicate when people were on the way to a wide range of ailments, including cancer of the liver. There are a lot of methods that are “pretty good” at predicting disease from a patient’s records, says Joel Dudley, who leads the Mount Sinai team. But, he adds, “this was just way better.”
At the same time, Deep Patient is a bit puzzling. It appears to anticipate the onset of psychiatric disorders like schizophrenia surprisingly well. But since schizophrenia is notoriously difficult for physicians to predict, Dudley wondered how this was possible. He still doesn’t know. The new tool offers no clue as to how it does this. If something like Deep Patient is actually going to help doctors, it will ideally give them the rationale for its prediction, to reassure them that it is accurate and to justify, say, a change in the drugs someone is being prescribed. “We can build these models,” Dudley says ruefully, “but we don’t know how they work.”
Artificial intelligence hasn’t always been this way. From the outset, there were two schools of thought regarding how understandable, or explainable, AI ought to be. Many thought it made the most sense to build machines that reasoned according to rules and logic, making their inner workings transparent to anyone who cared to examine some code. Others felt that intelligence would more easily emerge if machines took inspiration from biology, and learned by observing and experiencing. This meant turning computer programming on its head. Instead of a programmer writing the commands to solve a problem, the program generates its own algorithm based on example data and a desired output. The machine-learning techniques that would later evolve into today’s most powerful AI systems followed the latter path: the machine essentially programs itself.
At first this approach was of limited practical use, and in the 1960s and ’70s it remained largely confined to the fringes of the field. Then the computerization of many industries and the emergence of large data sets renewed interest. That inspired the development of more powerful machine-learning techniques, especially new versions of one known as the artificial neural network. By the 1990s, neural networks could automatically digitize handwritten characters.
But it was not until the start of this decade, after several clever tweaks and refinements, that very large—or “deep”—neural networks demonstrated dramatic improvements in automated perception. Deep learning is responsible for today’s explosion of AI. It has given computers extraordinary powers, like the ability to recognize spoken words almost as well as a person could, a skill too complex to code into the machine by hand. Deep learning has transformed computer vision and dramatically improved machine translation. It is now being used to guide all sorts of key decisions in medicine, finance, manufacturing—and beyond.
The workings of any machine-learning technology are inherently more opaque, even to computer scientists, than a hand-coded system. This is not to say that all future AI techniques will be equally unknowable. But by its nature, deep learning is a particularly dark black box.
You can’t just look inside a deep neural network to see how it works. A network’s reasoning is embedded in the behavior of thousands of simulated neurons, arranged into dozens or even hundreds of intricately interconnected layers. The neurons in the first layer each receive an input, like the intensity of a pixel in an image, and then perform a calculation before outputting a new signal. These outputs are fed, in a complex web, to the neurons in the next layer, and so on, until an overall output is produced. Plus, there is a process known as back-propagation that tweaks the calculations of individual neurons in a way that lets the network learn to produce a desired output.
The many layers in a deep network enable it to recognize things at different levels of abstraction. In a system designed to recognize dogs, for instance, the lower layers recognize simple things like outlines or color; higher layers recognize more complex stuff like fur or eyes; and the topmost layer identifies it all as a dog. The same approach can be applied, roughly speaking, to other inputs that lead a machine to teach itself: the sounds that make up words in speech, the letters and words that create sentences in text, or the steering-wheel movements required for driving.
Ingenious strategies have been used to try to capture and thus explain in more detail what’s happening in such systems. In 2015, researchers at Google modified a deep-learning-based image recognition algorithm so that instead of spotting objects in photos, it would generate or modify them. By effectively running the algorithm in reverse, they could discover the features the program uses to recognize, say, a bird or building. The resulting images, produced by a project known as Deep Dream, showed grotesque, alien-like animals emerging from clouds and plants, and hallucinatory pagodas blooming across forests and mountain ranges. The images proved that deep learning need not be entirely inscrutable; they revealed that the algorithms home in on familiar visual features like a bird’s beak or feathers. But the images also hinted at how different deep learning is from human perception, in that it might make something out of an artifact that we would know to ignore. Google researchers noted that when its algorithm generated images of a dumbbell, it also generated a human arm holding it. The machine had concluded that an arm was part of the thing.
Further progress has been made using ideas borrowed from neuroscience and cognitive science. A team led by Jeff Clune, an assistant professor at the University of Wyoming, has employed the AI equivalent of optical illusions to test deep neural networks. In 2015, Clune’s group showed how certain images could fool such a network into perceiving things that aren’t there, because the images exploit the low-level patterns the system searches for. One of Clune’s collaborators, Jason Yosinski, also built a tool that acts like a probe stuck into a brain. His tool targets any neuron in the middle of the network and searches for the image that activates it the most. The images that turn up are abstract (imagine an impressionistic take on a flamingo or a school bus), highlighting the mysterious nature of the machine’s perceptual abilities.
We need more than a glimpse of AI’s thinking, however, and there is no easy solution. It is the interplay of calculations inside a deep neural network that is crucial to higher-level pattern recognition and complex decision-making, but those calculations are a quagmire of mathematical functions and variables. “If you had a very small neural network, you might be able to understand it,” Jaakkola says. “But once it becomes very large, and it has thousands of units per layer and maybe hundreds of layers, then it becomes quite un-understandable.”
In the office next to Jaakkola is Regina Barzilay, an MIT professor who is determined to apply machine learning to medicine. She was diagnosed with breast cancer a couple of years ago, at age 43. The diagnosis was shocking in itself, but Barzilay was also dismayed that cutting-edge statistical and machine-learning methods were not being used to help with oncological research or to guide patient treatment. She says AI has huge potential to revolutionize medicine, but realizing that potential will mean going beyond just medical records. She envisions using more of the raw data that she says is currently underutilized: “imaging data, pathology data, all this information.”
After she finished cancer treatment last year, Barzilay and her students began working with doctors at Massachusetts General Hospital to develop a system capable of mining pathology reports to identify patients with specific clinical characteristics that researchers might want to study. However, Barzilay understood that the system would need to explain its reasoning. So, together with Jaakkola and a student, she added a step: the system extracts and highlights snippets of text that are representative of a pattern it has discovered. Barzilay and her students are also developing a deep-learning algorithm capable of finding early signs of breast cancer in mammogram images, and they aim to give this system some ability to explain its reasoning, too. “You really need to have a loop where the machine and the human collaborate,” -Barzilay says.
The U.S. military is pouring billions into projects that will use machine learning to pilot vehicles and aircraft, identify targets, and help analysts sift through huge piles of intelligence data. Here more than anywhere else, even more than in medicine, there is little room for algorithmic mystery, and the Department of Defense has identified explainability as a key stumbling block.
David Gunning, a program manager at the Defense Advanced Research Projects Agency, is overseeing the aptly named Explainable Artificial Intelligence program. A silver-haired veteran of the agency who previously oversaw the DARPA project that eventually led to the creation of Siri, Gunning says automation is creeping into countless areas of the military. Intelligence analysts are testing machine learning as a way of identifying patterns in vast amounts of surveillance data. Many autonomous ground vehicles and aircraft are being developed and tested. But soldiers probably won’t feel comfortable in a robotic tank that doesn’t explain itself to them, and analysts will be reluctant to act on information without some reasoning. “It’s often the nature of these machine-learning systems that they produce a lot of false alarms, so an intel analyst really needs extra help to understand why a recommendation was made,” Gunning says.
This March, DARPA chose 13 projects from academia and industry for funding under Gunning’s program. Some of them could build on work led by Carlos Guestrin, a professor at the University of Washington. He and his colleagues have developed a way for machine-learning systems to provide a rationale for their outputs. Essentially, under this method a computer automatically finds a few examples from a data set and serves them up in a short explanation. A system designed to classify an e-mail message as coming from a terrorist, for example, might use many millions of messages in its training and decision-making. But using the Washington team’s approach, it could highlight certain keywords found in a message. Guestrin’s group has also devised ways for image recognition systems to hint at their reasoning by highlighting the parts of an image that were most significant.
One drawback to this approach and others like it, such as Barzilay’s, is that the explanations provided will always be simplified, meaning some vital information may be lost along the way. “We haven’t achieved the whole dream, which is where AI has a conversation with you, and it is able to explain,” says Guestrin. “We’re a long way from having truly interpretable AI.”
It doesn’t have to be a high-stakes situation like cancer diagnosis or military maneuvers for this to become an issue. Knowing AI’s reasoning is also going to be crucial if the technology is to become a common and useful part of our daily lives. Tom Gruber, who leads the Siri team at Apple, says explainability is a key consideration for his team as it tries to make Siri a smarter and more capable virtual assistant. Gruber wouldn’t discuss specific plans for Siri’s future, but it’s easy to imagine that if you receive a restaurant recommendation from Siri, you’ll want to know what the reasoning was. Ruslan Salakhutdinov, director of AI research at Apple and an associate professor at Carnegie Mellon University, sees explainability as the core of the evolving relationship between humans and intelligent machines. “It’s going to introduce trust,” he says.
Just as many aspects of human behavior are impossible to explain in detail, perhaps it won’t be possible for AI to explain everything it does. “Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete, and the same could very well be true for AI,” says Clune, of the University of Wyoming. “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.”
If that’s so, then at some stage we may have to simply trust AI’s judgment or do without using it. Likewise, that judgment will have to incorporate social intelligence. Just as society is built upon a contract of expected behavior, we will need to design AI systems to respect and fit with our social norms. If we are to create robot tanks and other killing machines, it is important that their decision-making be consistent with our ethical judgments.
To probe these metaphysical concepts, I went to Tufts University to meet with Daniel Dennett, a renowned philosopher and cognitive scientist who studies consciousness and the mind. A chapter of Dennett’s latest book, From Bacteria to Bach and Back, an encyclopedic treatise on consciousness, suggests that a natural part of the evolution of intelligence itself is the creation of systems capable of performing tasks their creators do not know how to do. “The question is, what accommodations do we have to make to do this wisely—what standards do we demand of them, and of ourselves?” he tells me in his cluttered office on the university’s idyllic campus.
He also has a word of warning about the quest for explainability. “I think by all means if we’re going to use these things and rely on them, then let’s get as firm a grip on how and why they’re giving us the answers as possible,” he says. But since there may be no perfect answer, we should be as cautious of AI explanations as we are of each other’s—no matter how clever a machine seems. “If it can’t do better than us at explaining what it’s doing,” he says, “then don’t trust it.”
Original article here.
A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.
Even as machines known as “deep neural networks” have learned to converse, drive cars, beat video games and Go champions, dream, paint pictures and help make scientific discoveries, they have also confounded their human creators, who never expected so-called “deep-learning” algorithms to work so well. No underlying principle has guided the design of these learning systems, other than vague inspiration drawn from the architecture of the brain (and no one really understands how that operates either).
Like a brain, a deep neural network has layers of neurons — artificial ones that are figments of computer memory. When a neuron fires, it sends signals to connected neurons in the layer above. During deep learning, connections in the network are strengthened or weakened as needed to make the system better at sending signals from input data — the pixels of a photo of a dog, for instance — up through the layers to neurons associated with the right high-level concepts, such as “dog.” After a deep neural network has “learned” from thousands of sample dog photos, it can identify dogs in new photos as accurately as people can. The magic leap from special cases to general concepts during learning gives deep neural networks their power, just as it underlies human reasoning, creativity and the other faculties collectively termed “intelligence.” Experts wonder what it is about deep learning that enables generalization — and to what extent brains apprehend reality in the same way.
Last month, a YouTube videoof a conference talk in Berlin, shared widely among artificial-intelligence researchers, offered a possible answer. In the talk, Naftali Tishby, a computer scientist and neuroscientist from the Hebrew University of Jerusalem, presented evidence in support of a new theory explaining how deep learning works. Tishby argues that deep neural networks learn according to a procedure called the “information bottleneck,” which he and two collaborators first described in purely theoretical terms in 1999. The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts. Striking new computer experiments by Tishby and his student Ravid Shwartz-Ziv reveal how this squeezing procedure happens during deep learning, at least in the cases they studied.
Tishby’s findings have the AI community buzzing. “I believe that the information bottleneck idea could be very important in future deep neural network research,” said Alex Alemi of Google Research, who has already developed new approximation methods for applying an information bottleneck analysis to large deep neural networks. The bottleneck could serve “not only as a theoretical tool for understanding why our neural networks work as well as they do currently, but also as a tool for constructing new objectives and architectures of networks,” Alemi said.
Some researchers remain skeptical that the theory fully accounts for the success of deep learning, but Kyle Cranmer, a particle physicist at New York University who uses machine learning to analyze particle collisions at the Large Hadron Collider, said that as a general principle of learning, it “somehow smells right.”
Geoffrey Hinton, a pioneer of deep learning who works at Google and the University of Toronto, emailed Tishby after watching his Berlin talk. “It’s extremely interesting,” Hinton wrote. “I have to listen to it another 10,000 times to really understand it, but it’s very rare nowadays to hear a talk with a really original idea in it that may be the answer to a really major puzzle.”
According to Tishby, who views the information bottleneck as a fundamental principle behind learning, whether you’re an algorithm, a housefly, a conscious being, or a physics calculation of emergent behavior, that long-awaited answer “is that the most important part of learning is actually forgetting.”
Tishby began contemplating the information bottleneck around the time that other researchers were first mulling over deep neural networks, though neither concept had been named yet. It was the 1980s, and Tishby was thinking about how good humans are at speech recognition — a major challenge for AI at the time. Tishby realized that the crux of the issue was the question of relevance: What are the most relevant features of a spoken word, and how do we tease these out from the variables that accompany them, such as accents, mumbling and intonation? In general, when we face the sea of data that is reality, which signals do we keep?
“This notion of relevant information was mentioned many times in history but never formulated correctly,” Tishby said in an interview last month. “For many years people thought information theory wasn’t the right way to think about relevance, starting with misconceptions that go all the way to Shannon himself.”
Claude Shannon, the founder of information theory, in a sense liberated the study of information starting in the 1940s by allowing it to be considered in the abstract — as 1s and 0s with purely mathematical meaning. Shannon took the view that, as Tishby put it, “information is not about semantics.” But, Tishby argued, this isn’t true. Using information theory, he realized, “you can define ‘relevant’ in a precise sense.”
Imagine X is a complex data set, like the pixels of a dog photo, and Yis a simpler variable represented by those data, like the word “dog.” You can capture all the “relevant” information in X about Y by compressing X as much as you can without losing the ability to predict Y. In their 1999 paper, Tishby and co-authors Fernando Pereira, now at Google, and William Bialek, now at Princeton University, formulated this as a mathematical optimization problem. It was a fundamental idea with no killer application.
“I’ve been thinking along these lines in various contexts for 30 years,” Tishby said. “My only luck was that deep neural networks became so important.”
Eyeballs on Faces on People on Scenes
Though the concept behind deep neural networks had been kicked around for decades, their performance in tasks like speech and image recognition only took off in the early 2010s, due to improved training regimens and more powerful computer processors. Tishby recognized their potential connection to the information bottleneck principle in 2014 after reading a surprising paper by the physicists David Schwaband Pankaj Mehta.
The duo discovered that a deep-learning algorithm invented by Hinton called the “deep belief net” works, in a particular case, exactly like renormalization, a technique used in physics to zoom out on a physical system by coarse-graining over its details and calculating its overall state. When Schwab and Mehta applied the deep belief net to a model of a magnet at its “critical point,” where the system is fractal, or self-similar at every scale, they found that the network automatically used the renormalization-like procedure to discover the model’s state. It was a stunning indication that, as the biophysicist Ilya Nemenman said at the time, “extracting relevant features in the context of statistical physics and extracting relevant features in the context of deep learning are not just similar words, they are one and the same.”
The only problem is that, in general, the real world isn’t fractal. “The natural world is not ears on ears on ears on ears; it’s eyeballs on faces on people on scenes,” Cranmer said. “So I wouldn’t say [the renormalization procedure] is why deep learning on natural images is working so well.” But Tishby, who at the time was undergoing chemotherapy for pancreatic cancer, realized that both deep learning and the coarse-graining procedure could be encompassed by a broader idea. “Thinking about science and about the role of my old ideas was an important part of my healing and recovery,” he said.
In 2015, he and his student Noga Zaslavsky hypothesizedthat deep learning is an information bottleneck procedure that compresses noisy data as much as possible while preserving information about what the data represent. Tishby and Shwartz-Ziv’s new experiments with deep neural networks reveal how the bottleneck procedure actually plays out. In one case, the researchers used small networks that could be trained to label input data with a 1 or 0 (think “dog” or “no dog”) and gave their 282 neural connections random initial strengths. They then tracked what happened as the networks engaged in deep learning with 3,000 sample input data sets.
The basic algorithm used in the majority of deep-learning procedures to tweak neural connections in response to data is called “stochastic gradient descent”: Each time the training data are fed into the network, a cascade of firing activity sweeps upward through the layers of artificial neurons. When the signal reaches the top layer, the final firing pattern can be compared to the correct label for the image — 1 or 0, “dog” or “no dog.” Any differences between this firing pattern and the correct pattern are “back-propagated” down the layers, meaning that, like a teacher correcting an exam, the algorithm strengthens or weakens each connection to make the network layer better at producing the correct output signal. Over the course of training, common patterns in the training data become reflected in the strengths of the connections, and the network becomes expert at correctly labeling the data, such as by recognizing a dog, a word, or a 1.
In their experiments, Tishby and Shwartz-Ziv tracked how much information each layer of a deep neural network retained about the input data and how much information each one retained about the output label. The scientists found that, layer by layer, the networks converged to the information bottleneck theoretical bound: a theoretical limit derived in Tishby, Pereira and Bialek’s original paper that represents the absolute best the system can do at extracting relevant information. At the bound, the network has compressed the input as much as possible without sacrificing the ability to accurately predict its label.
Tishby and Shwartz-Ziv also made the intriguing discovery that deep learning proceeds in two phases: a short “fitting” phase, during which the network learns to label its training data, and a much longer “compression” phase, during which it becomes good at generalization, as measured by its performance at labeling new test data.
As a deep neural network tweaks its connections by stochastic gradient descent, at first the number of bits it stores about the input data stays roughly constant or increases slightly, as connections adjust to encode patterns in the input and the network gets good at fitting labels to it. Some experts have compared this phase to memorization.
Then learning switches to the compression phase. The network starts to shed information about the input data, keeping track of only the strongest features — those correlations that are most relevant to the output label. This happens because, in each iteration of stochastic gradient descent, more or less accidental correlations in the training data tell the network to do different things, dialing the strengths of its neural connections up and down in a random walk. This randomization is effectively the same as compressing the system’s representation of the input data. As an example, some photos of dogs might have houses in the background, while others don’t. As a network cycles through these training photos, it might “forget” the correlation between houses and dogs in some photos as other photos counteract it. It’s this forgetting of specifics, Tishby and Shwartz-Ziv argue, that enables the system to form general concepts. Indeed, their experiments revealed that deep neural networks ramp up their generalization performance during the compression phase, becoming better at labeling test data. (A deep neural network trained to recognize dogs in photos might be tested on new photos that may or may not include dogs, for instance.)
It remains to be seen whether the information bottleneck governs all deep-learning regimes, or whether there are other routes to generalization besides compression. Some AI experts see Tishby’s idea as one of many important theoretical insights about deep learning to have emerged recently. Andrew Saxe, an AI researcher and theoretical neuroscientist at Harvard University, noted that certain very large deep neural networks don’t seem to need a drawn-out compression phase in order to generalize well. Instead, researchers program in something called early stopping, which cuts training short to prevent the network from encoding too many correlations in the first place.
Tishby argues that the network models analyzed by Saxe and his colleagues differ from standard deep neural network architectures, but that nonetheless, the information bottleneck theoretical bound defines these networks’ generalization performance better than other methods. Questions about whether the bottleneck holds up for larger neural networks are partly addressed by Tishby and Shwartz-Ziv’s most recent experiments, not included in their preliminary paper, in which they train much larger, 330,000-connection-deep neural networks to recognize handwritten digits in the 60,000-image Modified National Institute of Standards and Technology database, a well-known benchmark for gauging the performance of deep-learning algorithms. The scientists saw the same convergence of the networks to the information bottleneck theoretical bound; they also observed the two distinct phases of deep learning, separated by an even sharper transition than in the smaller networks. “I’m completely convinced now that this is a general phenomenon,” Tishby said.
Humans and Machines
The mystery of how brains sift signals from our senses and elevate them to the level of our conscious awareness drove much of the early interest in deep neural networks among AI pioneers, who hoped to reverse-engineer the brain’s learning rules. AI practitioners have since largely abandoned that path in the mad dash for technological progress, instead slapping on bells and whistles that boost performance with little regard for biological plausibility. Still, as their thinking machines achieve ever greater feats — even stoking fears that AI could someday pose an existential threat — many researchers hope these explorations will uncover general insights about learning and intelligence.
Brenden Lake, an assistant professor of psychology and data science at New York University who studies similarities and differences in how humans and machines learn, said that Tishby’s findings represent “an important step towards opening the black box of neural networks,” but he stressed that the brain represents a much bigger, blacker black box. Our adult brains, which boast several hundred trillion connections between 86 billion neurons, in all likelihood employ a bag of tricks to enhance generalization, going beyond the basic image- and sound-recognition learning procedures that occur during infancy and that may in many ways resemble deep learning.
For instance, Lake said the fitting and compression phases that Tishby identified don’t seem to have analogues in the way children learn handwritten characters, which he studies. Children don’t need to see thousands of examples of a character and compress their mental representation over an extended period of time before they’re able to recognize other instances of that letter and write it themselves. In fact, they can learn from a single example. Lake and his colleagues’ models suggest the brain may deconstruct the new letter into a series of strokes — previously existing mental constructs — allowing the conception of the letter to be tacked onto an edifice of prior knowledge. “Rather than thinking of an image of a letter as a pattern of pixels and learning the concept as mapping those features” as in standard machine-learning algorithms, Lake explained, “instead I aim to build a simple causal model of the letter,” a shorter path to generalization.
Such brainy ideas might hold lessons for the AI community, furthering the back-and-forth between the two fields. Tishby believes his information bottleneck theory will ultimately prove useful in both disciplines, even if it takes a more general form in human learning than in AI. One immediate insight that can be gleaned from the theory is a better understanding of which kinds of problems can be solved by real and artificial neural networks. “It gives a complete characterization of the problems that can be learned,” Tishby said. These are “problems where I can wipe out noise in the input without hurting my ability to classify. This is natural vision problems, speech recognition. These are also precisely the problems our brain can cope with.”
Meanwhile, both real and artificial neural networks stumble on problems in which every detail matters and minute differences can throw off the whole result. Most people can’t quickly multiply two large numbers in their heads, for instance. “We have a long class of problems like this, logical problems that are very sensitive to changes in one variable,” Tishby said. “Classifiability, discrete problems, cryptographic problems. I don’t think deep learning will ever help me break cryptographic codes.”
Generalizing — traversing the information bottleneck, perhaps — means leaving some details behind. This isn’t so good for doing algebra on the fly, but that’s not a brain’s main business. We’re looking for familiar faces in the crowd, order in chaos, salient signals in a noisy world.
Original article here.
The market for artificial intelligence (AI) technologies is flourishing. Beyond the hype and the heightened media attention, the numerous startups and the internet giants racing to acquire them, there is a significant increase in investment and adoption by enterprises. A Narrative Science survey found last year that 38% of enterprises are already using AI, growing to 62% by 2018. Forrester Research predicted a greater than 300% increase in investment in artificial intelligence in 2017 compared with 2016. IDC estimated that the AI market will grow from $8 billion in 2016 to more than $47 billion in 2020.
Coined in 1955 to describe a new computer science sub-discipline, “Artificial Intelligence” today includes a variety of technologies and tools, some time-tested, others relatively new. To help make sense of what’s hot and what’s not, Forrester just published a TechRadar report on Artificial Intelligence (for application development professionals), a detailed analysis of 13 technologies enterprises should consider adopting to support human decision-making.
Based on Forrester’s analysis, here’s my list of the 10 hottest AI technologies:
- Natural Language Generation: Producing text from computer data. Currently used in customer service, report generation, and summarizing business intelligence insights. Sample vendors: Attivio, Automated Insights, Cambridge Semantics, Digital Reasoning, Lucidworks, Narrative Science, SAS, Yseop.
- Speech Recognition: Transcribe and transform human speech into format useful for computer applications. Currently used in interactive voice response systems and mobile applications. Sample vendors: NICE, Nuance Communications, OpenText, Verint Systems.
- Virtual Agents: “The current darling of the media,” says Forrester (I believe they refer to my evolving relationships with Alexa), from simple chatbots to advanced systems that can network with humans. Currently used in customer service and support and as a smart home manager. Sample vendors: Amazon, Apple, Artificial Solutions, Assist AI, Creative Virtual, Google, IBM, IPsoft, Microsoft, Satisfi.
- Machine Learning Platforms: Providing algorithms, APIs, development and training toolkits, data, as well as computing power to design, train, and deploy models into applications, processes, and other machines. Currently used in a wide range of enterprise applications, mostly `involving prediction or classification. Sample vendors: Amazon, Fractal Analytics, Google, H2O.ai, Microsoft, SAS, Skytree.
- AI-optimized Hardware: Graphics processing units (GPU) and appliances specifically designed and architected to efficiently run AI-oriented computational jobs. Currently primarily making a difference in deep learning applications. Sample vendors: Alluviate, Cray, Google, IBM, Intel, Nvidia.
- Decision Management: Engines that insert rules and logic into AI systems and used for initial setup/training and ongoing maintenance and tuning. A mature technology, it is used in a wide variety of enterprise applications, assisting in or performing automated decision-making. Sample vendors: Advanced Systems Concepts, Informatica, Maana, Pegasystems, UiPath.
- Deep Learning Platforms: A special type of machine learning consisting of artificial neural networks with multiple abstraction layers. Currently primarily used in pattern recognition and classification applications supported by very large data sets. Sample vendors: Deep Instinct, Ersatz Labs, Fluid AI, MathWorks, Peltarion, Saffron Technology, Sentient Technologies.
- Biometrics: Enable more natural interactions between humans and machines, including but not limited to image and touch recognition, speech, and body language. Currently used primarily in market research. Sample vendors: 3VR, Affectiva, Agnitio, FaceFirst, Sensory, Synqera, Tahzoo.
- Robotic Process Automation: Using scripts and other methods to automate human action to support efficient business processes. Currently used where it’s too expensive or inefficient for humans to execute a task or a process. Sample vendors: Advanced Systems Concepts, Automation Anywhere, Blue Prism, UiPath, WorkFusion.
- Text Analytics and NLP: Natural language processing (NLP) uses and supports text analytics by facilitating the understanding of sentence structure and meaning, sentiment, and intent through statistical and machine learning methods. Currently used in fraud detection and security, a wide range of automated assistants, and applications for mining unstructured data. Sample vendors: Basis Technology, Coveo, Expert System, Indico, Knime, Lexalytics, Linguamatics, Mindbreeze, Sinequa, Stratifyd, Synapsify.
There are certainly many business benefits gained from AI technologies today, but according to a survey Forrester conducted last year, there are also obstacles to AI adoption as expressed by companies with no plans of investing in AI:
There is no defined business case 42%
Not clear what AI can be used for 39%
Don’t have the required skills 33%
Need first to invest in modernizing data mgt platform 29%
Don’t have the budget 23%
Not certain what is needed for implementing an AI system 19%
AI systems are not proven 14%
Do not have the right processes or governance 13%
AI is a lot of hype with little substance 11%
Don’t own or have access to the required data 8%
Not sure what AI means 3%
Once enterprises overcome these obstacles, Forrester concludes, they stand to gain from AI driving accelerated transformation in customer-facing applications and developing an interconnected web of enterprise intelligence.
Original article here.
FEI-FEI LI IS a big deal in the world of AI. As the director of the Artificial Intelligence and Vision labs at Stanford University, she oversaw the creation of ImageNet, a vast database of images designed to accelerate the development of AI that can “see.” And, well, it worked, helping to drive the creation of deep learning systems that can recognize objects, animals, people, and even entire scenes in photos—technology that has become commonplace on the world’s biggest photo-sharing sites. Now, Fei-Fei will help run a brand new AI group inside Google, a move that reflects just how aggressively the world’s biggest tech companies are remaking themselves around this breed of artificial intelligence.
Alongside a former Stanford researcher—Jia Li, who more recently ran research for the social networking service Snapchat—the China-born Fei-Fei will lead a team inside Google’s cloud computing operation, building online services that any coder or company can use to build their own AI. This new Cloud Machine Learning Group is the latest example of AI not only re-shaping the technology that Google uses, but also changing how the company organizes and operates its business.
Google is not alone in this rapid re-orientation. Amazon is building a similar group cloud computing group for AI. Facebook and Twitter have created internal groups akin to Google Brain, the team responsible for infusing the search giant’s own tech with AI. And in recent weeks, Microsoft reorganized much of its operation around its existing machine learning work, creating a new AI and research group under executive vice president Harry Shum, who began his career as a computer vision researcher.
Oren Etzioni, CEO of the not-for-profit Allen Institute for Artificial Intelligence, says that these changes are partly about marketing—efforts to ride the AI hype wave. Google, for example, is focusing public attention on Fei-Fei’s new group because that’s good for the company’s cloud computing business. But Etzioni says this is also part of very real shift inside these companies, with AI poised to play an increasingly large role in our future. “This isn’t just window dressing,” he says.
The New Cloud
Fei-Fei’s group is an effort to solidify Google’s position on a new front in the AI wars. The company is challenging rivals like Amazon, Microsoft, and IBM in building cloud computing services specifically designed for artificial intelligence work. This includes services not just for image recognition, but speech recognition, machine-driven translation, natural language understanding, and more.
Cloud computing doesn’t always get the same attention as consumer apps and phones, but it could come to dominate the balance sheet at these giant companies. Even Amazon and Google, known for their consumer-oriented services, believe that cloud computing could eventually become their primary source of revenue. And in the years to come, AI services will play right into the trend, providing tools that allow of a world of businesses to build machine learning services they couldn’t build on their own. Iddo Gino, CEO of RapidAPI, a company that helps businesses use such services, says they have already reached thousands of developers, with image recognition services leading the way.
When it announced Fei-Fei’s appointment last week, Google unveiled new versions of cloud services that offer image and speech recognition as well as machine-driven translation. And the company said it will soon offer a service that allows others to access to vast farms of GPU processors, the chips that are essential to running deep neural networks. This came just weeks after Amazon hired a notable Carnegie Mellon researcher to run its own cloud computing group for AI—and just a day after Microsoft formally unveiled new services for building “chatbots” and announced a deal to provide GPU services to OpenAI, the AI lab established by Tesla founder Elon Musk and Y Combinator president Sam Altman.
The New Microsoft
Even as they move to provide AI to others, these big internet players are looking to significantly accelerate the progress of artificial intelligence across their own organizations. In late September, Microsoft announced the formation of a new group under Shum called the Microsoft AI and Research Group. Shum will oversee more than 5,000 computer scientists and engineers focused on efforts to push AI into the company’s products, including the Bing search engine, the Cortana digital assistant, and Microsoft’s forays into robotics.
The company had already reorganized its research group to move quickly into new technologies into products. With AI, Shum says, the company aims to move even quicker. In recent months, Microsoft pushed its chatbot work out of research and into live products—though not quite successfully. Still, it’s the path from research to product the company hopes to accelerate in the years to come.
“With AI, we don’t really know what the customer expectation is,” Shum says. By moving research closer to the team that actually builds the products, the company believes it can develop a better understanding of how AI can do things customers truly want.
The New Brains
In similar fashion, Google, Facebook, and Twitter have already formed central AI teams designed to spread artificial intelligence throughout their companies. The Google Brain team began as a project inside the Google X lab under another former Stanford computer science professor, Andrew Ng, now chief scientist at Baidu. The team provides well-known services such as image recognition for Google Photos and speech recognition for Android. But it also works with potentially any group at Google, such as the company’s security teams, which are looking for ways to identify security bugs and malware through machine learning.
Facebook, meanwhile, runs its own AI research lab as well as a Brain-like team known as the Applied Machine Learning Group. Its mission is to push AI across the entire family of Facebook products, and according chief technology officer Mike Schroepfer, it’s already working: one in five Facebook engineers now make use of machine learning. Schroepfer calls the tools built by Facebook’s Applied ML group “a big flywheel that has changed everything” inside the company. “When they build a new model or build a new technique, it immediately gets used by thousands of people working on products that serve billions of people,” he says. Twitter has built a similar team, called Cortex, after acquiring several AI startups.
The New Education
The trouble for all of these companies is that finding that talent needed to drive all this AI work can be difficult. Given the deep neural networking has only recently entered the mainstream, only so many Fei-Fei Lis exist to go around. Everyday coders won’t do. Deep neural networking is a very different way of building computer services. Rather than coding software to behave a certain way, engineers coax results from vast amounts of data—more like a coach than a player.
As a result, these big companies are also working to retrain their employees in this new way of doing things. As it revealed last spring, Google is now running internal classes in the art of deep learning, and Facebook offers machine learning instruction to all engineers inside the company alongside a formal program that allows employees to become full-time AI researchers.
Yes, artificial intelligence is all the buzz in the tech industry right now, which can make it feel like a passing fad. But inside Google and Microsoft and Amazon, it’s certainly not. And these companies are intent on pushing it across the rest of the tech world too.
Original article here.