Peter Hopkins:
Among other projects - you're doing lots of stuff - you get
involved in some very heady questions about the origins of truth
on the internet.
And this is where we're getting folks because
the work that Danny's describing now in theory ultimately became
a venture, right? Metaweb.
Danny Hillis: So that's right.
So what I really thought is that
what we need to do is have a way of representing the knowledge
of the world in a way that machines can get at them, and take
advantage of it - and that that should be shared.
Everybody
should be able to get at it. That is, in some sense if the human
knowledge isn't a shared resource - then what is? I mean what
has civilization been doing all these years?
So I created a
company that built this database called Freebase.
It was a free
database. And the company basically took any kind of public
knowledge that we could get, information about anything and put
it in machine-readable format.
We were kind of creating with the idea that this is going to be
useful to the world. We didn't really have a business model. And
we started building it up, and then it became useful to lots of
different people including particularly all the search engines.
So eventually Google bought it, of course. And then I got Google
to agree to keep it open for three years, but they only kept the
part that was already open open, and they started building it
up.
And so now Google has something called the
Knowledge Graph
which is the evolution of this. And it probably has about 100
billion different entities. So everybody in this room is in that
graph. This building is in that graph.
Peter Hopkins: Yes, I took a screenshot earlier of when you just
Googled NeueHouse, and all of these different...
Danny Hillis: That's right. NeueHouse is obviously in the graph.
So this event is, and yes. So anything like a person, a place,
an event.
Anything like that is in this huge knowledge base, and
all the relationships between them are. So when you, for
instance, print out a Google map, that is rendered from the
Knowledge Graph.
So the Knowledge Graph knows the bus schedules
and it knows the address of the restaurant and the traffic.
Peter Hopkins: It's drawing all this information together around
the thing that the searcher cares about.
Danny Hillis: That's right.
So the map is just in some sense a
custom rendering of a piece of the Knowledge Graph for your
particular purpose.
And also by the way, I don't know - this
doesn't have any ads on it, but the other thing is that the ads
are also like a lot of Knowledge Graph about what the products
are about and whether - it probably has knowledge about you,
specifically, and so on.
So it's gone way beyond the kind of
public knowledge, also again it probably has very particular
private knowledge about people too.
Peter Hopkins: Now, from Google's perspective it's safe to say
that this is a quantum leap in terms of the original basis of
its citation-based search model.
All of a sudden it is now
providing this multidimensional search that is drawing in way
more richness.
Danny Hillis: It still does the old kind of search.
So right now
when you, let's say I put in museums of New York. You know,
"museums in New York," well, it still does the old keyword
search of searching for pages that have the word "museum" and
the phrase "New York," but it doesn't - if you say "an
exhibition in Manhattan" or something, you might have something
that's a museum in New York that actually didn't use the word
"museum" and "New York" on the page.
But the Knowledge Graph
knows that Manhattan is in New York, and it knows that
exhibitions are in museums, or may know something is a museum
even if it doesn't use the word museum in its title.
And so it's actually able to pick that up even though it's not,
it doesn't have the keyword. So that will play into the search
results that come up. It does a search that's based on the
semantics. And, of course, that's very important because that
kind of knowledge is completely language independent too.
So the
same knowledge that informs your search in English also informs
somebody's search in Mandarin or Hindi or something like that.
So the good news is it's turned out to be really useful. There
are these big representations of knowledge.
But the bad news is
the whole idea of it being this free, open thing that everybody
was going to use has actually become really just something that
is a
competitive advantage of Google, and now other search
engines and other companies will make their own I'm sure.
Apple
is working on it, Amazon, you know. Each of the big companies - IBM, Microsoft.
They'll each work on their own database.
So the
world could go in one of two directions:
-
we could either have
this sort of oligarchy of big companies that have giant
knowledge bases that they use for proprietary advantage
-
or it
could flip over and say it becomes a public resource, that we
could say,
"We want knowledge to be a public resource. And we
want, in particular, knowledge that's tied to who said what,"
...because this is not, it doesn't represent truth, remember!
It
represents who said stuff and that becomes then a resource for
doing things like sorting out what's fake news or deciding what
medical treatments, what effects are in the scientific
literature, things like that that really don't align very well
with commercial goals.
Peter Hopkins: And this is where Underlay comes in.
Underlay in
many respects is your attempt to kind of reclaim this technology
as the public good that you kind of initially envisioned it as.
Danny Hillis: Yes, it's my penance for having sold the other one
to Google.
Peter Hopkins: So I'm actually stuck on the screen here.
I
thought there was a very nice paragraph on the very simple
Underlay website, which basically in written terms explains kind
of what it's attempting to do.
And it says The Underlay
aggregates statements and reported observations, along with
citations of who made and who published them.
For example, it
would not contain the bare assertion that,
"Sudan's population
was 39M in 2008",
...but rather that,
"Sudan's population was
'provisionally' 39M in 2008, according to the UN's statistics
division in 2011, referencing Sudan's national census, as
reported by its Central Bureau of Statistics, and as contested
by the Southern People's Liberation Movement."
Danny Hillis: And it would do that not in those words, but in a
kind of machine-readable.
Peter Hopkins: Right.
So that those could be - and ultimately
this version of what you are going at becomes almost a kind of
record of all of these observations over time, and then can be
tracked.
So if we wanted to get to the heart of, let's say,
whether in one of these hearings we just watched, somebody said
one or the other, we could trace it potentially back to the
first recorded incidents.
Danny Hillis: Yes. And if you take a problem like that I would
regard that as an application of the Underlay, just like Google
Maps and say drawing a map is.
But if you take sorting through
fake news and recognizing when rumors are getting out of
control, in order to do that you really need a very complex
representation of who's saying what.
So you can kind of trace
whether this person said that or this person said that this
person said that. Or the New York Times said that, you know, the
Drudge Report said that.
And so there is something that needs to
be built on top of the Underlay that is essentially a network of
trust for that purpose. So somebody has to say well, okay, I
trust New York Times more than I trust Fox News or vice versa.
Peter Hopkins: And these would be organizations or individuals
with some sort of framework of analysis that would leverage the
Underlay for interpretative purposes.
Danny Hillis: And it's going to be for different purposes. I
mean an awful lot of the things that people argue about - I
mean, is Taiwan a province of China?
Well, you know, if you're
doing something with the Chinese government you've got to count
it as one. If you're doing something with Taiwan you're probably
not going to count it. So for some purposes it "is", for some
purposes it "isn't".
And so what's the truth of that? Well there
isn't exactly a truth.
It's, you know, what's the purpose,
what's the trust in it? and so on.
And many of these - so I sort
of feel like the Underlay is, in some sense it's a piece of the
plumbing that we need to deal with the fact that the amount of
information has become overwhelming, that no human can hold it
all in their heads.
Nobody can be sort of familiar with all the
news sources or things like that. And then that lets us build
these things on top of it where computers help us be smarter in
sort of navigating these networks of trust.
Peter Hopkins: And so you were conceiving of this challenge...
This is in the mid, early 2000s and what was the first inklings
of an approach that technology could provide to addressing this,
and to kind of capturing the chain, if you will, of custody of
information.
Danny Hillis: So the idea was to build something that basically
said what the agreed on the things that you were talking about,
the entities that you were talking about...
Let people make
statements about the relationships between them but then have
some provenance of who made those statements, so that instead of
recording that "the glass is sitting on the table," you record,
"Danny said the glass is sitting on the table on such and such a
day."
And then once you have all that information recorded then
that lets you, first of all it lets you record information
without worrying to much about whether it's true. It's true that
I said that, which is much easier to determine than whether it's
true that the glass is actually on the table.
But then it also
lets you apply basically your idea of trust afterwards, after
you get more information about who I am - or later you find out
I'm a liar or later you find out the glass was someplace else.
Peter Hopkins: You can weigh those previous recordings against
it.
Danny Hillis: Exactly.
So the idea is that what we really need
to do is we need to separate up two things.
We need to separate the record of what different people said and
who said it - the provenance of what was said... And then
separately have in some sense a network of trust which is going
to be different for different purposes.
Ultimately there's lots of kinds of knowledge that I think
really are fundamentally part of the public common, the public
good.
And I hope that those will end up in it, and I think it's
not as complicated as copyright law where you're taking the
expression of the individual artist and things like that. A fact
is a fact. It's not copyrightable, to own truth.
If somebody
figures out the geographical location of this building, that's
just a truth.
Nobody owns that.
And, really, it's to everybody's
advantage to share that.