hello I'm professor world today we'll be
discussing how to aggregate network
flows into application flows so if you
remember from previous lessons we talked
about learning or discovering
application flows from network traffic
either by deploying a sniffer and
capturing full pcap files or by using a
net flow source and processing that
today I want to look a little deeper
into this processing part to see what
sort of considerations go into doing
that well so our starting point is let's
say a net flow capture so a list of
Records source destination and service
which we want to represent as flows
supporting some application if we look
at this an easy operation to make is
that you can see a lot of repetition in
the service and also here in this
example all the destinations are the
same it's the one IP address of all
these destinations and there are
multiple sources so the easiest thing to
do is to take this representation and
convert it to something that looks like
what we have here in option number one
at the top where the service is listed
once and the destination is listed once
just IP address and in the source we see
the list of all the IP addresses that
were discovered in in the net flow
sorted by IP address this can can be
done quite easily however I argued that
this is not very satisfactory for a few
reasons first of all it's too detailed
there could be here we have only six but
it could be hundreds or thousands of
separate IP addresses appear in the
source and this is two too long and too
detailed for a person to look at and
understand it's also very accurate it's
too accurate it records exactly the IP
addresses that were
seen in the network capture but it's not
future-proof I mean if you look at this
list and you can see that IP address dot
seven and dot 8 and dot 13 connected to
this destination it's quite plausible
that IP addresses dot 9 10 11 and dot 12
would also connect to the same web
server at some point in the future they
just didn't do so while we were
capturing the traffic and if we restrict
ourselves to only the IP addresses that
we observed we get something that is
very accurate but not future proof so
this might not be the best way to
represent the flows that we observed an
alternative is what we have here in
number 2 instead of having the
individual IP addresses listed in the
source we just have the source of any
and then the destination would be that
web server and the service was HTTP so
in terms of usability this is great this
is very compact just a single record
showing what might happen capturing or
describing all the flows that were
actually observed and it's completely
future proof every possible IP address
could appear in the source and the flow
still describes it so this is good
the downside is of course that is too
broad this is very very inaccurate it's
not that we saw every single IP address
in the Internet and the IP addresses we
did see are not uniformly distributed
they are quite focused there are you can
see that the 10.2 subnet is is quite
visible and then we have this outlier in
the 3.7 IP address but it's still pretty
focused so we could do better than this
as well and trying to strike the balance
is what I have here in option number 3
where you can see that well the
destination and service are as before
but in terms of the source we have the
3.7 1.7 IP address appears separately as
a slash 32 cyber block
and the five other IP addresses appear
as 10.2 1.0 / 24 which is wider than
what was strictly observed this allows
or describes 256 possible IP addresses
but they're all grouped together in one
subnet so it is future-proof up to a
point so all the IP addresses in the
10.2 subnet are described by this flow
but it's not too accurate and it's not
as as detailed as we had up here and
it's it's reasonably accurate and it's
quite usable because it's still very
compact so it is possible to find this
middle ground algorithmically trying to
balance the accuracy against the
usability and to produce a compact
representation that's still reasonable
and usable thank you for your attention