FB alumni and current employees share - 工程師
By Puput
at 2021-10-21T23:23
at 2021-10-21T23:23
Table of Contents
哈囉,跟大家推薦,絕對優質-Podcast-Iterative Venture
本次分享-EP4重點內容給大家:
Building Data-Driven Products with Data Science Founders Panel
絕對值得大家閱讀,滿滿的學習與心境分享,
若看完有興趣的大家請聽起來!並請支持創辦人個人FB 粉專
https://www.facebook.com/richardcheniterativeventure/
Podcast『每月』定期更新,歡迎訂閱收聽!並且推薦給你的朋友,
邀請Podcast大多數訪談來賓,來自有Facebook工作經驗員工
分享許多最精華和成長技術主題或創業、矽谷科技第一流資訊和工作經驗。
Podcast創辦人,本身曾經在矽谷-facebook從事:
Data Science, Data Engineering, and Backend Software Engineeering。
也非常榮幸被『 矽谷為什麼』-Podcast邀請訪談,
如何進入數據科學領域,成為科學家?
透過經營粉專,跟大家分享各種深入數據議題!
根據媒體文章報導,21世紀最性感的職業『數據科學家』。
數據科學家在做什麼嗎?
數據科學家,需要運用到那些技能、統計及構建預測模型?
為什麼需要數據?如何透過歷史數據做更精準、有效決策?
對於數據科學未來十年,如何做出決策評估風險的挑戰?
如何建溝一個更好的data分析評估,它的設計影響公司產品分析,
並根據data幫助企業實現目標和提出新的建議,
不同的商業模式,我們都需有更多洞察力解決問題。
【Podcast重點方向】
The panels shared their perspectives on the current landscape of the Data
Science startup ecosystem, the importance of data-driven decision-making
culture in building products, and tools that the founders in the panel are
building such as Statsig's Product Experimentation tool.
【內容討論】
【Iterative Venture Newsletter: Data Landscape in the 2020s】
( In our podcast-EP4 episode on discussing the latest data trends.)
"First reckon, then risk." - Von Moltke.
Why do we need data and historical perspective
Oftentimes, we overhear conversations that companies get acquired for their “
data”, their data platform, or that we have clean (or unclean) “data” etc.
What are they trying to say? Why would a company get acquired purely for their
data? What is data and why do we need them anyway?
The way we understand data is that data are sample points much like the
experiences we have in life. The more we see the world, the more data points we
have, and therefore the more apt we are to make decisions (hopefully good ones)
based on the experiences we have had. This is no different for a company.
Thus, data is essential knowledge for the company and our brains are just very
complex and efficient data infrastructures enabling us to store data, make
decisions, explain such decisions, and iterate after incorporating new
information.
To make good decisions, therefore we should have a lot of data points to learn
from and be efficient to access them. This is what led to Facebook’s creation
of Hive, Presto, Scuba, and the likes to suit different problems such as quick
data accessibility for debugging purposes and simple insights (scuba),
cost-effective big-data data warehouse (Hive) for data crunching, as well as
the mixture of the two (presto) so to have more “citizen data scientists,
as per Ashu.
The new decade
Humanizing AI
At the turning of the new decade, and with the rise of the optimism of what
machine models and AI can bring, so too comes the challenge of how to provide
an explanation into how such models make decisions. This is where Krishna's
company, Fiddler.ai, comes in.
(Fiddler.ai:https://www.fiddler.ai/ )
When it comes to complex machine learning models such as random forest,
XgBoost, and neural network, etc. the models can sometimes be so complex with
intertwined permutations that explanation is simply impossible as there is not
a single formula we can boil down to.
For reference, below is an example of how neural network decision works at a
very high level with each of the circle as a variable:
(Picture 2:Source: IBM)
Just how does it work? Many of us just take it as a black box.
We spoke(Podcast EP1) with Krishna a while ago about this and he said that the
reason why he is solving this problem is that there is a monumental shift in
traditional industries such as banking and insurance.
Namely, there is a paradigm shift in the way how risks are assessed. In the
past, models were deterministic. That means we have a complex if-else switch
statement in place where we can evaluate whether someone is a risk and
therefore we should deny their credit card application. The good thing is that
we know why we rejected someone. The downside is that our models are fixed and
the importance of each criterion is pre-determined.
Machine learning models on the other hand can easily incorporate new data and
the models can easily be swapped with other models and thus can offer a dynamic
solution to the problem.
Check out the podcast episode we had Krishna to learn more on how Fiddler is
solving the problem.
Faster Iterative Loop
Another challenge that tech companies face in the new decade is the
ever-competitive landscape as there are more and more entrants.
For reference, the number of apps hitting the Apple App Store is exponential.
(Picture 3:Source:Statista (Note the time scale changes in 2020))
In this landscape, differentiation and the ability to learn and incorporate new
changes become absolutely key. To this point, there is a saying in Silicon
Valley that if there are two startups in competition against each other, the
one that iterates faster will win.
This is where Vijaye and Statsig (short for statistical significance) comes in.
What Statsig is hoping to achieve is to allow any tech companies to easily set
up the gatekeeper mechanism (the capability to show certain users a feature
that a company wants to roll out and not others for comparison purposes) and
the dashboard to collect relevant statistics for the control and test group in
order to better understand whether the product launch had achieved the intended
purpose.
Furthermore, Statisig offers the capability for companies to gradually roll out
their features in releases in a controlled fashion.
This not only achieves the goal of relying on the platform for release as well
as data collection but it also informs the relevant product teams of the effect
of any of the releases thus inherently building in the data-driven culture as
every release can now be backed up by statistics and data.
Democratizing Data Technology
From a broader overview and from investors’ perspectives, there is also
general democratization of data technology as well as vertical integration for
efficiency as per Ashu and Ravi.
As more and more folks coming from companies such as Google, Amazon, and
Facebook, are spreading the data-driven culture, more and more people are
realizing the power of data and are thus seeking to democratize data usage and
thus allowing for more “citizen data scientists”.
One angle of such is to reduce the technical bar in order to achieve the same
result. One prominent example that we came across in the past was Looker,
acquired by Google in 2019 for $2.6 billion, as Looker looks at data as objects
and thus allows for users to create LookML models (data objects) and to be
joined with other datasets thus enriching the data in a drag and drop fashion
reducing the need of having to write complex joins via SQL queries.
On the other hand, having vertically integrated platforms also means that we
can have fewer people working on the same system as the systems can be
centrally configured without companies having to write custom software.
One prominent example of such is the rise of cloud technology where in the past
a company may have to set up their own data centers but now everything can be
configured elastically via AWS, GCP, or Microsoft Azure.
Such trend is only accelerating as various buzzwords such as DataOps and MLOps
enter into our common use.
Just a matter of time
With various trends emerging, the challenges for many companies, especially
ones that are older, are that there are still a lot of data that reside in
on-premise data centers as per Ashu. Such data centers, because of archaic
technologies, do not lend themselves to easily transfer data across and thus
people have to physically remove the hard drives in order to copy over the data.
On the other hand, while such trends are emerging in Silicon Valley and other
tech hubs, many companies may not be able to afford the same level of
compensation for tech employees to justify the said employees to go to other
companies (a good number of which are legacy ones) and thus spread the
data-driven culture outside of Silicon Valley.
However, just with all problems, it is a matter time that such culture will
spread beyond the valley and I have no doubt that talented folks will come up
with new solutions that it may be both cheaper, more accessible, and easy to
use for companies of all kind to adopt a more data-driven solution and
ultimately, shape a more data-driven culture.
Podcast Link:
Spotify: https://lnkd.in/gYSm-Vvm
Apple: https://lnkd.in/g6kYiitq
Google: https://lnkd.in/gB_p7gMR
Any feedback would be appreciated
看到這裡的大家,非常謝謝,
如果這篇文章有幫助到大家學習,
歡迎至Facebook:
Richard Chen - Iterative Venture 按讚追蹤,
若有相關疑問,歡迎大家透過粉絲專頁私訊,也可以在底下留言:)
--
本次分享-EP4重點內容給大家:
Building Data-Driven Products with Data Science Founders Panel
絕對值得大家閱讀,滿滿的學習與心境分享,
若看完有興趣的大家請聽起來!並請支持創辦人個人FB 粉專
https://www.facebook.com/richardcheniterativeventure/
Podcast『每月』定期更新,歡迎訂閱收聽!並且推薦給你的朋友,
邀請Podcast大多數訪談來賓,來自有Facebook工作經驗員工
分享許多最精華和成長技術主題或創業、矽谷科技第一流資訊和工作經驗。
Podcast創辦人,本身曾經在矽谷-facebook從事:
Data Science, Data Engineering, and Backend Software Engineeering。
也非常榮幸被『 矽谷為什麼』-Podcast邀請訪談,
如何進入數據科學領域,成為科學家?
透過經營粉專,跟大家分享各種深入數據議題!
根據媒體文章報導,21世紀最性感的職業『數據科學家』。
數據科學家在做什麼嗎?
數據科學家,需要運用到那些技能、統計及構建預測模型?
為什麼需要數據?如何透過歷史數據做更精準、有效決策?
對於數據科學未來十年,如何做出決策評估風險的挑戰?
如何建溝一個更好的data分析評估,它的設計影響公司產品分析,
並根據data幫助企業實現目標和提出新的建議,
不同的商業模式,我們都需有更多洞察力解決問題。
【Podcast重點方向】
The panels shared their perspectives on the current landscape of the Data
Science startup ecosystem, the importance of data-driven decision-making
culture in building products, and tools that the founders in the panel are
building such as Statsig's Product Experimentation tool.
【內容討論】
【Iterative Venture Newsletter: Data Landscape in the 2020s】
( In our podcast-EP4 episode on discussing the latest data trends.)
"First reckon, then risk." - Von Moltke.
Why do we need data and historical perspective
Oftentimes, we overhear conversations that companies get acquired for their “
data”, their data platform, or that we have clean (or unclean) “data” etc.
What are they trying to say? Why would a company get acquired purely for their
data? What is data and why do we need them anyway?
The way we understand data is that data are sample points much like the
experiences we have in life. The more we see the world, the more data points we
have, and therefore the more apt we are to make decisions (hopefully good ones)
based on the experiences we have had. This is no different for a company.
Thus, data is essential knowledge for the company and our brains are just very
complex and efficient data infrastructures enabling us to store data, make
decisions, explain such decisions, and iterate after incorporating new
information.
To make good decisions, therefore we should have a lot of data points to learn
from and be efficient to access them. This is what led to Facebook’s creation
of Hive, Presto, Scuba, and the likes to suit different problems such as quick
data accessibility for debugging purposes and simple insights (scuba),
cost-effective big-data data warehouse (Hive) for data crunching, as well as
the mixture of the two (presto) so to have more “citizen data scientists,
as per Ashu.
The new decade
Humanizing AI
At the turning of the new decade, and with the rise of the optimism of what
machine models and AI can bring, so too comes the challenge of how to provide
an explanation into how such models make decisions. This is where Krishna's
company, Fiddler.ai, comes in.
(Fiddler.ai:https://www.fiddler.ai/ )
When it comes to complex machine learning models such as random forest,
XgBoost, and neural network, etc. the models can sometimes be so complex with
intertwined permutations that explanation is simply impossible as there is not
a single formula we can boil down to.
For reference, below is an example of how neural network decision works at a
very high level with each of the circle as a variable:
(Picture 2:Source: IBM)
Just how does it work? Many of us just take it as a black box.
We spoke(Podcast EP1) with Krishna a while ago about this and he said that the
reason why he is solving this problem is that there is a monumental shift in
traditional industries such as banking and insurance.
Namely, there is a paradigm shift in the way how risks are assessed. In the
past, models were deterministic. That means we have a complex if-else switch
statement in place where we can evaluate whether someone is a risk and
therefore we should deny their credit card application. The good thing is that
we know why we rejected someone. The downside is that our models are fixed and
the importance of each criterion is pre-determined.
Machine learning models on the other hand can easily incorporate new data and
the models can easily be swapped with other models and thus can offer a dynamic
solution to the problem.
Check out the podcast episode we had Krishna to learn more on how Fiddler is
solving the problem.
Faster Iterative Loop
Another challenge that tech companies face in the new decade is the
ever-competitive landscape as there are more and more entrants.
For reference, the number of apps hitting the Apple App Store is exponential.
(Picture 3:Source:Statista (Note the time scale changes in 2020))
In this landscape, differentiation and the ability to learn and incorporate new
changes become absolutely key. To this point, there is a saying in Silicon
Valley that if there are two startups in competition against each other, the
one that iterates faster will win.
This is where Vijaye and Statsig (short for statistical significance) comes in.
What Statsig is hoping to achieve is to allow any tech companies to easily set
up the gatekeeper mechanism (the capability to show certain users a feature
that a company wants to roll out and not others for comparison purposes) and
the dashboard to collect relevant statistics for the control and test group in
order to better understand whether the product launch had achieved the intended
purpose.
Furthermore, Statisig offers the capability for companies to gradually roll out
their features in releases in a controlled fashion.
This not only achieves the goal of relying on the platform for release as well
as data collection but it also informs the relevant product teams of the effect
of any of the releases thus inherently building in the data-driven culture as
every release can now be backed up by statistics and data.
Democratizing Data Technology
From a broader overview and from investors’ perspectives, there is also
general democratization of data technology as well as vertical integration for
efficiency as per Ashu and Ravi.
As more and more folks coming from companies such as Google, Amazon, and
Facebook, are spreading the data-driven culture, more and more people are
realizing the power of data and are thus seeking to democratize data usage and
thus allowing for more “citizen data scientists”.
One angle of such is to reduce the technical bar in order to achieve the same
result. One prominent example that we came across in the past was Looker,
acquired by Google in 2019 for $2.6 billion, as Looker looks at data as objects
and thus allows for users to create LookML models (data objects) and to be
joined with other datasets thus enriching the data in a drag and drop fashion
reducing the need of having to write complex joins via SQL queries.
On the other hand, having vertically integrated platforms also means that we
can have fewer people working on the same system as the systems can be
centrally configured without companies having to write custom software.
One prominent example of such is the rise of cloud technology where in the past
a company may have to set up their own data centers but now everything can be
configured elastically via AWS, GCP, or Microsoft Azure.
Such trend is only accelerating as various buzzwords such as DataOps and MLOps
enter into our common use.
Just a matter of time
With various trends emerging, the challenges for many companies, especially
ones that are older, are that there are still a lot of data that reside in
on-premise data centers as per Ashu. Such data centers, because of archaic
technologies, do not lend themselves to easily transfer data across and thus
people have to physically remove the hard drives in order to copy over the data.
On the other hand, while such trends are emerging in Silicon Valley and other
tech hubs, many companies may not be able to afford the same level of
compensation for tech employees to justify the said employees to go to other
companies (a good number of which are legacy ones) and thus spread the
data-driven culture outside of Silicon Valley.
However, just with all problems, it is a matter time that such culture will
spread beyond the valley and I have no doubt that talented folks will come up
with new solutions that it may be both cheaper, more accessible, and easy to
use for companies of all kind to adopt a more data-driven solution and
ultimately, shape a more data-driven culture.
Podcast Link:
Spotify: https://lnkd.in/gYSm-Vvm
Apple: https://lnkd.in/g6kYiitq
Google: https://lnkd.in/gB_p7gMR
Any feedback would be appreciated
看到這裡的大家,非常謝謝,
如果這篇文章有幫助到大家學習,
歡迎至Facebook:
Richard Chen - Iterative Venture 按讚追蹤,
若有相關疑問,歡迎大家透過粉絲專頁私訊,也可以在底下留言:)
--
Tags:
工程師
All Comments
By Mia
at 2021-10-26T22:49
at 2021-10-26T22:49
Related Posts
升主管前要學什麼比較好?
By Lucy
at 2021-10-21T21:55
at 2021-10-21T21:55
5G目前和未來發展
By Ethan
at 2021-10-21T21:45
at 2021-10-21T21:45
公司被收購, 可被強制資遣嗎?
By Edwina
at 2021-10-21T19:35
at 2021-10-21T19:35
自主研製世界號運載火箭 韓國將揭航太產
By Quintina
at 2021-10-21T19:34
at 2021-10-21T19:34
升主管前要學什麼比較好?
By Emily
at 2021-10-21T18:02
at 2021-10-21T18:02