I wish to check if there are additional material for EDA and e2e MLP? There is some practise material in FG3, Chapter 6 and I found some through google, but good can be subjected to individual interpretation. Therefore I wish to get align with the group on what is ideal to look at for progressing in the journey of learning.
Likewise, it will be helpful if we can be pointed in the direction for e2e MLP as well. Maybe a git page sample?
I might have asked them in a separate post but wish to bring it up again as my concern falls on my lack of conviction of ‘definitive good’.
Many thanks for help !
By the way, in FG9, the link to datakind is dated, I am wondering if there is a replacement to it?
My teammate @Ryzal should have reached out to you with regards to additional resources pertaining to the E2E ML Pipelining. For the benefit of the rest of the members here for future reference, here are the links to understand more about that and more MLOps (and EDA content:
Hi @CaffeinePowered . It seems like my reply to you didn’t get published. No worries. @Syak captured most of what I wanted to relay.
Regarding the link for DataKind SG, the working link here provides one with more information about the chapter. You can also join their Meetup group here. However, do note that their last session was back in 2019. They’ve yet to resume their sessions. There also exist other Meetup groups relevant to data analytics/science for you to get together with other like-minded individuals:<div>
[quote]Assessment 2 requires applicants to perform exploratory data analysis (part 1) and build an end-to-end machine learning pipeline (part 2) on an unseen dataset. In part 1, applicants are expected to extract the dataset from a database before performing an exploratory data analysis of the dataset (EDA). In the EDA, applicants should develop a good understanding of the dataset by analyzing each feature individually as well as their interactions. Applicants should also form hypotheses about the dataset and verify them during the EDA. A good submission involves presenting these findings in a logical and well-thought-out process with insights.
In part 2, multiple machine learning models have to be developed and evaluated based on the given task. In the development of these models, applicants are required to document their design decisions in data pre-processing and reasoning to support their choice of models. In addition, models should be evaluated in a meaningful manner with an evaluation metric that is appropriate for the task at hand. When tackling a problem using different approaches, applicants may have to be creative to find a common evaluation metric to ensure consistency across comparisons.
As part of the submission process, applicants will be required to package their work for review. Submissions should include an iPython notebook (.ipynb file format), Python (.py file format) and bash (.sh file format) scripts. Other files required to recreate the development environment should also be included. Reproducibility is an extremely important tenet in machine learning and the ability to reproduce an applicant’s results is a key assessment point. Submissions should also demonstrate an understanding of software engineering fundamentals and all code submitted should incorporate proper coding conventions while being readable and easily understood.[/quote]
I wish to clarify that the following is true,
EDA, digest data, analyze and visualize
MLP, data preprocess, model training, evaluation, logging(if applicable)